WAVE-27K: Bringing together CTI sources to enhance threat intelligence

< Back

Date: 29.07.2024

Abstract

Considering the growing flow of information on the internet, and the increased incident-related data from diverse sources, unstructured text processing gains importance. We have presented an automated approach to link several CTI sources through the mapping of external references. Our method facilitates the automatic construction of datasets, allowing for updates and the inclusion of new samples and labels. Following this method we built a new dataset of unstructured CTI descriptions called Weakness, Attack, Vulnerabilities, and Events 27k (WAVE-27k). Our dataset includes information about 27 different MITRE techniques, containing 22539 samples related one technique and 5262 related to two or more techniques simultaneously. We evaluated five BERT-based models into the WAVE-27K dataset concluding that SecRoBERTa reaches the highest performance with a 77.52% F1 score. Additionally, we compare the performance of the SecRoBERTa on the WAVE-27K dataset and other public datasets. The results show that the model using the WAVE-27K dataset outperforms the others. These results demonstrate that the data within WAVE-27K contains relevant information and that the proposed method effectively built a dataset with a level of quality sufficient to train a machine-learning model.

BIB_text

@Article {
title = {WAVE-27K: Bringing together CTI sources to enhance threat intelligence},
pages = {119-126},
abstract = {

}
date = {2024-07-29},
}