Empirical Validation of Automated Vulnerability Curation and Characterization
Prior research has shown that public vulnerability systems such as US National Vulnerability Database (NVD) rely on a manual, time-consuming, and error-prone process which has led to inconsistencies and delays in releasing final vulnerability results. This work provides an approach to curate vulnerability reports in real-time and map textual vulnerability reports to machine readable structured vulnerability attribute data. Designed to support the time consuming human analysis done by vulnerability databases, the system leverages the Common Vulnerabilities and Exposures (CVE) list of vulnerabilities and the vulnerability attributes described by the National Institute of Standards and Technology (NIST) Vulnerability Description Ontology (VDO) framework. Our work uses Natural Language Processing (NLP), Machine Learning (ML) and novel Information Theoretical (IT) methods to provide automated techniques for near real-time publishing, and characterization of vulnerabilities using 28 attributes in 5 domains. Experiment results indicate that vulnerabilities can be evaluated up to 95 hours earlier than using manual methods, they can be characterized with F-Measure values over 0.9, and the proposed automated approach could save up to 47% of the time spent for CVE characterization.
Okutan, A., Mell, P., Mirakhorli, M., Khokhlov, I., Santos, J. C.S., Gonzalez, D., & Simmons, S. (2023). Empirical validation of automated vulnerability curation and characterization. IEEE Transactions on Software Engineering, 49(5), 3241-3260.. Doi: 10.1109/TSE.2023.3250479