Identify Vulnerability Fix Commits Automatically Using Hierarchical Attention Network

Mingxin Sun; Wenjie Wang; Hantao Feng; Hongu Sun; Yuqing Zhang

Research Article

Identify Vulnerability Fix Commits Automatically Using Hierarchical Attention Network

Download2134 downloads

Cite: BibTeX Plain Text

@ARTICLE{10.4108/eai.13-7-2018.164552,
    author={Mingxin Sun and Wenjie Wang and Hantao Feng and Hongu Sun and Yuqing Zhang},
    title={Identify Vulnerability Fix Commits Automatically Using Hierarchical Attention Network},
    journal={EAI Endorsed Transactions on Security and Safety},
    volume={7},
    number={23},
    publisher={EAI},
    journal_a={SESA},
    year={2020},
    month={5},
    keywords={vulnerability detection, GitHub Commits, deep learning, vulnerability patch},
    doi={10.4108/eai.13-7-2018.164552}
}

Mingxin Sun
Wenjie Wang
Hantao Feng
Hongu Sun
Yuqing Zhang
Year: 2020
Identify Vulnerability Fix Commits Automatically Using Hierarchical Attention Network
SESA
EAI
DOI: 10.4108/eai.13-7-2018.164552

Mingxin Sun¹, Wenjie Wang¹, Hantao Feng², Hongu Sun², Yuqing Zhang^1,2^,*

1: National Computer Network Intrusion Protection Center, University of Chinese Academy of Sciences, China
2: School of Cyber Engineering, Xidian University, China

*Contact email: zhangyq@ucas.ac.cn

Abstract

The application of machine learning and deep learning in the ﬁeld of vulnerability detection is a hot topic in security research, but currently it faces the problem of lack of dataset. Considering vulnerable code can be obtained from vulnerability ﬁx commits, we propose an automatic vulnerability commit identiﬁcation tool based on hierarchical attention network (HAN) to expand existing vulnerability dataset. HAN can model the input data at the word and sentence levels respectively and pay attention to the changes in the characteristics of diﬀerent words in diﬀerent categories, which improves the classiﬁcation performance. Experimental results show that the accuracy and F1 of our model both achieve 92%. Through the vulnerability ﬁx commit, researchers can quickly locate the vulnerable code. And extracting vulnerable code from open-source software can eﬀectively expand the current dataset due to the enormous number of open-source software.

Keywords: vulnerability detection, GitHub Commits, deep learning, vulnerability patch

Received: 2020-04-14
Accepted: 2020-05-05
Published: 2020-05-12
Publisher: EAI

: http://dx.doi.org/10.4108/eai.13-7-2018.164552

Copyright © 2020 Mingxin Sun et al., licensed to EAI. This is an open access article distributed under the terms of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/), which permits unlimited use, distribution and reproduction in any medium so long as the original work is properly cited.