sesa 20(23): e2

Research Article

Identify Vulnerability Fix Commits Automatically Using Hierarchical Attention Network

Download2134 downloads
  • @ARTICLE{10.4108/eai.13-7-2018.164552,
        author={Mingxin Sun and Wenjie Wang and Hantao Feng and Hongu Sun and Yuqing Zhang},
        title={Identify Vulnerability Fix Commits Automatically Using Hierarchical Attention Network},
        journal={EAI Endorsed Transactions on Security and Safety},
        volume={7},
        number={23},
        publisher={EAI},
        journal_a={SESA},
        year={2020},
        month={5},
        keywords={vulnerability detection, GitHub Commits, deep learning, vulnerability patch},
        doi={10.4108/eai.13-7-2018.164552}
    }
    
  • Mingxin Sun
    Wenjie Wang
    Hantao Feng
    Hongu Sun
    Yuqing Zhang
    Year: 2020
    Identify Vulnerability Fix Commits Automatically Using Hierarchical Attention Network
    SESA
    EAI
    DOI: 10.4108/eai.13-7-2018.164552
Mingxin Sun1, Wenjie Wang1, Hantao Feng2, Hongu Sun2, Yuqing Zhang1,2,*
  • 1: National Computer Network Intrusion Protection Center, University of Chinese Academy of Sciences, China
  • 2: School of Cyber Engineering, Xidian University, China
*Contact email: zhangyq@ucas.ac.cn

Abstract

The application of machine learning and deep learning in the field of vulnerability detection is a hot topic in security research, but currently it faces the problem of lack of dataset. Considering vulnerable code can be obtained from vulnerability fix commits, we propose an automatic vulnerability commit identification tool based on hierarchical attention network (HAN) to expand existing vulnerability dataset. HAN can model the input data at the word and sentence levels respectively and pay attention to the changes in the characteristics of different words in different categories, which improves the classification performance. Experimental results show that the accuracy and F1 of our model both achieve 92%. Through the vulnerability fix commit, researchers can quickly locate the vulnerable code. And extracting vulnerable code from open-source software can effectively expand the current dataset due to the enormous number of open-source software.