Proceedings of The 6th Asia-Pacific Education And Science Conference, AECon 2020, 19-20 December 2020, Purwokerto, Indonesia

Research Article

BR+ for Addressing Imbalanced Multilabel Data Classification Combined with Resampling Technique

Download416 downloads
  • @INPROCEEDINGS{10.4108/eai.19-12-2020.2309179,
        author={Nilam  Novita Sari and Ismaini  Zain and Kartika  Fithriasari and Amri  Muhaimin},
        title={BR+ for Addressing Imbalanced Multilabel Data Classification Combined with Resampling Technique},
        proceedings={Proceedings of The 6th Asia-Pacific Education And Science Conference, AECon 2020, 19-20 December 2020, Purwokerto, Indonesia},
        publisher={EAI},
        proceedings_a={AECON},
        year={2021},
        month={8},
        keywords={multilabel imbalanced data br+ smote-nc tomek link random forest},
        doi={10.4108/eai.19-12-2020.2309179}
    }
    
  • Nilam Novita Sari
    Ismaini Zain
    Kartika Fithriasari
    Amri Muhaimin
    Year: 2021
    BR+ for Addressing Imbalanced Multilabel Data Classification Combined with Resampling Technique
    AECON
    EAI
    DOI: 10.4108/eai.19-12-2020.2309179
Nilam Novita Sari1,*, Ismaini Zain1, Kartika Fithriasari1, Amri Muhaimin1
  • 1: Department of Statistics, Faculty of Science and Data Analystics, Institut Teknologi Sepuluh Nopember, Surabaya, Indonesia
*Contact email: nilamnovitasari2013@gmail.com

Abstract

BR+ is a multilabel method that transforms multilabel into binary single label and assumes label dependency. BR+ can use any different classification method such as random forest. Random forest is an advantageous classification method. But presence of imbalanced classes, random forest will result in poor performance. Hence, handling imbalanced data can be done by applying resampling techniques consisting of SMOTE-NC and T-Link. The dataset used was adolescent risk behavior of drug abuse and premarital sex based on SKAP. The dataset has two labels means there are multilabel problems and the dataset is imbalanced. Thus, the combination of BR+ (Stat) and resampling techniques will be compared in handling multilabel imbalanced data in the classification of adolescent risk behavior using random forest. The results show that the optimum Mtry is 7 and the combination of BR+ (Stat) and T-Link is the best method to handle the multilabel imbalanced data.