Over-sampling imbalanced datasets using the Covariance Matrix

Leguen-deVarona, Ireimis and Madera, Julio and Martínez-López, Yoan and Hernández-Nieto, José Carlos (2020) Over-sampling imbalanced datasets using the Covariance Matrix. EAI Endorsed Transactions on Energy Web, 7 (2020): e2. ISSN 2032-944X

[thumbnail of eai.13-7-2018.163982.pdf]
eai.13-7-2018.163982.pdf - Published Version
Available under License Creative Commons Attribution No Derivatives.

Download (1MB) | Preview


INTRODUCTION: Nowadays, many machine learning tasks involve learning from imbalanced datasets, leading to the miss-classification of the minority class. One of the state-of-the-art approaches to ”solve” this problem at the data level is Synthetic Minority Over-sampling Technique (SMOTE) which in turn uses KNearest Neighbors (KNN) algorithm to select and generate new instances.

OBJECTIVES: This paper presents SMOTE-Cov, a modified SMOTE that use Covariance Matrix instead of KNN to balance datasets, with continuous attributes and binary class.

METHODS: We implemented two variants SMOTE-CovI, which generates new values within the interval of each attribute and SMOTE-CovO, which allows some values to be outside the interval of the attributes.

RESULTS: The results show that our approach has a similar performance as the state- of-the-art approaches.

CONCLUSION: In this paper, a new algorithm is proposed to generate synthetic instances of the minority class, using the Covariance Matrix.

Item Type: Article
Uncontrolled Keywords: Imbalanced datasets, Oversampling, Covariance Matrix, Attribute Dependency
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
QA75 Electronic computers. Computer science
Depositing User: EAI Editor II.
Date Deposited: 17 Sep 2020 10:42
Last Modified: 17 Sep 2020 10:42
URI: https://eprints.eudl.eu/id/eprint/427

Actions (login required)

View Item
View Item