A Clustering Analysis Method Based on Wilcoxon-Mann-Whitney Testing

Yuan Cheng; Weinan Jia; Ronghua Chi

Proceedings of the 13th EAI International Conference on Mobile Multimedia Communications, Mobimedia 2020, 27-28 August 2020, Cyberspace

Research Article

A Clustering Analysis Method Based on Wilcoxon-Mann-Whitney Testing

Download461 downloads

Cite: BibTeX Plain Text

@INPROCEEDINGS{10.4108/eai.27-8-2020.2296732,
    author={Yuan  Cheng and Weinan  Jia and Ronghua  Chi},
    title={A Clustering Analysis Method Based on Wilcoxon-Mann-Whitney Testing},
    proceedings={Proceedings of the 13th EAI International Conference on Mobile Multimedia Communications, Mobimedia 2020, 27-28 August 2020, Cyberspace},
    publisher={EAI},
    proceedings_a={MOBIMEDIA},
    year={2020},
    month={11},
    keywords={clustering analysis distance measurement nonparametric statistics wilcoxon-mann-whitney rank sum test},
    doi={10.4108/eai.27-8-2020.2296732}
}

Yuan Cheng
Weinan Jia
Ronghua Chi
Year: 2020
A Clustering Analysis Method Based on Wilcoxon-Mann-Whitney Testing
MOBIMEDIA
EAI
DOI: 10.4108/eai.27-8-2020.2296732

Yuan Cheng¹^,*, Weinan Jia¹, Ronghua Chi²

1: Harbin University of Science and Technology
2: Heilongjiang University of Science and Technology

*Contact email: changuang7@sina.com

Abstract

As the core step of clustering analysis, the results of distance measurements will influence the clustering accuracy. The existing measurements are mostly based on the information about cluster features. However, the cluster features may be not sufficient enough and would result in losing data information about clusters containing a number of objects. To improve the measurement accuracy, we make full use of the distribution characteristics of objects in clusters, so we use the descriptive statistics and the Wilcoxon-Mann-Whitney rank sum test in nonparametric statistics to measure distances during clustering. Furthermore, a two-stage clustering is proposed to improve the performance of clustering analysis, from the aspects ofavoiding assuming the number of clusterspreliminarily, discovering clusters of arbitrary shapes andimproving clustering accuracy. The experiments on multiple datasets compared with other clustering algorithms illustrate the accuracy and efficiency of the proposed clustering algorithm.

Keywords: clustering analysis distance measurement nonparametric statistics wilcoxon-mann-whitney rank sum test

Published: 2020-11-19
Publisher: EAI

: http://dx.doi.org/10.4108/eai.27-8-2020.2296732