Proceedings of The 6th Asia-Pacific Education And Science Conference, AECon 2020, 19-20 December 2020, Purwokerto, Indonesia

Research Article

Text Mining to Analyse Publication Topics of COVID-19 using HDP and LDA Methods

Download337 downloads
  • @INPROCEEDINGS{10.4108/eai.19-12-2020.2309174,
        author={Rakhmah Wahyu Mayasari and Kartika  Fithiasari and Dedy Dwi Prastyo},
        title={Text Mining to Analyse Publication Topics of COVID-19 using HDP and LDA Methods},
        proceedings={Proceedings of The 6th Asia-Pacific Education And Science Conference, AECon 2020, 19-20 December 2020, Purwokerto, Indonesia},
        publisher={EAI},
        proceedings_a={AECON},
        year={2021},
        month={8},
        keywords={covid-19 hierarchical dirichlet process latent dirichlet allocation text mining},
        doi={10.4108/eai.19-12-2020.2309174}
    }
    
  • Rakhmah Wahyu Mayasari
    Kartika Fithiasari
    Dedy Dwi Prastyo
    Year: 2021
    Text Mining to Analyse Publication Topics of COVID-19 using HDP and LDA Methods
    AECON
    EAI
    DOI: 10.4108/eai.19-12-2020.2309174
Rakhmah Wahyu Mayasari1,*, Kartika Fithiasari2, Dedy Dwi Prastyo3
  • 1: Department of Statistics, Faculty of Science and Data Analytics, Institut Teknologi Sepuluh Nopember, Surabaya, Indonesia
  • 2: Department of Statistics, Faculty of Science and Data Analystics, Institut Teknologi Sepuluh Nopember, Surabaya, Indonesia
  • 3: 1Department of Statistics, Faculty of Science and Data Analytics, Institut Teknologi Sepuluh Nopember, Surabaya, Indonesia
*Contact email: rakhmah13@mhs.statistika.its.ac.id

Abstract

COVID-19 is a disease caused by the novel coronavirus, which almost all countries are affected. This worldwide effect has led many researchers to conduct research related to COVID-19. It is wanted to know what topics have been carried out from all the studies published by researchers in various countries. This research analyzes the data crawled from full abstracts of publications related to COVID-19 start January 2020 to August 2020. The abstract's text was crawled and then preprocessed by eliminating punctuation, lowering text, lemmatizer, and stopword. Furthermore, the clean data is ready for analysis using the text mining method to allocate topics and use as future research information. The methods used are the Hierarchical Dirichlet Process (HDP) and Latent Dirichlet Allocation (LDA) approaches. It also found that the LDA method has a coherence score of 42% higher than the HDP method, which means LDA is more appropriate in this case.