Semantic N-Gram Topic Modeling

Kherwa, Pooja and Bansal, Poonam (2020) Semantic N-Gram Topic Modeling. EAI Endorsed Transactions on Scalable Information Systems, 7 (26): e7. ISSN 2032-9407

[thumbnail of eai.13-7-2018.163131.pdf]
eai.13-7-2018.163131.pdf - Published Version
Available under License Creative Commons Attribution No Derivatives.

Download (3MB) | Preview


In this paper a novel approach for effective topic modeling is presented. The approach is different from traditional vector space model-based topic modeling, where the Bag of Words (BOW) approach is followed. The novelty of our approach is that in phrase-based vector space, where critical measure like point wise mutual information (PMI) and log frequency based mutual dependency (LGMD)is applied and phrase’s suitability for particular topic are calculated and best considerable semantic N-Gram phrases and terms are considered for further topic modeling. In this experiment the proposed semantic N-Gram topic modeling is compared with collocation Latent Dirichlet allocation(coll-LDA) and most appropriate state of the art topic modeling technique latent Dirichlet allocation (LDA). Results are evaluated and it was found that perplexity is drastically improved and found significant improvement in coherence score specifically for short text data set like movie reviews and political blogs.

Item Type: Article
Uncontrolled Keywords: Topic Modeling, Latent Dirichlet Allocation, Point wise Mutual Information, Bag of words, Coherence, Perplexity
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
QA75 Electronic computers. Computer science
Depositing User: EAI Editor II.
Date Deposited: 08 Oct 2020 13:52
Last Modified: 08 Oct 2020 13:52

Actions (login required)

View Item
View Item