Word Embedding and String-Matching Techniques for Automobile Entity Name Identification from Web Reviews

Satanu Maity; Nilanjana Das; Mukta Majumder; Dibya Dasadhikary

Research Article

Word Embedding and String-Matching Techniques for Automobile Entity Name Identification from Web Reviews

Download1146 downloads

Cite: BibTeX Plain Text

@ARTICLE{10.4108/eai.14-5-2021.169918,
    author={Satanu Maity and Nilanjana Das and Mukta Majumder and Dibya Ranjan Dasadhikary},
    title={Word Embedding and String-Matching Techniques for Automobile Entity Name Identification from Web Reviews},
    journal={EAI Endorsed Transactions on Scalable Information Systems},
    volume={8},
    number={33},
    publisher={EAI},
    journal_a={SIS},
    year={2021},
    month={5},
    keywords={Noisy Name Identification, Automobile Discussion Forum, Machine Learning, Support Vector Machine, Conditional Random Field, Word Embedding, String Matching},
    doi={10.4108/eai.14-5-2021.169918}
}

Satanu Maity
Nilanjana Das
Mukta Majumder
Dibya Ranjan Dasadhikary
Year: 2021
Word Embedding and String-Matching Techniques for Automobile Entity Name Identification from Web Reviews
SIS
EAI
DOI: 10.4108/eai.14-5-2021.169918

Satanu Maity¹, Nilanjana Das², Mukta Majumder³^,*, Dibya Ranjan Dasadhikary⁴

1: Department of Computer Application, Bengal School of Technology and Management, Hooghly, India
2: Midnapore Zone, WBSEDCL, Midnapore, India
3: Department of Computer Science and Application, University of North Bengal, Siliguri, India
4: Department of Computer Science and Engineering, Siksha ‘O’ Anusandhan Deemed to be University, Bhubaneswar, India

*Contact email: mukta_jgec_it_4@yahoo.co.in

Abstract

With the huge popularity of Internet, various types of information on a wide range of domains are floating over different social media platforms. To extract this information for using in diverse natural language processing applications, identifying the names is prerequisite. A study is presented here, to identify automobile names from noisy web reviews by exploring two widely used machine learning algorithms, Conditional Random Field and Support Vector Machine. The accuracy of machine learning classifiers radically rely on size and quality of training data which has been prepared manually by extracting discussion forum corpus; the task is time consuming and laborious; hence to leverage this word embedding is adopted. Though it enhances the system’s performance but is unable to spot noisy names which occur in web reviews. Next, a gazetteer based string matching technique is proposed, it recognizes a new set of noisy automobile entities, resulting considerable improvement in accuracy.

Keywords: Noisy Name Identification, Automobile Discussion Forum, Machine Learning, Support Vector Machine, Conditional Random Field, Word Embedding, String Matching

Received: 2021-01-05
Accepted: 2021-04-28
Published: 2021-05-14
Publisher: EAI

: http://dx.doi.org/10.4108/eai.14-5-2021.169918

Copyright © 2021 Satanu Maity et al., licensed to EAI. This is an open access article distributed under the terms of the Creative Commons Attribution license, which permits unlimited use, distribution and reproduction in any medium so long as the original work is properly cited.