Data Analysis and Knowledge Discovery  2020, Vol. 4 Issue (1): 99-110    DOI: 10.11925/infotech.2096-3467.2019.0702
 Current Issue | Archive | Adv Search |
Identifying Implicit Features with Word Embedding
Hui Nie(),Huan He
School of Information Management, Sun Yat-Sen University, Guangzhou 510006, China
 Download: PDF (1126 KB)   HTML ( 14 )  Export: BibTeX | EndNote (RIS)
Abstract

[Objective] The paper tries to extract implicit features from online reviews, aiming to obtain complete product-specific information and users’ evaluation from reviews.[Methods] We compared the performance of two leading methods for implicit feature extraction, relationship-based inference and classification.Then, we introduced the word embedding model, an online review corpus, and semantic-related words to improve each algorithm’s effectiveness. Finally, we examined the impacts of dataset equilibrium on the algorithms.[Results] To idenfity implicit features, the classification-based methods performed better than those based on relation inference with the non-equilibrium dataset. Word embedding significantly improved the quality of sentence model, which increased the recall and F1 scores by 5.91% and 2.48% respectively. With the equilibrium dataset, the relation-inference methods did a better job and the best F1-score was 0.7503 (word embedding).[Limitations] The size of corpus for training word embedding and the balanced dataset needs to be expanded.[Conclusions] The appropriate modeling schemes based on the target datasets and the equilibrium datasets yield better results. Word embedding helps us optimize the methods for classification.

Received: 18 June 2019      Published: 14 March 2020
 ZTFLH: TP391.1
Corresponding Authors: Hui Nie     E-mail: issnh@mail.sysu.edu.cn