[Objective] The paper tries to extract implicit features from online reviews, aiming to obtain complete product-specific information and users’ evaluation from reviews.[Methods] We compared the performance of two leading methods for implicit feature extraction, relationship-based inference and classification.Then, we introduced the word embedding model, an online review corpus, and semantic-related words to improve each algorithm’s effectiveness. Finally, we examined the impacts of dataset equilibrium on the algorithms.[Results] To idenfity implicit features, the classification-based methods performed better than those based on relation inference with the non-equilibrium dataset. Word embedding significantly improved the quality of sentence model, which increased the recall and F1 scores by 5.91% and 2.48% respectively. With the equilibrium dataset, the relation-inference methods did a better job and the best F1-score was 0.7503 (word embedding).[Limitations] The size of corpus for training word embedding and the balanced dataset needs to be expanded.[Conclusions] The appropriate modeling schemes based on the target datasets and the equilibrium datasets yield better results. Word embedding helps us optimize the methods for classification.
聂卉,何欢. 引入词向量的隐性特征识别研究*[J]. 数据分析与知识发现, 2020, 4(1): 99-110.
Hui Nie,Huan He. Identifying Implicit Features with Word Embedding. Data Analysis and Knowledge Discovery, 2020, 4(1): 99-110.
( Liu Bing. Sentiment Analysis: Mining Opinions, Sentiments, and Emotions[M]. Translated by Liu Kang, Zhao Jun. Beijing: China Machine Press, 2017.
[2]
Tubishat M, Idris N, Abushariah M A M . Implicit Aspect Extraction in Sentiment Analysis: Review, Taxonomy, Opportunities, and Open Challenges[J]. Information Processing & Management, 2018,54(4):545-563.
[3]
Qiu G, Liu B, Bu J , et al. Expanding Domain Sentiment Lexicon Through Double Propagation [C]// Proceedings of the 21st International Joint Conference on Artificial Intelligence. 2009: 1199-1204.
[4]
Song H, Fan Y, Liu X , et al. Extracting Product Features from Online Reviews for Sentimental Analysis [C]// Proceedings of the 6th International Conference on Computer Sciences and Convergence Information Technology. 2011: 745-750.
[5]
Zhu J, Wang H, Zhu M , et al. Aspect-Based Opinion Polling from Customer Reviews[J]. IEEE Transactions on Affective Computing, 2011,2(1):37-49.
( Wang Wei, Wang Hongwei, Sheng Xiaobao . Extracting Product Features and Opinions from Chinese Online Reviews: A Comparative Study on Multi-domains[J]. Journal of Industrial Engineering and Engineering Management, 2017,31(4):52-62.)
( Tang Xiaobo, Liu Guangchao . Research Review on Fine-grained Sentiment Analysis[J]. Library and Information Service, 2017,61(5):132-140.)
[8]
Zhang Y, Zhu W. Extracting Implicit Features in Online Customer Reviews for Opinion Mining [C]// Proceedings of the 22nd International Conference on World Wide Web. ACM, 2013: 103-104.
[9]
Sun L, Li S, Li J Y, et al. A Novel Context-based Implicit Feature Extracting Method [C]// Proceedings of the 2014 International Conference on Data Science and Advanced Analytics(DSAA). IEEE, 2014: 420-424.
[10]
Schouten K, Frasincar F. Finding Implicit Features in Consumer Reviews for Sentiment Analysis [C]// Proceedings of the 14th International Conference on Web Engineering. Springer, Cham, 2014: 130-144.
[11]
Hai Z, Chang K, Kim J J. Implicit Feature Identification via Co-occurrence Association Rule Mining [C]// Proceedings of the 12th International Conference on Computational Linguistics and Intelligent Text Processing. 2011: 393-404.
[12]
Wang W, Xu H, Wan W . Implicit Feature Identification via Hybrid Association Rule Mining[J]. Expert System with Application, 2013,40(9):3518-3531.
[13]
张莉, 许鑫 . 现代图书情报技术[J].现代图书情报技术,2015(12):42-47.
[13]
( Zhang Li, Xu Xin . Implicit Feature Identification in Product Reviews[J]. New Technology of Library and Information Service, 2015(12):42-47.)
[14]
Hai Z, Chang K, Cong G , et al. An Association-Based Unified Framework for Mining Features and Opinion Words[J]. ACM Transactions on Intelligent Systems and Technology, 2015, 6(2): Article No. 26.
[15]
Xu H, Zhang F, Wang W . Implicit Feature Identification in Chinese Reviews Using Explicit Topic Mining Model[J]. Knowledge-Based Systems, 2015,76:166-175.
[16]
Hajar E H, Mohammed B. Hybrid Approach to Extract Adjectives for Implicit Aspect Identification in Opinion Mining [C]// Proceedings of the 11th International Conference on Intelligent Systems: Theories and Applications(SITA). IEEE, 2016: 1-5.
( Qiu Yunfei, Ni Xuefeng, Shao Liangshan . Research on Extracting Method of Commodities Implicit Opinion Targets[J]. Computer Engineering and Applications, 2015,51(19):114-118.)
[18]
Yan Z, Xing M, Zhang D , et al. EXPRS: An Extended PageRank Method for Product Feature Extraction from Online Consumer Reviews[J]. Information & Management, 2015,52(7):850-858.
( Zhou Qingqing, Zhang Chengzhi . Fine-grained Aspect Extraction from Online Customer Reviews[J]. Journal of the China Society for Scientific and Technical Information, 2017,36(5):484-493.)
( Li Liangqiang, Yuan Hua, Ye Kai , et al. Extraction Product Features from Online Reviews Based on Word-Vector-Representation[J]. Journal of Systems Engineering, 2018,33(5):687-697.)
( Lin Jianghao, Zhou Yongmei, Yang Aimin , et al. Building of Domain Sentiment Lexicon Based on Word2Vec[J]. Journal of Shandong University: Engineering Science, 2018,48(3):40-47.)
[23]
Bengio Y, Ducharme R, Vincent P , et al. A Neural Probabilistic Language Model[J]. Journal of Machine Learning Research, 2003,3(6):1137-1155.
[24]
Mikolov T, Chen K, Corrado G , et al. Efficient Estimation of Word Representations in Vector Space[OL]. arXiv Preprint, arXiv: 1301. 3781.
( Ding Shengchun, Meng Meiren, Li Xiao . Study of Subjective Sentence Identification Oriented to Chinese Microblog[J]. Journal of the China Society for Scientific and Technical Information, 2014,33(2):175-182.)
[26]
Che W, Li Z, Liu T. LTP: A Chinese Language Technology Platform [C]// Proceedings of the 23rd International Conference on Computational Linguistics: Demonstrations. Association for Computational Linguistics, 2010: 13-16.