Extracting Keywords from User Comments: Case Study of Meituan
Zhen Zhang1,Jin Zeng2()
1School of Information Management, Central China Normal University, Wuhan 430079, China 2School of Information Management, Wuhan University, Wuhan 430072, China
[Objective] This paper tries to automatically extract keywords from user comments, aiming to help both buyers and sellers find valuable information. It supports the decision making of customers and provides feedbacks to improve online services. [Methods] Firstly, we defined the task of extracting keywords from user comments. Then, we proposed evaluation criteria from the perspectives of merchants and customers. Thirdly, we constructed a language model based keyword extraction method (LMKE). Finally, we collected experimental data from Meituan.com, and compared the performance of our method with two existing ones, i.e., TF-IDF and TextRank. [Results] The scores of our LMKE method were 0.7665, 0.6701, 0.6200, 0.8187, 0.7326 and 0.6743 with P@5, P@10, P@20, nDCG@5, nDCG@10 and nDCG@20. [Limitations] Our dataset was only built with user’s comments on buffet services in Wuhan, China. [Conclusions] The discriminative LMKE model has better performance than those of the TF-IDF and TextRank.
张震,曾金. 面向用户评论的关键词抽取研究*——以美团为例[J]. 数据分析与知识发现, 2019, 3(3): 36-44.
Zhen Zhang,Jin Zeng. Extracting Keywords from User Comments: Case Study of Meituan. Data Analysis and Knowledge Discovery, 2019, 3(3): 36-44.
(Ma Songyue, Xu Xin.Study on User Online Evaluation Based on Sentiment Analysis of Comments: Taking Douban.com Movie as an Example[J]. Library and Information Service , 2016, 60(10): 95-102.)
[2]
韩金波. 面向在线评论的关键词抽取和知识关联研究[D]. 大连: 大连理工大学, 2017.
[2]
(Han Jinbo.Research on Online Review Oriented Keyword Extraction and Knowledge Association[D]. Dalian: Dalian University of Technology, 2017.)
(Wang Jun, Ding Dandan.Research on the Relationship Between the Usefulness of Online Review and the Time and Social Distance[J]. Information Studies: Theory & Application, 2016, 39(2): 73-77.)
(Guo Shunli, Zhang Xiangxian, Li Zhongmei.Study on the Usefulness Ranking Model of Mobile O2O Online Reviews from the Perspective of User’s Information Demand: Taking an Example of Meituan[J]. Library and Information Service, 2015, 59(23): 85-93.)
(Xia Huosong, Li Baoguo, Yang Pei.Topic Extraction in News Comments Based on Improved K-means Clustering Algorithm[J]. Journal of the China Society for Scientific and Technical Information, 2016, 35(1): 55-65.)
(Wu Jiang, Zhou Lusha, Liu Guanjun, et al.The Study of Topic Mining on Online Reviews of Wearable Devices Based on LDA Model[J]. Journal of Information Resources Management, 2017, 7(3): 24-33.)
(Meng Meiren, Ding Shengchun.Research on the Credibility of Online Chinese Product Reviews[J]. New Technology of Library and Information Service, 2013(9): 60-66.)
[8]
徐嘉徽. 电子商务用户在线评论信息质量研究[D]. 长春: 吉林大学, 2016.
[8]
(Xu Jiahui.Research on the Information Quality of E-commerce Customer Online Reviews[D]. Changchun: Jilin University, 2016.)
(Li Jie, Li Huan.Research on Product Feature Extraction and Sentiment Classification of Short Online Review Based on Deep Learning[J]. Information Studies: Theory & Application, 2018, 41(2): 143-148.)
(Du Jiazhong, Xu Jian, Liu Ying.Research on Construction of Feature-Sentiment Ontology and Sentiment Analysis[J]. New Technology of Library and Information Service, 2014(5): 74-82.)
(Xiao Lu, Chen Guo, Liu Jiyun.Study on Identification of Enterprise Product Level Competitor Based on Sentiment Analysis: Taking User Reviews for Data Resources[J]. Library and Information Service, 2016, 60(1): 83-90.)
(Guo Bo, Li Shouguang, Wang Hao, et al.Examining Product Reviews with Sentiment Analysis and Opinion Mining[J]. Data Analysis and Knowledge Discovery, 2017, 1(12): 1-9.)
(Zhao Jingsheng, Zhu Qiaoming, Zhou Guodong, et al.Review of Research in Automatic Keyword Extraction[J]. Journal of Software, 2017, 28(9): 2431-2449.)
[14]
Luhn H P.A Statistical Approach to Mechanized Encoding and Searching of Literary Information[J]. IBM Journal of Research and Development, 1957, 1(4): 309-317.
(Xu Wenhai, Wen Youkui.A Chinese Keyword Extraction Algorithm Based on TFIDF Method[J]. Information Studies: Theory & Application, 2008, 31(2): 298-302.)
(Huang Lei, Wu Yanpeng, Zhu Qunfeng.Research and Improvement of TFIDF Text Feature Weighting Method[J]. Computer Science, 2014, 41(6): 204-207.)
[17]
Pu X, Jin R, Xue G R, et al.Topic Modeling in Semantic Space with Keywords[C]// Proceedings of the 24th ACM International on Conference on Information and Knowledge Management. ACM, 2015: 1141-1150.
(Liu Xiaojian, Xie Fei, Wu Xindong.Graph Based Keyphrase Extraction Using LDA Topic Model[J]. Journal of the China Society for Scientific and Technical Information, 2016, 35(6): 664-672.)
(Yang Jie, Ji Duo, Cai Dongfeng, et al.Keyword Extraction in Multi-Document Based on TextRank Technology[C]// Proceedings of the National Academic Conference on Information Retrieval and Content Safety. 2008.)
(Fang Long, Li Xin, Huang Yong, et al.Structure-Function Recognition of Academic Text—— Application in Automatic Keywords Extraction[J]. Journal of the China Society for Scientific and Technical Information, 2017, 36(6): 599-605.)
[23]
Ramos J.Using TF-IDF to Determine Word Relevance in Document Queries[C]//Proceedings of the 1st International Conference on Machine Learning. 2003, 242: 133-142.
[24]
Mihalcea R, Tarau P.TextRank: Bringing Order into Texts[C]// Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, Barcelona, Spain. 2004: 404-411.
(Hao Yuanyuan, Zou Peng, Li Yijun, et al.An Empirical Study on the Impact of Online Reviews Sentimental Orientation on Sale Based on Movie Panel Data[J]. Management Review, 2009, 21(10): 95-103.)
(Yao Tianfang, Cheng Xiwen, Xu Feiyu, et al.A Survey of Opinion Mining for Texts[J]. Journal of Chinese Information Processing, 2008, 22(3): 71-80.)
[27]
Jin R, Hauptmann A G, Zhai C X.Title Language Model for Information Retrieval[C]// Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2002: 42-48.
[28]
Davis J, Goadrich M.The Relationship Between Precision-Recall and ROC Curves[C]// Proceedings of the 23rd International Conference on Machine Learning. ACM, 2006: 233-240.
[29]
Wang Y, Wang L, Li Y, et al.A Theoretical Analysis of NDCG Type Ranking Measures[C]// Proceedings of the 2013 Conference on Learning Theory. 2013.