Please wait a minute...
Data Analysis and Knowledge Discovery  2019, Vol. 3 Issue (3): 36-44    DOI: 10.11925/infotech.2096-3467.2018.0573
Current Issue | Archive | Adv Search |
Extracting Keywords from User Comments: Case Study of Meituan
Zhen Zhang1,Jin Zeng2()
1School of Information Management, Central China Normal University, Wuhan 430079, China
2School of Information Management, Wuhan University, Wuhan 430072, China
Download: PDF(792 KB)   HTML ( 4
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper tries to automatically extract keywords from user comments, aiming to help both buyers and sellers find valuable information. It supports the decision making of customers and provides feedbacks to improve online services. [Methods] Firstly, we defined the task of extracting keywords from user comments. Then, we proposed evaluation criteria from the perspectives of merchants and customers. Thirdly, we constructed a language model based keyword extraction method (LMKE). Finally, we collected experimental data from Meituan.com, and compared the performance of our method with two existing ones, i.e., TF-IDF and TextRank. [Results] The scores of our LMKE method were 0.7665, 0.6701, 0.6200, 0.8187, 0.7326 and 0.6743 with P@5, P@10, P@20, nDCG@5, nDCG@10 and nDCG@20. [Limitations] Our dataset was only built with user’s comments on buffet services in Wuhan, China. [Conclusions] The discriminative LMKE model has better performance than those of the TF-IDF and TextRank.

Key wordsProduction Recommendation      User Comments      Keywords Extraction     
Received: 22 May 2018      Published: 17 April 2019

Cite this article:

Zhen Zhang,Jin Zeng. Extracting Keywords from User Comments: Case Study of Meituan. Data Analysis and Knowledge Discovery, 2019, 3(3): 36-44.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2018.0573     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2019/V3/I3/36

[1] 马松岳, 许鑫. 基于评论情感分析的用户在线评价研究——以豆瓣网电影为例[J]. 图书情报工作, 2016, 60(10): 95-102.
[1] (Ma Songyue, Xu Xin.Study on User Online Evaluation Based on Sentiment Analysis of Comments: Taking Douban.com Movie as an Example[J]. Library and Information Service , 2016, 60(10): 95-102.)
[2] 韩金波. 面向在线评论的关键词抽取和知识关联研究[D]. 大连: 大连理工大学, 2017.
[2] (Han Jinbo.Research on Online Review Oriented Keyword Extraction and Knowledge Association[D]. Dalian: Dalian University of Technology, 2017.)
[3] 王军, 丁丹丹. 在线评论有用性与时间距离和社会距离关系的研究[J]. 情报理论与实践, 2016, 39(2): 73-77.
[3] (Wang Jun, Ding Dandan.Research on the Relationship Between the Usefulness of Online Review and the Time and Social Distance[J]. Information Studies: Theory & Application, 2016, 39(2): 73-77.)
[4] 郭顺利, 张向先, 李中梅. 面向用户信息需求的移动O2O在线评论有用性排序模型研究——以美团为例[J]. 图书情报工作, 2015, 59(23): 85-93.
[4] (Guo Shunli, Zhang Xiangxian, Li Zhongmei.Study on the Usefulness Ranking Model of Mobile O2O Online Reviews from the Perspective of User’s Information Demand: Taking an Example of Meituan[J]. Library and Information Service, 2015, 59(23): 85-93.)
[5] 夏火松, 李保国, 杨培. 基于改进K-means聚类的在线新闻评论主题抽取[J]. 情报学报, 2016, 35(1): 55-65.
[5] (Xia Huosong, Li Baoguo, Yang Pei.Topic Extraction in News Comments Based on Improved K-means Clustering Algorithm[J]. Journal of the China Society for Scientific and Technical Information, 2016, 35(1): 55-65.)
[6] 吴江, 周露莎, 刘冠君, 等. 基于LDA的可穿戴设备在线评论主题挖掘研究[J]. 信息资源管理学报, 2017, 7(3): 24-33.
[6] (Wu Jiang, Zhou Lusha, Liu Guanjun, et al.The Study of Topic Mining on Online Reviews of Wearable Devices Based on LDA Model[J]. Journal of Information Resources Management, 2017, 7(3): 24-33.)
[7] 孟美任, 丁晟春. 在线中文商品评论可信度研究[J]. 现代图书情报技术, 2013(9): 60-66.
[7] (Meng Meiren, Ding Shengchun.Research on the Credibility of Online Chinese Product Reviews[J]. New Technology of Library and Information Service, 2013(9): 60-66.)
[8] 徐嘉徽. 电子商务用户在线评论信息质量研究[D]. 长春: 吉林大学, 2016.
[8] (Xu Jiahui.Research on the Information Quality of E-commerce Customer Online Reviews[D]. Changchun: Jilin University, 2016.)
[9] 李杰, 李欢. 基于深度学习的短文本评论产品特征提取及情感分类研究[J]. 情报理论与实践, 2018, 41(2): 143-148.
[9] (Li Jie, Li Huan.Research on Product Feature Extraction and Sentiment Classification of Short Online Review Based on Deep Learning[J]. Information Studies: Theory & Application, 2018, 41(2): 143-148.)
[10] 杜嘉忠, 徐健, 刘颖. 网络商品评论的特征-情感词本体构建与情感分析方法研究[J]. 现代图书情报技术, 2014(5): 74-82.
[10] (Du Jiazhong, Xu Jian, Liu Ying.Research on Construction of Feature-Sentiment Ontology and Sentiment Analysis[J]. New Technology of Library and Information Service, 2014(5): 74-82.)
[11] 肖璐, 陈果, 刘继云. 基于情感分析的企业产品级竞争对手识别研究——以用户评论为数据源[J]. 图书情报工作, 2016, 60(1): 83-90.
[11] (Xiao Lu, Chen Guo, Liu Jiyun.Study on Identification of Enterprise Product Level Competitor Based on Sentiment Analysis: Taking User Reviews for Data Resources[J]. Library and Information Service, 2016, 60(1): 83-90.)
[12] 郭博, 李守光, 王昊, 等. 电商评论综合分析系统的设计与实现——情感分析与观点挖掘的研究与应用[J]. 数据分析与知识发现, 2017, 1(12): 1-9.
[12] (Guo Bo, Li Shouguang, Wang Hao, et al.Examining Product Reviews with Sentiment Analysis and Opinion Mining[J]. Data Analysis and Knowledge Discovery, 2017, 1(12): 1-9.)
[13] 赵京胜, 朱巧明, 周国栋, 等. 自动关键词抽取研究综述[J]. 软件学报, 2017, 28(9): 2431-2449.
[13] (Zhao Jingsheng, Zhu Qiaoming, Zhou Guodong, et al.Review of Research in Automatic Keyword Extraction[J]. Journal of Software, 2017, 28(9): 2431-2449.)
[14] Luhn H P.A Statistical Approach to Mechanized Encoding and Searching of Literary Information[J]. IBM Journal of Research and Development, 1957, 1(4): 309-317.
[15] 徐文海, 温有奎. 一种基于TFIDF方法的中文关键词抽取算法[J]. 情报理论与实践, 2008, 31(2): 298-302.
[15] (Xu Wenhai, Wen Youkui.A Chinese Keyword Extraction Algorithm Based on TFIDF Method[J]. Information Studies: Theory & Application, 2008, 31(2): 298-302.)
[16] 黄磊, 伍雁鹏, 朱群峰. 关键词自动提取方法的研究与改进[J]. 计算机科学, 2014, 41(6): 204-207.
[16] (Huang Lei, Wu Yanpeng, Zhu Qunfeng.Research and Improvement of TFIDF Text Feature Weighting Method[J]. Computer Science, 2014, 41(6): 204-207.)
[17] Pu X, Jin R, Xue G R, et al.Topic Modeling in Semantic Space with Keywords[C]// Proceedings of the 24th ACM International on Conference on Information and Knowledge Management. ACM, 2015: 1141-1150.
[18] 刘啸剑, 谢飞, 吴信东. 基于图和LDA主题模型的关键词抽取算法[J]. 情报学报, 2016, 35(6): 664-672.
[18] (Liu Xiaojian, Xie Fei, Wu Xindong.Graph Based Keyphrase Extraction Using LDA Topic Model[J]. Journal of the China Society for Scientific and Technical Information, 2016, 35(6): 664-672.)
[19] 杨洁, 季铎, 蔡东风, 等. 基于TextRank的多文档关键词抽取技术[C]// 全国信息检索与内容安全学术会议论文集. 2008.
[19] (Yang Jie, Ji Duo, Cai Dongfeng, et al.Keyword Extraction in Multi-Document Based on TextRank Technology[C]// Proceedings of the National Academic Conference on Information Retrieval and Content Safety. 2008.)
[20] 夏天. 词向量聚类加权TextRank的关键词抽取[J]. 数据分析与知识发现, 2017, 1(2): 28-34.
[20] (Xia Tian.Extracting Keywords with Modified TextRank Model[J]. Data Analysis and Knowledge Discovery, 2017, 1(2): 28-34.)
[21] 夏天. 词语位置加权TextRank的关键词抽取研究[J]. 现代图书情报技术, 2013(9): 30-34.
[21] (Xia Tian.Study on Keyword Extraction Using Word Position Weighted TextRank[J]. New Technology of Library and Information Service, 2013(9): 30-34.)
[22] 方龙, 李信, 黄永, 等. 学术文本的结构功能识别——在关键词自动抽取中的应用[J]. 情报学报, 2017, 36(6): 599-605.
[22] (Fang Long, Li Xin, Huang Yong, et al.Structure-Function Recognition of Academic Text—— Application in Automatic Keywords Extraction[J]. Journal of the China Society for Scientific and Technical Information, 2017, 36(6): 599-605.)
[23] Ramos J.Using TF-IDF to Determine Word Relevance in Document Queries[C]//Proceedings of the 1st International Conference on Machine Learning. 2003, 242: 133-142.
[24] Mihalcea R, Tarau P.TextRank: Bringing Order into Texts[C]// Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, Barcelona, Spain. 2004: 404-411.
[25] 郝媛媛, 邹鹏, 李一军, 等. 基于电影面板数据的在线评论情感倾向对销售收入影响的实证研究[J]. 管理评论, 2009, 21(10): 95-103.
[25] (Hao Yuanyuan, Zou Peng, Li Yijun, et al.An Empirical Study on the Impact of Online Reviews Sentimental Orientation on Sale Based on Movie Panel Data[J]. Management Review, 2009, 21(10): 95-103.)
[26] 姚天昉, 程希文, 徐飞玉, 等. 文本意见挖掘综述[J]. 中文信息学报, 2008, 22(3): 71-80.
[26] (Yao Tianfang, Cheng Xiwen, Xu Feiyu, et al.A Survey of Opinion Mining for Texts[J]. Journal of Chinese Information Processing, 2008, 22(3): 71-80.)
[27] Jin R, Hauptmann A G, Zhai C X.Title Language Model for Information Retrieval[C]// Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2002: 42-48.
[28] Davis J, Goadrich M.The Relationship Between Precision-Recall and ROC Curves[C]// Proceedings of the 23rd International Conference on Machine Learning. ACM, 2006: 233-240.
[29] Wang Y, Wang L, Li Y, et al.A Theoretical Analysis of NDCG Type Ranking Measures[C]// Proceedings of the 2013 Conference on Learning Theory. 2013.
[1] Xiuxian Wen,Jian Xu. Research on Product Characteristics Extraction and Hedonic Price Based on User Comments[J]. 数据分析与知识发现, 2019, 3(7): 42-51.
[2] Qingtian Zeng,Xiaohui Hu,Chao Li. Extracting Keywords with Topic Embedding and Network Structure Analysis[J]. 数据分析与知识发现, 2019, 3(7): 52-60.
[3] Hong Zong,Chunxiang Xue,Fen Chen. Growth Pattern of Online News Comments[J]. 数据分析与知识发现, 2018, 2(9): 50-58.
[4] Wang Hao, Deng Sanhong, Su Xinning. Research on Chinese Keywords Extraction Based on Characters Sequence Annotation[J]. 现代图书情报技术, 2011, 27(12): 39-45.
[5] Zhang Chengmin,Xu Xin,Zhang Chengzhi. Analysis of the Factors Affecting the Performance of CRF-based Keywords Extraction Model[J]. 现代图书情报技术, 2008, 24(6): 34-40.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn