Please wait a minute...
Advanced Search
数据分析与知识发现  2019, Vol. 3 Issue (3): 36-44     https://doi.org/10.11925/infotech.2096-3467.2018.0573
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
面向用户评论的关键词抽取研究*——以美团为例
张震1,曾金2()
1华中师范大学信息管理学院 武汉 430079
2武汉大学信息管理学院 武汉 430072
Extracting Keywords from User Comments: Case Study of Meituan
Zhen Zhang1,Jin Zeng2()
1School of Information Management, Central China Normal University, Wuhan 430079, China
2School of Information Management, Wuhan University, Wuhan 430072, China
全文: PDF (792 KB)   HTML ( 18
输出: BibTeX | EndNote (RIS)      
摘要 

【目的】通过自动从海量用户评论中抽取有效关键词, 帮助用户和商家快速有效地发现有价值的信息, 从而更好地为用户购买行为提供决策支持, 为商家改善服务质量提供信息反馈。【方法】界定面向用户评论的关键词抽取的问题定义, 从商家和用户两个角度提出面向用户评论的关键词抽取的评价准则; 提出一种基于语言模型的用户评论关键词抽取方法(LMKE), 采集美团网用户评论构建实验数据集, 并与TF-IDF和TextRank两种关键词抽取方法进行对比。【结果】LMKE方法在P@5、P@10、P@20、nDCG@5、nDCG@10和nDCG@20的最高得分分别为0.7665、0.6701、0.6200、0.8187、0.7326和0.6743。【局限】实验仅以美团网武汉地区自助餐厅的所有用户评论为例, 具有一定的局限性。【结论】相较于TF-IDF和TextRank, LMKE方法的效果更优, 且在LMKE方法中基于区分度的策略能获得最优评价指标。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
张震
曾金
关键词 产品推荐用户评论关键词抽取    
Abstract

[Objective] This paper tries to automatically extract keywords from user comments, aiming to help both buyers and sellers find valuable information. It supports the decision making of customers and provides feedbacks to improve online services. [Methods] Firstly, we defined the task of extracting keywords from user comments. Then, we proposed evaluation criteria from the perspectives of merchants and customers. Thirdly, we constructed a language model based keyword extraction method (LMKE). Finally, we collected experimental data from Meituan.com, and compared the performance of our method with two existing ones, i.e., TF-IDF and TextRank. [Results] The scores of our LMKE method were 0.7665, 0.6701, 0.6200, 0.8187, 0.7326 and 0.6743 with P@5, P@10, P@20, nDCG@5, nDCG@10 and nDCG@20. [Limitations] Our dataset was only built with user’s comments on buffet services in Wuhan, China. [Conclusions] The discriminative LMKE model has better performance than those of the TF-IDF and TextRank.

Key wordsProduction Recommendation    User Comments    Keywords Extraction
收稿日期: 2018-05-22      出版日期: 2019-04-17
基金资助:*本文系国家社会科学基金重点项目“基于全生命周期的政府开放数据整合利用机制与模式研究”(项目编号: 17ATQ006)和中央高校基本科研业务费专项资金重大培育项目“大数据环境下的政府信息服务研究”(项目编号: CCNU16Z02002)的研究成果之一
引用本文:   
张震,曾金. 面向用户评论的关键词抽取研究*——以美团为例[J]. 数据分析与知识发现, 2019, 3(3): 36-44.
Zhen Zhang,Jin Zeng. Extracting Keywords from User Comments: Case Study of Meituan. Data Analysis and Knowledge Discovery, 2019, 3(3): 36-44.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2018.0573      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2019/V3/I3/36
[1] 马松岳, 许鑫. 基于评论情感分析的用户在线评价研究——以豆瓣网电影为例[J]. 图书情报工作, 2016, 60(10): 95-102.
[1] (Ma Songyue, Xu Xin.Study on User Online Evaluation Based on Sentiment Analysis of Comments: Taking Douban.com Movie as an Example[J]. Library and Information Service , 2016, 60(10): 95-102.)
[2] 韩金波. 面向在线评论的关键词抽取和知识关联研究[D]. 大连: 大连理工大学, 2017.
[2] (Han Jinbo.Research on Online Review Oriented Keyword Extraction and Knowledge Association[D]. Dalian: Dalian University of Technology, 2017.)
[3] 王军, 丁丹丹. 在线评论有用性与时间距离和社会距离关系的研究[J]. 情报理论与实践, 2016, 39(2): 73-77.
[3] (Wang Jun, Ding Dandan.Research on the Relationship Between the Usefulness of Online Review and the Time and Social Distance[J]. Information Studies: Theory & Application, 2016, 39(2): 73-77.)
[4] 郭顺利, 张向先, 李中梅. 面向用户信息需求的移动O2O在线评论有用性排序模型研究——以美团为例[J]. 图书情报工作, 2015, 59(23): 85-93.
[4] (Guo Shunli, Zhang Xiangxian, Li Zhongmei.Study on the Usefulness Ranking Model of Mobile O2O Online Reviews from the Perspective of User’s Information Demand: Taking an Example of Meituan[J]. Library and Information Service, 2015, 59(23): 85-93.)
[5] 夏火松, 李保国, 杨培. 基于改进K-means聚类的在线新闻评论主题抽取[J]. 情报学报, 2016, 35(1): 55-65.
[5] (Xia Huosong, Li Baoguo, Yang Pei.Topic Extraction in News Comments Based on Improved K-means Clustering Algorithm[J]. Journal of the China Society for Scientific and Technical Information, 2016, 35(1): 55-65.)
[6] 吴江, 周露莎, 刘冠君, 等. 基于LDA的可穿戴设备在线评论主题挖掘研究[J]. 信息资源管理学报, 2017, 7(3): 24-33.
[6] (Wu Jiang, Zhou Lusha, Liu Guanjun, et al.The Study of Topic Mining on Online Reviews of Wearable Devices Based on LDA Model[J]. Journal of Information Resources Management, 2017, 7(3): 24-33.)
[7] 孟美任, 丁晟春. 在线中文商品评论可信度研究[J]. 现代图书情报技术, 2013(9): 60-66.
[7] (Meng Meiren, Ding Shengchun.Research on the Credibility of Online Chinese Product Reviews[J]. New Technology of Library and Information Service, 2013(9): 60-66.)
[8] 徐嘉徽. 电子商务用户在线评论信息质量研究[D]. 长春: 吉林大学, 2016.
[8] (Xu Jiahui.Research on the Information Quality of E-commerce Customer Online Reviews[D]. Changchun: Jilin University, 2016.)
[9] 李杰, 李欢. 基于深度学习的短文本评论产品特征提取及情感分类研究[J]. 情报理论与实践, 2018, 41(2): 143-148.
[9] (Li Jie, Li Huan.Research on Product Feature Extraction and Sentiment Classification of Short Online Review Based on Deep Learning[J]. Information Studies: Theory & Application, 2018, 41(2): 143-148.)
[10] 杜嘉忠, 徐健, 刘颖. 网络商品评论的特征-情感词本体构建与情感分析方法研究[J]. 现代图书情报技术, 2014(5): 74-82.
[10] (Du Jiazhong, Xu Jian, Liu Ying.Research on Construction of Feature-Sentiment Ontology and Sentiment Analysis[J]. New Technology of Library and Information Service, 2014(5): 74-82.)
[11] 肖璐, 陈果, 刘继云. 基于情感分析的企业产品级竞争对手识别研究——以用户评论为数据源[J]. 图书情报工作, 2016, 60(1): 83-90.
[11] (Xiao Lu, Chen Guo, Liu Jiyun.Study on Identification of Enterprise Product Level Competitor Based on Sentiment Analysis: Taking User Reviews for Data Resources[J]. Library and Information Service, 2016, 60(1): 83-90.)
[12] 郭博, 李守光, 王昊, 等. 电商评论综合分析系统的设计与实现——情感分析与观点挖掘的研究与应用[J]. 数据分析与知识发现, 2017, 1(12): 1-9.
[12] (Guo Bo, Li Shouguang, Wang Hao, et al.Examining Product Reviews with Sentiment Analysis and Opinion Mining[J]. Data Analysis and Knowledge Discovery, 2017, 1(12): 1-9.)
[13] 赵京胜, 朱巧明, 周国栋, 等. 自动关键词抽取研究综述[J]. 软件学报, 2017, 28(9): 2431-2449.
[13] (Zhao Jingsheng, Zhu Qiaoming, Zhou Guodong, et al.Review of Research in Automatic Keyword Extraction[J]. Journal of Software, 2017, 28(9): 2431-2449.)
[14] Luhn H P.A Statistical Approach to Mechanized Encoding and Searching of Literary Information[J]. IBM Journal of Research and Development, 1957, 1(4): 309-317.
[15] 徐文海, 温有奎. 一种基于TFIDF方法的中文关键词抽取算法[J]. 情报理论与实践, 2008, 31(2): 298-302.
[15] (Xu Wenhai, Wen Youkui.A Chinese Keyword Extraction Algorithm Based on TFIDF Method[J]. Information Studies: Theory & Application, 2008, 31(2): 298-302.)
[16] 黄磊, 伍雁鹏, 朱群峰. 关键词自动提取方法的研究与改进[J]. 计算机科学, 2014, 41(6): 204-207.
[16] (Huang Lei, Wu Yanpeng, Zhu Qunfeng.Research and Improvement of TFIDF Text Feature Weighting Method[J]. Computer Science, 2014, 41(6): 204-207.)
[17] Pu X, Jin R, Xue G R, et al.Topic Modeling in Semantic Space with Keywords[C]// Proceedings of the 24th ACM International on Conference on Information and Knowledge Management. ACM, 2015: 1141-1150.
[18] 刘啸剑, 谢飞, 吴信东. 基于图和LDA主题模型的关键词抽取算法[J]. 情报学报, 2016, 35(6): 664-672.
[18] (Liu Xiaojian, Xie Fei, Wu Xindong.Graph Based Keyphrase Extraction Using LDA Topic Model[J]. Journal of the China Society for Scientific and Technical Information, 2016, 35(6): 664-672.)
[19] 杨洁, 季铎, 蔡东风, 等. 基于TextRank的多文档关键词抽取技术[C]// 全国信息检索与内容安全学术会议论文集. 2008.
[19] (Yang Jie, Ji Duo, Cai Dongfeng, et al.Keyword Extraction in Multi-Document Based on TextRank Technology[C]// Proceedings of the National Academic Conference on Information Retrieval and Content Safety. 2008.)
[20] 夏天. 词向量聚类加权TextRank的关键词抽取[J]. 数据分析与知识发现, 2017, 1(2): 28-34.
[20] (Xia Tian.Extracting Keywords with Modified TextRank Model[J]. Data Analysis and Knowledge Discovery, 2017, 1(2): 28-34.)
[21] 夏天. 词语位置加权TextRank的关键词抽取研究[J]. 现代图书情报技术, 2013(9): 30-34.
[21] (Xia Tian.Study on Keyword Extraction Using Word Position Weighted TextRank[J]. New Technology of Library and Information Service, 2013(9): 30-34.)
[22] 方龙, 李信, 黄永, 等. 学术文本的结构功能识别——在关键词自动抽取中的应用[J]. 情报学报, 2017, 36(6): 599-605.
[22] (Fang Long, Li Xin, Huang Yong, et al.Structure-Function Recognition of Academic Text—— Application in Automatic Keywords Extraction[J]. Journal of the China Society for Scientific and Technical Information, 2017, 36(6): 599-605.)
[23] Ramos J.Using TF-IDF to Determine Word Relevance in Document Queries[C]//Proceedings of the 1st International Conference on Machine Learning. 2003, 242: 133-142.
[24] Mihalcea R, Tarau P.TextRank: Bringing Order into Texts[C]// Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, Barcelona, Spain. 2004: 404-411.
[25] 郝媛媛, 邹鹏, 李一军, 等. 基于电影面板数据的在线评论情感倾向对销售收入影响的实证研究[J]. 管理评论, 2009, 21(10): 95-103.
[25] (Hao Yuanyuan, Zou Peng, Li Yijun, et al.An Empirical Study on the Impact of Online Reviews Sentimental Orientation on Sale Based on Movie Panel Data[J]. Management Review, 2009, 21(10): 95-103.)
[26] 姚天昉, 程希文, 徐飞玉, 等. 文本意见挖掘综述[J]. 中文信息学报, 2008, 22(3): 71-80.
[26] (Yao Tianfang, Cheng Xiwen, Xu Feiyu, et al.A Survey of Opinion Mining for Texts[J]. Journal of Chinese Information Processing, 2008, 22(3): 71-80.)
[27] Jin R, Hauptmann A G, Zhai C X.Title Language Model for Information Retrieval[C]// Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2002: 42-48.
[28] Davis J, Goadrich M.The Relationship Between Precision-Recall and ROC Curves[C]// Proceedings of the 23rd International Conference on Machine Learning. ACM, 2006: 233-240.
[29] Wang Y, Wang L, Li Y, et al.A Theoretical Analysis of NDCG Type Ranking Measures[C]// Proceedings of the 2013 Conference on Learning Theory. 2013.
[1] 卢伟聪,徐健. 基于三分网络的网络用户评论情感分析 *[J]. 数据分析与知识发现, 2019, 3(8): 10-20.
[2] 文秀贤,徐健. 基于用户评论的商品特征提取及特征价格研究 *[J]. 数据分析与知识发现, 2019, 3(7): 42-51.
[3] 李钰曼,陈志泊,许福. 基于KACC模型的文本分类研究 *[J]. 数据分析与知识发现, 2019, 3(10): 89-97.
[4] 杨贵军,徐雪,赵富强. 基于XGBoost算法的用户评分预测模型及应用*[J]. 数据分析与知识发现, 2019, 3(1): 118-126.
[5] 宗红, 薛春香, 陈芬. 在线新闻评论生长规律研究*[J]. 数据分析与知识发现, 2018, 2(9): 50-58.
[6] 郭博, 李守光, 王昊, 张晓军, 龚伟, 于昭君, 孙宇. 电商评论综合分析系统的设计与实现——情感分析与观点挖掘的研究与应用[J]. 数据分析与知识发现, 2017, 1(12): 1-9.
[7] 王培霞,余海,陈力,王永吉. 科技查新中检索词智能抽取系统的设计与实现*[J]. 现代图书情报技术, 2016, 32(11): 82-93.
[8] 蔡晓珍, 徐健, 吴思竹. 面向情感分析的用户评论过滤模型研究[J]. 现代图书情报技术, 2014, 30(4): 58-64.
[9] 夏天. 词语位置加权TextRank的关键词抽取研究[J]. 现代图书情报技术, 2013, 29(9): 30-34.
[10] 叶春蕾, 冷伏海. 基于词汇链的路线图关键词抽取方法研究[J]. 现代图书情报技术, 2013, 29(1): 50-56.
[11] 王昊, 邓三鸿, 苏新宁. 基于字序列标注的中文关键词抽取研究[J]. 现代图书情报技术, 2011, 27(12): 39-45.
[12] 殷蜀梅,张智雄,吴振新. 一种从医学文本中实现自动关键词抽取和筛选的技术方法*[J]. 现代图书情报技术, 2008, 24(8): 31-36.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn