Please wait a minute...
Advanced Search
现代图书情报技术  2015, Vol. 31 Issue (7-8): 104-112    DOI: 10.11925/infotech.1003-3513.2015.07.14
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
面向主题的高质量评论挖掘模型研究
唐晓波1, 邱鑫2
1 武汉大学信息资源研究中心 武汉 430072;
2 武汉大学信息管理学院 武汉 430072
Research on Subject-Oriented High Quality Reviews Mining Model
Tang Xiaobo1, Qiu Xin2
1 Center for the Studies of Information Resources, Wuhan University, Wuhan 430072, China;
2 School of Information Management, Wuhan University, Wuhan 430072, China
全文: PDF(6034 KB)   HTML  
输出: BibTeX | EndNote (RIS)      
摘要 

目的】帮助消费者从海量的评论集合中识别高质量评论。【方法】利用LDA主题模型对消费者关注的主题进行分类, 借鉴改进的自动摘要的思想, 追踪评论主题下的高质量评论, 提出面向主题的高质量评论挖掘模型。【结果】自动提炼出每个主题下的高质量评论, 其准确率、召回率和F1值分别为80.73%、64.90%和71.95%, 并通过实证研究证明该模型的有效性和优越性。【局限】仅与部分典型模型作对比, 其他模型方法还未进行验证。【结论】该模型能从评论集中有效地挖掘出不同主题下的高质量评论, 从而能够更加高效地辅助消费者进行购买决策。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
Abstract

[Objective] In order to help consumers distinguish high quality reviews from enormous review sets.[Methods] Using LDA topic model to classify the themes and referring to the thoughts of improved automatic summarization, this paper puts forward Subject-Oriented High Quality Reviews Mining Model.[Results] The model extracts high quality reviews automatically under each topic. The results of the experiment show that its precision, recall and F1 score reach 80.73%, 64.90% and 71.95% respectively, proving the model's effectiveness and superiority.[Limitations] Just compared the model with some typical models, but there are some other methods exist but have not been verified. [Conclusions] The model can effectively mine high quality reviews under different themes from the review sets, thus help customers in making more effective purchase decision.

收稿日期: 2015-01-13     
:  G203  
基金资助:

本文系国家自然科学基金项目"社会化媒体集成检索与语义分析方法研究"(项目编号:71273194)的研究成果之一。

通讯作者: 邱鑫, ORCID: 0000-0001-9508-7441, E-mail: 847125278@qq.com。     E-mail: 847125278@qq.com
作者简介: 作者贡献声明: 唐晓波, 邱鑫: 提出研究思路, 设计研究方案; 邱鑫: 进行实验, 采集、清洗和分析数据, 起草论文; 唐晓波, 邱鑫: 论文最终版本修订。
引用本文:   
唐晓波, 邱鑫. 面向主题的高质量评论挖掘模型研究[J]. 现代图书情报技术, 2015, 31(7-8): 104-112.
Tang Xiaobo, Qiu Xin. Research on Subject-Oriented High Quality Reviews Mining Model. New Technology of Library and Information Service, DOI:10.11925/infotech.1003-3513.2015.07.14.
链接本文:  
http://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.1003-3513.2015.07.14

[1] 江敏.产品网络评论挖掘研究[D].北京:北京信息科技大学, 2008.(Jiang Min.Research on ProductNetworkReviewsMining[D].Beijing: Beijing Information Science and Technology University, 2008.)
[2] Ghose A, Ipeiortis P G.Designing Novel Review Ranking Systems: Predicting the Usefulness and Impact of Reviews[C].In: Proceedings of the 9th International ConferenceonElectronicCommerce(ICEC'07),Minneapolis,MN, USA. New York: ACM, 2007: 303-310.
[3] Otterbacher J. "Helpfulness"in Online Communities: A Measure of Message Quality[C]. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI'09), Boston, MA, USA.New York: ACM, 2009: 955-964.
[4] 李志宇. 在线商品评论效用排序模型研究[J]. 现代图书情报技术, 2013(4): 62-68.(Li Zhiyu. Study on the Reviews Effectiveness Sequencing Model of Online Products[J]. New Technology of Library and Information Service, 2013(4): 62-68.)
[5] 王平, 代宝.消费者在线评论有用性影响因素实证研究[J]. 统计与决策, 2012(2): 118-120.(Wang Ping, Dai Bao. An Empirical Study of the Factors Affecting the Usefulness of Online Consumer Reviews[J]. Statistics & Decision, 2012(2): 118-120.)
[6] 彭岚, 周启海, 邱江涛.消费者在线评论有用性影响因素模型研究[J].计算机科学, 2011, 38(8): 205-207, 244.(Peng Lan, Zhou Qihai, Qiu Jiangtao. Research on the Model of Helpfulness Factors of Online Customer Reviews[J].Computer Science, 2011, 38(8): 205-207, 244.)
[7] 聂卉.基于内容分析的用户评论质量的评价与预测[J].图书情报工作, 2014, 58(13): 83-89.(Nie Hui.Content-oriented Evaluation and Detection for Product Reviews[J].Library and Information Service, 2014, 58(13): 83-89.)
[8] Liu Y, Huang X, An A, et al. Modeling and Predicting the Helpfulness of Online Reviews[C].In: Proceedings of the 8th IEEE International Conference on Data Mining, (ICDM'08). IEEE, 2008: 443-452.
[9] Fei G, Mukherjee A, Liu B. Exploiting Business in Reviews for Review Spammer Detection [C]. In: Proceedings of the 7th International AAAI Conference on Weblogs and Social Media. 2013.
[10] Moghaddam S, Ester M. ILDA: Interdependent LDA Model for Learning Latent Aspects and Their Ratings from Online Product Reviews[C].In: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR'11). New York: ACM, 2011: 665-674.
[11] 阮光册. 基于LDA的网络评论主题发现研究[J]. 情报杂志, 2014, 33(3): 161-164. (Ruan Guangce. Topic Extraction Research of Net Reviews Based on Latent Dirichlet Allocation[J]. Journal of Intelligence, 2014, 33(3): 161-164.)
[12] 余传明, 张小青, 陈雷. 基于LDA模型的评论热点挖掘: 原理与实现[J]. 情报理论与实践, 2010, 33(5): 103-106.(Yu Chuanming, Zhang Xiaoqing, Chen Lei. Mining Hot Topics of User Comment Based on LDA Model: Principle & Approach[J]. Information Studies: Theory & Application, 2010, 33(5): 103-106.)
[13] Titov I, McDonald R. Modeling Online Reviews with Multi-grain Topic Models[C].In: Proceedings of the 17th International Conference on World Wide Web (WWW'08). New York: ACM, 2008: 111-120.
[14] Erkan G, Radev D R. LexRank: Graph-based Lexical Centrality as Salience in Text Summarization[J]. Journal of ArtificialIntelligence Research, 2004, 22(1): 457-479.
[15] 纪文倩, 李舟军, 巢文涵, 等. 一种基于 LexRank 算法的改进的自动文摘系统[J]. 计算机科学, 2010, 37(5): 151-154.(Ji Wenqian, Li Zhoujun, Chao Wenhan, et al. Automatic Abstracting System Based on Improved LexRank Algorithm[J].Computer Science, 2010, 37(5): 151-154.)
[16] Blei D M, Ng A Y, Jordan M I. Latent Dirichlet Allocation[J]. The Journal of Machine Learning Research, 2003, 3: 993-1022.
[17] 八爪鱼采集器 [EB/OL].[2014-11-08].http://www.bazhuayu.com/doc-wf.(Bazhuayu Collector[EB/OL].[2014-11-08].http://www.bazhuayu.com/doc-wf.)
[18] Gross A, Murthy D. Modeling Virtual Organizations with Latent Dirichlet Allocation: A Case for Natural Language Processing[J]. Neural Networks, 2014, 58: 38-49.
[19] Mudambi S M, Schuff D. What Makes a Helpful Online Review? A Study of Customer Reviews on Amazon.com[J]. Management Information Systems Quarterly, 2010, 34(1): 185-200.
[20] 杨潇, 马军, 杨同峰, 等.主题模型LDA的多文档自动文摘[J]. 智能系统学报, 2010, 5(2): 169-176.(Yang Xiao, Ma Jun, Yang Tongfeng, et al. Automatic Multi-document Summarization Based on the Latent Dirichlet Topic Allocation Model[J].CAAI Transactions on Intelligent Systems, 2010, 5(2): 169-176.)
[21] Zhang Y, Ji D, Su Y, et al. Topic Analysis for Online Reviews with an Author-Experience-Object-Topic Model[A].//Information Retrieval Technology[M]. Springer Berlin Heidelberg, 2011: 303-314.
[22] Zhuang L, Jing F, Zhu X. Movie Review Mining and Summarization[C].In: Proceedings of the 15th ACM International Conference on Information and Knowledge Management(CIKM'06). New York: ACM, 2006: 43-50.

[1] 张颖怡, 章成志, 池雪花, 李蕾. 科研用户博文关键词标注行为差异研究——以科学网博客为例[J]. 现代图书情报技术, 2015, 31(10): 13-21.
[2] 翟姗姗, 许鑫, 夏立新. 学术博客中的用户交流与知识传播研究述评[J]. 现代图书情报技术, 2015, 31(7-8): 3-12.
[3] 许鑫, 翟姗姗, 姚占雷. 学术博客的学科交互实证分析——以科学网博客为例[J]. 现代图书情报技术, 2015, 31(7-8): 13-23.
[4] 谭旻, 许鑫, 赵星. 学术博客共推荐关系及核心结构特性研究——以科学网博客为例[J]. 现代图书情报技术, 2015, 31(7-8): 24-30.
[5] 谭旻, 许鑫. 学术博客推荐网络的h度实证——以科学网博客为例[J]. 现代图书情报技术, 2015, 31(7-8): 31-36.
[6] 王传清, 毕强. 数字图书馆自动化语义标注工具系统模型研究[J]. 现代图书情报技术, 2014, 30(6): 17-24.
[7] 姜雯, 许鑫. 在线问答社区信息质量评价研究综述[J]. 现代图书情报技术, 2014, 30(6): 41-50.
[8] 唐晓波, 房小可. 微博中文本特征质量对检索效果的影响[J]. 现代图书情报技术, 2014, 30(6): 79-86.
[9] 柯青, 王秀峰. Web导航模型综述——信息觅食理论视角[J]. 现代图书情报技术, 2014, 30(2): 32-40.
[10] 李英英, 王惠临. 主题图技术在消费者健康信息资源组织中的应用——以糖尿病为例[J]. 现代图书情报技术, 2013, (12): 55-61.
[11] 陈明红, 漆贤军. 学术博客的用户接受模型及实证研究[J]. 现代图书情报技术, 2013, (12): 81-87.
[12] 陈颖, 李姣, 李军莲. 中国药品数据的知识表示方法研究[J]. 现代图书情报技术, 2013, (6): 9-15.
[13] 洪娜, 钱庆, 范炜, 方安, 王军辉. 关联数据中关系发现的可视化实践[J]. 现代图书情报技术, 2013, 29(2): 11-17.
[14] 万君, 张祥, 庞培培. 婚恋网站初始信任影响因素模型研究[J]. 现代图书情报技术, 2012, (10): 67-71.
[15] 韩耀军. 多语言信息资源调度的有色时延Petri网建模与分析[J]. 现代图书情报技术, 2012, 28(3): 40-46.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn