Please wait a minute...
Advanced Search
现代图书情报技术  2015, Vol. 31 Issue (7-8): 104-112     https://doi.org/10.11925/infotech.1003-3513.2015.07.14
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
面向主题的高质量评论挖掘模型研究
唐晓波1, 邱鑫2
1 武汉大学信息资源研究中心 武汉 430072;
2 武汉大学信息管理学院 武汉 430072
Research on Subject-Oriented High Quality Reviews Mining Model
Tang Xiaobo1, Qiu Xin2
1 Center for the Studies of Information Resources, Wuhan University, Wuhan 430072, China;
2 School of Information Management, Wuhan University, Wuhan 430072, China
全文: PDF (6034 KB)   HTML  
输出: BibTeX | EndNote (RIS)      
摘要 

目的】帮助消费者从海量的评论集合中识别高质量评论。【方法】利用LDA主题模型对消费者关注的主题进行分类, 借鉴改进的自动摘要的思想, 追踪评论主题下的高质量评论, 提出面向主题的高质量评论挖掘模型。【结果】自动提炼出每个主题下的高质量评论, 其准确率、召回率和F1值分别为80.73%、64.90%和71.95%, 并通过实证研究证明该模型的有效性和优越性。【局限】仅与部分典型模型作对比, 其他模型方法还未进行验证。【结论】该模型能从评论集中有效地挖掘出不同主题下的高质量评论, 从而能够更加高效地辅助消费者进行购买决策。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
Abstract

[Objective] In order to help consumers distinguish high quality reviews from enormous review sets.[Methods] Using LDA topic model to classify the themes and referring to the thoughts of improved automatic summarization, this paper puts forward Subject-Oriented High Quality Reviews Mining Model.[Results] The model extracts high quality reviews automatically under each topic. The results of the experiment show that its precision, recall and F1 score reach 80.73%, 64.90% and 71.95% respectively, proving the model's effectiveness and superiority.[Limitations] Just compared the model with some typical models, but there are some other methods exist but have not been verified. [Conclusions] The model can effectively mine high quality reviews under different themes from the review sets, thus help customers in making more effective purchase decision.

收稿日期: 2015-01-13      出版日期: 2015-08-25
:  G203  
基金资助:

本文系国家自然科学基金项目"社会化媒体集成检索与语义分析方法研究"(项目编号:71273194)的研究成果之一。

通讯作者: 邱鑫, ORCID: 0000-0001-9508-7441, E-mail: 847125278@qq.com。     E-mail: 847125278@qq.com
作者简介: 作者贡献声明: 唐晓波, 邱鑫: 提出研究思路, 设计研究方案; 邱鑫: 进行实验, 采集、清洗和分析数据, 起草论文; 唐晓波, 邱鑫: 论文最终版本修订。
引用本文:   
唐晓波, 邱鑫. 面向主题的高质量评论挖掘模型研究[J]. 现代图书情报技术, 2015, 31(7-8): 104-112.
Tang Xiaobo, Qiu Xin. Research on Subject-Oriented High Quality Reviews Mining Model. New Technology of Library and Information Service, 2015, 31(7-8): 104-112.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.1003-3513.2015.07.14      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2015/V31/I7-8/104

[1] 江敏.产品网络评论挖掘研究[D].北京:北京信息科技大学, 2008.(Jiang Min.Research on ProductNetworkReviewsMining[D].Beijing: Beijing Information Science and Technology University, 2008.)
[2] Ghose A, Ipeiortis P G.Designing Novel Review Ranking Systems: Predicting the Usefulness and Impact of Reviews[C].In: Proceedings of the 9th International ConferenceonElectronicCommerce(ICEC'07),Minneapolis,MN, USA. New York: ACM, 2007: 303-310.
[3] Otterbacher J. "Helpfulness"in Online Communities: A Measure of Message Quality[C]. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI'09), Boston, MA, USA.New York: ACM, 2009: 955-964.
[4] 李志宇. 在线商品评论效用排序模型研究[J]. 现代图书情报技术, 2013(4): 62-68.(Li Zhiyu. Study on the Reviews Effectiveness Sequencing Model of Online Products[J]. New Technology of Library and Information Service, 2013(4): 62-68.)
[5] 王平, 代宝.消费者在线评论有用性影响因素实证研究[J]. 统计与决策, 2012(2): 118-120.(Wang Ping, Dai Bao. An Empirical Study of the Factors Affecting the Usefulness of Online Consumer Reviews[J]. Statistics & Decision, 2012(2): 118-120.)
[6] 彭岚, 周启海, 邱江涛.消费者在线评论有用性影响因素模型研究[J].计算机科学, 2011, 38(8): 205-207, 244.(Peng Lan, Zhou Qihai, Qiu Jiangtao. Research on the Model of Helpfulness Factors of Online Customer Reviews[J].Computer Science, 2011, 38(8): 205-207, 244.)
[7] 聂卉.基于内容分析的用户评论质量的评价与预测[J].图书情报工作, 2014, 58(13): 83-89.(Nie Hui.Content-oriented Evaluation and Detection for Product Reviews[J].Library and Information Service, 2014, 58(13): 83-89.)
[8] Liu Y, Huang X, An A, et al. Modeling and Predicting the Helpfulness of Online Reviews[C].In: Proceedings of the 8th IEEE International Conference on Data Mining, (ICDM'08). IEEE, 2008: 443-452.
[9] Fei G, Mukherjee A, Liu B. Exploiting Business in Reviews for Review Spammer Detection [C]. In: Proceedings of the 7th International AAAI Conference on Weblogs and Social Media. 2013.
[10] Moghaddam S, Ester M. ILDA: Interdependent LDA Model for Learning Latent Aspects and Their Ratings from Online Product Reviews[C].In: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR'11). New York: ACM, 2011: 665-674.
[11] 阮光册. 基于LDA的网络评论主题发现研究[J]. 情报杂志, 2014, 33(3): 161-164. (Ruan Guangce. Topic Extraction Research of Net Reviews Based on Latent Dirichlet Allocation[J]. Journal of Intelligence, 2014, 33(3): 161-164.)
[12] 余传明, 张小青, 陈雷. 基于LDA模型的评论热点挖掘: 原理与实现[J]. 情报理论与实践, 2010, 33(5): 103-106.(Yu Chuanming, Zhang Xiaoqing, Chen Lei. Mining Hot Topics of User Comment Based on LDA Model: Principle & Approach[J]. Information Studies: Theory & Application, 2010, 33(5): 103-106.)
[13] Titov I, McDonald R. Modeling Online Reviews with Multi-grain Topic Models[C].In: Proceedings of the 17th International Conference on World Wide Web (WWW'08). New York: ACM, 2008: 111-120.
[14] Erkan G, Radev D R. LexRank: Graph-based Lexical Centrality as Salience in Text Summarization[J]. Journal of ArtificialIntelligence Research, 2004, 22(1): 457-479.
[15] 纪文倩, 李舟军, 巢文涵, 等. 一种基于 LexRank 算法的改进的自动文摘系统[J]. 计算机科学, 2010, 37(5): 151-154.(Ji Wenqian, Li Zhoujun, Chao Wenhan, et al. Automatic Abstracting System Based on Improved LexRank Algorithm[J].Computer Science, 2010, 37(5): 151-154.)
[16] Blei D M, Ng A Y, Jordan M I. Latent Dirichlet Allocation[J]. The Journal of Machine Learning Research, 2003, 3: 993-1022.
[17] 八爪鱼采集器 [EB/OL].[2014-11-08].http://www.bazhuayu.com/doc-wf.(Bazhuayu Collector[EB/OL].[2014-11-08].http://www.bazhuayu.com/doc-wf.)
[18] Gross A, Murthy D. Modeling Virtual Organizations with Latent Dirichlet Allocation: A Case for Natural Language Processing[J]. Neural Networks, 2014, 58: 38-49.
[19] Mudambi S M, Schuff D. What Makes a Helpful Online Review? A Study of Customer Reviews on Amazon.com[J]. Management Information Systems Quarterly, 2010, 34(1): 185-200.
[20] 杨潇, 马军, 杨同峰, 等.主题模型LDA的多文档自动文摘[J]. 智能系统学报, 2010, 5(2): 169-176.(Yang Xiao, Ma Jun, Yang Tongfeng, et al. Automatic Multi-document Summarization Based on the Latent Dirichlet Topic Allocation Model[J].CAAI Transactions on Intelligent Systems, 2010, 5(2): 169-176.)
[21] Zhang Y, Ji D, Su Y, et al. Topic Analysis for Online Reviews with an Author-Experience-Object-Topic Model[A].//Information Retrieval Technology[M]. Springer Berlin Heidelberg, 2011: 303-314.
[22] Zhuang L, Jing F, Zhu X. Movie Review Mining and Summarization[C].In: Proceedings of the 15th ACM International Conference on Information and Knowledge Management(CIKM'06). New York: ACM, 2006: 43-50.

[1] 齐托托, 白如玉, 王天梅. 基于信息采纳模型的知识付费行为研究——产品类型的调节效应 [J]. 数据分析与知识发现, 0, (): 1-.
[2] 陆泉, 何超, 陈静, 田敏, 刘婷. 基于两阶段迁移学习的多标签分类模型研究*[J]. 数据分析与知识发现, 2021, 5(7): 91-100.
[3] 陈君,梁昊,钱晨. 情感距离视角下奖励式众筹用户投资决策行为研究*——基于项目文本的分析[J]. 数据分析与知识发现, 2021, 5(4): 60-71.
[4] 梁家铭, 赵洁, 郑鹏, 黄流深, 叶敏祺, 董振宁. 特征选择下融合图像和文本分析的在线短租平台信任计算框架 *[J]. 数据分析与知识发现, 2021, 5(2): 129-140.
[5] 吴思竹, 钱庆, 周伟, 钟明, 王安然, 修晓蕾, 苟欢, 李赞梅, 李姣, 方安. 面向人口健康领域科研项目数据汇交的数据仓储设计与实现*[J]. 数据分析与知识发现, 2020, 4(12): 2-13.
[6] 黄微,赵江元,闫璐. 网络热点事件话题漂移指数构建与实证研究*[J]. 数据分析与知识发现, 2020, 4(11): 92-101.
[7] 池毛毛,潘美钰,王伟军. 线索一致性对共享住宿平台用户购买决策的影响研究:房客文本信息和房源图片信息的交互效应*[J]. 数据分析与知识发现, 2020, 4(11): 74-83.
[8] 蔡婧璇,吴江,王诚坤. 基于深度学习的众测报告有用性预测研究*[J]. 数据分析与知识发现, 2020, 4(11): 102-111.
[9] 吴思竹, 钱庆, 周伟, 钟明, 王安然, 修晓蕾, 苟欢, 李赞梅, 李姣, 方安. 面向人口健康领域科研项目数据汇交的数据仓储设计与实现 [J]. 数据分析与知识发现, 0, (): 1-.
[10] 王树义,刘赛,马峥. 基于深度迁移学习的微博图像隐私分类研究*[J]. 数据分析与知识发现, 2020, 4(10): 80-92.
[11] 梁家铭, 赵洁, 郑鹏, 黄流深, 叶敏祺, 董振宁. 特征选择下融合图像和文本分析的在线短租平台信任计算框架 [J]. 数据分析与知识发现, 0, (): 1-.
[12] 池毛毛, 潘美钰, 王伟军. 线索一致性对共享住宿平台用户购买决策的影响研究:房客文本信息和房源图片信息的交互效应 [J]. 数据分析与知识发现, 0, (): 1-.
[13] 李旭晖,于滔,李婷,李逸文,顾进广. 一种面向演化的模式元数据描述机制*[J]. 数据分析与知识发现, 2020, 4(1): 76-88.
[14] 李纲,陈思菁,毛进,谷岩松. 自然灾害事件微博热点话题的时空对比分析 *[J]. 数据分析与知识发现, 2019, 3(11): 1-15.
[15] 李贺, 祝琳琳, 闫敏, 刘金承, 洪闯. 开放式创新社区用户信息有用性识别研究*[J]. 数据分析与知识发现, 2018, 2(12): 12-22.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn