Please wait a minute...
Advanced Search
数据分析与知识发现  2018, Vol. 2 Issue (4): 90-98     https://doi.org/10.11925/infotech.2096-3467.2017.1252
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
基于NodeRank算法的产品特征提取研究*
周立欣, 林杰()
同济大学经济与管理学院 上海 200092
Extracting Product Features with NodeRank Algorithm
Zhou Lixin, Lin Jie()
School of Economics and Management, Tongji University, Shanghai 200092, China
全文: PDF (1351 KB)   HTML ( 3
输出: BibTeX | EndNote (RIS)      
摘要 

目的】基于自然语言处理技术和复杂网络相关理论, 提出新的产品特征识别方法, 提高产品特征的抽取效果。【方法】构建产品特征-情感词对的二分加权网络, 从网络视角更加清晰、直观地描述产品特征词和情感词之间的关系。然后提出NodeRank算法对产品特征词进行重要性排序, 提高特征词提取的准确率。【结果】通过对京东商城中真实评论数据的仿真实验, 结果表明NodeRank算法产品特征提取的准确率、召回率和F-score都高于HAC、TF-IDF和TextRank等基准算法。【局限】NodeRank算法的计算复杂度偏高, 需要进一步优化。【结论】NodeRank算法是一种准确有效的特征提取方法, 能够为产品特征提取、产品营销等商业活动提供支持。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
周立欣
林杰
关键词 特征词抽取二分网络NodeRank算法重要性排序    
Abstract

[Objective] This paper presents a novel algorithm based on the NLP technique and complex network theory, aiming to extract product features more effectively. [Methods] First, we constructed a weighted bipartite graph with the product features and sentiment words, which described their relationship more clearly and intuitively from network perspective. Then, we proposed the NodeRank algorithm to rank the importance of product features, which improved the precision of feature extraction. [Results] We examined the proposed algorithm with data from jd.com, a popular online shopping site in China. The precision, recall and F-score of the NodeRank algorithm were better than the HAC, TF-IDF and TextRank methods. [Limitations] The computational complexity of our new algorithm needs to be optimized. [Conclusions] The NodeRank algorithm could effectively extract the product features, which supports marketing and other business activities.

Key wordsFeature Extraction    Bipartite Graph    NodeRank Algorithm    Importance Ranking
收稿日期: 2017-12-11      出版日期: 2018-05-11
ZTFLH:  TP393  
基金资助:*本文系国家自然科学基金项目“社交媒体中用户创新价值度测量模型及互动创新管理方法研究”(项目编号: 71672128)和中央高校基本科研业务费专项资金项目“基于大数据的社交网络传播机理与模型研究”(项目编号: 1200219368)的研究成果之一
引用本文:   
周立欣, 林杰. 基于NodeRank算法的产品特征提取研究*[J]. 数据分析与知识发现, 2018, 2(4): 90-98.
Zhou Lixin,Lin Jie. Extracting Product Features with NodeRank Algorithm. Data Analysis and Knowledge Discovery, 2018, 2(4): 90-98.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2017.1252      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2018/V2/I4/90
  基于特征-情感词二分网络的模型框架
  产品特征-情感词对的二分网络示意图
产品名称 类别 评论数量 清洗后的评论数量
华为G9 Plus铂雅金
4G手机
手机 1 888条 1 366条
  数据集的基本统计特性
排序 特征词 NR 词频 RFF
1 手感 0.02622 65 0.08541
2 外观 0.02141 61 0.08016
3 屏幕 0.01678 39 0.05125
4 电池 0.01614 23 0.03022
5 价格 0.01518 13 0.01708
6 速度 0.01446 59 0.07753
7 质量 0.01238 23 0.03022
8 感觉 0.01191 7 0.02365
9 机身 0.01098 4 0.0092
10 界面 0.01081 6 0.00526
  产品的特征词排序
  不同比例下特征词的准确率
  不同比例下特征词的召回率和F-score
  不同特征词识别算法准确率分析
  不同特征词识别算法的召回率分析
  不同特征词识别算法的F-score分析
[1] King R A, Racherla P, Bush V D.What We Know and don’t Know about Online Word-of-Mouth: A Review and Synthesis of the Literature[J]. Journal of Interactive Marketing, 2014, 28(3): 167-183.
doi: 10.1016/j.intmar.2014.02.001
[2] Phang C W, Zhang C, Sutanto J.The Influence of User Interaction and Participation in Social Media on the Consumption Intention of Niche Products[J]. Information & Management, 2013, 50(8): 661-672.
doi: 10.1016/j.im.2013.07.001
[3] Gandomi A, Haider M.Beyond the Hype: Big Data Concepts, Methods, and Analytics[J]. International Journal of Information Management, 2015, 35(2): 137-144.
doi: 10.1016/j.ijinfomgt.2014.10.007
[4] Hu M, Liu B.Mining and Summarizing Customer Reviews[C]//Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, Washington, USA. 2004: 168-177.
[5] Popescu A M, Etzioni O.Extracting Product Features and Opinions from Reviews [A]//Natural Language Processing and Text Mining[M]. Springer, 2007: 9-28.
[6] 李实, 叶强, 李一军. 中文网络客户评论的产品特征挖掘方法研究[J]. 管理科学学报, 2009, 12(2): 142-152.
doi: 10.3321/j.issn:1007-9807.2009.02.015
[6] (Li Shi, Ye Qiang, Li Yijun, et al.Mining Features of Products from Chinese Customer Online Reviews[J]. Journal of Management Sciences in China, 2009, 12(2): 142-152.)
doi: 10.3321/j.issn:1007-9807.2009.02.015
[7] 刘鸿宇, 赵妍妍, 秦兵, 等. 评价对象抽取及其倾向性分析[J]. 中文信息学报, 2010, 24(1): 84-88.
doi: 10.3969/j.issn.1003-0077.2010.01.015
[7] (Liu Hongyu, Zhao Yanyan, Qin Bing, et al.Comment Target Extraction and Sentiment Classification[J]. Journal of Chinese Information Processing, 2010, 24(1): 84-88.)
doi: 10.3969/j.issn.1003-0077.2010.01.015
[8] Qiu G, Liu B, Bu J, et al.Opinion Word Expansion and Target Extraction Through Double Propagation[J]. Computational Linguistics, 2011, 37(1): 9-27.
doi: 10.1162/coli_a_00034
[9] Poria S, Cambria E, Ku L W, et al.A Rule-Based Approach to Aspect Extraction from Product Reviews[C] //Proceedings of the 2nd Workshop on Natural Language Processing for Social Media (SocialNLP). 2014: 28-37.
[10] Xu H, Shu L, Zhang J, et al.Mining Compatible/Incompatible Entities from Question and Answering via Yes/No Answer Classification Using Distant Label Expansion [OL]. arXiv Preprint, arXiv:1612.04499.
[11] Xu H, Xie S, Shu L, et al.CER: Complementary Entity Recognition via Knowledge Expansion on Large Unlabeled Product Reviews [OL]. arXiv Preprint, arXiv: 1612 .01039.
doi: 10.1109/BigData.2016.7840672
[12] Borrajo L, Vieira A S, Iglesias E L.TCBR-HMM: An HMM-based Text Classifier with a CBR System[J]. Applied Soft Computing, 2015,26: 463-473.
doi: 10.1016/j.asoc.2014.10.019
[13] Owoputi O, O’Connor B, Dyer C, et al. Improved Part-of- Speech Tagging for Online Conversational Text with Word Clusters[C]//Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2013: 7-13.
[14] Mesnil G, Dauphin Y, Yao K, et al.Using Recurrent Neural Networks for Slot Filling in Spoken Language Understanding[J]. IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), 2015, 23(3): 530-539.
doi: 10.1109/TASLP.2014.2383614
[15] Jakob N, Gurevych I.Extracting Opinion Targets in a Single- and Cross-Domain Setting with Conditional Random Fields[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing. 2010:1035-1045.
[16] Shu L, Liu B, Xu H, et al.Supervised Opinion Aspect Extraction by Exploiting Past Extraction Results [OL]. arXiv Preprint. arXiv:1612.07940.
[17] Choi Y, Cardie C.Hierarchical Sequential Learning for Extracting Opinions and Their Attributes[C]// Proceedings of the ACL 2010 Conference Short Papers. Association for Computational Linguistics, 2010:269-274.
[18] Wang W, Wang H, Song Y.Ranking Product Aspects Through Sentiment Analysis of Online Reviews[J]. Journal of Experimental & Theoretical Artificial Intelligence, 2017, 29(2): 227-246.
doi: 10.1080/0952813X.2015.1132270
[19] Zhang Z, Guo C, Goes P.Product Comparison Networks for Competitive Analysis of Online Word-of-Mouth[J]. ACM Transactions on Management Information Systems, 2013, 3(4): 1-22.
doi: 10.1145/2407740.2407744
[20] Jo Y, Oh A H.Aspect and Sentiment Unification Model for Online Review Analysis[C]//Proceedings of the ACM International Conference on Web Search and Data Mining. ACM, 2011:815-824.
[21] Moghaddam S, Ester M.ILDA: Interdependent LDA Model for Learning Latent Aspects and Their Ratings from Online Product Reviews[C]//Proceedings of International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2011:665-674.
[22] Huang S, Liu X, Peng X, et al.Fine-grained Product Features Extraction and Categorization in Reviews Opinion Mining[C]//Proceedings of IEEE 12th International Conference on Data Mining Workshops. IEEE, 2012:680-686.
[23] Yan Z, Xing M, Zhang D, et al.EXPRS: An Extended PageRank Method for Product Feature Extraction from Online Consumer Reviews[J]. Information & Management, 2015, 52(7): 850-858.
doi: 10.1016/j.im.2015.02.002
[24] Zhang L, Liu B, Lim S H, et al.Extracting and Ranking Product Features in Opinion Documents[C]// Proceedings of International Conference on Computational Linguistics: Posters. Association for Computational Linguistics, 2010: 1462-1470.
[25] Mihalcea R, Tarau P.TextRank: Bringing Order into Texts[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2004: 404-411.
[26] Zha Z J, Yu J, Tang J, et al.Product Aspect Ranking and Its Applications[J]. IEEE Transactions on Knowledge & Data Engineering, 2014, 26(5): 1211-1224.
doi: 10.1109/TKDE.2013.136
[27] Brin S, Page L.The Anatomy of a Large-scale Hypertextual Web Search Engine[J]. Computer Networks and ISDN Systems, 1998, 30(1-7): 107-117. .
[1] 李博诚,张云秋,杨铠西. 面向微博商品评论的情感标签抽取研究 *[J]. 数据分析与知识发现, 2019, 3(9): 115-123.
[2] 张金柱,王小梅,韩涛. 文献-作者二分网络中基于路径组合的合著关系预测研究*[J]. 现代图书情报技术, 2016, 32(10): 42-49.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn