Please wait a minute...
Data Analysis and Knowledge Discovery  2017, Vol. 1 Issue (9): 83-89    DOI: 10.11925/infotech.2096-3467.2017.09.09
Orginal Article Current Issue | Archive | Adv Search |
Extracting Product Features with Weight-based Apriori Algorithm
Li Changbing, Pang Chongpeng(), Li Meiping
School of Economics and Management, Chongqing University of Posts and Telecommunications, Chongqing 400065, China
Download: PDF (622 KB)   HTML ( 3
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper aims to reduce the noises while extracting product features from customer comments. [Methods] We used the TF-IDF and variance selection methods to extracted the needed data. Then, we set the thresholds to filter the extracted words and obtain the product feature set. Third, we generated frequent item sets with the Apriori algorithm. Finally, we defined various thresholds to obtain the optimal sets, which automatically extracted product features from user comments. [Results] We examined the effectiveness of the proposed method with comment texts on mobile phone products. Comparing the automatically extracted characteristics with the manually identified characteristics, we found that the precision P value was 72.44%, the recall R value was 77.59%, and the comprehensive F value reached 74.93%. [Limitations] The precision needs to be improved and there might be some human errors involving the manually identified terms. [Conclusions] The Apriori algorithm could help us extract product features effectively.

Key wordsFeature Extraction      Apriori Algorithm      TF-IDF      Variance Selection     
Received: 24 April 2017      Published: 18 October 2017
ZTFLH:  G350  

Cite this article:

Li Changbing,Pang Chongpeng,Li Meiping. Extracting Product Features with Weight-based Apriori Algorithm. Data Analysis and Knowledge Discovery, 2017, 1(9): 83-89.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2017.09.09     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2017/V1/I9/83

A1 A2 A3 An
T1 0 0 1 1
T2 0 1 0 0
T3 1 1 0 1
Tm 1 1 1 0 0
产品特征数 算法识别出来的
正确特征数
算法识别出来的
错误特征数
挖掘出的特征数 A B
没有挖掘出的特征数 C
产品名称 参数 人工标注特征集合 人工标注特征数量
手机 外观设计 外键, 外屏, 彩屏, 机身, 磨砂, 键盘, 外观, 内屏, 方向键, 外观设计, 颜色, 手感, 外壳, 体积, 重量, 快捷键, 金属, 质感, 机型, 外形, 面积, 按键, 数字键, 导航键, 造型, 功能键, 机体, 材质, 图案, 拨号键, 外表, 数字键盘, 红外接口, 尺寸, 按钮, 外盖, 机壳 37
屏幕 分辨率, 色彩, 屏保, 画面, 屏幕, 清晰度, 亮度, 屏幕显示, 显示屏, 触摸屏, 画质, 动画, 透明度 13
基本功能 功能, 短信, 通话记录, 计算器, 记事本, 程序, 联系人, 手写, 信息, 电话, 短消息, 彩信, 闹钟, 日程表, 手写输入, 语音, 软件, 收音机, 防火墙, 通话质量, 电话簿,
录音, 电话号码, 号码, 输入法, 语音拨号, 键盘输入, 通话, 闹铃, 通讯录, 应用
程序, 时钟, 背光灯, 录音器, 背景灯, 手电筒, 备忘录, 收件箱, SIM卡

39
摄像功能 像素, 摄像头, 彩灯, 图片, 闪光灯, 照片, 象素, 镜头, 图像, 照相机, 摄像机 11
娱乐功能 多媒体, 影音, 媒体播放器, 游戏, 音频, 播放器 6
数据功能 蓝牙, 红外线 2
手机附件 耳机, 手写笔, 扩音器, 耳塞, 内存卡, 存储卡, 数据线, 充电器, 防尘盖, 传输线 10
美化 壁纸, 界面, 背景, 菜单, 饱和度, 主题 6
性能 信号, 响应速度, 速度, 识别率, 待机时间, 续航, 性能, 处理速度, 关机, 操作速度, 网络, 待机, 反应速度, 开机, 传输速度, 速率, 反应时间, 智能, 输入速度 19
声音 铃音, 铃声, 音量, 提示音, 声音, 和弦, 和弦铃声, 音质, 音乐, 听筒, 扬声器, 音效, 短信铃声, 关机闹钟 14
硬件配置 容量, 内置, 空间, 储存量, 内存, 处理器, 电池, 硬件, 外置, 存储量, 存储容量,
均衡器, 电池容量, 储存, 内存容量, 电池电量, 存储空间, 储存卡
18
性价比 性价比, 价格, 价位, 价钱, 价值, 零售价 6
售后反馈 质量, 客服 2
排名 属性 wsupport
1 功能 0.3337
2 屏幕 0.2628
3 效果 0.2348
4 铃声 0.2324
5 外观 0.2057
6 电话 0.2054
7 短信 0.1887
8 待机 0.1772
9 声音 0.1719
10 电池 0.1685
项的权重支持度 P(查准率) R(查全率) F(综合值)
0.01 71.35% 77.60% 74.34%
0.012 72.08% 77.59% 74.73%
0.013 72.44% 77.59% 74.93%
0.0135 72.30% 77.05% 74.60%
0.014 72.68% 77.04% 74.80%
0.015 73.01% 75.41% 74.19%
0.016 72.82% 73.22% 73.02%
0.018 73.71% 70.49% 72.06%
0.02 74.09% 67.21% 70.48%
性能指标 本文方法 文献[7]
的方法
文献[11]
的方法
文献[13]
的方法
查准率 72.44% 70.8% 70.72% 62.8%
查全率 77.59% 73.3% 68.35% 81.8%
综合值 74.93% 72% 69.51% 71.05%
性能指标 本文方法 文献[12]的方法 文献[4]的方法
查准率 72.44% 63.3% 71.8%
查全率 77.59% 68.9% 76.1%
综合值 74.93% 66% 73.88%
[1] Zhuang L, Jing F, Zhu X Y.Movie Review Mining and Summarization[C]//Proceedings of the 15th ACM International Conference on Information and Knowledge Management, Arlington, Virginia, USA.New York: ACM, 2006: 43-50.
[2] Kobayashi N, Inui K, Matsumoto Y, et al.Collecting Evaluative Expressions for Opinion Extraction[C]// Proceedings of the 1st International Joint Conference on Natural Language Processing. Berlin, Heidelberg: Springer- Verlag, 2004: 596-605.
[3] 娄德成, 姚天昉. 汉语句子语义极性分析和观点抽取方法的研究[J]. 计算机应用, 2006, 26(11) : 2622-2625.
[3] (Lou Decheng, Yao Tianfang.Semantic Polarity Analysis and Opinion Mining on Chinese Review Sentences[J]. Journal of Computer Applications, 2006, 26(11): 2622-2625.)
[4] Hu M, Liu B.Mining Opinion Features in Customer Reviews[C]// Proceedings of the 19th National Conference on Artificial Intelligence. 2004.
[5] Popescu A M, Etzioni O.Extracting Product Features and Opinions From Reviews[C]//Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing. 2005.
[6] 杜思奇, 李红莲, 吕学强. 汉语组块分析在产品特征提取中的应用研究[J]. 现代图书情报技术, 2015(9): 26-30.
[6] (Du Siqi, Li Honglian, Lv Xueqiang.Application of Chinese Chunk Analysis in Product Feature Extraction[J]. New Technology of Library and Information Service, 2015(9): 26-30.)
[7] 王永, 张勤, 杨晓洁. 中文网络评论中产品特征提取方法研究[J]. 现代图书情报技术, 2013(12): 70-73.
[7] (Wang Yong, Zhang Qin, Yang Xiaojie.Study on the Extraction of Product Features in Chinese Network Reviews[J]. New Technology of Library and Information Service, 2013(12): 70-73.)
[8] 路永和, 梁明辉. 遗传算法在改进文本特征提取方法中的应用[J]. 现代图书情报技术, 2014(4): 48-57.
[8] (Lu Yonghe, Liang Minghui.Application of Genetic Algorithms in Improving Text Feature Extraction Method[J]. New Technology of Library and Information Service, 2014 (4): 48-57.)
[9] 张建娥. 基于TFIDF和词语关联度的中文关键词提取方法[J]. 情报科学, 2012, 30(10): 1542-1544, 1555.
[9] (Zhang Jian’e.Chinese Keyword Extraction Method Based on TFIDF and Word Relevance Degree[J]. Information Science, 2012, 30(10): 1542-1544, 1555.)
[10] 边根庆, 王月. 一种基于矩阵和权重改进的Apriori算法[J]. 微电子学与计算机, 2017, 34(1): 136-140.
[10] (Bian Genqing, Wang Yue.A Apriori Algorithm Based on Matrix and Weight Improvement[J]. Microelectronics and Computer, 2017, 34(1): 136-140.)
[11] Shi B, Chang K.Mining Chinese Reviews[C]//Proceedings of the 6th IEEE lnrternational Conference on Data Mining. 2006.
[12] 李实, 叶强, 李一军, 等. 中文网络客户评论的产品特征挖掘方法研究[J]. 管理科学学报, 2009, 12(2): 142-152.
doi: 10.3321/j.issn:1007-9807.2009.02.015
[12] (Li Shi, Ye Qiang, Li Yijun, et al.Research on Product Feature Mining Method of Chinese Network Customer Review[J]. Chinese Journal of Management Science, 2009, 12(2): 142-152.)
doi: 10.3321/j.issn:1007-9807.2009.02.015
[13] 李实, 叶强, 李一军, 等. 挖掘中文网络客户评论的产品特征及情感倾向[J]. 计算机应用研究, 2010, 27(8): 3016-3019.
doi: 10.3969/j.issn.1001-3695.2010.08.054
[13] (Li Shi, Ye Qiang, Li Yijun, et al.Characteristics and Emotional Tendency of Excavating Chinese Network Customer Reviews[J]. Application Research of Computers, 2010, 27(8): 3016-3019.)
doi: 10.3969/j.issn.1001-3695.2010.08.054
[1] Zheng Xinman, Dong Yu. Constructing Degree Lexicon for STI Policy Texts[J]. 数据分析与知识发现, 2021, 5(10): 81-93.
[2] Tang Xiaobo,Gao Hexuan. Classification of Health Questions Based on Vector Extension of Keywords[J]. 数据分析与知识发现, 2020, 4(7): 66-75.
[3] Cai Jingxuan,Wu Jiang,Wang Chengkun. Predicting Usefulness of Crowd Testing Reports with Deep Learning[J]. 数据分析与知识发现, 2020, 4(11): 102-111.
[4] Peng Chen,Lv Xueqiang,Sun Ning,Zang Le,Jiang Zhaocai,Song Li. Building Phrase Dictionary for Defective Products with Convolutional Neural Network[J]. 数据分析与知识发现, 2020, 4(11): 112-120.
[5] Hui Nie,Huan He. Identifying Implicit Features with Word Embedding[J]. 数据分析与知识发现, 2020, 4(1): 99-110.
[6] Gang Li,Huayang Zhou,Jin Mao,Sijing Chen. Classifying Social Media Users with Machine Learning[J]. 数据分析与知识发现, 2019, 3(8): 1-9.
[7] Xiaofeng Li,Jing Ma,Chi Li,Hengmin Zhu. Identifying Commodity Names Based on XGBoost Model[J]. 数据分析与知识发现, 2019, 3(7): 34-41.
[8] Jiao Yan,Jing Ma,Kang Fang. Computing Text Semantic Similarity with Syntactic Network of Co-occurrence Distance[J]. 数据分析与知识发现, 2019, 3(12): 93-100.
[9] Qinghong Zhong,Xiaodong Qiao,Yunliang Zhang,Mengjuan Weng. Cross-media Fusion Method Based on LDA2Vec and Residual Network[J]. 数据分析与知识发现, 2019, 3(10): 78-88.
[10] Guijun Yang,Xue Xu,Fuqiang Zhao. Predicting User Ratings with XGBoost Algorithm[J]. 数据分析与知识发现, 2019, 3(1): 118-126.
[11] Zhou Lixin,Lin Jie. Extracting Product Features with NodeRank Algorithm[J]. 数据分析与知识发现, 2018, 2(4): 90-98.
[12] Yin Cong,Zhang Liyi. Recommendation Algorithm for Post-Context Filtering Based on TF-IDF: Case Study of Catering O2O[J]. 数据分析与知识发现, 2018, 2(11): 28-36.
[13] Huang Xiaoxi,Li Hanyu,Wang Rongbo,Wang Xiaohua,Chen Zhiqun. Recognizing Metaphor with Convolution Neural Network and SVM[J]. 数据分析与知识发现, 2018, 2(10): 77-83.
[14] Li Weiqing,Wang Weijun. Building Product Feature Dictionary with Large-scale Review Data[J]. 数据分析与知识发现, 2018, 2(1): 41-50.
[15] He Yue,Xiao Min,Zhang Yue. Sentiment Analysis of Trending Topics Based on Relevance[J]. 数据分析与知识发现, 2017, 1(3): 46-53.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn