Please wait a minute...
Data Analysis and Knowledge Discovery  2017, Vol. 1 Issue (9): 83-89    DOI: 10.11925/infotech.2096-3467.2017.09.09
Orginal Article Current Issue | Archive | Adv Search |
Extracting Product Features with Weight-based Apriori Algorithm
Changbing Li,Chongpeng Pang(),Meiping Li
School of Economics and Management, Chongqing University of Posts and Telecommunications, Chongqing 400065, China
Download: PDF(622 KB)   HTML ( 1
Export: BibTeX | EndNote (RIS)      

[Objective] This paper aims to reduce the noises while extracting product features from customer comments. [Methods] We used the TF-IDF and variance selection methods to extracted the needed data. Then, we set the thresholds to filter the extracted words and obtain the product feature set. Third, we generated frequent item sets with the Apriori algorithm. Finally, we defined various thresholds to obtain the optimal sets, which automatically extracted product features from user comments. [Results] We examined the effectiveness of the proposed method with comment texts on mobile phone products. Comparing the automatically extracted characteristics with the manually identified characteristics, we found that the precision P value was 72.44%, the recall R value was 77.59%, and the comprehensive F value reached 74.93%. [Limitations] The precision needs to be improved and there might be some human errors involving the manually identified terms. [Conclusions] The Apriori algorithm could help us extract product features effectively.

Key wordsFeature Extraction      Apriori Algorithm      TF-IDF      Variance Selection     
Received: 24 April 2017      Published: 18 October 2017

Cite this article:

Changbing Li,Chongpeng Pang,Meiping Li. Extracting Product Features with Weight-based Apriori Algorithm. Data Analysis and Knowledge Discovery, 2017, 1(9): 83-89.

URL:     OR

[1] Zhuang L, Jing F, Zhu X Y.Movie Review Mining and Summarization[C]//Proceedings of the 15th ACM International Conference on Information and Knowledge Management, Arlington, Virginia, USA.New York: ACM, 2006: 43-50.
[2] Kobayashi N, Inui K, Matsumoto Y, et al.Collecting Evaluative Expressions for Opinion Extraction[C]// Proceedings of the 1st International Joint Conference on Natural Language Processing. Berlin, Heidelberg: Springer- Verlag, 2004: 596-605.
[3] 娄德成, 姚天昉. 汉语句子语义极性分析和观点抽取方法的研究[J]. 计算机应用, 2006, 26(11) : 2622-2625.
[3] (Lou Decheng, Yao Tianfang.Semantic Polarity Analysis and Opinion Mining on Chinese Review Sentences[J]. Journal of Computer Applications, 2006, 26(11): 2622-2625.)
[4] Hu M, Liu B.Mining Opinion Features in Customer Reviews[C]// Proceedings of the 19th National Conference on Artificial Intelligence. 2004.
[5] Popescu A M, Etzioni O.Extracting Product Features and Opinions From Reviews[C]//Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing. 2005.
[6] 杜思奇, 李红莲, 吕学强. 汉语组块分析在产品特征提取中的应用研究[J]. 现代图书情报技术, 2015(9): 26-30.
[6] (Du Siqi, Li Honglian, Lv Xueqiang.Application of Chinese Chunk Analysis in Product Feature Extraction[J]. New Technology of Library and Information Service, 2015(9): 26-30.)
[7] 王永, 张勤, 杨晓洁. 中文网络评论中产品特征提取方法研究[J]. 现代图书情报技术, 2013(12): 70-73.
[7] (Wang Yong, Zhang Qin, Yang Xiaojie.Study on the Extraction of Product Features in Chinese Network Reviews[J]. New Technology of Library and Information Service, 2013(12): 70-73.)
[8] 路永和, 梁明辉. 遗传算法在改进文本特征提取方法中的应用[J]. 现代图书情报技术, 2014(4): 48-57.
[8] (Lu Yonghe, Liang Minghui.Application of Genetic Algorithms in Improving Text Feature Extraction Method[J]. New Technology of Library and Information Service, 2014 (4): 48-57.)
[9] 张建娥. 基于TFIDF和词语关联度的中文关键词提取方法[J]. 情报科学, 2012, 30(10): 1542-1544, 1555.
[9] (Zhang Jian’e.Chinese Keyword Extraction Method Based on TFIDF and Word Relevance Degree[J]. Information Science, 2012, 30(10): 1542-1544, 1555.)
[10] 边根庆, 王月. 一种基于矩阵和权重改进的Apriori算法[J]. 微电子学与计算机, 2017, 34(1): 136-140.
[10] (Bian Genqing, Wang Yue.A Apriori Algorithm Based on Matrix and Weight Improvement[J]. Microelectronics and Computer, 2017, 34(1): 136-140.)
[11] Shi B, Chang K.Mining Chinese Reviews[C]//Proceedings of the 6th IEEE lnrternational Conference on Data Mining. 2006.
[12] 李实, 叶强, 李一军, 等. 中文网络客户评论的产品特征挖掘方法研究[J]. 管理科学学报, 2009, 12(2): 142-152.
[12] (Li Shi, Ye Qiang, Li Yijun, et al.Research on Product Feature Mining Method of Chinese Network Customer Review[J]. Chinese Journal of Management Science, 2009, 12(2): 142-152.)
[13] 李实, 叶强, 李一军, 等. 挖掘中文网络客户评论的产品特征及情感倾向[J]. 计算机应用研究, 2010, 27(8): 3016-3019.
[13] (Li Shi, Ye Qiang, Li Yijun, et al.Characteristics and Emotional Tendency of Excavating Chinese Network Customer Reviews[J]. Application Research of Computers, 2010, 27(8): 3016-3019.)
[1] Hui Nie,Huan He. Identifying Implicit Features with Word Embedding[J]. 数据分析与知识发现, 2020, 4(1): 99-110.
[2] Gang Li,Huayang Zhou,Jin Mao,Sijing Chen. Classifying Social Media Users with Machine Learning[J]. 数据分析与知识发现, 2019, 3(8): 1-9.
[3] Xiaofeng Li,Jing Ma,Chi Li,Hengmin Zhu. Identifying Commodity Names Based on XGBoost Model[J]. 数据分析与知识发现, 2019, 3(7): 34-41.
[4] Jiao Yan,Jing Ma,Kang Fang. Computing Text Semantic Similarity with Syntactic Network of Co-occurrence Distance[J]. 数据分析与知识发现, 2019, 3(12): 93-100.
[5] Qinghong Zhong,Xiaodong Qiao,Yunliang Zhang,Mengjuan Weng. Cross-media Fusion Method Based on LDA2Vec and Residual Network[J]. 数据分析与知识发现, 2019, 3(10): 78-88.
[6] Guijun Yang,Xue Xu,Fuqiang Zhao. Predicting User Ratings with XGBoost Algorithm[J]. 数据分析与知识发现, 2019, 3(1): 118-126.
[7] Lixin Zhou,Jie Lin. Extracting Product Features with NodeRank Algorithm[J]. 数据分析与知识发现, 2018, 2(4): 90-98.
[8] Cong Yin,Liyi Zhang. Recommendation Algorithm for Post-Context Filtering Based on TF-IDF: Case Study of Catering O2O[J]. 数据分析与知识发现, 2018, 2(11): 28-36.
[9] Xiaoxi Huang,Hanyu Li,Rongbo Wang,Xiaohua Wang,Zhiqun Chen. Recognizing Metaphor with Convolution Neural Network and SVM[J]. 数据分析与知识发现, 2018, 2(10): 77-83.
[10] Weiqing Li,Weijun Wang. Building Product Feature Dictionary with Large-scale Review Data[J]. 数据分析与知识发现, 2018, 2(1): 41-50.
[11] Yue He,Min Xiao,Yue Zhang. Sentiment Analysis of Trending Topics Based on Relevance[J]. 数据分析与知识发现, 2017, 1(3): 46-53.
[12] Du Siqi, Li Honglian, Lv Xueqiang. Research of Chinese Chunk Parsing in Application of the Product Feature Extraction[J]. 现代图书情报技术, 2015, 31(9): 26-30.
[13] Xu Dongdong, Wu Shaobo. An Improved TF-IDF Feature Selection Based on Categorical Description[J]. 现代图书情报技术, 2015, 31(3): 39-48.
[14] Lu Yonghe, Liang Minghui. Improvement of Text Feature Extraction with Genetic Algorithm[J]. 现代图书情报技术, 2014, 30(4): 48-57.
[15] Tang Xiaobo, Xiao Lu. Research of Text Feature Extraction on Dependency Parsing Network[J]. 现代图书情报技术, 2014, 30(11): 31-37.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938