Please wait a minute...
Data Analysis and Knowledge Discovery  2018, Vol. 2 Issue (3): 1-8    DOI: 10.11925/infotech.2096-3467.2017.0849
Current Issue | Archive | Adv Search |
Identifying Potential Customers Based on User-Generated Contents
Jiang Cuiqing(), Song Kailun, Ding Yong, Liu Yao
School of Management, Hefei University of Technology, Hefei 230009, China
Download: PDF (601 KB)   HTML ( 12
Export: BibTeX | EndNote (RIS)      

[Objective] This paper aims to identify potential customers by analyzing user-generated contents from product-specific online forums. [Methods] First, we converted the unbalanced dataset into multiple balanced subsets. Then, we employed the Stacking classification algorithm to construct identification model. Finally, we compared results of the proposed method with five baseline algorithms. [Results] Compared to the algorithms of Bayesnet, Logistic, C4.5, SMO and Naive Bayes, the F-measure of our method was increased by 17.4%, 26.5%, 24.1%, 29.3%, and 40.9%. Compared to Stacking, Bagging and Boosting methods, our F-measure increased by 10.1%, 5.9%, 13.1%. [Limitations] We only examined performance of the proposed methods with automotive industry. [Conclusions] The proposed method could effectively identify potential customers based on user-generated contents.

Key wordsUser-Generated Content      Potential Customer Identification      Stacking Classification Algorithm      Imbalanced Datasets     
Received: 22 August 2017      Published: 03 April 2018
ZTFLH:  C931  

Cite this article:

Jiang Cuiqing,Song Kailun,Ding Yong,Liu Yao. Identifying Potential Customers Based on User-Generated Contents. Data Analysis and Knowledge Discovery, 2018, 2(3): 1-8.

URL:     OR

特征 编号 说明 备注
人口统计学特征 F1-F14 用户是否所属某地区 是为1, 否为0
F15 用户的注册时长 注册时间到现在的时间差
F16 用户在论坛中的粉丝数
F17 用户在论坛中的关注数
F18 用户在论坛中的发帖精华数
文体特征 F19 评论内容中的总字数
F20-F26 评论内容中时间词、动词、形容词、副词、
F27-29 评论内容中句号、问号和叹号出现的频率 与NLPIR汉语分词包[30]中汉语词性标记集一致
情感特征 F30 评论内容的情感倾向是否为正面 与中文情感极性词典 NTUSD[23]一致, 是为1, 否为0
F31 评论内容的情感倾向是否为负面 与中文情感极性词典 NTUSD[23]一致, 是为1, 否为0
行为特征 F32 用户是否认证某车型 是为1, 否为0
F33 用户是否关注某车型 是为1, 否为0
F34 用户是否所属某车型组织 是为1, 否为0
F35 用户总评论数
F36 用户总发帖数
F37 用户回复时长 注册时间与回复时间的时间差
关键词特征 F38-F508 关键词出现的词频
算法 准确率 召回率 F值
本文算法 72.2% 70.3% 71.2%
贝叶斯网络 67.8% 44.5% 53.8%
逻辑回归 76.0% 31.7% 44.7%
决策树(C4.5) 55.3% 41.0% 47.1%
SMO 82.6% 28.1% 41.9%
朴素贝叶斯 18.9% 76.2% 30.3%
算法 准确率 召回率 F值
本文算法 72.2% 70.3% 71.2%
Stacking集成学习算法 57.8% 64.9% 61.1%
Bagging集成学习算法 65.8% 64.9% 65.3%
Boosting集成学习算法 55.6% 60.8% 58.1%
[1] Shaw M J, Subramaniam C, Tan G W, et al.Knowledge Management and Data Mining for Marketing[J]. Decision Support Systems, 2001, 31(1): 127-137.
doi: 10.1016/S0167-9236(00)00123-8
[2] 魏国华, 康志英. 以客户需求为导向的定制终端潜在客户挖掘模型研究[J]. 信息安全与技术, 2014, 5(3): 79-81.
doi: 10.3969/j.issn.1674-9456.2014.03.027
[2] (Wei Guohua, Kang Zhiying.Research on the Model of Mining Customer Demand Potential Customers Customized Terminal[J]. Information Security & Technology, 2014, 5(3): 79-81.)
doi: 10.3969/j.issn.1674-9456.2014.03.027
[3] 李杏谊. 数据挖掘技术在保险行业目标客户识别中的应用研究[D]. 广州: 中山大学, 2014.
[3] (Li Xingyi.Study on Application of Data Mining Technology in Insurance Target Customer Identification [D]. Guangzhou: Sun Yat-Sen University, 2014.)
[4] 王昱元. 基于数据挖掘的移动客户预测及分析[D]. 西安: 长安大学, 2016.
[4] (Wang Yuyuan.Prediction and Analysis of China Mobile Customers Based on Data Mining[D]. Xi’an: Chang’an University, 2016.)
[5] 曹淑鹏, 蒋竹, 严美艺. 运用决策树模型识别信用消费贷款潜在客户研究[J]. 北京金融评论, 2016(2): 36-53.
[5] (Cao Shupeng, Jiang Zhu, Yan Meiyi.Application of Decision Tree Model to Identify Potential Customers of Credit Consumption Loan[J]. Beijing Review of Financial Studies, 2016(2): 36-53.)
[6] Ganatra A.Draw Attention to Potential Customer with the Help of Subjective Measures in Sequential Pattern Mining (SPM) Approach[C]// Proceedings of the International Conference on Recent Trends in Information, Telecommunication and Computing. 2014.
[7] Chang H J, Hung L P, Ho C L.An Anticipation Model of Potential Customers’ Purchasing Behavior Based on Clustering Analysis and Association Rules Analysis[J]. Expert Systems with Applications, 2007, 32(3): 753-764.
doi: 10.1016/j.eswa.2006.01.049
[8] 过蓓蓓, 方兆本. 基于SVM的Web日志挖掘及潜在客户发现[J]. 管理工程学报, 2010, 24(1): 129-133.
[8] (Guo Beibei, Fang Zhaoben.Application of SVM in Mining Potential Customers from Web Log[J]. Journal of Industrial Engineering & Engineering Management, 2010, 24(1): 129-133.)
[9] Sun L, Duan Z.Web Potential Customer Classification Based on SVM[C]// Proceedings of the 2012 International Conference on Industrial Control and Electronics Engineering. 2012: 568-570.
[10] 郭林雪. 关联规则及协同过滤在汽车电子商务中的应用[J]. 科技经济导刊, 2017(8): 31.
[10] (Guo Linxue.Application of Association Rules and Collaborative Filtering in Automotive E-commerce[J]. Technology and Economic Guide, 2017(8): 31.)
[11] Hsieh H P, Li C T, Lin S D.Estimating Potential Customers Anywhere and Anytime Based on Location-Based Social Networks[A]// Machine Learning and Knowledge Discovery in Databases[M]. Springer International Publishing, 2015.
[12] 蒋翠清, 王齐林, 刘士喜, 等. 中文社会媒体环境下半监督学习的汽车缺陷识别方法[J]. 中国管理科学, 2014(S1): 677-685.
[12] (Jiang Cuiqing, Wang Qilin, Liu Shixi, et al.Semi-supervised Learning for Automobile Defect Identification in the Context of Chinese Social Media[J]. Chinese Journal of Management Science, 2014(S1): 677-685.)
[13] 火车采集器[CP/OL]. [2016-11-04]. .
[13] (LocoySpider [CP/OL]. [2016-11-04].
[14] Zheng X, Zhu S, Lin Z.Capturing the Essence of Word-of- Mouth for Social Commerce: Assessing the Quality of Online E-Commerce Reviews by a Semi-Supervised Approach[J]. Decision Support Systems, 2013, 56(1): 211-222.
doi: 10.1016/j.dss.2013.06.002
[15] Abrahams A S, Fan W, Wang G A, et al.An Integrated Text Analytic Framework for Product Defect Discovery[J]. Production & Operations Management, 2015, 24(6): 975-990.
doi: 10.1111/poms.12303
[16] Krishnamoorthy S.Linguistic Features for Review Helpfulness Prediction[J]. Expert Systems with Applications, 2015, 42(7): 3751-3759.
doi: 10.1016/j.eswa.2014.12.044
[17] Liu Y, Jiang C, Zhao H, et al.Using Contextual Features and Multi-view Ensemble Learning in Product Defect Identification from Online Discussion Forums[J]. Decision Support Systems, 2018, 105: 1-12.
doi: 10.1016/j.dss.2017.10.009
[18] Abbasi A, Chen H.CyberGate: A Design Framework and System for Text Analysis of Computer-Mediated Communication[J]. MIS Quarterly, 2008, 32(4): 811-837.
doi: 10.2307/25148873
[19] Abrahams A S, Jiao J, Fan W, et al.What’s Buzzing in the Blizzard of Buzz? Automotive Component Isolation in Social Media Postings[J]. Decision Support Systems, 2013, 55(4): 871-882.
doi: 10.1016/j.dss.2012.12.023
[20] Lee S, Choeh J Y.Predicting the Helpfulness of Online Reviews Using Multilayer Perceptron Neural Networks[J]. Expert Systems with Applications, 2014, 41(6): 3041-3046.
doi: 10.1016/j.eswa.2013.10.034
[21] Almagrabi H, Malibari A, McNaught J. A Survey of Quality Prediction of Product Reviews[J]. International Journal of Advanced Computer Science & Applications, 2015, 6(11): 49-58.
doi: 10.14569/IJACSA.2015.061107
[22] Xu N, Liu H, Chen J, et al.Selecting a Representative Set of Diverse Quality Reviews Automatically[C]// Proceedings of the 2014 SIAM International Conference on Data Mining. 2014.
[23] NTUSD[OL]. [2017-01-05]. .
[24] Zhu F, Zhang X.Impact of Online Consumer Reviews on Sales: The Moderating Role of Product and Consumer Characteristics[J]. Journal of Marketing, 2010, 74(2): 133-148.
doi: 10.1509/jmkg.74.2.133
[25] Oh C, Sheng O.Investigating Predictive Power of Stock Micro Blog Sentiment in Forecasting Future Stock Price Directional Movement[C]// Proceedings of the Annual International Conference on Information Systems. 2011.
[26] Loughran T, McDonald B. When is a Liability Not a Liability? Textual Analysis, Dictionaries, and 10-Ks[J]. Journal of Finance, 2011, 66(1): 35-65.
doi: 10.1111/j.1540-6261.2010.01625.x
[27] Abrahams A S, Jiao J, Wang G A, et al.Vehicle Defect Discovery from Social Media[J]. Decision Support Systems, 2012, 54(1): 87-97.
doi: 10.1016/j.dss.2012.04.005
[28] Law D, Gruss R, Abrahams A S.Automated Defect Discovery for Dishwasher Appliances from Online Consumer Reviews[J]. Expert Systems with Applications, 2017, 67: 84-94.
doi: 10.1016/j.eswa.2016.08.069
[29] Winkler M, Abrahams A S, Gruss R, et al.Toy Safety Surveillance from Online Reviews[J]. Decision Support Systems, 2016, 90: 23-32.
doi: 10.1016/j.dss.2016.06.016 pmid: 5145195
[30] NLPIR[OL]. [2017-01-10]. .
[31] Wolpert D H.Stacked Generalization[M]. Springer US, 2011.
[32] 汽车之家[OL]. [2016-11-14]. .
[32] (AutoHome [OL]. [2016-11-14].
[33] WEKA [K/OL]. [2017-01-18]. .
[1] Wang Yuefen,Jia Xinlu,Fu Zhu. Content Using Behavior of Academic Social Network System: Case Study of Popular Blogs from[J]. 现代图书情报技术, 2016, 32(6): 63-72.
[2] Zhang Xiaoyong,Zhou Qingqing,Zhang Chengzhi. Identifying Food Topics from User-Generated Contents in Microblogs[J]. 现代图书情报技术, 2016, 32(10): 70-80.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938