Please wait a minute...
Advanced Search
数据分析与知识发现  2018, Vol. 2 Issue (3): 1-8     https://doi.org/10.11925/infotech.2096-3467.2017.0849
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
基于用户生成内容的潜在客户识别方法*
蒋翠清(), 宋凯伦, 丁勇, 刘尧
合肥工业大学管理学院 合肥 230009
Identifying Potential Customers Based on User-Generated Contents
Jiang Cuiqing(), Song Kailun, Ding Yong, Liu Yao
School of Management, Hefei University of Technology, Hefei 230009, China
全文: PDF (601 KB)   HTML ( 14
输出: BibTeX | EndNote (RIS)      
摘要 

目的】从产品论坛中识别潜在客户, 对产品论坛中的用户生成内容特征进行分析, 识别有购买意愿的产品潜在客户。【方法】将不均衡数据集转换为n个均衡数据集, 结合Stacking分类算法识别潜在客户, 分别使用基分类器算法和本文提出的针对不均衡数据集的Stacking分类算法对样本数据进行测试, 并通过对比F值验证本文算法的有效性。【结果】本文提出的算法的F值较贝叶斯网络、逻辑回归、C4.5决策树、SMO和朴素贝叶斯5种基分类器算法分别提高17.4%、26.5%、24.1%、29.3%、40.9%, 较Stacking、Bagging和Boosting三种集成学习算法分别提高10.1%、5.9%、13.1%。【局限】研究语料来源于汽车行业, 具有一定的领域局限性。【结论】该方法能有效识别潜在客户。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
蒋翠清
宋凯伦
丁勇
刘尧
关键词 用户生成内容潜在客户识别Stacking分类算法不均衡数据集    
Abstract

[Objective] This paper aims to identify potential customers by analyzing user-generated contents from product-specific online forums. [Methods] First, we converted the unbalanced dataset into multiple balanced subsets. Then, we employed the Stacking classification algorithm to construct identification model. Finally, we compared results of the proposed method with five baseline algorithms. [Results] Compared to the algorithms of Bayesnet, Logistic, C4.5, SMO and Naive Bayes, the F-measure of our method was increased by 17.4%, 26.5%, 24.1%, 29.3%, and 40.9%. Compared to Stacking, Bagging and Boosting methods, our F-measure increased by 10.1%, 5.9%, 13.1%. [Limitations] We only examined performance of the proposed methods with automotive industry. [Conclusions] The proposed method could effectively identify potential customers based on user-generated contents.

Key wordsUser-Generated Content    Potential Customer Identification    Stacking Classification Algorithm    Imbalanced Datasets
收稿日期: 2017-08-22      出版日期: 2018-04-03
ZTFLH:  C931  
基金资助:*本文系国家自然科学基金项目“基于社交媒体用户生成内容的产品创新需求发现方法研究”(项目编号: 71571059)和教育部人文社会科学规划基金项目“社会化媒体对企业绩效的影响机制研究” (项目编号: 15YJA630010)的研究成果之一
引用本文:   
蒋翠清, 宋凯伦, 丁勇, 刘尧. 基于用户生成内容的潜在客户识别方法*[J]. 数据分析与知识发现, 2018, 2(3): 1-8.
Jiang Cuiqing,Song Kailun,Ding Yong,Liu Yao. Identifying Potential Customers Based on User-Generated Contents. Data Analysis and Knowledge Discovery, 2018, 2(3): 1-8.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2017.0849      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2018/V2/I3/1
  基于用户生成内容的潜在客户识别框架
特征 编号 说明 备注
人口统计学特征 F1-F14 用户是否所属某地区 是为1, 否为0
F15 用户的注册时长 注册时间到现在的时间差
F16 用户在论坛中的粉丝数
F17 用户在论坛中的关注数
F18 用户在论坛中的发帖精华数
文体特征 F19 评论内容中的总字数
F20-F26 评论内容中时间词、动词、形容词、副词、
普通名词、地点名词以及命名实体的数量
与NLPIR汉语分词包[30]中汉语词性标记集一致
F27-29 评论内容中句号、问号和叹号出现的频率 与NLPIR汉语分词包[30]中汉语词性标记集一致
情感特征 F30 评论内容的情感倾向是否为正面 与中文情感极性词典 NTUSD[23]一致, 是为1, 否为0
F31 评论内容的情感倾向是否为负面 与中文情感极性词典 NTUSD[23]一致, 是为1, 否为0
行为特征 F32 用户是否认证某车型 是为1, 否为0
F33 用户是否关注某车型 是为1, 否为0
F34 用户是否所属某车型组织 是为1, 否为0
F35 用户总评论数
F36 用户总发帖数
F37 用户回复时长 注册时间与回复时间的时间差
关键词特征 F38-F508 关键词出现的词频
  潜在客户有效特征集
  针对不均衡数据的Stacking分类算法框架
算法 准确率 召回率 F值
本文算法 72.2% 70.3% 71.2%
贝叶斯网络 67.8% 44.5% 53.8%
逻辑回归 76.0% 31.7% 44.7%
决策树(C4.5) 55.3% 41.0% 47.1%
SMO 82.6% 28.1% 41.9%
朴素贝叶斯 18.9% 76.2% 30.3%
  本文算法与基分类器对比实验结果
算法 准确率 召回率 F值
本文算法 72.2% 70.3% 71.2%
Stacking集成学习算法 57.8% 64.9% 61.1%
Bagging集成学习算法 65.8% 64.9% 65.3%
Boosting集成学习算法 55.6% 60.8% 58.1%
  本文算法与常见集成学习算法对比实验结果
[1] Shaw M J, Subramaniam C, Tan G W, et al.Knowledge Management and Data Mining for Marketing[J]. Decision Support Systems, 2001, 31(1): 127-137.
doi: 10.1016/S0167-9236(00)00123-8
[2] 魏国华, 康志英. 以客户需求为导向的定制终端潜在客户挖掘模型研究[J]. 信息安全与技术, 2014, 5(3): 79-81.
doi: 10.3969/j.issn.1674-9456.2014.03.027
[2] (Wei Guohua, Kang Zhiying.Research on the Model of Mining Customer Demand Potential Customers Customized Terminal[J]. Information Security & Technology, 2014, 5(3): 79-81.)
doi: 10.3969/j.issn.1674-9456.2014.03.027
[3] 李杏谊. 数据挖掘技术在保险行业目标客户识别中的应用研究[D]. 广州: 中山大学, 2014.
[3] (Li Xingyi.Study on Application of Data Mining Technology in Insurance Target Customer Identification [D]. Guangzhou: Sun Yat-Sen University, 2014.)
[4] 王昱元. 基于数据挖掘的移动客户预测及分析[D]. 西安: 长安大学, 2016.
[4] (Wang Yuyuan.Prediction and Analysis of China Mobile Customers Based on Data Mining[D]. Xi’an: Chang’an University, 2016.)
[5] 曹淑鹏, 蒋竹, 严美艺. 运用决策树模型识别信用消费贷款潜在客户研究[J]. 北京金融评论, 2016(2): 36-53.
[5] (Cao Shupeng, Jiang Zhu, Yan Meiyi.Application of Decision Tree Model to Identify Potential Customers of Credit Consumption Loan[J]. Beijing Review of Financial Studies, 2016(2): 36-53.)
[6] Ganatra A.Draw Attention to Potential Customer with the Help of Subjective Measures in Sequential Pattern Mining (SPM) Approach[C]// Proceedings of the International Conference on Recent Trends in Information, Telecommunication and Computing. 2014.
[7] Chang H J, Hung L P, Ho C L.An Anticipation Model of Potential Customers’ Purchasing Behavior Based on Clustering Analysis and Association Rules Analysis[J]. Expert Systems with Applications, 2007, 32(3): 753-764.
doi: 10.1016/j.eswa.2006.01.049
[8] 过蓓蓓, 方兆本. 基于SVM的Web日志挖掘及潜在客户发现[J]. 管理工程学报, 2010, 24(1): 129-133.
[8] (Guo Beibei, Fang Zhaoben.Application of SVM in Mining Potential Customers from Web Log[J]. Journal of Industrial Engineering & Engineering Management, 2010, 24(1): 129-133.)
[9] Sun L, Duan Z.Web Potential Customer Classification Based on SVM[C]// Proceedings of the 2012 International Conference on Industrial Control and Electronics Engineering. 2012: 568-570.
[10] 郭林雪. 关联规则及协同过滤在汽车电子商务中的应用[J]. 科技经济导刊, 2017(8): 31.
[10] (Guo Linxue.Application of Association Rules and Collaborative Filtering in Automotive E-commerce[J]. Technology and Economic Guide, 2017(8): 31.)
[11] Hsieh H P, Li C T, Lin S D.Estimating Potential Customers Anywhere and Anytime Based on Location-Based Social Networks[A]// Machine Learning and Knowledge Discovery in Databases[M]. Springer International Publishing, 2015.
[12] 蒋翠清, 王齐林, 刘士喜, 等. 中文社会媒体环境下半监督学习的汽车缺陷识别方法[J]. 中国管理科学, 2014(S1): 677-685.
[12] (Jiang Cuiqing, Wang Qilin, Liu Shixi, et al.Semi-supervised Learning for Automobile Defect Identification in the Context of Chinese Social Media[J]. Chinese Journal of Management Science, 2014(S1): 677-685.)
[13] 火车采集器[CP/OL]. [2016-11-04]. .
[13] (LocoySpider [CP/OL]. [2016-11-04].
[14] Zheng X, Zhu S, Lin Z.Capturing the Essence of Word-of- Mouth for Social Commerce: Assessing the Quality of Online E-Commerce Reviews by a Semi-Supervised Approach[J]. Decision Support Systems, 2013, 56(1): 211-222.
doi: 10.1016/j.dss.2013.06.002
[15] Abrahams A S, Fan W, Wang G A, et al.An Integrated Text Analytic Framework for Product Defect Discovery[J]. Production & Operations Management, 2015, 24(6): 975-990.
doi: 10.1111/poms.12303
[16] Krishnamoorthy S.Linguistic Features for Review Helpfulness Prediction[J]. Expert Systems with Applications, 2015, 42(7): 3751-3759.
doi: 10.1016/j.eswa.2014.12.044
[17] Liu Y, Jiang C, Zhao H, et al.Using Contextual Features and Multi-view Ensemble Learning in Product Defect Identification from Online Discussion Forums[J]. Decision Support Systems, 2018, 105: 1-12.
doi: 10.1016/j.dss.2017.10.009
[18] Abbasi A, Chen H.CyberGate: A Design Framework and System for Text Analysis of Computer-Mediated Communication[J]. MIS Quarterly, 2008, 32(4): 811-837.
doi: 10.2307/25148873
[19] Abrahams A S, Jiao J, Fan W, et al.What’s Buzzing in the Blizzard of Buzz? Automotive Component Isolation in Social Media Postings[J]. Decision Support Systems, 2013, 55(4): 871-882.
doi: 10.1016/j.dss.2012.12.023
[20] Lee S, Choeh J Y.Predicting the Helpfulness of Online Reviews Using Multilayer Perceptron Neural Networks[J]. Expert Systems with Applications, 2014, 41(6): 3041-3046.
doi: 10.1016/j.eswa.2013.10.034
[21] Almagrabi H, Malibari A, McNaught J. A Survey of Quality Prediction of Product Reviews[J]. International Journal of Advanced Computer Science & Applications, 2015, 6(11): 49-58.
doi: 10.14569/IJACSA.2015.061107
[22] Xu N, Liu H, Chen J, et al.Selecting a Representative Set of Diverse Quality Reviews Automatically[C]// Proceedings of the 2014 SIAM International Conference on Data Mining. 2014.
[23] NTUSD[OL]. [2017-01-05]. .
[24] Zhu F, Zhang X.Impact of Online Consumer Reviews on Sales: The Moderating Role of Product and Consumer Characteristics[J]. Journal of Marketing, 2010, 74(2): 133-148.
doi: 10.1509/jmkg.74.2.133
[25] Oh C, Sheng O.Investigating Predictive Power of Stock Micro Blog Sentiment in Forecasting Future Stock Price Directional Movement[C]// Proceedings of the Annual International Conference on Information Systems. 2011.
[26] Loughran T, McDonald B. When is a Liability Not a Liability? Textual Analysis, Dictionaries, and 10-Ks[J]. Journal of Finance, 2011, 66(1): 35-65.
doi: 10.1111/j.1540-6261.2010.01625.x
[27] Abrahams A S, Jiao J, Wang G A, et al.Vehicle Defect Discovery from Social Media[J]. Decision Support Systems, 2012, 54(1): 87-97.
doi: 10.1016/j.dss.2012.04.005
[28] Law D, Gruss R, Abrahams A S.Automated Defect Discovery for Dishwasher Appliances from Online Consumer Reviews[J]. Expert Systems with Applications, 2017, 67: 84-94.
doi: 10.1016/j.eswa.2016.08.069
[29] Winkler M, Abrahams A S, Gruss R, et al.Toy Safety Surveillance from Online Reviews[J]. Decision Support Systems, 2016, 90: 23-32.
doi: 10.1016/j.dss.2016.06.016 pmid: 5145195
[30] NLPIR[OL]. [2017-01-10]. .
[31] Wolpert D H.Stacked Generalization[M]. Springer US, 2011.
[32] 汽车之家[OL]. [2016-11-14]. .
[32] (AutoHome [OL]. [2016-11-14].
[33] WEKA [K/OL]. [2017-01-18]. .
[1] 王婷婷, 王凯平, 戚桂杰. 基于情感分析的开放式创新平台创意采纳研究: 以Salesforce为例*[J]. 数据分析与知识发现, 2018, 2(4): 38-47.
[2] 岳子静, 章成志, 周清清. 基于UGC的中国各地区用户饮食偏好挖掘研究*[J]. 数据分析与知识发现, 2017, 1(11): 84-93.
[3] 王曰芬,贾新露,傅柱. 学术社交网络用户内容使用行为研究*——基于科学网热门博文的实证分析[J]. 现代图书情报技术, 2016, 32(6): 63-72.
[4] 张晓勇,周清清,章成志. 面向在线社交网络用户生成内容的饮食话题发现研究*[J]. 现代图书情报技术, 2016, 32(10): 70-80.
[5] 吕英杰, 范静, 刘景方. 基于文体学的中文UGC作者身份识别研究[J]. 现代图书情报技术, 2013, 29(9): 48-53.
[6] 赵辉, 刘怀亮. 面向用户生成内容的短文本聚类算法研究[J]. 现代图书情报技术, 2013, 29(9): 88-92.
[7] 李蕾, 章成志. 社会化标签质量评估研究综述[J]. 现代图书情报技术, 2013, 29(11): 22-29.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn