Please wait a minute...
Advanced Search
数据分析与知识发现  2020, Vol. 4 Issue (11): 102-111     https://doi.org/10.11925/infotech.2096-3467.2020.0059
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
基于深度学习的众测报告有用性预测研究*
蔡婧璇1,吴江1,2(),王诚坤1
1武汉大学信息管理学院 武汉 430072
2武汉大学电子商务研究与发展中心 武汉 430072
Predicting Usefulness of Crowd Testing Reports with Deep Learning
Cai Jingxuan1,Wu Jiang1,2(),Wang Chengkun1
1School of Information Management, Wuhan University, Wuhan 430072, China
2Center for E-commerce Research and Development, Wuhan University, Wuhan 430072, China
全文: PDF (1461 KB)   HTML ( 16
输出: BibTeX | EndNote (RIS)      
摘要 

【目的】 以众测报告为研究对象,探索众测报告作者属性、产品属性、文本、图片对预测众测报告有用性的作用。【方法】 基于深度学习提取众测报告的文本特征和图片特征,使用全连接神经网络构建众测报告有用性预测模型,使用80%随机样本对不同输入组合下的模型进行训练学习,再以剩余样本作为测试集评估模型的预测效果。【结果】 单独加入文本特征后,模型的预测效果提升4.24%;单独加入图片特征后,模型的预测效果提升5.21%;同时加入文本特征和图片特征后,模型的预测效果提升6.96%。【局限】 深度学习提取的文本特征和图片特征可理解性与可解释性较差,因此,即使最终模型的预测结果比较准确,仍难以得知模型中每一层神经网络所代表的具体特征并总结归纳出模型做出最终决策所依赖的预测规则。【结论】 众测报告中文字描述的特征和图片特征都能有效预测众测报告对消费者的有用性,且两者对于预测众测报告对消费者的有用性具有相互验证和相互替代的作用。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
蔡婧璇
吴江
王诚坤
关键词 产品众测信号理论深度学习特征提取预测分析    
Abstract

[Objective] This paper tries to predict the usefulness of crowd testing reports with author attributes, text features, and image features. [Methods] First, we adopted deep learning techniques to extract text and image features from crowd testing reports. Then, we constrcuted a prediction model with full-connected neural network. Third, we trained the new model with 80% of samples and different input combinations. Finally, we examined our model’s performance with the remaining samples. [Results] With the help of text or image features, the prediction accuracy of the model increased by 4.24% and 5.21%, respectively. Using both the text and image features, our model’s prediction accuracy increased by 6.96%. [Limitations] The extracted features of texts and images were not understandable and interpretable. Therefore, we cannot identify specific features represented by each layer of neural network in the model. [Conclusions] The proposed model with text and image features can effectively predict the usefulness of crowd testing reports.

Key wordsCrowd Testing    Signal Theory    Deep Learning    Feature Extraction    Predictive Analysis
收稿日期: 2020-01-15      出版日期: 2020-09-27
ZTFLH:  G203  
基金资助:*本文系国家自然科学基金项目“信息不对称驱动的共享经济去中心化机制与风险的复杂性研究”的研究成果之一(71874131)
通讯作者: 吴江     E-mail: jiangw@whu.edu.cn
引用本文:   
蔡婧璇,吴江,王诚坤. 基于深度学习的众测报告有用性预测研究*[J]. 数据分析与知识发现, 2020, 4(11): 102-111.
Cai Jingxuan,Wu Jiang,Wang Chengkun. Predicting Usefulness of Crowd Testing Reports with Deep Learning. Data Analysis and Knowledge Discovery, 2020, 4(11): 102-111.
链接本文:  
http://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2020.0059      或      http://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2020/V4/I11/102
维度 变量 说明 Obs Mean S.D. Max Min
因变量 Like 众测报告获得的点赞数 1 550 27.28 33.56 385 0
作者属性 Au_level 作者在平台上的等级 1 550 9.55 3.72 26 1
Au_article 作者在平台上发布的文章总数 1 550 24.57 31.18 246 1
Au_test 作者参与众测次数 1 550 5.10 4.82 114 1
Au_followed 作者关注数 1 550 911.74 1 486.37 10 620 0
Au_follower 作者粉丝数 1 550 57.02 66.68 1 225 0
产品属性 Prod_type 试用产品的类型(美妆护肤、美食、生活、服饰、电子产品、母婴) 1 550 - - - -
Prod_price 试用产品的价格 1 550 177.32 215.30 1 700 11.99
Prod_num 同时参与该商品众测人数 1 550 7.98 7.35 32 1
Table 1  基础属性定义及描述性统计
类别 内容 来源
用户词典 (1)产品及商家名称:美妆蛋、小白鞋、良品铺子等
(2)体验描述:沙漠皮、开箱、神器等
(3)网络用语及缩写:鸡冻、踩雷、轻奢等
(1)网站中对众测商品的描述
(2)高频未分词
停词词典 (1)标点符号
(2)虚词
(3)表情
(4)其他无意义词语:哈哈等
(1)哈尔滨工业大学停用词表
(2)高频无意义词
Table 2  用户词典及停用词表
Fig.1  众测报告文本主题困惑度
Fig.2  图片特征提取流程
Fig.3  特征融合流程图(以模型4为例)
Fig.4  模型构建流程图
模型代码 特征组合 精确率 召回率 F1值 准确率
模型1 作者属性+产品属性 0.767 6 0.878 3 0.791 7 0.782 1
模型2 作者属性+产品属性+报告文本 0.862 4 0.840 3 0.809 9 0.824 5
模型3 作者属性+产品属性+报告图片 0.870 0 0.857 1 0.835 9 0.834 2
模型4 作者属性+产品属性+报告文本+报告图片 0.885 4 0.887 0 0.865 8 0.851 7
Table 3  模型预测效果
模型代码 图片预训练模型 精确率 召回率 F1值 准确率
模型3 VGG19 0.870 0 0.857 1 0.835 9 0.834 2
ResNet50 0.843 0 0.803 1 0.822 6 0.816 7
InceptionV3 0.504 1 0.941 9 0.670 1 0.512 3
NASNetMobile 0.795 2 0.792 3 0.779 0 0.768 8
模型4 VGG19 0.885 4 0.887 0 0.865 8 0.851 7
ResNet50 0.786 0 0.838 1 0.803 6 0.793 8
InceptionV3 0.859 8 0.806 7 0.831 6 0.831 3
NASNetMobile 0.885 0 0.745 5 0.801 1 0.810 5
Table 4  使用不同图片预训练模型的预测效果
[1] Han Y, Zhang Z. Impact of Free Sampling on Product Diffusion Based on Bass Model[J]. Electronic Commerce Research, 2018,18(2):125-141.
doi: 10.1007/s10660-017-9264-9
[2] Koch O F, Benlian A. The Effect of Free Sampling Strategies on Freemium Conversion Rates[J]. Electronic Markets, 2017,27:67-76.
doi: 10.1007/s12525-016-0236-z
[3] Wu L, Deng S, Jiang X. Sampling and Pricing Strategy Under Competition[J]. Omega, 2018,80:192-208.
doi: 10.1016/j.omega.2018.01.002
[4] Hsu C L, Lin J C C, Chiang H S. The Effects of Blogger Recommendations on Customers’ Online Shopping Intentions[J]. Internet Research, 2013,23(1):69-88.
doi: 10.1108/10662241311295782
[5] Biswas D, Grewal D, Roggeveen A. How the Order of Sampled Experiential Products Affects Choice[J]. Journal of Marketing Research, 2010,47(3):508-519.
doi: 10.1509/jmkr.47.3.508
[6] Lin Z, Zhang Y, Tan Y. An Empirical Study of Free Product Sampling and Rating Bias[J]. Information Systems Research, 2019,30(1):260-275.
doi: 10.1287/isre.2018.0801
[7] Fehr E, Gächter S. Fairness and Retaliation: The Economics of Reciprocity[J]. The Journal of Economic Perspectives, 2000,14(3):159-181.
[8] Schmidhuber J. Deep Learning in Neural Networks: An Overview[J]. Neural Networks, 2015,61:85-117.
doi: 10.1016/j.neunet.2014.09.003 pmid: 25462637
[9] Ni F T, Zhang J, Noori M N. Deep Learning for Data Anomaly Detection and Data Compression of a Long-Span Suspension Bridge[J]. Computer-Aided Civil and Infrastructure Engineering, 2020,35(7):685-700.
doi: 10.1111/mice.v35.7
[10] Xu W H, Huang H, Zhang J, et al. CNN-based Skip-Gram Method for Improving Classification Accuracy of Chinese Text[J]. KSII Transactions on Internet and Information Systems, 2019,13(12):6080-6096.
[11] Yadav A, Vishwakarma D K. Sentiment Analysis Using Deep Learning Architectures: A Review[J]. Artificial Intelligence Review, 2020,53(6):4335-4385.
doi: 10.1007/s10462-019-09794-5
[12] Yang W M, Zhang X C, Tian Y P, et al. Deep Learning for Single Image Super-Resolution: A Brief Review[J]. IEEE Transactions on Multimedia, 2019,21(12):3106-3121.
doi: 10.1109/TMM.6046
[13] Dhillon A, Verma G K. Convolutional Neural Network: A Review of Models, Methodologies and Applications to Object Detection[J]. Progress in Artificial Intelligence, 2020,9:85-112.
doi: 10.1007/s13748-019-00203-0
[14] Simonyan K, Zisserman A. Very Deep Convolutional Networks for Large-Scale Image Recognition[OL]. arXiv Preprint, arXiv: 1409. 1556.
[15] Lakshmanan A, Krishnan H S. The A-ha! Experience: Insight and Discontinuous Learning in Product Usage[J]. Journal of Marketing, 2011,75(6):105-123.
[16] Kim J, Morris J D. The Power of Affective Response and Cognitive Structure In Product-Trial Attitude Formation[J]. Journal of Advertising, 2007,36(1):95-106.
doi: 10.2753/JOA0091-3367360107
[17] Chen H, Duan W, Zhou W. The Interplay Between Free Sampling and Word of Mouth in the Online Software Market[J]. Decision Support Systems, 2017,95:82-90.
doi: 10.1016/j.dss.2017.01.001
[18] Li F, Yi Z. Trial or No Trial: Supplying Costly Signals to Improve Profits[J]. Decision Sciences, 2016,48(4):795-827.
doi: 10.1111/deci.2017.48.issue-4
[19] Han Y, Zhang Z. Impact of Free Sampling on Product Diffusion Based on Bass Model[J]. Electronic Commerce Research, 2018,18(1):125-141.
doi: 10.1007/s10660-017-9264-9
[20] Connelly B, Certo T, Ireland R, et al. Signaling Theory: A Review and Assessment[J]. Journal of Management, 2011,37(1):39-67.
doi: 10.1177/0149206310388419
[21] Kirmani A, Rao A. No Pain, No Gain: A Critical Review of the Literature on Signaling Unobservable Product Quality[J]. Journal of Marketing, 2000,64(2):66-79.
doi: 10.1509/jmkg.64.2.66.18000
[22] Hampshire K, Hamill H, Mariwah S, et al. The Application of Signalling Theory to Health-Related Trust Problems: The Example of Herbal Clinics in Ghana and Tanzania[J]. Social Science & Medicine, 2017,188:109-118.
pmid: 28738317
[23] Mavlanova T, Benbunan-Fich R, Koufaris M. Signaling Theory and Information Asymmetry in Online Commerce[J]. Information & Management, 2012,49(5):240-247.
[24] Biswas D, Biswas A. The Diagnostic Role of Signals in the Context of Perceived Risks in Online Shopping: Do Signals Matter More on the Web?[J]. Journal of Interactive Marketing, 2004,18(3):30-45.
doi: 10.1002/dir.20010
[25] Spence M. Signaling in Retrospect and the Informational Structure of Markets[J]. American Economic Review, 2002,92(3):434-459.
doi: 10.1257/00028280260136200
[26] Donath J. Signals in Social Supernets[J]. Journal of Computer-Mediated Communication, 2007,13(1):231-251.
doi: 10.1111/j.1083-6101.2007.00394.x
[27] Jiang Z, Benbasat I. Virtual Product Experience: Effects of Visual and Functional Control of Products on Perceived Diagnosticity and Flow in Electronic Shopping[J]. Journal of Management Information Systems, 2005,21(3):111-148.
doi: 10.1080/07421222.2004.11045817
[28] Teng L, Ye N, Yu Y, et al. Effects of Culturally Verbal and Visual Congruency/Incongruency Across Cultures in a Competitive Advertising Context[J]. Journal of Business Research, 2014,67(3):288-294.
doi: 10.1016/j.jbusres.2013.05.015
[29] Evans J S B T. In Two Minds: Dual-Process Accounts of Reasoning[J]. Trends in Cognitive Sciences, 2003,7(10):454-459.
doi: 10.1016/j.tics.2003.08.012 pmid: 14550493
[30] Yoo J, Kim M. The Effects of Online Product Presentation on Consumer Responses: A Mental Imagery Perspective[J]. Journal of Business Research, 2014,67(11):2464-2472.
doi: 10.1016/j.jbusres.2014.03.006
[31] Macinnis D, Price L. The Role of Imagery in Information Processing: Review and Extensions[J]. Journal of Consumer Research, 1987,13(4):473-491.
doi: 10.1086/jcr.1987.13.issue-4
[32] Wang Q, Cui X, Huang L, et al. Seller Reputation or Product Presentation? An Empirical Investigation from Cue Utilization Perspective[J]. International Journal of Information Management, 2016,36(3):271-283.
doi: 10.1016/j.ijinfomgt.2015.12.006
[33] Zhang S, Lee D, Vir Singh P, et al. How Much is an Image Worth? Airbnb Property Demand Estimation Leveraging Large Scale Image Analytics[OL]. [2017-05-25]. https://ssrn.com/abstract=2976021.
[1] 黄露,周恩国,李岱峰. 融合特定任务信息注意力机制的文本表示学习模型*[J]. 数据分析与知识发现, 2020, 4(9): 111-122.
[2] 赵旸, 张智雄, 刘欢, 丁良萍. 基于BERT模型的中文医学文献分类研究*[J]. 数据分析与知识发现, 2020, 4(8): 41-49.
[3] 徐晨飞, 叶海影, 包平. 基于深度学习的方志物产资料实体自动识别模型构建研究*[J]. 数据分析与知识发现, 2020, 4(8): 86-97.
[4] 余传明, 王曼怡, 林虹君, 朱星宇, 黄婷婷, 安璐. 基于深度学习的词汇表示模型对比研究*[J]. 数据分析与知识发现, 2020, 4(8): 28-40.
[5] 王鑫芸,王昊,邓三鸿,张宝隆. 面向期刊选择的学术论文内容分类研究 *[J]. 数据分析与知识发现, 2020, 4(7): 96-109.
[6] 王末,崔运鹏,陈丽,李欢. 基于深度学习的学术论文语步结构分类方法研究*[J]. 数据分析与知识发现, 2020, 4(6): 60-68.
[7] 焦启航,乐小虬. 对比关系句子生成方法研究[J]. 数据分析与知识发现, 2020, 4(6): 43-50.
[8] 邓思艺,乐小虬. 基于动态语义注意力的指代消解方法[J]. 数据分析与知识发现, 2020, 4(5): 46-53.
[9] 余传明,原赛,朱星宇,林虹君,张普亮,安璐. 基于深度学习的热点事件主题表示研究*[J]. 数据分析与知识发现, 2020, 4(4): 1-14.
[10] 苏传东,黄孝喜,王荣波,谌志群,毛君钰,朱嘉莹,潘宇豪. 基于词嵌入融合和循环神经网络的中英文隐喻识别*[J]. 数据分析与知识发现, 2020, 4(4): 91-99.
[11] 刘彤,倪维健,孙宇健,曾庆田. 基于深度迁移学习的业务流程实例剩余执行时间预测方法*[J]. 数据分析与知识发现, 2020, 4(2/3): 134-142.
[12] 达婧玮,颜嘉麒,邓三鸿,王忠民. 基于深度学习的重复住院预测模型研究——以心脏病为例*[J]. 数据分析与知识发现, 2020, 4(11): 63-73.
[13] 丁恒,李映萱. 基于深度学习的问答平台查询推荐研究*[J]. 数据分析与知识发现, 2020, 4(10): 37-46.
[14] 余传明,李浩男,王曼怡,黄婷婷,安璐. 基于深度学习的知识表示研究:网络视角*[J]. 数据分析与知识发现, 2020, 4(1): 63-75.
[15] 李纲,周华阳,毛进,陈思菁. 基于机器学习的社交媒体用户分类研究 *[J]. 数据分析与知识发现, 2019, 3(8): 1-9.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn