Please wait a minute...
Data Analysis and Knowledge Discovery  2017, Vol. 1 Issue (8): 48-58    DOI: 10.11925/infotech.2096-3467.2017.08.06
Orginal Article Current Issue | Archive | Adv Search |
Predicting Online Users’ Ratings with Comments
Zhang Hongli, Liu Jiying, Yang Sinan, Xu Jian()
School of Information Management, Sun Yat-Sen University, Guangzhou 510006, China
Download: PDF (1097 KB)   HTML ( 10
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This study aims to build an effective prediction mechanism for online ratings, with the help of Web surfers’ comments. [Methods] We proposed a model with the following modules: Web users’comment acquisition, predictive variable acquisition, prediction analysis and the prediction results evaluation. We retrieved 30 movies of different types and user’s comments from the Web. 27 movies were used to build the model, which were then examined with the remaining movies. [Results] We employed the stepwise regression to select variables, which included the number of raters, the number of participants posting comments, the number of people who wanted to watch the moive and the sentiment value of the positive comments. The prediction results were quite close to the IMDb scores, and the maximum and the minimum differences were 0.0644 and 0.0227. [Limitations] The sample size, the accuracy of sentiment features, and compatibility of the model could be improved. [Conclusions] The proposed model effectively predicts movie scores and detects the “water army” online.

Key wordsRating Prediction      Sentiment Analysis      Regression Analysis      Movie Rating      "Water Army" Detection     
Received: 31 May 2017      Published: 28 September 2017
ZTFLH:  G350  

Cite this article:

Zhang Hongli,Liu Jiying,Yang Sinan,Xu Jian. Predicting Online Users’ Ratings with Comments. Data Analysis and Knowledge Discovery, 2017, 1(8): 48-58.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2017.08.06     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2017/V1/I8/48

编号 电影名称 国内上
映日期
类型 制作地区
1 小时代4 2015/7/9 爱情、剧情、青春 中国内地、
中国台湾
2 小时代2 2013/8/8 青春、剧情、爱情 中国内地、
中国台湾
3 恶棍天使 2015/12/24 喜剧、荒诞、爱情 中国
4 万物生长 2015/4/17 爱情、剧情、校园 中国
5 捉妖记 2015/7/16 剧情、喜剧、奇幻 中国
6 湄公河行动 2016/9/30 动作、警匪 中国
7 驴得水 2016/10/28 喜剧、剧情 中国
8 功夫熊猫3 2016/1/29 动画, 喜剧、动作 美国、中国
9 百鸟朝凤 2016/5/6 剧情、文化 中国
10 七月与安生 2016/9/14 剧情、爱情、青春 中国
预测变量名称 实际含义
LcriticNum 参与评分的人数以10为底对数值
LcommentNum 参与评论的人数以10为底对数值
LwatchedNum 已经看过的人数以10为底对数值
LdesireNum 想要观看的人数以10为底对数值
commentRatio 评论人数占评分人数的比例
desireRatio 想要观看人次占看过和想看人次的比例
sentimentmeanScore 电影评论情感均值
posmeanScore 电影正向评论情感均值
negmeanScore 电影负向评论情感均值
doubanScore 豆瓣电影评分
编号 电影名称 Lcritic
Num
Lcomment
Num
Lwatched
Num
Ldesire
Num
comment
Ratio
desireRatio sentiment
meanScore
posmean
Score
negmean
Score
douban
Score
1 小时代4 4.9019 4.5759 4.9563 3.9654 0.4720 0.0927 0.6022 4.3345 -3.7442 4.6
2 小时代2 5.1045 4.7196 5.1774 3.8624 0.4121 0.0462 0.6174 4.2995 -3.7318 5
3 恶棍天使 4.8992 4.6329 4.9357 3.8567 0.5416 0.0769 0.3044 4.1802 -3.6735 4
4 万物生长 4.9530 4.5765 5.0190 3.9803 0.4202 0.0838 0.5267 4.1363 -3.8332 5.9
5 捉妖记 5.3677 4.9937 5.4185 4.2924 0.4226 0.0696 1.2405 4.3430 -3.4054 6.8
6 湄公河行动 5.3412 5.0007 5.3659 4.5103 0.4565 0.1224 1.4745 4.6532 -3.5063 8.1
7 驴得水 5.1235 4.7927 5.1492 4.4252 0.4668 0.1588 0.4241 4.3093 -4.1345 8.3
8 功夫熊猫3 5.1937 4.7917 5.2385 4.0827 0.3962 0.0653 1.7018 4.6260 -3.0234 7.7
9 百鸟朝凤 4.9233 4.5974 4.9611 4.3204 0.4722 0.1861 2.1067 5.5629 -3.3765 8
10 七月与安生 5.2082 4.8858 5.2441 4.2882 0.4760 0.0997 1.7458 4.8169 -3.3355 7.6
变量名 P值
LcriticNum 0.142
LcommentNum 0.217
LwatchedNum 0.304
LdesireNum 0.151
commentRatio 0.359
desireRatio 0.308
sentimentmeanScore 0.824
posmeanScore 0.427
negmeanScore 0.820
回归方法 变量名 P值
逐步回归 LcriticNum 0.0320
LcommentNum 0.0046
LwacthedNum 0.0728
LdesireNum 0.0027
posmeanScore 0.0020
岭回归 LdesireNum 0.0001
commentRatio 0.0336
posmeanScore 0.0020
套索回归 LdesireNum 0.0001
sentimentmeanScore 0.0003
变量名 P值
LcriticNum 0.0003
LcommentNum 0.0004
LdesireNum 0.0002
posmeanScore 0.0001
电影名称 LcriticNum LcommentNum LdesireNum posmeanNum
心迷宫 5.1247 4.7244 4.6835 4.9646
七月与安生 5.2082 4.8858 4.2882 4.8169
我的少女时代 5.3919 5.0585 4.4110 4.8415
[1] 楼旭东, 刘萍. “网络水军”的传播学分析[J]. 当代传播, 2011(4): 76-77.
[1] (Lou Xudong, Liu Ping.A Communicational Analysis of the “Water-forces in the Network”[J]. Contemporary Communication, 2011(4): 76-77.)
[2] Mudambi S M, Schuff D.What Makes a Helpful Online Review? A Study of Customer Reviews on Amazon.com[J]. MIS Quarterly, 2010, 34(1): 185-200.
doi: 10.1007/s10107-008-0244-7
[3] Chen Y, Chai Y, Liu Y, et al.Analysis of Review Helpfulness Based on Consumer Perspective[J]. Tsinghua Science & Technology, 2015, 20(3): 293-305.
doi: 10.1109/TST.2015.7128942
[4] 吴江, 刘弯弯. 基于信息采纳理论的在线商品评论有用性影响因素研究[J]. 信息资源管理学报, 2017, 7(1): 47-55.
[4] (Wu Jiang, Liu Wanwan.A Research of Factors Affecting the Perceived Helpfulness of Online Product Based on the Information Adoption Theory[J]. Journal of Information Resources Management, 2017, 7(1): 47-55.)
[5] Kuan K K, Hui K, Prasarnphanich P, et al.What Makes a Review Voted? An Empirical Investigation of Review Voting in Online Review Systems[J]. Journal of the Association for Information Systems, 2015, 16(1): 48-71.
[6] 王文君, 张静中. 电子商务网站在线评论对手机销量影响的实证研究[J]. 河北工业科技, 2016, 33(3): 188-193.
doi: 10.7535/hbgykj.2016yx03002
[6] (Wang Wenjun, Zhang Jingzhong.An Empirical Study of the Impact of Online Reviews on Mobile Phone Sales in E-commerce[J]. Hebei Journal of Industrial Science and Technology, 2016, 33(3): 188-193. )
doi: 10.7535/hbgykj.2016yx03002
[7] 龚诗阳, 刘霞, 赵平. 线上消费者评论如何影响产品销量?——基于在线图书评论的实证研究[J]. 中国软科学, 2013(6): 171-183.
[7] (Gong Shiyang, Liu Xia, Zhao Ping.How do Online Consumer Reviews Influence Product Sales? —An Empirical Study Based on Online Book Reviews.[J] China Soft Science, 2013(6): 171-183.)
[8] Torres E N, Singh D, Robertson-Ring A.Consumer Reviews and the Creation of Booking Transaction Value: Lessons from the Hotel Industry[J]. International Journal of Hospitality Management, 2015, 50: 77-83.
doi: 10.1016/j.ijhm.2015.07.012
[9] Chintagunta P K, Gopinath S, Venkataraman S, et al.The Effects of Online User Reviews on Movie Box Office Performance: Accounting for Sequential Rollout and Aggregation Across Local Markets[J]. Marketing Science, 2010, 29(5): 944-957.
doi: 10.2139/ssrn.1331124
[10] Liu B, Hu M, Cheng J.Opinion Observer: Analyzing and Comparing Opinions on the Web[C]////Proceedings of the 14th International Conference on World Wide Web, Chiba, Japan. New York, USA: ACM, 2005: 342-351.
[11] 杜思奇, 李红莲, 吕学强. 汉语组块分析在产品特征提取中的应用研究[J]. 现代图书情报技术, 2015(9): 26-30.
[11] (Du Siqi, Li Honglian, Lv Xueqiang.Research of Chinese Chunk Parsing in Application of the Product Feature Extraction[J]. New Technology of Library and Information Service, 2015(9): 26-30.)
[12] 单晓红, 杨柳. 网络产品评论挖掘研究[J]. 计算机系统应用, 2014, 23(2): 1-6.
doi: 10.3969/j.issn.1003-3254.2014.02.001
[12] (Shan Xiaohong, Yang Liu.Research on Online Product Review Mining[J]. Computer Systems & Applications, 2014, 23(2): 1-6.)
doi: 10.3969/j.issn.1003-3254.2014.02.001
[13] 吴维芳, 高宝俊, 杨海霞, 等. 评论文本对酒店满意度的影响: 基于情感分析的方法[J]. 数据分析与知识发现, 2017, 1(3): 62-71.
[13] (Wu Weifang, Gao Baojun, Yang Haixia, et al.The Impacts of Reviews on Hotel Satisfaction: A Sentiment Analysis Method[J]. Data Analysis and Knowledge Discovery, 2017, 1(3): 62-71.)
[14] 马春平, 陈文亮. 基于评论主题分析的评分预测方法研究[J]. 中文信息学报, 2017, 31(2): 204-211.
[14] (Ma Chunping, Chen Wenliang.A Review Topic Analysis Method for Rating Prediction[J]. Journal of Chinese Information Processing, 2017, 31(2): 204-211.)
[15] Kamath R, Ochi M, Matsuo Y. Understanding Rating Behaviour and Predicting Ratings by Identifying Representative Users[OL]. arXiv PrePrint, arXiv:1604.05468v1.
[16] Titov I, McDonald R. Modeling Online Reviews with Multi-grain Topic Models[C]//// Proceedings of the 17th International Conference on World Wide Web. ACM, 2008: 111-120.
[17] 马松岳, 许鑫. 基于评论情感分析的用户在线评价研究——以豆瓣网电影为例[J]. 图书情报工作, 2016, 60(10): 95-102.
doi: 10.13266/j.issn.0252-3116.2016.10.013
[17] (Ma Songyue, Xu Xin.Study on User Online Evaluation Based on Sentiment Analysis of Comments: Taking Douban.com Movie as an Example[J]. Library and Information Service, 2016, 60(10): 95-102.)
doi: 10.13266/j.issn.0252-3116.2016.10.013
[18] 程翠琼, 徐健. 面向网络游记时间特征的情感分析模型[J]. 数据分析与知识发现, 2017, 1(2): 87-95.
[18] (Cheng Cuiqiong, Xu Jian.A Sentiment Analysis Model Based on Temporal Characteristics of Travel Blogs[J]. Data Analysis and Knowledge Discovery, 2017, 1(2): 87-95.)
[19] 吴应良, 黄媛, 王选飞. 在线中文用户评论研究综述: 基于情感计算的视角[J]. 情报科学, 2017, 35(6): 159-163.
[19] (Wu Yingliang, Huang Yuan, Wang Xuanfei.Research on Online Users’ Reviews in Chinese: Basing on the Perspective of Affective Computing[J]. Information Science, 2017, 35(6): 159-163.)
[20] 冷建飞, 高旭, 朱嘉平. 多元线性回归统计预测模型的应用[J]. 统计与决策, 2016(7): 82-85.
[20] (Leng Jianfei, Gao Xu, Zhu Jiaping.Application of Multivariate Linear Regression Statistical Prediction Model[J]. Statistics and Decision, 2016(7): 82-85.)
[21] 王伟. 美国电影网站IMDb的榜单文化研究[D]. 长春: 东北师范大学, 2016.
[21] (Wang Wei.An Empirical Analysis of Factors Influencing the Helpfulness of Online Consumer Reviews[D]. Changchun: Northeast Normal University, 2016.)
[22] GooSeeker集搜客网络爬虫, 简单高效的网页采集器[EB/OL]. [2017-03-20]. .
[22] (GooSeeker Web Crawler, Simple and Efficient Web Collector[EB/OL]. [2017-03-20].
[23] 徐琳宏, 林鸿飞, 潘宇, 等. 情感词汇本体的构造[J]. 情报学报, 2008, 27(2): 180-185.
doi: 10.3969/j.issn.1000-0135.2008.02.004
[23] (Xu Linhong, Lin Hongfei, Pan Yu, et al.Constructing the Affective Lexicon Ontology[J]. Journal of the China Society for Scientific and Technical Information, 2008, 27(2): 180-185.)
doi: 10.3969/j.issn.1000-0135.2008.02.004
[24] Ray S. 7 Types of Regression Techniques You Should Know! [EB/OL]. [2017-03-20]. .
[25] Abyaneh H Z.Evaluation of Multivariate Linear Regression and Artificial Neural Networks in Prediction of Water Quality Parameters[J/OL]. Iranian Journal of Environmental Health Science & Engineering, 2014. DOI: 10.1186/2052-336x-12-40.
doi: 10.1186/2052-336X-12-40 pmid: 3906747
[26] Yu T, Yu G, Li P Y, et al.Citation Impact Prediction for Scientific Papers Using Stepwise Regression Analysis[J]. Scientometrics, 2014, 101(2): 1233-1252.
doi: 10.1007/s11192-014-1279-6
[27] Wan S, Mak M, Kung S, et al.R3P-Loc: A Compact Multi-label Predictor Using Ridge Regression and Random Projection for Protein Subcellular Localization[J]. Journal of Theoretical Biology, 2014, 360: 34-45.
doi: 10.1016/j.jtbi.2014.06.031 pmid: 24997236
[28] Buccheri S, Capodanno D, Barbanti M, et al.A Risk Model for Prediction of 1-Year Mortality in Patients Undergoing MitraClip Implantation[J]. American Journal of Cardiology, 2017, 119(9): 1443-1449.
doi: 10.1016/j.amjcard.2017.01.024 pmid: 28274574
[1] Xu Hongxia,Yu Qianqian,Qian Li. Studying Content Interaction Data with Topic Model and Sentiment Analysis[J]. 数据分析与知识发现, 2020, 4(7): 110-117.
[2] Jiang Lin,Zhang Qilin. Research on Academic Evaluation Based on Fine-Grain Citation Sentimental Quantification[J]. 数据分析与知识发现, 2020, 4(6): 129-138.
[3] Shi Lei,Wang Yi,Cheng Ying,Wei Ruibin. Review of Attention Mechanism in Natural Language Processing[J]. 数据分析与知识发现, 2020, 4(5): 1-14.
[4] Li Tiejun,Yan Duanwu,Yang Xiongfei. Recommending Microblogs Based on Emotion-Weighted Association Rules[J]. 数据分析与知识发现, 2020, 4(4): 27-33.
[5] Shen Zhuo,Li Yan. Mining User Reviews with PreLM-FT Fine-Grain Sentiment Analysis[J]. 数据分析与知识发现, 2020, 4(4): 63-71.
[6] Xue Fuliang,Liu Lifang. Fine-Grained Sentiment Analysis with CRF and ATAE-LSTM[J]. 数据分析与知识发现, 2020, 4(2/3): 207-213.
[7] Ying Tan,Jin Zhang,Lixin Xia. A Survey of Sentiment Analysis on Social Media[J]. 数据分析与知识发现, 2020, 4(1): 1-11.
[8] Hui Nie,Huan He. Identifying Implicit Features with Word Embedding[J]. 数据分析与知识发现, 2020, 4(1): 99-110.
[9] Yonghua Cen,Zhihao Tan,Chengyao Wu. Impacts of Financial Media Information on Stock Market: An Empirical Study of Sentiment Analysis[J]. 数据分析与知识发现, 2019, 3(9): 98-114.
[10] Weicong Lu,Jian Xu. Sentiment Analysis for Online User Reviews Based on Tripartite Network[J]. 数据分析与知识发现, 2019, 3(8): 10-20.
[11] Zhongxi You,Weina Hua,Xuelian Pan. Matching Book Reviews and Essential Sentiment Lexicons with Chinese Word Segmenters[J]. 数据分析与知识发现, 2019, 3(7): 23-33.
[12] Cuiqing Jiang,Yibo Guo,Yao Liu. Constructing a Domain Sentiment Lexicon Based on Chinese Social Media Text[J]. 数据分析与知识发现, 2019, 3(2): 98-107.
[13] Fen Chen,Xiaohuan Gao,Yue Peng,Yuan He,Chunxiang Xue. Identifying Weibo Opinion Leaders with Text Sentiment Analysis[J]. 数据分析与知识发现, 2019, 3(11): 120-128.
[14] Guijun Yang,Xue Xu,Fuqiang Zhao. Predicting User Ratings with XGBoost Algorithm[J]. 数据分析与知识发现, 2019, 3(1): 118-126.
[15] Bengong Yu,Peihang Zhang,Qingtang Xu. Selecting Products Based on F-BiGRU Sentiment Analysis[J]. 数据分析与知识发现, 2018, 2(9): 22-30.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn