Please wait a minute...
Data Analysis and Knowledge Discovery  2022, Vol. 6 Issue (6): 32-45    DOI: 10.11925/infotech.2096-3467.2021.0793
Current Issue | Archive | Adv Search |
Sentiment Curve Clustering and Communication Effects of Barrage Videos
Zhang Teng1,2,Ni Yuan1,2(),Mo Tong3,Lv Xueqiang4
1School of Economics and Management, Beijing Information Science and Technology University, Beijing 100192, China
2Beijing Knowledge Management Research Base, Beijing 100192, China
3School of Software and Microelectronics, Peking University, Beijing 102600, China
4Beijing Key Laboratory of Internet Culture and Digital Dissemination Research, Beijing Information Science and Technology University, Beijing 100192, China
Download: PDF (1358 KB)   HTML ( 22
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper constructs a clustering model for sentimental time series of bullet screen texts, aiming to predict video communication effects. [Methods] First, we used the Word2Vec to expand the sentiment dictionary and optimize the performance of sentiment classifiers. Then we added comprehensive weights to make the sentiment sequence smooth and stable. Finally, we constructed the SBD measurement and K-shape clustering model to analyze sentiment sequence patterns, characteristics, and communication effects. [Results] The optimized model had F1 values of 0.89 and 0.79 with multi-classification indicators (subjective or objective, and polar classification). The performance of the subjective and objective classifier was improved by 123%. Compared with the existing multiple time series measurement clustering algorithms, the proposed new model generated better Davies-Bouldin Index and Silhouette Index. [Limitations] The new algorithm did not fully utilize the Internet buzzwords or sentence situations without central adjectives. The description and interpretation of sentimental time series clustering results need to be further explored. [Conclusions] The proposed model could reduce the irregular noise and the timing phase shift of the bullet screen texts, while the clustering results are the basis for identifying the different effects.

Key wordsSentiment Dictionary      Sentiment Curve      Time Series     
Received: 04 August 2021      Published: 28 July 2022
ZTFLH:  TP393  
  G250  
Fund:Beijing Social Science Foundation(21GLB027)
Corresponding Authors: Ni Yuan     E-mail: niyuan230@163.com

Cite this article:

Zhang Teng, Ni Yuan, Mo Tong, Lv Xueqiang. Sentiment Curve Clustering and Communication Effects of Barrage Videos. Data Analysis and Knowledge Discovery, 2022, 6(6): 32-45.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2021.0793     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2022/V6/I6/32

Technical Route
弹幕内容 发表相对
时间(秒)
发表日期 用户唯一标识ID
欢迎/前往/三秦/大地 36.117 2019-07-14 d7faeec
长安/超/美/的/名字 36.300 2020-02-29 9695d4e6
为/我/大/陕西/增加/弹幕 37.540 2019-09-13 f2745c64
壮哉/我/大/陕西 37.541 2019-09-07 cbd65743
陕西/冲鸭 37.569 2019-08-21 28d977d4
Barrage Data (Part)
实例化模型参数 模型参数解释 数值
Size 词向量维度 400
Window 当前词与预测词最大距离 5
Min_count 词频下阈值 50
Workers 训练调用CPU核心数 4
Parameter Setting
扩展基准词 Pos积极情感词典 扩展基准词 Neg消极情感词典
硬朗 ('幽默',0.9607639312744141)
('清晰',0.9584671854972839)
('优雅',0.9508066177368164)
('豁达',0.9448449015617371)
('谦虚',0.9414252638816833)
肤浅 ('枯燥',0.9757405519485474)
('繁复',0.9752581119537354)
('憋屈',0.9749971628189087)
('爆差',0.9729041457176208)
('残忍',0.9679246544837952)
谦和 ('睿智',0.9870889186859131)
('恢宏',0.956283688545227)
('高贵',0.9113680124282837)
('沉稳',0.8986016511917114)
('秀丽',0.8980205059051514)
尴尬 ('过猛',0.9259644746780396)
('糟糕',0.9231310486793518)
('苛刻',0.9080279469490051)
('太尬',0.9031277894973755)
('羞耻',0.9018667936325073)
敬佩 ('艰苦',0.9246152639389038)
('幸苦',0.9218361973762512)
('健康',0.9206362962722778)
('豁达',0.9205096364021301)
('深厚',0.9176081418991089)
有毒 ('剧透',0.9450205564498901)
('气活',0.9297248721122742)
('破功',0.923279881477356)
('好烦',0.9159470796585083)
('浓密',0.9148081541061401)
Dictionary Expansion Display
有情感预测为无情感FN 无情感预测为有情感FP 被正确分类为有情感TP 被正确分类为无情感TN
302 8 110 80
Subjective and Objective Binary Classification Results of Barrage Based on HowNet
有情感预测为无情感FN 无情感预测为有情感FP 被正确分类为有情感TP 被正确分类为无情感TN
44 31 309 116
Subjective and Objective Binary Classification Results of Barrage Based on Expanded Dictionary
积极预测为消极FN 消极预测为积极FP 被正确分类为积极TP 被正确分类为消极TN
57 23 141 119
Barrage Sentiment Polarity Classification Results Based on Expanded Dictionary
准确率:
T P + T N T P + T N + F P + F N
精确率:
T P T P + F P
召回率:
T P T P + F N
F1值
85.00% 90.08% 87.53% 0.89
Subjective and Objective Binary Classification Evaluation Index Based on Extended Sentiment Dictionary Model
准确率:
T P + T N T P + T N + F P + F N
精确率:
T P T P + F P
召回率:
T P T P + F N
F1值
76.47% 85.90% 71.94% 0.79
Evaluation Index of Sentiment Polarity Binary Classification Effect Based on Extended Sentiment Dictionary Model
Four Typical Emotional Curves in Documentaries
Elbow Method for Determining the Optimal Number of Clusters
聚类指标 K-medoids+ED K-medoids+DTW K-shape+SBD
DBI_Score 1.63 1.32 1.13
Silhouette_Score 0.44 0.41 0.47
Evaluation Metrics for Time Series Clustering Models
聚类标签 聚类样本
0 2,11,13,18,19,20
1 1,6,7,10
2 8,9,14,16,17
3 3,4,5,12,15
Clustering Diversity and Labeling Results
Sentiment Curve K-shape Clustering Results
莱文统计 自由度 1 自由度 2 显著性
点赞量 基于平均值 0.717 3 16 0.556
基于中位数 0.650 3 16 0.594
基于中位数并具有调整后自由度 0.650 3 9.252 0.602
基于剪除后平均值 0.714 3 16 0.558
Levene’s Test Results
(I) 聚类类别 (J) 聚类类别 平均值差值 (I-J) 标准错误 显著性 95% 置信区间
下限 上限
0 1 20 811.167* 2 538.410 0 15 429.98 26 192.36
2 6 380.667* 2 381.239 0.016 1 332.66 11 428.67
3 15 193.667* 2 381.239 0 10 145.66 20 241.67
1 0 -20 811.167* 2 538.410 0 -26 192.36 -15 429.98
2 -14 430.500* 2 637.993 0 -20 022.80 -8 838.20
3 -5 617.500* 2 637.993 0.049 -11 209.80 -25.20
2 0 -6 380.667* 2 381.239 0.016 -11 428.67 -1 332.66
1 14 430.500* 2 637.993 0 8 838.20 20 022.80
3 8 813.000* 2 487.124 0.003 3 540.53 14 085.47
3 0 -15 193.667* 2 381.239 0 -20 241.67 -10 145.66
1 5 617.500* 2 637.993 0.049 25.20 11 209.80
2 -8 813.000* 2 487.124 0.003 -14 085.47 -3 540.53
Between-Group ANOVA Analysis Results
[1] Nickerson R S. Confirmation Bias: A Ubiquitous Phenomenon in Many Guises[J]. Review of General Psychology, 1998, 2(2): 175-220.
[2] Konnegut K. Palm Sunday[M]. New York: Rosettabooks LLC, 1981.
[3] 江含雪. 传播学视域中的弹幕视频研究[D]. 武汉: 华中师范大学, 2014.
[3] (Jiang Hanxue. The Research of Barrage Video from Communication Vision[D]. Wuhan: Central China Normal University, 2014.)
[4] 陈沫. 弹幕技术在电视行业的应用与发展[J]. 新闻爱好者, 2015(10): 83-85.
[4] (Chen Mo. Application and Development of Barrage Technology in TV Industry[J]. Journalism Lover, 2015(10): 83-85.)
[5] 谢梅, 何炬, 冯宇乐. 大众传播游戏理论视角下的弹幕视频研究[J]. 新闻界, 2014(2): 37-40.
[5] (Xie Mei, He Ju, Feng Yule. Research on Barrage Video from the Perspective of Mass Communication Game Theory[J]. Press Circles, 2014(2): 37-40.)
[6] 詹雪美. 浅析弹幕视频网站在我国的发展[J]. 大众科技, 2014, 16(10): 232-233.
[6] (Zhan Xuemei. Development of Barrage of Video Website in China[J]. Popular Science & Technology, 2014, 16(10): 232-233.)
[7] 张艺凝. 互动视角下弹幕视频研究[D]. 南京: 南京师范大学, 2015.
[7] (Zhang Yining. Research on Barrage Video from Interactive Perspective[D]. Nanjing: Nanjing Normal University, 2015.)
[8] Wu B, Zhong E H, Tan B, et al. Crowdsourced Time-Sync Video Tagging Using Temporal and Personalized Topic Modeling[C]// Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, NY, USA: ACM, 2014: 721-730.
[9] Lv G Y, Xu T, Chen E H, et al. Reading the Videos: Temporal Labeling for Crowdsourced Time-Sync Videos Based on Semantic Embedding[C]// Proceedings of the 30th AAAI Conference on Artificial Intelligence. 2016:3000-3006.
[10] Yang W M, Ruan N, Gao W Y, et al. Crowdsourced Time-Sync Video Tagging Using Semantic Association Graph[C]// Proceedings of the 2017 IEEE International Conference on Multimedia and Expo. 2017: 547-552.
[11] Filippova K, Hall K B. Improved Video Categorization from Text Metadata and User Comments[C]// Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, Beijing, China. 2011:835-842.
[12] Yamamoto T, Nakamura S. Leveraging Viewer Comments for Mood Classification of Music Video Clips[C]// Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, NY, USA: ACM, 2013: 797.
[13] Murakami N, Ito E. Emotional Video Ranking Based on User Comments[C]// Proceedings of the 13th International Conference on Information Integration and Web-based Applications and Services. 2011: 499-502.
[14] 洪庆, 王思尧, 赵钦佩, 等. 基于弹幕情感分析和聚类算法的视频用户群体分类[J]. 计算机工程与科学, 2018, 40(6): 1125-1139.
[14] (Hong Qing, Wang Siyao, Zhao Qinpei, et al. Video User Group Classification Based on Barrage Comments Sentiment Analysis and Clustering Algorithms[J]. Computer Engineering & Science, 2018, 40(6): 1125-1139.)
[15] 吴法民, 吕广奕, 刘淇, 等. 视频实时评论的深度语义表征方法[J]. 计算机研究与发展, 2019, 56(2): 293-305.
[15] (Wu Famin, Lü Guangyi, Liu Qi, et al. Deep Semantic Representation of Time-Sync Comments for Videos[J]. Journal of Computer Research and Development, 2019, 56(2): 293-305.)
[16] Tran N K, Cheng W W. Multiplicative Tree-Structured Long Short-Term Memory Networks for Semantic Representations[C]// Proceedings of the 7th Joint Conference on Lexical and Computational Semantics. Stroudsburg, PA, USA: Association for Computational Linguistics, 2018: 276-286.
[17] 王晓艳. 基于图像分析的网络视频弹幕的情感分类研究与应用[D]. 北京: 北京邮电大学, 2018.
[17] (Wang Xiaoyan. Research and Application of Emotion Classification of Network Video Barrage Based on Image Analysis[D]. Beijing: Beijing University of Posts and Telecommunications, 2018.)
[18] Turney P D, Littman M L. Measuring Praise and Criticism: Inference of Semantic Orientation from Association[J]. ACM Transactions on Information Systems, 2003, 21(4): 315-346.
[19] 陈晓东. 基于情感词典的中文微博情感倾向分析研究[D]. 武汉: 华中科技大学, 2012.
[19] (Chen Xiaodong. Research on Sentiment Dictionary Based Emotional Tendency Analysis of Chinese MicroBlog[D]. Wuhan: Huazhong University of Science and Technology, 2012.)
[20] 庄须强. 基于深度学习的弹幕评论情感分析研究[D]. 济南: 山东师范大学, 2018.
[20] (Zhuang Xuqiang. Research on Emotional Analysis of Barrage Comments Based on Deep Learning[D]. Jinan: Shandong Normal University, 2018.)
[21] Hao X F, Xu S J, Zhang X M. Barrage Participation and Feedback in Travel Reality Shows: The Effects of Media on Destination Image Among Generation Y[J]. Journal of Destination Marketing & Management, 2019, 12: 27-36.
[22] Eickhoff C, Li W, Vries A P. Exploiting User Comments for Audio-Visual Content Indexing and Retrieval[C]// Proceedings of the 35th European Conference on Information Retrieval Research. 2013: 38-49.
[23] 郑飏飏, 徐健, 肖卓. 情感分析及可视化方法在网络视频弹幕数据分析中的应用[J]. 现代图书情报技术, 2015(11): 82-90.
[23] (Zheng Yangyang, Xu Jian, Xiao Zhuo. Utilization of Sentiment Analysis and Visualization in Online Video Bullet-Screen Comments[J]. New Technology of Library and Information Service, 2015(11): 82-90.)
[24] 王敏, 徐健. 视频弹幕与字幕的情感分析与比较研究[J]. 图书情报知识, 2019(5): 109-119.
[24] (Wang Min, Xu Jian. Emotional Analysis and Comparative Study of Bullet-Screen Comments and Subtitles[J]. Documentation, Information & Knowledge, 2019(5): 109-119.)
[25] 熊燕. 运用情感曲线改造服务[J]. 现代商业, 2011(15): 17.
[25] (Xiong Yan. Using Emotional Curve to Transform Service[J]. Modern Business, 2011(15): 17.)
[26] 李致萱, 刘澜, 张斯嘉, 等. 基于情感曲线的高速铁路旅客个性化服务评价权重设计[J]. 铁道运输与经济, 2020, 42(1): 6-11.
[26] (Li Zhixuan, Liu Lan, Zhang Sijia, et al. A Design of Individualized Passenger Transport Service Evaluation Weight Based on Emotion Curve for High-Speed Railway[J]. Railway Transport and Economy, 2020, 42(1): 6-11.)
[27] Reagan A J, Mitchell L, Kiley D, et al. The Emotional Arcs of Stories are Dominated by Six Basic Shapes[J]. EPJ Data Science, 2016, 5: 31.
[28] 周启元. 基于小说文本情感曲线的下载量预测研究[D]. 南京: 南京大学, 2017.
[28] (Zhou Qiyuan. Research on Downloading Volume Prediction Based on Sentiment Curves of Novels[D]. Nanjing: Nanjing University, 2017.)
[29] 何跃, 朱灿, 朱婷婷, 等. 微博热点话题情感趋势研究[J]. 情报理论与实践, 2018, 41(7): 155-160.
[29] (He Yue, Zhu Can, Zhu Tingting, et al. Research on the Emotional Tendency of Hot Topics in Micro-Blogs[J]. Information Studies: Theory & Application, 2018, 41(7): 155-160.)
[30] 吕建伟. 基于情感序列的突发事件分析与预测[D]. 南京: 南京邮电大学, 2020.
[30] (Lü Jianwei. Analysis and Prediction of Emergencies Based on Emotion Sequences[D]. Nanjing: Nanjing University of Posts and Telecommunications, 2020.)
[31] Knopp J, Li J T. Signal Processing for Communications[C]// Proceedings of the 2005 IEEE Global Telecommunications Conference. 2005.
[32] Ives Z. Technical Perspective: K-Shape: Efficient and Accurate Clustering of Time Series[J]. ACM SIGMOD Record, 2016, 45(1): 68.
[33] Thorndike R L. Who Belongs in the Family?[J]. Psychometrika, 1953, 18(4): 267-276.
[34] Tslearn使用轮廓系数(Silhouette_Score)评估KShape聚类效果[EB/OL]. [2021-06-10]. https://blog.csdn.net/qq_37960007/article/details/107937212.
[34] (Tslearn Uses Silhouette_Score to Evaluate KShape Clustering Effect[EB/OL]. [2021-06-10]. https://blog.csdn.net/qq_37960007/article/details/107937212.)
[1] Ding Hao, Ai Wenhua, Hu Guangwei, Li Shuqing, Suo Wei. A Personalized Recommendation Model with Time Series Fluctuation of User Interest[J]. 数据分析与知识发现, 2021, 5(11): 45-58.
[2] Yan Jinghua,Hou Miaomiao. Predicting Time Series of Theft Crimes Based on LSTM Network[J]. 数据分析与知识发现, 2020, 4(11): 84-91.
[3] Hao Ding,Shuqing Li. Personalized Recommendation Based on Predictive Analysis of User’s Interests[J]. 数据分析与知识发现, 2019, 3(11): 43-51.
[4] Hu Jiaheng,Cen Yonghua,Wu Chengyao. Constructing Sentiment Dictionary with Deep Learning: Case Study of Financial Data[J]. 数据分析与知识发现, 2018, 2(10): 95-102.
[5] Wang Xiaoyun,Yuan Yuan,Shi Lingling. Predicting Opening Weekend Box Office Prediction Based on Microblog[J]. 现代图书情报技术, 2016, 32(4): 31-39.
[6] Nie Hui, Rong Zhe. Review Helpfulness Prediction Research Based on Review Sentiment Feature Sets[J]. 现代图书情报技术, 2015, 31(7-8): 113-121.
[7] Wang Weijun, Bao Liqian, Liu Kai. Development Trends of Cloud Services in Time Dimension[J]. 现代图书情报技术, 2014, 30(3): 42-48.
[8] Hong Na, Zhang Zhixiong, Le Xiaoqiu. Detection Method of Latent Burst Word Based on the Clue of Energy Evolution[J]. 现代图书情报技术, 2010, 26(11): 45-52.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn