Please wait a minute...
Data Analysis and Knowledge Discovery  2021, Vol. 5 Issue (8): 100-112    DOI: 10.11925/infotech.2096-3467.2020.1013
Current Issue | Archive | Adv Search |
Predicting Online Music Playbacks and Influencing Factors
Liu Yuanchen,Wang Hao(),Gao Yaqi
School of Information Management, Nanjing University, Nanjing 210023, China
Jiangsu Key Laboratory of Data Engineering and Knowledge Service, Nanjing 210023, China
Download: PDF (1429 KB)   HTML ( 35
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper predicts the amount of music playbacks and explores the influencing factors, aiming to help online music platforms evaluate the quality of music lists. [Methods] First, we used a web-crawler to retrieve the numerical and text features of music playlists from the Netease cloud. Then, we pre-trained the texts with Word2Vec and BERT. Third, we established RF, XGBoost and DNN models to predict the amount of playbacks. [Results] We found the prediction accuracy of DNN was higher than those of RF and XGBoost. The numbers of initial playbacks, comments, favorites and forwarding of music list had the most significant impacts on the amount of the music list playbacks. However, the text features reduce the prediction accuracy. [Limitations] The Netease cloud music updated everyday, therefore, we only examined the playback data collected 12 hours following the updates. [Conclusions] This study could help online music websites preliminarily judge the popularity of their music lists.

Key wordsPrediction of Music List Playing Amount      Netease Cloud Music      Random Forest      XGBoost      DNN     
Received: 18 October 2020      Published: 15 September 2021
ZTFLH:  TP391  
Fund:National Social Science Fund of China(17ZDA291);Youth of Excellence in Social Sciences of Jiangsu Prince, Tang Scholar of Nanjing University
Corresponding Authors: Wang Hao ORCID:0000-0002-0131-0823     E-mail: ywhaowang@nju.edu.cn

Cite this article:

Liu Yuanchen, Wang Hao, Gao Yaqi. Predicting Online Music Playbacks and Influencing Factors. Data Analysis and Knowledge Discovery, 2021, 5(8): 100-112.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2020.1013     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2021/V5/I8/100

Research Framework
类别 数字表示 类别 数字表示
怀旧 1 孤独 7
清新 2 感动 8
浪漫 3 兴奋 9
伤感 4 快乐 10
治愈 5 安静 11
放松 6 思念 12
Music List Category and Numerical Representation
Music List Play Amount Distribution
序号 字段名称 字段类型 主要内容 组合标记
1 歌单链接 字符型 用于唯一表示每个歌单(只用于标识,不作为特征选取)
2 歌单起始播放量 数值型 第一次爬取歌单信息时歌单的播放量 F1
3 歌单收藏量 数值型 歌单被多少个用户收藏,方便用户反复收听
4 歌单转发量 数值型 歌单被多少个用户转发分享至其他平台(如微信朋友圈、微博、QQ空间等)
5 歌单评论数 数值型 用户对该歌单发表的评论数
6 歌单歌曲数 数值型 歌单共收录了多少歌曲 F2
7 歌单推荐顺序 数值型 按照每个类别下歌单排列的位置顺序依次标号
8 歌单类别 数值型 12个类别,用数字1~12表示
9 歌单创建时间 数值型 将创建日期换算为距2020.5.24(即爬取数据当日)的天数
10 用户昵称 字符型 创建歌单的用户名称(只用于标识,不作为特征选取)
11 用户动态数 数值型 创建歌单的用户个人主页中发表的动态总数 F3
12 用户关注数 数值型 创建歌单的用户关注其他用户的数量
13 用户粉丝数 数值型 创建歌单的用户被其他用户关注的数量
14 用户等级 数值型 创建歌单的用户等级,一般等级越高,代表活跃度越高
15 用户创建歌单数 数值型 创建歌单的用户创建的歌单总数
16 歌单名称 文本型 由中英文、特殊符号等组成的文本数据,不允许为空值 F4
17 歌单简介 文本型 由中英文、特殊符号等组成的文本数据,通常是对歌单特征、内容的描述
18 12h后歌单播放量 数值型 间隔12h后第二次爬取时歌单的播放量,即本文进行预测的数值
Music List Data Information
参数名 参数值
n_estimators 150
max_depth 50
max_features auto
min_samples_leaf 1
bootstrap True
Parameter Setting of RF Algorithm
参数名 参数值
learning rate 0.1
n_estimators 150
max_depth 10
min_child_weight 1
gamma 0.4
Colsample_bytree 0.9
subsample 0.8
Parameter Setting of XGBoost Algorithm
隐藏层数 R2 Acc
2 0.866 0 0.745 6
3 0.871 4 0.771 2
4 0.888 0 0.815 4
5 0.888 8 0.796 8
6 0.889 0 0.844 0
7 0.880 4 0.838 0
8 0.866 5 0.803 2
The Relationship Between the Number of DNN Hidden Layers and R2, Acc
Batch_size R2 Acc
32 0.824 2 0.672 1
64 0.880 2 0.791 3
128 0.889 0 0.844 0
256 0.888 6 0.821 7
512 0.887 9 0.818 2
The Relationship Between the Batch_size of DNN and R2, Acc
The Relationship Between epoch and train_loss and val_loss
参数名 参数值
隐藏层数 6
batch_size 128
epoch 20
每层units(输出维度) 300/150/50/10/5/1
dropout 0.1
learning rate_method Adam(算法)
activation(激活函数) ReLU
Parameter Setting of DNN
Comparative Experiments with Different Algorithms
Classification of Music List Features
Comparative Experiments with Different Features of the Two Models
Comparing the Predicted Values of DNN with the True Values
R2 and Acc of Different Datasets
[1] CNNIC. 第45次中国互联网络发展状况统计报告[R]. 中国互联网络信息中心, 2020.
[1] (CNNIC. The 45th China Statistical Report on Internet Development[R]. China Internet Network Information Center, 2020.)
[2] 崔新平. 移动互联网时代中国音乐文化传播思考[J]. 四川戏剧, 2020(4):147-149.
[2] ( Cui Xinping. Thoughts on the Dissemination of Chinese Musical Culture in the Era of Mobile Internet[J]. Sichuan Theatre, 2020(4):147-149.)
[3] 刘晓明, 聂新磊. 网易云音乐定制化营销发展策略研究[J]. 价值工程, 2019, 38(28):88-89.
[3] ( Liu Xiaoming, Nie Xinlei. Research on Netease Cloud Music Customization Marketing Development Strategy[J]. Value Engineering, 2019, 38(28):88-89.)
[4] 陈晓宇, 付少雄, 邓胜利. 社会化问答用户信息搜寻的影响因素研究——一种混合方法的视角[J]. 图书情报工作, 2018, 62(20):102-111.
[4] ( Chen Xiaoyu, Fu Shaoxiong, Deng Shengli. Analyzing the Influencing Factors of Internet Users’ Information-seeking Behavior: A Mixed-method Perspective[J]. Library and Information Service, 2018, 62(20):102-111.)
[5] 崔连广, 闫旭, 张玉利. 心理因素联动对创业者决策逻辑的影响——一个基于QCA方法的研究[J]. 科学学与科学技术管理, 2020, 41(9):123-135.
[5] ( Cui Lianguang, Yan Xu, Zhang Yuli. The Impact of Psychological Factors on Entrepreneurs’ Decision Logics: A Fuzzy-Set Qualitative Comparative Analysis[J]. Science of Science and Management of S.&.T., 2020, 41(9):123-135.)
[6] 张宁, 袁勤俭. 用户视角下的学术社交网络信息质量影响因素研究——基于扎根理论方法[J]. 图书情报知识, 2018, 62(5):105-113.
[6] ( Zhang Ning, Yuan Qinjian. The Influence Factors of Information Quality in Academic Social Networks from User' Perspective Based on Grounded Theory[J]. Document, Informaiton & Knowledge, 2018, 62(5):105-113.)
[7] 姜文学, 王妍. “一带一路”电子产品贸易格局演变特征及影响因素研究——基于复杂网络分析方法[J]. 国际商务研究, 2020, 41(5):26-40.
[7] ( Jiang Wenxue, Wang Yan. Research on Structural Change Characteristics and Influencing Factors of Electronic Products Trade Network along the Belt and Road: Based on Complex Network Analysis Method[J]. International Business Research, 2020, 41(5):26-40.)
[8] 边璐, 王晓贺, 张江朋, 等. 稀土产品价格决定: 影响因素与预测方法综述[J]. 稀土, 2020, 41(4):146-158.
[8] ( Bian Lu, Wang Xiaohe, Zhang Jiangpeng, et al. Review of Rare Earth Price: Influencing Factors and Forecasting Methods[J]. Chinese Rare Earths, 2020, 41(4):146-158.)
[9] 李舟军, 范宇, 吴贤杰. 面向自然语言处理的预训练技术研究综述[J]. 计算机科学, 2020, 47(3):162-173.
[9] ( Li Zhoujun, Fan Yu, Wu Xianjie. Survey of Natural Language Processing Pre-training Techniques[J]. Computer Science, 2020, 47(3):162-173.)
[10] 黄丽明, 陈维政, 闫宏飞, 等. 基于循环神经网络和深度学习的股票预测方法[J]. 广西师范大学学报(自然科学版), 2019, 37(1):13-22.
[10] ( Huang Liming, Chen Weizheng, Yan Hongfei, et al. A Stock Prediction Method Based on Recurrent Neural Network and Deep Learning[J]. Journal of Guangxi Normal University(Natural Science Edition), 2019, 37(1):13-22.)
[11] 张晗, 贾甜远, 骆方, 等. 面向网络文本的BERT心理特质预测研究[J/OL]. 计算机科学与探索.[2020-11-21]. DOI: 10.3778/j.issn.1673-9418.2007009.
doi: 10.3778/j.issn.1673-9418.2007009
[11] ( Zhang Han, Jia Tianyuan, Luo Fang, et al. A Study on Predicting Psychological Traits of Online Text by BERT[J/OL]. Journal of Frontiers of Computer Science and Technology.[2020-11-21]. DOI: 10.3778/j.issn.1673-9418.2007009.)
doi: 10.3778/j.issn.1673-9418.2007009
[12] 方匡南, 吴见彬, 朱建平, 等. 随机森林方法研究综述[J]. 统计与信息论坛, 2011, 26(3):32-38.
[12] ( Fang Kuangnan, Wu Jianbin, Zhu Jianping, et al. A Review of Technologies on Random Forests[J]. Journal of Statistics and Information, 2011, 26(3):32-38.)
[13] Malekipirbazari M, Aksakalli V. Risk Assessment in Social Lending via Random Forests[J]. Expert Systems with Applications, 2015, 42(10):4621-4631.
doi: 10.1016/j.eswa.2015.02.001
[14] Chen T, Guestrin C. XGBoost: A Scalable Tree Boosting System[C]// Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2016: 785-794.
[15] Pan B Y. Application of XGBoost Algorithm in Hourly PM2.5 Concentration Prediction[J]. IOP Conference Series: Earth and Environmental Science, 2018, 113:012127.
doi: 10.1088/1755-1315/113/1/012127
[16] Hinton G E, Osindero S, Teh Y W. A Fast Learning Algorithm for Deep Belief Nets[J]. Neural Computation, 2006, 18(7):1527-1554.
doi: 10.1162/neco.2006.18.7.1527
[17] Wu Y K, Tan H C, Qin L Q, et al. A Hybrid Deep Learning Based Traffic Flow Prediction Method and Its Understanding[J]. Transportation Research Part C-Emerging Technologies, 2018, 90:166-180.
doi: 10.1016/j.trc.2018.03.001
[18] Putin E, Mamoshina P, Aliper A, et al. Deep Biomarkers of Human Aging: Application of Deep Neural Networks to Biomarker Development[J]. Aging (Albany NY), 2016, 8(5):1021-1033.
[19] Mudambi S M, Schuff D. What Makes a Helpful Online Review? A Study of Customer Reviews on amazon.com[J]. MIS Quarterly, 2010, 34(1):185-200.
doi: 10.2307/20721420
[20] Chevalier J A, Mayzlin D. The Effect of Word of Mouth on Sales: Online Book Reviews[J]. Journal of Marketing Research, 2006, 43(3):345-354.
doi: 10.1509/jmkr.43.3.345
[21] 李进华, 张婷婷. 社会化问答知识分享用户感知有用性影响因素研究——以知乎为例[J]. 现代情报, 2018, 38(4):20-28.
[21] ( Li Jinhua, Zhang Tingting. Research on Influencing Factors of User Perceived Usefulness of Knowledge Sharing in Social Q&A——A Case Study of Zhihu[J]. Modern Information, 2018, 38(4):20-28.)
[22] 单英骥, 邵鹏. 信息过载视角下用户创建资源列表扩散效果的影响因素研究——以网易云音乐为例[J]. 现代情报, 2019, 39(7):93-101.
[22] ( Shan Yingji, Shao Peng. Research on the Influencing Factors of the User Generated Resource List Diffusion Effect in the Perspective of Information Overload——Take Netease Cloud Music as an Example[J]. Modern Information, 2019, 39(7):93-101.)
[23] Susarla A, Oh J H, Tan Y. Social Networks and the Diffusion of User-Generated Content: Evidence from YouTube[J]. Information Systems Research, 2012, 23(1):23-41.
doi: 10.1287/isre.1100.0339
[24] Mikolov T, Corrado G S, Chen K, et al. Efficient Estimation of Word Representations in Vector Space[C]// Proceedings of the International Conference on Learning Representations. 2013.
[25] Zheng X Q, Chen H Y, Xu T Y. Deep Learning for Chinese Word Segmentation and POS Tagging[C]// Proceedings of 2013 Conference on Empirical Methods in Natural Language Processing. 2013: 647-657.
[26] Xing C, Wang D, Zhang X W, et al. Document Classification with Distributions of Word Vectors[C]// Proceedings of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference. 2014. DOI: 10.1109/APSIPA.2014.7041633.
doi: 10.1109/APSIPA.2014.7041633
[27] Kim H K, Kim H, Cho S. Bag-of-Concepts: Comprehending Document Representation Through Clustering Words in Distributed Representation[J]. Neurocomputing, 2017, 266:336-352.
doi: 10.1016/j.neucom.2017.05.046
[28] 唐明, 朱磊, 邹显春. 基于Word2Vec的一种文档向量表示[J]. 计算机科学, 2016, 43(6):214-217, 269.
[28] ( Tang Ming, Zhu Lei, Zou Xianchun. Document Vector Representation Based on Word2Vec[J]. Computer Science, 2016, 43(6):214-217, 269.)
[29] Devlin J, Chang M W, Lee K, et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding[OL]. arXiv Preprint, arXiv: 1810.04805.
[30] 刘欢, 张智雄, 王宇飞. BERT模型的主要优化改进方法研究综述[J]. 数据分析与知识发现, 2021, 5(1):3-15.
[30] ( Liu Huan, Zhang Zhixiong, Wang Yufei. A Review on Main Optimization Methods of BERT[J]. Data Analysis and Knowledge Discovery, 2021, 5(1):3-15.)
[31] Breiman L. Random Forests[J]. Machine Learning, 2001, 45:5-32.
doi: 10.1023/A:1010933404324
[32] 刘艳丽. 随机森林综述[D]. 天津: 南开大学, 2008.
[32] ( Liu Yanli. A Review of Random Forests[D]. Tianjin: Nankai University, 2008.)
[33] Chen T Q, He T. Higgs Boson Discovery with Boosted Trees[C]// Proceedings of the 2014 International Conference on High-Energy Physics and Machine Learning. 2014: 69-80.
[34] 张永梅, 陈惠妮, 张奕. 基于XGBoost的雾霾预测方法[J]. 计算机工程与设计, 2019, 40(12):3631-3638.
[34] ( Zhang Yongmei, Chen Huini, Zhang Yi. Haze Prediction Method Based on XGBoost[J]. Computer Engineering and Design, 2019, 40(12):3631-3638.)
[35] 杨柳青, 查蓓, 陈伟. 基于深度神经网络的砂岩储层孔隙度预测方法[J]. 中国科技论文, 2020, 15(1):73-80.
[35] ( Yang Liuqing, Zha Bei, Chen Wei. Prediction Method of Reservoir Porosity Based on Deep Neural Network[J]. China Sciencepaper, 2020, 15(1):73-80.)
[36] Sussman S W, Siegal W S. Informational Influence in Organizations: An Integrated Approach to Knowledge Adoption[J]. Information Systems Research, 2003, 14(1):47-65.
doi: 10.1287/isre.14.1.47.14767
[1] Cao Rui,Liao Bin,Li Min,Sun Ruina. Predicting Prices and Analyzing Features of Online Short-Term Rentals Based on XGBoost[J]. 数据分析与知识发现, 2021, 5(6): 51-65.
[2] Ding Yong,Chen Xi,Jiang Cuiqing,Wang Zhao. Predicting Online Ratings with Network Representation Learning and XGBoost[J]. 数据分析与知识发现, 2020, 4(11): 52-62.
[3] Bengong Yu,Yumeng Cao,Yangnan Chen,Ying Yang. Classification of Short Texts Based on nLD-SVM-RF Model[J]. 数据分析与知识发现, 2020, 4(1): 111-120.
[4] Huiying Qi,Yuhe Jiang. Predicting Breast Cancer Survival Length with Multi-Omics Data Fusion[J]. 数据分析与知识发现, 2019, 3(8): 88-93.
[5] Xiaofeng Li,Jing Ma,Chi Li,Hengmin Zhu. Identifying Commodity Names Based on XGBoost Model[J]. 数据分析与知识发现, 2019, 3(7): 34-41.
[6] Wancheng Chen,Haoran Dai,Yinghan Jin. Appraising Home Prices with HEDONIC Model: Case Study of Seattle, U.S.[J]. 数据分析与知识发现, 2019, 3(5): 19-26.
[7] Guijun Yang,Xue Xu,Fuqiang Zhao. Predicting User Ratings with XGBoost Algorithm[J]. 数据分析与知识发现, 2019, 3(1): 118-126.
[8] Zhou Cheng,Wei Hongqin. Identifying Crowd Participants with Modified Random Forests Algorithm[J]. 数据分析与知识发现, 2018, 2(7): 46-54.
[9] Chen Yuan,Wang Chaoqun,Hu Zhongyi,Wu Jiang. Identifying Malicious Websites with PCA and Random Forest Methods[J]. 数据分析与知识发现, 2018, 2(4): 71-80.
[10] Zhang Liyi,Li Yiran,Wen Xuan. Predicting Repeat Purchase Intention of New Consumers[J]. 数据分析与知识发现, 2018, 2(11): 10-18.
[11] Lv Weimin,Wang Xiaomei,Han Tao. Recommending Scientific Research Collaborators with Link Prediction and Extremely Randomized Trees Algorithm[J]. 数据分析与知识发现, 2017, 1(4): 38-45.
[12] Yuan Xinwei,Yang Shaohua,Wang Chaochao,Du Zhanhe. Identifying Lead Players of User Innovation Communities Based on Feature Extraction and Random Forest Classification[J]. 数据分析与知识发现, 2017, 1(11): 62-74.
[13] Zhang Liyi, Zhang Jiao. A Brusher Detection Method Based on Principle Component Analysis and Random Forest[J]. 现代图书情报技术, 2015, 31(10): 65-71.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn