Data Analysis and Knowledge Discovery  2022, Vol. 6 Issue (7): 56-69    DOI: 10.11925/infotech.2096-3467.2021.1449
Mining Online User Profiles and Self-Presentations: Case Study of NetEase Music Community
Wu Jiang1,2,3,Liu Tao3,Liu Yang1,3()
1Center for Studies of Information Resources, Wuhan University, Wuhan 430072, China
2Center for E-commerce Research and Development, Wuhan University, Wuhan 430072, China
3School of Information Management, Wuhan University, Wuhan 430072, China
[Objective] This paper explores patterns, evolutionary laws, group differences and influences on community recognition of online users’ self-presentation topics. [Methods] Firstly, we identified online users of NetEase music community and constructed their profiles from the perspectives of qualification and participation. Then, we adopted the BERT model to cluster users’ short comments, and identified their self-presentation topics. Third, we utilized cosine similarity to analyze the evolution of topics and group differences. Finally, we used covariance to analyze the impacts of self-presentation topics on community recognition. [Results] There are eight self-presentation topics, while the proportion of “reviews” decreased and “recollection” increased. “Interaction”topics were more popular in “relax” style than in others. The proportion of each topic at different time was almost the same. Under the themes of “recollection”, the cosine similarity value of quality users was higher than those of other users. The cosine similarity of continuous participants was higher than those of the inactive participants. The impact of users’ self-presentation topics on their community recognition was significant at the 0.1 level. [Limitations] More research is needed to examine users of other online communities. [Conclusions] “Recollection” is the most popular one among users’ self-presentation topics, which are affected by styles and time. There was a diversity trend for the topics with the development of the community, as well as obvious differences among user groups.

Key wordsSelf-Presentation      User Profile      BERT Topic Clustering      Group Differences      Online Community     
Received: 24 December 2021      Published: 24 August 2022
ZTFLH:  F49 G203  
Fund:Key Projects of Philosophy and Social Sciences Research, Ministry of Education(20JZD024)
Corresponding Authors: Liu Yang,ORCID:0000-0002-9410-1755     E-mail:

Wu Jiang, Liu Tao, Liu Yang. Mining Online User Profiles and Self-Presentations: Case Study of NetEase Music Community. Data Analysis and Knowledge Discovery, 2022, 6(7): 56-69.

Research Framework of Users’ Self-presentation in Online Community
Number of Comments for Different Style
Number of Comments for Different Length
属性 字段 英文名称
用户资历 注册时间 days
粉丝数 fans
歌单被订阅数 subscribe
用户付费 vip
用户付费等级数 viplevel
用户参与度 用户创建歌单数 playlist
用户创建动态数 event
关注数 follows
Variables Definition
Construction Method of User Profile
fans 0.020 8
subscribe 0.015 3
viplevel 0.244 4
Pearson Values Between days and fans, subscribe, viplevel
数值 fans(比例) subscribe(比例)
[0,5) 46.13% 87.21%
[0,10) 67.70% 92.77%
[0,20) 84.19% 95.70%
[0,50) 94.35% 97.62%
[0,100) 97.10% 98.37%
Distribution of fans, subscribe
Clustering Results of BERT and LDA
主题识别 含义 主题 占比 关键词
回忆往事 与用户过往经历有关的故事,如爱情、亲情、学生时代的经历等 Topic 1 3.95% 男孩、女孩、喜欢、朋友、分手
Topic 4 4.40% 小学、学生、同学、学习、音乐
Topic 8 7.26% 对不起、我爱你、放弃、别人
Topic 10 8.69% 感觉、也许、时间、再也无法
Topic 14 5.93% 高三、学校、三年、想起
Topic 16 2.81% 初中、女孩、学生、暑假、脑海
Topic 25 5.10% 变好、姑娘、不够、埋怨、真心
人生感悟 用户抒发的人生感想与体会 Topic 3 4.86% 希望、世界、孤独、放弃、发现
Topic 24 7.43% 思念、淡化、每个人、永远
留言 用户借歌曲评论区留言祈福、设定目标等 Topic 2 3.67% 高考、一年、加油、时间、大学
Topic 6 2.47% 努力、鼓励、考生、可能、转折
Topic 13 7.01% 想要、决定、做梦、目标、向前
歌曲信息 与歌曲相关的信息,如歌手、歌曲推荐等 Topic 18 0.19% 几首歌、谢安琪、欢乐颂、老樊
Topic 20 1.74% 重温、风格、韵律、原曲、吉他
Topic 26 0.67% 声音、纯音、佳作、创作、理解
听后感 用户对歌曲的评价及歌曲给自身带来的感受 Topic 7 5.47% 听到、好听、一首歌、循环
Topic 17 4.55% 好难过、挥之不去、歌单、那句
Topic 23 1.84% 小众、不敢、平静、温馨、怀念
寻求互动 用户表露互动的行为,如求赞等 Topic 15 0.04% 上午好、中午好、祝老板、点赞
Topic 21 0.10% 网恋么、有没有、有人么、陪你
天马行空 用户天马行空的想法与评论,一般与歌曲无太多的关系 Topic 9 0.32% 周游、摇滚、大佬、战袍、兰姨
Topic 11 4.36% 嘿嘿、豪任、摇起来、呵呵
Topic 12 0.03% 难熬、寡呱、打卡、指挥官
Topic 19 8.38% 抵挡、返回、红蜡烛、提醒
Topic 22 2.47% 苏联、红军、多边形、国民
当前状态 用户当下所处的环境或状态等 Topic 5 6.26% 晚上、生日、降温、加班、现在
Topics of Users’ Self-presentation
Distribution of Users’ Self-presentation Topics at Different Year
Distribution of Users’ Self-presentation Topics at Different Style
Proportion of Users’ Self-presentation Topics at Different Style
Distribution of Users’ Self-presentation Topics at Different Time
Proportion of Users’ Self-presentation Topics at Different Time
主题 L1 L2 L3 L4
Topic 18 1.58% 2.09% 2.51% 3.92%
Topic 1 3.23% 4.24% 5.14% 8.09%
Topic 4 3.68% 4.87% 5.87% 9.15%
Topic 8 3.57% 4.74% 5.72% 9.06%
Topic 10 3.51% 4.66% 5.58% 8.74%
Topic 14 3.53% 4.70% 5.68% 8.84%
Topic 16 1.77% 2.36% 2.85% 4.40%
Topic 25 3.74% 4.93% 5.94% 9.32%
Topic 2 1.39% 1.85% 2.29% 3.50%
Topic 6 2.11% 2.81% 3.39% 5.24%
Topic 13 3.70% 4.90% 5.91% 9.25%
Topic 3 3.58% 4.77% 5.71% 8.97%
Topic 24 3.67% 4.89% 5.87% 9.17%
Topic 9 1.11% 1.45% 1.75% 2.77%
Topic 11 3.18% 4.19% 5.06% 8.02%
Topic 22 2.79% 3.70% 4.47% 6.92%
Topic 21 1.67% 2.19% 2.66% 4.17%
Cosine Similarity Between Users’ Qualification and Self-presentation Topics
主题 边缘参与者 初始参与者 持续参与者
Topic 5 2.68% 2.41% 16.02%
Topic 23 2.69% 2.40% 15.55%
Topic 26 2.67% 2.39% 15.54%
Topic 1 2.67% 2.40% 15.63%
Topic 16 1.48% 1.32% 8.59%
Topic 2 1.16% 1.02% 6.84%
Topic 6 1.76% 1.57% 10.22%
Topic 9 0.91% 0.81% 5.36%
Topic 19 2.48% 2.24% 15.04%
Topic 15 0.38% 0.34% 2.26%
Topic 21 1.38% 1.24% 8.07%
Cosine Similarity Between Users’ Participation and Self-presentation Topics
Partial SS df MS F Prob>F
Model 7.22×107 136 531 045 4.9 0.00***
topic 1.41×106 7 201 809 1.9 0.07*
style 3.53×106 11 320 468 2.9 0.00***
year 2.73×107 6 4 543 720 41.8 0.00***
comment_num 2.29×105 1 228 765 2.1 0.15
year×topic 7.13×106 38 187 523 1.7 0.00***
style×topic 9.28×106 73 127 170 1.2 0.15
Residual 1.17×109 10 703 108 820
Covariance Analysis Results
