Please wait a minute...
Data Analysis and Knowledge Discovery  2019, Vol. 3 Issue (7): 1-13    DOI: 10.11925/infotech.2096-3467.2018.1065
Current Issue | Archive | Adv Search |
Identifying Hierarchy Evolution of User Interests with LDA Topic Model
Lixin Xia,Jieyan Zeng(),Chongwu Bi,Guanghui Ye
School of Information Management, Central China Normal University, Wuhan 430079, China
Download: PDF (700 KB)   HTML ( 37
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] his study explores the structure of user interest hierarchy, as well as its evolution laws, aiming to improve the quality of personalized information services. [Methods] First, we used the LDA topic model to retrieve the topics of users’ tags. Then, we calculated the tag’s degree of interests, which were combined with their topics to identify user’s interests. Finally, we created the “core-edge” structure for user’s interests based on the interest network to analyze the evolution laws of their hierarchy. [Results] The “core-edge” structure of user’s interests gradually converged and became stable with the determination of interest domain. The evolution of user interest hierarchy in time series mainly included three types: always in the core layer, the core layer faded to the edge layer, and the edge layer promoted to the core layer. [Limitations] More research is needed to predict user’s interests in future time nodes. [Conclusions] This proposed method could accurately evaluate the existing users’ dynamic interests, and the evolution laws of their hierarchy, which optimizes personalized information services.

Key wordsSocial Tags      LDA      User Interest      Hierarchical Structure     
Received: 25 September 2018      Published: 06 September 2019
ZTFLH:  TP393  
Corresponding Authors: Jieyan Zeng     E-mail: aryuki@163.com

Cite this article:

Lixin Xia,Jieyan Zeng,Chongwu Bi,Guanghui Ye. Identifying Hierarchy Evolution of User Interests with LDA Topic Model. Data Analysis and Knowledge Discovery, 2019, 3(7): 1-13.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2018.1065     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2019/V3/I7/1

主题1 主题2 主题3 主题4 主题5
rock 0.397 piano 0.286 MeFiMusicChallenge 0.304 guitar 0.406 experimental 0.239
parody 0.048 mandolin 0.051 Americana 0.021 pop 0.201 140bpm 0.052
traditional 0.048 powerpop 0.048 SwampVST 0.018 trance 0.030 rap 0.047
future 0.024 Instrumental 0.030 PS1 0.018 Canada 0.012 whoamI 0.022
hardrock 0.018 takemetochurch 0.022 FPPCover 0.018 dirge 0.009 fretlessbanjo 0.022
135bpm 0.018 acoustic 0.016 cosmic 0.018 lucky 0.007 synthpop 0.015
vocoder 0.013 cassette 0.012 rpm2016 0.018 intro 0.005 mixing 0.012
trippy 0.009 standard 0.005 SoundCollage 0.018 vocalsonly 0.005 CocteauTwins 0.010
may 0.007 cover 0.003 mooc 0.018 holidaymusic2012 0.004 losangeles 0.008
amen 0.007 ukulele 0.003 bedroompop 0.018 acoustic 0.004 minnesota 0.008
主题6 主题7 主题8 主题9 主题10
synthwave 0.043 cover 0.351 electronic 0.259 acoustic 0.421 country 0.128
jazz 0.030 folk 0.269 live 0.108 80s 0.025 lo-fi 0.054
MusicByWomen 0.024 scifi 0.009 demo 0.102 cat 0.023 citysongs 0.041
lilfriendys 0.024 musical 0.006 drone 0.091 anthem 0.013 Ummagma 0.031
newtime 0.024 tenorukulele 0.006 harp 0.028 musicchallenge 0.013 onetake 0.030
70bpm 0.024 mp3 0.006 analog 0.021 nudisco 0.013 hardcore 0.026
interactive 0.024 powerpop 0.002 long 0.015 scifi 0.010 heartbreak 0.023
dreamy 0.019 annoying 0.001 okcomputer 0.012 murder 0.007 sufjan 0.023
disco 0.017 wizard 0.001 electricguitar 0.012 upupup 0.006 112bpm 0.022
autoharp 0.013 experimental 0.001 pony 0.007 death 0.006 altrock 0.020
序号 用户ID 绝对中心度 相对中心度 中介性
1 174730 27 376 1.719 0.075
2 186265 20 393 1.280 0.056
3 46851 9 878 0.620 0.027
4 7418 9 674 0.607 0.027
5 17619 9 336 0.586 0.026
…… …… …… …… ……
46 148146 1 464 0.092 0.004
47 77623 1 432 0.090 0.004
48 39114 1 417 0.089 0.004
49 189309 1 385 0.087 0.004
50 11806 1 345 0.084 0.004
标签 兴趣强度 兴趣稳定性 兴趣度
dreampop 0.068 0.128 0.128
indie 0.061 0.123 0.123
alternative 0.061 0.114 0.114
indietronic 0.020 0.051 0.051
indiepop 0.027 0.045 0.045
synthpop 0.041 0.045 0.045
lofi 0.007 0.042 0.042
electronic 0.020 0.041 0.041
shoegaze 0.027 0.036 0.036
altpop 0.068 0.034 0.034
序号 相似度 序号 相似度
主题32 0.847 主题37 0.153
主题43 0.430 主题18 0.129
主题48 0.298 主题34 0.083
主题5 0.224 主题25 0.051
主题50 0.181 主题12 0.047
时间(月) 兴趣数量(个) 同现关系数量(对)
2013.01-2013.06 20 352
2013.07-2013.12 21 390
2014.01-2014.06 29 792
2014.07-2014.12 30 846
2015.01-2015.06 31 818
2015.07-2015.12 30 792
2016.01-2016.06 29 580
2016.07-2016.12 30 526
时间(月) 核心层兴趣
数量(个)
边缘层兴趣
数量(个)
核心/边缘比
2013.01-2013.06 14 6 2.333
2013.07-2013.12 15 6 2.500
2014.01-2014.06 24 5 4.800
2014.07-2014.12 25 5 5.000
2015.01-2015.06 19 12 1.583
2015.07-2015.12 18 12 1.500
2016.01-2016.06 16 13 1.231
2016.07-2016.12 16 14 1.143
兴趣 绝对点度中心度 相对点度中心度
Topic5 34 0.031
Topic6 34 0.031
Topic15 34 0.031
Topic18 34 0.031
Topic22 34 0.031
Topic24 34 0.031
Topic25 34 0.031
Topic37 33 0.030
Topic7 32 0.029
Topic48 30 0.027
兴趣 2013.06 2013.12 2014.06 2014.12 2015.06 2015.12 2016.06 2016.12
Topic5
Topic15
Topic24
Topic48
Topic18
Topic25
Topic7
Topic37
Topic22
Topic6
[1] 晏杰, 亓文娟, 郭磊 , 等. 基于多最小支持度的关联规则挖掘[J]. 计算机系统应用, 2014,23(3):237-239, 219.
[1] ( Yan Jie, Qi Wenjuan, Guo Lei , et al. Based on Multiple Minimum Supports of Association Rules in Data Mining[J]. Computer Systems & Applications, 2014,23(3):237-239, 219.)
[2] 张秀杰, 朱克珊, 李钢 . 基于标签、得分和偏好时效性的项目推荐方法[J]. 计算机系统应用, 2012,21(3):202-205, 110.
[2] ( Zhang Xiujie, Zhu Keshan, Li Gang . Item Recommendation Method Based on Tag, Rating and Preference Timeliness[J]. Computer Systems & Applications, 2012,21(3):202-205, 110.)
[3] Gemmell J, Schimoler T, Mobasher B , et al. Resource Recommendation in Social Annotation Systems: A Linear-Weighted Hybrid Approach[J]. Journal of Computer and System Sciences, 2012,78(4):1160-1174.
[4] 吴小兰, 章成志 . 结合用户关系网和标签共现网的微博用户标签推荐研究[J]. 情报学报, 2015,34(5):459-465.
[4] ( Wu Xiaolan, Zhang Chengzhi . Microblogger Tag Predication Based on User Network and Tag Co-occurrence Network[J]. Journal of the China Society for Scientific and Technical Information, 2015,34(5):459-465.)
[5] 孙海真, 谢颖华 . 基于情景和浏览内容的层次性用户兴趣建模[J]. 计算机系统应用, 2017,26(1):152-156.
[5] ( Sun Haizhen, Xie Yinghua . Hierarchical User Interest Modeling Based on Context and Browse Content[J]. Computer Systems & Applications, 2017,26(1):152-156.)
[6] Lee J, Lee K , Kim J G. Personalized Academic Research Paper Recommendation System[OL]. arXiv Preprint. arXiv: 1304. 5457, 2013.
[7] Borgatti S P . Centrality and Network Flow[J]. Social Networks, 2005,27(1):55-71.
[8] Li X, Guo L, Zhao Y E. Tag-Based Social Interest Discovery [C]// Proceedings of the 17th International Conference on World Wide Web. ACM, 2008: 675-684.
[9] Jäschke R, Marinho L, Hotho A, et al. Tag Recommendations in Folksonomies [C]// Proceedings of the 11th European Conference on Principles of Data Mining and Knowledge Discovery. 2007: 506-514.
[10] 赵开慧 . 基于社会化标注的个性化信息推荐方法研究[J]. 情报科学, 2015,33(6):39-42.
[10] ( Zhao Kaihui , Recommendation Method of Personalized Information Based on Socialized Tagging[J]. Information Science, 2015,33(6):39-42.)
[11] Kim Y, Shim K . TWILITE: A Recommendation System for Twitter Using a Probabilistic Model Based on Latent Dirichlet Allocation[J]. Information Systems, 2014,42:59-77.
[12] Jayarathna S, Patra A, Shipman F. Mining User Interest from Search Tasks and Annotations [C]// Proceedings of the 22nd ACM International Conference on Information & Knowledge Management. ACM, 2013: 1849-1852.
[13] 唐晓波, 祝黎, 谢力 . 基于主题的微博二级好友推荐模型研究[J]. 图书情报工作, 2014,58(9):105-113.
[13] ( Tang Xiaobo, Zhu Li, Xie Li . Two-level MicroBlog Friend Recommendation Based on Topic Model[J]. Library and Information Service, 2014,58(9):105-113.)
[14] Widmer G, Kubat M . Learning in the Presence of Concept Drift and Hidden Contexts[J]. Machine Learning, 1996,23(1):69-101.
[15] Xu S, Shi Q, Qiao X , et al. A Dynamic Users’ Interest Discovery Model with Distributed Inference Algorithm[J]. International Journal of Distributed Sensor Networks, 2014, 10(4): Article ID 280892.
[16] Liu X, Turtle H . Real-Time User Interest Modeling for Real-Time Ranking[J]. Journal of the American Society for Information Science & Technology, 2013,64(8):1557-1576.
[17] Li H, Fang L, Wang P, et al. Longitudinal Data Based Research on Web User Interests Drift Modeling [C]// Proceedings of the 2nd International Conference on Advances in Computer Science and Engineering. 2013.
[18] Heymann P, Garcia-Molina H . Collaborative Creation of Communal Hierarchical Taxonomies in Social Tagging Systems[R]. Technical Report of Stanford InfoLab. USA: Stanford InfoLab Publication Server, 2006.
[19] Cantador I, Szomszor M, Alani H, et al. Enriching Ontological User Profiles with Tagging History for Multi-Domain Recommendations [C]// Proceedings of the 1st International Workshop on Collective Semantics: Collective Intelligence and the Semantic Web. 2008: 5-19.
[20] Ding L, Finin T, Joshi A, et al. Swoogle: A Search and Metadata Engine for the Semantic Web [C]// Proceedings of the 13th ACM Conference on Information and Knowledge Management. ACM, 2004: 652-659.
[21] 房小可, 纪春光 . 基于标签主题和概念空间的个性化推荐研究[J]. 情报理论与实践, 2015,38(5):105-111.
[21] ( Fang Xiaoke, Ji Chunguang . Research on the Personalized Recommendation Based on Tag Topic and Concept Space[J]. Information Studies: Theory & Application, 2015,38(5):105-111.)
[22] 吴超 . 在线社会化网络的语义分析和语义社会网的构建[D]. 杭州: 浙江大学, 2010.
[22] ( Wu Chao . Semantic Analysis in Online Social Network and Construction of Semantic Social Networking[D]. Hangzhou: Zhejiang University, 2010.)
[23] Wu X, Zhang L, Yu Y. Exploring Social Annotations for the Semantic Web [C]// Proceedings of the 15th International Conference on World Wide Web. ACM, 2006: 417-426.
[24] Blei D M, Ng A Y, Jordan M I . Latent Dirichlet Allocation[J]. Journal of Machine Learning Research, 2003,3:993-1022.
[25] George H . Parameter Estimation for Text Analysis[R]. Technical Report of Fraunhofer IGD. 2009.
[26] 王卫平, 杨金侠 . 个性化信息服务中基于Tag的用户兴趣模型[J]. 计算机系统应用, 2011,20(2):80-84.
[26] ( Wang Weiping, Yang Jinxia . Model of User Profile Based on Tag in Personalized Information Service[J]. Computer Systems & Applications, 2011,20(2):80-84.)
[27] Ebbinghaus H . Memory: A Contribution to Experimental Psychology[J]. Annals of Neurosciences, 2013,20(4):155-156.
[28] 于洪涛, 崔瑞飞, 董芹芹 . 基于遗忘曲线的微博用户兴趣模型[J]. 计算机工程与设计, 2014,35(10):3367-3372, 3379.
doi: 10.3969/j.issn.1000-7024.2014.10.006
[28] ( Yu Hongtao, Cui Ruifei, Dong Qinqin . Micro-Blog User Interest Model Based on Forgetting Curve[J]. Computer Engineering and Design, 2014,35(10):3367-3372, 3379.)
doi: 10.3969/j.issn.1000-7024.2014.10.006
[29] 印桂生, 崔晓晖, 马志强 . 遗忘曲线的协同过滤推荐模型[J]. 哈尔滨工程大学学报, 2012,33(1):85-90.
doi: 10.3969/j.issn.1007-7043.201010018
[29] ( Yin Guisheng, Cui Xiaohui, Ma Zhiqiang . Forgetting Curve-Based Collaborative Filtering Recommendation Model[J]. Journal of Harbin Engineering University, 2012,33(1):85-90.)
doi: 10.3969/j.issn.1007-7043.201010018
[30] 叶鹰, 张力, 赵星 , 等. 用共关键词网络揭示领域知识结构的实验研究[J]. 情报学报, 2012,31(12):1245-1251.
[30] ( Ye Ying, Zhang Li, Zhao Xing , et al. An Experimental Study on Revealing Domain Knowledge Structure by Co-keyword Networks[J]. Journal of the China Society for Scientific and Technical Information, 2012,31(12):1245-1251.)
[31] 张瑞 . 网络信息半衰期测度研究述评[J]. 图书情报知识, 2009(1):97-100.
[31] ( Zhang Rui . Reviews on Half-life of Network Information[J]. Document, Information & Knowledge, 2009(1):97-100.)
[1] Li Yueyan,Wang Hao,Deng Sanhong,Wang Wei. Research Trends of Information Retrieval——Case Study of SIGIR Conference Papers[J]. 数据分析与知识发现, 2021, 5(4): 13-24.
[2] Yi Huifang,Liu Xiwen. Analyzing Patent Technology Topics with IPC Context-Enhanced Context-LDA Model[J]. 数据分析与知识发现, 2021, 5(4): 25-36.
[3] Wang Hongbin,Wang Jianxiong,Zhang Yafei,Yang Heng. Topic Recognition of News Reports with Imbalanced Contents[J]. 数据分析与知识发现, 2021, 5(3): 109-120.
[4] Wang Wei, Gao Ning, Xu Yuting, Wang Hongwei. Topic Evolution of Online Reviews for Crowdfunding Campaigns[J]. 数据分析与知识发现, 2021, 5(10): 103-123.
[5] Xi Yunjiang, Du Diedie, Liao Xiao, Zhang Xuehong. Analyzing & Clustering Enterprise Microblog Users with Supernetwork[J]. 数据分析与知识发现, 2020, 4(8): 107-118.
[6] Cai Yongming,Liu Lu,Wang Kewei. Identifying Key Users and Topics from Online Learning Community[J]. 数据分析与知识发现, 2020, 4(6): 69-79.
[7] Ye Guanghui,Zeng Jieyan,Hu Jinglan,Bi Chongwu. Analyzing Public Sentiments from the Perspective of City Profiles[J]. 数据分析与知识发现, 2020, 4(4): 15-26.
[8] Pan Youneng,Ni Xiuli. Recommending Online Medical Experts with Labeled-LDA Model[J]. 数据分析与知识发现, 2020, 4(4): 34-43.
[9] Liu Yuwen,Wang Kai. Finding Geographic Locations of Popular Online Topics[J]. 数据分析与知识发现, 2020, 4(2/3): 173-181.
[10] Huang Wei,Zhao Jiangyuan,Yan Lu. Empirical Research on Topic Drift Index for Trending Network Events[J]. 数据分析与知识发现, 2020, 4(11): 92-101.
[11] Ye Guanghui,Xu Tong,Bi Chongwu,Li Xinyue. Analyzing Evolution of City Tourism Portraits with Multi-Dimensional Features and LDA Model[J]. 数据分析与知识发现, 2020, 4(11): 121-130.
[12] Wang Xiwei,Zhang Liu,Huang Bo,Wei Ya’nan. Constructing Topic Graph for Weibo Users Based on LDA: Case Study of “Egypt Air Disaster”[J]. 数据分析与知识发现, 2020, 4(10): 47-57.
[13] Hongfei Ling,Shiyan Ou. Review of Automatic Labeling for Topic Models[J]. 数据分析与知识发现, 2019, 3(9): 16-26.
[14] Yunfei Shao,Dongsu Liu. Classifying Short-texts with Class Feature Extension[J]. 数据分析与知识发现, 2019, 3(9): 60-67.
[15] Peng Guan,Yuefen Wang,Zhu Fu. Analyzing Topic Semantic Evolution with LDA: Case Study of Lithium Ion Batteries[J]. 数据分析与知识发现, 2019, 3(7): 61-72.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn