Please wait a minute...
Advanced Search
数据分析与知识发现  2019, Vol. 3 Issue (7): 1-13    DOI: 10.11925/infotech.2096-3467.2018.1065
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
基于LDA主题模型的用户兴趣层级演化研究 *
夏立新,曾杰妍(),毕崇武,叶光辉
华中师范大学信息管理学院 武汉 430079
Identifying Hierarchy Evolution of User Interests with LDA Topic Model
Lixin Xia,Jieyan Zeng(),Chongwu Bi,Guanghui Ye
School of Information Management, Central China Normal University, Wuhan 430079, China
全文: PDF(700 KB)   HTML ( 31
输出: BibTeX | EndNote (RIS)      
摘要 

目的】探究用户兴趣层级结构, 揭示用户兴趣层级演化规律, 以提高个性化信息服务的质量, 满足用户信息需求。【方法】利用LDA主题模型获取用户标签主题; 通过定义标签兴趣度计算公式, 并结合提取的用户标签主题, 动态感知用户兴趣; 依据构建的兴趣网络划分用户兴趣核心-边缘结构, 进而分析用户兴趣层级结构的演化规律。【结果】用户兴趣的核心-边缘结构会随着用户兴趣领域的确定而逐渐收敛并趋于稳定。时间序列下用户兴趣层级的升降级演化主要包括始终处于核心层、核心层向边缘层淡化和边缘层向核心层晋升三种。【局限】基于已有用户兴趣层级演化规律进行未来时间节点下的用户兴趣预测和评估需要进一步探究。【结论】该方法能够更加精准地感知和预测 用户动态变化的兴趣, 评估时间序列下用户各兴趣程度的高低并划分用户兴趣层级, 进而得到用户兴趣层级演化规律, 有助于优化个性化信息服务。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
夏立新
曾杰妍
毕崇武
叶光辉
关键词 社会化标签LDA用户兴趣层级结构    
Abstract

[Objective] his study explores the structure of user interest hierarchy, as well as its evolution laws, aiming to improve the quality of personalized information services. [Methods] First, we used the LDA topic model to retrieve the topics of users’ tags. Then, we calculated the tag’s degree of interests, which were combined with their topics to identify user’s interests. Finally, we created the “core-edge” structure for user’s interests based on the interest network to analyze the evolution laws of their hierarchy. [Results] The “core-edge” structure of user’s interests gradually converged and became stable with the determination of interest domain. The evolution of user interest hierarchy in time series mainly included three types: always in the core layer, the core layer faded to the edge layer, and the edge layer promoted to the core layer. [Limitations] More research is needed to predict user’s interests in future time nodes. [Conclusions] This proposed method could accurately evaluate the existing users’ dynamic interests, and the evolution laws of their hierarchy, which optimizes personalized information services.

Key wordsSocial Tags    LDA    User Interest    Hierarchical Structure
收稿日期: 2018-09-25     
中图分类号:  TP393  
基金资助:*本文系国家社会科学基金重大项目“基于多维度聚合的网络资源知识发现研究”(13&ZD183);湖北省自然科学基金“基于社会化标签挖掘的智慧城市‘印象云’构建模式研究”(2018CFB387);中央高校基本科研业务费项目“基于社会化标签挖掘的城市画像研究”的研究成果之一(CCNU18QN040)
通讯作者: 曾杰妍     E-mail: aryuki@163.com
引用本文:   
夏立新,曾杰妍,毕崇武,叶光辉. 基于LDA主题模型的用户兴趣层级演化研究 *[J]. 数据分析与知识发现, 2019, 3(7): 1-13.
Lixin Xia,Jieyan Zeng,Chongwu Bi,Guanghui Ye. Identifying Hierarchy Evolution of User Interests with LDA Topic Model. Data Analysis and Knowledge Discovery, DOI:10.11925/infotech.2096-3467.2018.1065.
链接本文:  
http://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2018.1065
图1  用户兴趣动态感知模型构建
图2  用户兴趣度多阶段演化过程
图3  用户兴趣演化研究过程
主题1 主题2 主题3 主题4 主题5
rock 0.397 piano 0.286 MeFiMusicChallenge 0.304 guitar 0.406 experimental 0.239
parody 0.048 mandolin 0.051 Americana 0.021 pop 0.201 140bpm 0.052
traditional 0.048 powerpop 0.048 SwampVST 0.018 trance 0.030 rap 0.047
future 0.024 Instrumental 0.030 PS1 0.018 Canada 0.012 whoamI 0.022
hardrock 0.018 takemetochurch 0.022 FPPCover 0.018 dirge 0.009 fretlessbanjo 0.022
135bpm 0.018 acoustic 0.016 cosmic 0.018 lucky 0.007 synthpop 0.015
vocoder 0.013 cassette 0.012 rpm2016 0.018 intro 0.005 mixing 0.012
trippy 0.009 standard 0.005 SoundCollage 0.018 vocalsonly 0.005 CocteauTwins 0.010
may 0.007 cover 0.003 mooc 0.018 holidaymusic2012 0.004 losangeles 0.008
amen 0.007 ukulele 0.003 bedroompop 0.018 acoustic 0.004 minnesota 0.008
主题6 主题7 主题8 主题9 主题10
synthwave 0.043 cover 0.351 electronic 0.259 acoustic 0.421 country 0.128
jazz 0.030 folk 0.269 live 0.108 80s 0.025 lo-fi 0.054
MusicByWomen 0.024 scifi 0.009 demo 0.102 cat 0.023 citysongs 0.041
lilfriendys 0.024 musical 0.006 drone 0.091 anthem 0.013 Ummagma 0.031
newtime 0.024 tenorukulele 0.006 harp 0.028 musicchallenge 0.013 onetake 0.030
70bpm 0.024 mp3 0.006 analog 0.021 nudisco 0.013 hardcore 0.026
interactive 0.024 powerpop 0.002 long 0.015 scifi 0.010 heartbreak 0.023
dreamy 0.019 annoying 0.001 okcomputer 0.012 murder 0.007 sufjan 0.023
disco 0.017 wizard 0.001 electricguitar 0.012 upupup 0.006 112bpm 0.022
autoharp 0.013 experimental 0.001 pony 0.007 death 0.006 altrock 0.020
表1  主题-词聚类结果
序号 用户ID 绝对中心度 相对中心度 中介性
1 174730 27 376 1.719 0.075
2 186265 20 393 1.280 0.056
3 46851 9 878 0.620 0.027
4 7418 9 674 0.607 0.027
5 17619 9 336 0.586 0.026
…… …… …… …… ……
46 148146 1 464 0.092 0.004
47 77623 1 432 0.090 0.004
48 39114 1 417 0.089 0.004
49 189309 1 385 0.087 0.004
50 11806 1 345 0.084 0.004
表2  2013年-2016年核心用户Top50
图4  标签预测准确率对比
标签 兴趣强度 兴趣稳定性 兴趣度
dreampop 0.068 0.128 0.128
indie 0.061 0.123 0.123
alternative 0.061 0.114 0.114
indietronic 0.020 0.051 0.051
indiepop 0.027 0.045 0.045
synthpop 0.041 0.045 0.045
lofi 0.007 0.042 0.042
electronic 0.020 0.041 0.041
shoegaze 0.027 0.036 0.036
altpop 0.068 0.034 0.034
表3  时间点${{d}_{1}}$下用户标签兴趣度(Top10)
序号 相似度 序号 相似度
主题32 0.847 主题37 0.153
主题43 0.430 主题18 0.129
主题48 0.298 主题34 0.083
主题5 0.224 主题25 0.051
主题50 0.181 主题12 0.047
表4  用户“174730”在时间点${{d}_{1}}$的兴趣
时间(月) 兴趣数量(个) 同现关系数量(对)
2013.01-2013.06 20 352
2013.07-2013.12 21 390
2014.01-2014.06 29 792
2014.07-2014.12 30 846
2015.01-2015.06 31 818
2015.07-2015.12 30 792
2016.01-2016.06 29 580
2016.07-2016.12 30 526
表5  时间序列数据统计
时间(月) 核心层兴趣
数量(个)
边缘层兴趣
数量(个)
核心/边缘比
2013.01-2013.06 14 6 2.333
2013.07-2013.12 15 6 2.500
2014.01-2014.06 24 5 4.800
2014.07-2014.12 25 5 5.000
2015.01-2015.06 19 12 1.583
2015.07-2015.12 18 12 1.500
2016.01-2016.06 16 13 1.231
2016.07-2016.12 16 14 1.143
表6  时间序列核心-边缘兴趣统计
图5  finalfitness相关系数
兴趣 绝对点度中心度 相对点度中心度
Topic5 34 0.031
Topic6 34 0.031
Topic15 34 0.031
Topic18 34 0.031
Topic22 34 0.031
Topic24 34 0.031
Topic25 34 0.031
Topic37 33 0.030
Topic7 32 0.029
Topic48 30 0.027
表7  中心度排名前10位的用户兴趣
兴趣 2013.06 2013.12 2014.06 2014.12 2015.06 2015.12 2016.06 2016.12
Topic5
Topic15
Topic24
Topic48
Topic18
Topic25
Topic7
Topic37
Topic22
Topic6
表8  兴趣核心-边缘层级演化
[1] 晏杰, 亓文娟, 郭磊 , 等. 基于多最小支持度的关联规则挖掘[J]. 计算机系统应用, 2014,23(3):237-239, 219.
( Yan Jie, Qi Wenjuan, Guo Lei , et al. Based on Multiple Minimum Supports of Association Rules in Data Mining[J]. Computer Systems & Applications, 2014,23(3):237-239, 219.)
[2] 张秀杰, 朱克珊, 李钢 . 基于标签、得分和偏好时效性的项目推荐方法[J]. 计算机系统应用, 2012,21(3):202-205, 110.
( Zhang Xiujie, Zhu Keshan, Li Gang . Item Recommendation Method Based on Tag, Rating and Preference Timeliness[J]. Computer Systems & Applications, 2012,21(3):202-205, 110.)
[3] Gemmell J, Schimoler T, Mobasher B , et al. Resource Recommendation in Social Annotation Systems: A Linear-Weighted Hybrid Approach[J]. Journal of Computer and System Sciences, 2012,78(4):1160-1174.
[4] 吴小兰, 章成志 . 结合用户关系网和标签共现网的微博用户标签推荐研究[J]. 情报学报, 2015,34(5):459-465.
( Wu Xiaolan, Zhang Chengzhi . Microblogger Tag Predication Based on User Network and Tag Co-occurrence Network[J]. Journal of the China Society for Scientific and Technical Information, 2015,34(5):459-465.)
[5] 孙海真, 谢颖华 . 基于情景和浏览内容的层次性用户兴趣建模[J]. 计算机系统应用, 2017,26(1):152-156.
( Sun Haizhen, Xie Yinghua . Hierarchical User Interest Modeling Based on Context and Browse Content[J]. Computer Systems & Applications, 2017,26(1):152-156.)
[6] Lee J, Lee K , Kim J G. Personalized Academic Research Paper Recommendation System[OL]. arXiv Preprint. arXiv: 1304. 5457, 2013.
[7] Borgatti S P . Centrality and Network Flow[J]. Social Networks, 2005,27(1):55-71.
[8] Li X, Guo L, Zhao Y E. Tag-Based Social Interest Discovery [C]// Proceedings of the 17th International Conference on World Wide Web. ACM, 2008: 675-684.
[9] Jäschke R, Marinho L, Hotho A, et al. Tag Recommendations in Folksonomies [C]// Proceedings of the 11th European Conference on Principles of Data Mining and Knowledge Discovery. 2007: 506-514.
[10] 赵开慧 . 基于社会化标注的个性化信息推荐方法研究[J]. 情报科学, 2015,33(6):39-42.
( Zhao Kaihui , Recommendation Method of Personalized Information Based on Socialized Tagging[J]. Information Science, 2015,33(6):39-42.)
[11] Kim Y, Shim K . TWILITE: A Recommendation System for Twitter Using a Probabilistic Model Based on Latent Dirichlet Allocation[J]. Information Systems, 2014,42:59-77.
[12] Jayarathna S, Patra A, Shipman F. Mining User Interest from Search Tasks and Annotations [C]// Proceedings of the 22nd ACM International Conference on Information & Knowledge Management. ACM, 2013: 1849-1852.
[13] 唐晓波, 祝黎, 谢力 . 基于主题的微博二级好友推荐模型研究[J]. 图书情报工作, 2014,58(9):105-113.
( Tang Xiaobo, Zhu Li, Xie Li . Two-level MicroBlog Friend Recommendation Based on Topic Model[J]. Library and Information Service, 2014,58(9):105-113.)
[14] Widmer G, Kubat M . Learning in the Presence of Concept Drift and Hidden Contexts[J]. Machine Learning, 1996,23(1):69-101.
[15] Xu S, Shi Q, Qiao X , et al. A Dynamic Users’ Interest Discovery Model with Distributed Inference Algorithm[J]. International Journal of Distributed Sensor Networks, 2014, 10(4): Article ID 280892.
[16] Liu X, Turtle H . Real-Time User Interest Modeling for Real-Time Ranking[J]. Journal of the American Society for Information Science & Technology, 2013,64(8):1557-1576.
[17] Li H, Fang L, Wang P, et al. Longitudinal Data Based Research on Web User Interests Drift Modeling [C]// Proceedings of the 2nd International Conference on Advances in Computer Science and Engineering. 2013.
[18] Heymann P, Garcia-Molina H . Collaborative Creation of Communal Hierarchical Taxonomies in Social Tagging Systems[R]. Technical Report of Stanford InfoLab. USA: Stanford InfoLab Publication Server, 2006.
[19] Cantador I, Szomszor M, Alani H, et al. Enriching Ontological User Profiles with Tagging History for Multi-Domain Recommendations [C]// Proceedings of the 1st International Workshop on Collective Semantics: Collective Intelligence and the Semantic Web. 2008: 5-19.
[20] Ding L, Finin T, Joshi A, et al. Swoogle: A Search and Metadata Engine for the Semantic Web [C]// Proceedings of the 13th ACM Conference on Information and Knowledge Management. ACM, 2004: 652-659.
[21] 房小可, 纪春光 . 基于标签主题和概念空间的个性化推荐研究[J]. 情报理论与实践, 2015,38(5):105-111.
( Fang Xiaoke, Ji Chunguang . Research on the Personalized Recommendation Based on Tag Topic and Concept Space[J]. Information Studies: Theory & Application, 2015,38(5):105-111.)
[22] 吴超 . 在线社会化网络的语义分析和语义社会网的构建[D]. 杭州: 浙江大学, 2010.
( Wu Chao . Semantic Analysis in Online Social Network and Construction of Semantic Social Networking[D]. Hangzhou: Zhejiang University, 2010.)
[23] Wu X, Zhang L, Yu Y. Exploring Social Annotations for the Semantic Web [C]// Proceedings of the 15th International Conference on World Wide Web. ACM, 2006: 417-426.
[24] Blei D M, Ng A Y, Jordan M I . Latent Dirichlet Allocation[J]. Journal of Machine Learning Research, 2003,3:993-1022.
[25] George H . Parameter Estimation for Text Analysis[R]. Technical Report of Fraunhofer IGD. 2009.
[26] 王卫平, 杨金侠 . 个性化信息服务中基于Tag的用户兴趣模型[J]. 计算机系统应用, 2011,20(2):80-84.
( Wang Weiping, Yang Jinxia . Model of User Profile Based on Tag in Personalized Information Service[J]. Computer Systems & Applications, 2011,20(2):80-84.)
[27] Ebbinghaus H . Memory: A Contribution to Experimental Psychology[J]. Annals of Neurosciences, 2013,20(4):155-156.
[28] 于洪涛, 崔瑞飞, 董芹芹 . 基于遗忘曲线的微博用户兴趣模型[J]. 计算机工程与设计, 2014,35(10):3367-3372, 3379.
doi: 10.3969/j.issn.1000-7024.2014.10.006
( Yu Hongtao, Cui Ruifei, Dong Qinqin . Micro-Blog User Interest Model Based on Forgetting Curve[J]. Computer Engineering and Design, 2014,35(10):3367-3372, 3379.)
doi: 10.3969/j.issn.1000-7024.2014.10.006
[29] 印桂生, 崔晓晖, 马志强 . 遗忘曲线的协同过滤推荐模型[J]. 哈尔滨工程大学学报, 2012,33(1):85-90.
doi: 10.3969/j.issn.1007-7043.201010018
( Yin Guisheng, Cui Xiaohui, Ma Zhiqiang . Forgetting Curve-Based Collaborative Filtering Recommendation Model[J]. Journal of Harbin Engineering University, 2012,33(1):85-90.)
doi: 10.3969/j.issn.1007-7043.201010018
[30] 叶鹰, 张力, 赵星 , 等. 用共关键词网络揭示领域知识结构的实验研究[J]. 情报学报, 2012,31(12):1245-1251.
( Ye Ying, Zhang Li, Zhao Xing , et al. An Experimental Study on Revealing Domain Knowledge Structure by Co-keyword Networks[J]. Journal of the China Society for Scientific and Technical Information, 2012,31(12):1245-1251.)
[31] 张瑞 . 网络信息半衰期测度研究述评[J]. 图书情报知识, 2009(1):97-100.
( Zhang Rui . Reviews on Half-life of Network Information[J]. Document, Information & Knowledge, 2009(1):97-100.)
[1] 关鹏,王曰芬,傅柱. 基于LDA的主题语义演化分析方法研究 * ——以锂离子电池领域为例[J]. 数据分析与知识发现, 2019, 3(7): 61-72.
[2] 席林娜,窦永香. 基于计划行为理论的微博用户转发行为影响因素研究*[J]. 数据分析与知识发现, 2019, 3(2): 13-20.
[3] 张杰,赵君博,翟东升,孙宁宁. 基于主题模型的微藻生物燃料产业链专利技术分析*[J]. 数据分析与知识发现, 2019, 3(2): 52-64.
[4] 刘俊婉,龙志昕,王菲菲. 基于LDA主题模型与链路预测的新兴主题关联机会发现研究*[J]. 数据分析与知识发现, 2019, 3(1): 104-117.
[5] 杨贵军,徐雪,赵富强. 基于XGBoost算法的用户评分预测模型及应用*[J]. 数据分析与知识发现, 2019, 3(1): 118-126.
[6] 何跃,丰月,赵书朋,马玉凤. 基于知乎问答社区的内容推荐研究——以物流话题为例[J]. 数据分析与知识发现, 2018, 2(9): 42-49.
[7] 张涛,马海群. 一种基于LDA主题模型的政策文本聚类方法研究*[J]. 数据分析与知识发现, 2018, 2(9): 59-65.
[8] 徐艳华,苗雨洁,苗琳,吕学强. 基于LDA模型的HSK作文生成*[J]. 数据分析与知识发现, 2018, 2(9): 80-87.
[9] 曾子明,杨倩雯. 基于LDA和AdaBoost多特征组合的微博情感分析*[J]. 数据分析与知识发现, 2018, 2(8): 51-59.
[10] 庞贝贝,苟娟琼,穆文歆. 面向高校学生深度辅导领域的主题建模和主题上下位关系识别研究*[J]. 数据分析与知识发现, 2018, 2(6): 92-101.
[11] 王丽,邹丽雪,刘细文. 基于LDA主题模型的文献关联分析及可视化研究[J]. 数据分析与知识发现, 2018, 2(3): 98-106.
[12] 王璟琦,李锐,吴华意. 基于空间自相关的网络舆情话题演化时空规律分析*[J]. 数据分析与知识发现, 2018, 2(2): 64-73.
[13] 李贺,祝琳琳,闫敏,刘金承,洪闯. 开放式创新社区用户信息有用性识别研究*[J]. 数据分析与知识发现, 2018, 2(12): 12-22.
[14] 曲佳彬,欧石燕. 基于主题过滤与主题关联的学科主题演化分析*[J]. 数据分析与知识发现, 2018, 2(1): 64-75.
[15] 李真,丁晟春,王楠. 网络舆情观点主题识别研究*[J]. 数据分析与知识发现, 2017, 1(8): 18-30.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn