Please wait a minute...
Advanced Search
现代图书情报技术  2015, Vol. 31 Issue (9): 9-16    DOI: 10.11925/infotech.1003-3513.2015.09.02
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
融合主题模型及多时间节点函数的用户兴趣预测研究
桂思思1, 陆伟1,2, 黄诗豪1, 周鹏程1
1 武汉大学信息管理学院 武汉 430072;
2 武汉大学信息资源研究中心 武汉 430072
User Interest Prediction Combing Topic Model and Multi-time Function
Gui Sisi1, Lu Wei1,2, Huang Shihao1, Zhou Pengcheng1
1 School of Information Management, Wuhan University, Wuhan 430072, China;
2 Center for the Studies of Information Resources, Wuhan University, Wuhan 430072, China
全文: PDF(457 KB)   HTML  
输出: BibTeX | EndNote (RIS)      
摘要 

[目的]针对用户兴趣随时间推移不断变化的问题, 利用主题模型及时间节点函数预测用户兴趣。[方法]使用主题模型生成用户兴趣, 针对用户的所有兴趣, 分别利用多时间节点函数对每个兴趣的每次出现进行加权, 用以预测用户兴趣在下一个时间节点的分布情况。[结果]在Sogou搜索日志上, 与基于记忆的用户兴趣模型、基于遗忘曲线的用户兴趣度多阶段量化模型进行对比实验, 余弦相似度及KL(Kullback-Leibler)距离均表明本文方法能较准确地预测用户兴趣。[局限]仅在Sogou搜索日志上进行实验测试, 还需在其他数据集上进一步检验。[结论]充分考虑用户历史数据中每一个时间点可更准确地对用户兴趣进行预测。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
Abstract

[Objective] User interest is not static and it changes dynamically as time goes by, this paper proposes a user interest prediction model based on topic model and multi-time function. [Methods] Generate user interests by topic model, and calculate the weights of each user interest at every time point by applying multi-time function in order to predict user interest at next time point. [Results] Compared with memory-based user profile model and multi-step user profile model, cosine similarity and Kullback-Leibler divergence of the experimental results on search engine log data provided by Sogou Lab show that this model can predict user interests more effectively. [Limitations] The proposed method is only tested on search engine log data provided by Sogou Lab, and it need further examination on other data sets. [Conclusions] It is more effective to take every time point of user history data into consideration for user interest prediction.

收稿日期: 2015-04-03     
:  TP393  
基金资助:

本文系教育部人文社会科学基地重大项目“面向细粒度的网络信息检索模型及框架构建研究”(项目编号:10JJD630014)和国家自然科学基金面上项目“面向词汇功能的学术文本语义识别与知识图谱构建”(项目编号:71473183)的研究成果之一。

通讯作者: 陆伟, ORCID: 0000-0002-0929-7416, E-mail: weilu@whu.edu.cn。     E-mail: weilu@whu.edu.cn
作者简介: 作者贡献声明:桂思思:提出研究命题,设计实施方案,数据分析处理,论文起草与修订;陆伟:设计研究方案,论文最终版本修订;黄诗豪:Sogou数据集预处理,使用主题模型生成用户兴趣;周鹏程:在Sogou数据集上实现基于记忆的用户兴趣模型、基于遗忘曲线的用户兴趣度多阶段量化模型。
引用本文:   
桂思思, 陆伟, 黄诗豪, 周鹏程. 融合主题模型及多时间节点函数的用户兴趣预测研究[J]. 现代图书情报技术, 2015, 31(9): 9-16.
Gui Sisi, Lu Wei, Huang Shihao, Zhou Pengcheng. User Interest Prediction Combing Topic Model and Multi-time Function. New Technology of Library and Information Service, DOI:10.11925/infotech.1003-3513.2015.09.02.
链接本文:  
http://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.1003-3513.2015.09.02

[1] 冯子威. 用户兴趣建模的研究[D]. 哈尔滨: 哈尔滨工业大学, 2010. (Feng Ziwei. Research on User Interests Modeling [D]. Harbin: Harbin Institute of Technology, 2010.)
[2] 杨杰, 陈恩红. 面向个性化服务的用户兴趣偏移检测及处理方法[J]. 电子技术, 2009(11): 72-76, 63. (Yang Jie, Chen Enhong. Personalized Service Oriented User Interest Shift Detection and Processing [J]. Electronic Technology, 2009(11):
72-76, 63.)
[3] Ahmed A, Low Y, Aly M, et al. Scalable Distributed Inference of Dynamic User Interests for Behavioral Targeting [C]. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2011: 114-122.
[4] Veningston K, Shanmugalakshmi R. Combining User Interested Topic and Document Topic for Personalized Information Retrieval [A]. //Big Data Analytics [M]. Springer International Publishing, 2014: 60-79.
[5] Sakamoto S, Mikawa K, Goto M. A Study on Recommender System Based on Latent Class Model for High Dimensional and Sparse Data [C]. In: Proceedings of the 14th Asia Pacific Industrial Engineering and Management Society Conference, Cebu, Philippines. 2013.
[6] Pennacchiotti M, Gurumurthy S. Investigating Topic Models for Social Media User Recommendation [C]. In: Proceedings of the 20th International Conference Companion on World Wide Web. ACM, 2011: 101-102.
[7] Liu Q, Chen E H, Xiong H, et al. Enhancing Collaborative Filtering by User Interest Expansion via Personalized Ranking [J]. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 2012, 42(1): 218-233.
[8] Mao Q, Feng B, Pan S. Modeling User Interests Using Topic Model [J]. Journal of Theoretical and Applied Information Technology, 2013, 48(1): 600-606.
[9] Ding W, Chen C. Dynamic Topic Detection and Tracking: A Comparison of HDP, C-word, and Cocitation Methods [J]. Journal of the Association for Information Science and Technology, 2014, 65(10): 2084-2097.
[10] Lee T Q, Park Y, Park Y T. A Time-Based Approach to Effective Recommender Systems Using Implicit Feedback [J]. Expert Systems with Applications, 2008, 34(4): 3055-3062.
[11] Lee T Q, Park Y, Park Y T. An Empirical Study on Effectiveness of Temporal Information as Implicit Ratings [J]. Expert Systems with Applications, 2009, 36(2): 1315-1321.
[12] Widmer G, Kubat M. Learning in the Presence of Concept Drift and Hidden Contexts [J]. Machine Learning, 1996, 23(1): 69-101.
[13] 郝水龙, 吴共庆, 胡学钢. 基于层次向量空间模型的用户兴趣表示及更新[J]. 南京大学学报:自然科学版, 2012, 48(2): 190-197. (Hao Shuilong, Wu Gongqing, Hu Xuegang. Presentation and Updation for User Profile Based on Hierarchical Vector Space Model [J]. Journal of Nanjing University: Natural Sciences, 2012, 48(2):190-197.)
[14] 宋丽哲, 牛振东, 余正涛, 等. 一种基于混合模型的用户兴趣漂移方法[J]. 计算机工程, 2006, 32(1): 4-6,89. (Song Lizhe, Niu Zhendong, Yu Zhengtao. A Method of Drifting User's Interests Based on Hybrid Model [J]. Computer Engineering, 2006, 32(1): 4-6,89.)
[15] 布红艳, 王国胤, 董振兴. 邮件系统中的兴趣漂移混合模型[J]. 计算机工程与设计, 2011, 32(12): 4026-4029. (Bu Hongyan, Wang Guoyin, Dong Zhenxing. Hybrid Interest Drifting Model of E-mail Systems [J]. Computer Engineering and Design, 2011,32(12): 4026-4029.)
[16] Maloof M A, Michalski R S. Selecting Examples for Partial Memory Learning [J]. Machine Learning, 2000, 41(1): 27-52.
[17] Koychev I. Gradual Forgetting for Adaptation to Concept Drift [C]. In: Proceedings of ECAI 2000 Workshop on Current Issues in Spatio-Temporal Reasoning, Berlin, Germany. 2000.
[18] Koychev I, Schwab I. Adaptation to Drifting User's Interests [C]. In: Proceedings of ECML2000 Workshop: Machine Learning in New Information Age. 2000: 39-46.
[19] Chen Z, Jiang Y, Zhao Y. A Collaborative Filtering Recommendation Algorithm Based on User Interest Change and Trust Evaluation [J]. International Journal of Digital Content Technology and Its Applications, 2010, 4(9): 106-113.
[20] Zheng N, Li Q. A Recommender System Based on Tag and Time Information for Social Tagging Systems [J]. Expert Systems with Applications, 2011, 38(4): 4575-4587.
[21] Zhang Y, Liu Y. A Collaborative Filtering Algorithm Based on Time Period Partition [C]. In: Proceedings of the 3rd International Symposium on Intelligent Information Technology and Security Informatics, Jinggangshan, China. IEEE, 2010: 777-780.
[22] Karahodza B, Supic H, Donko D. An Approach to Design of Time-Aware Recommender System Based on Changes in Group User's Preferences [C]. In: Proceedings of the 2014 X International Symposium on Telecommunications. IEEE, 2014: 1-4.
[23] Wang Q, Sun M, Xu C. An Improved User-Model-Based Collaborative Filtering Algorithm [J]. Journal of Information and Computational Science, 2011, 8(10): 1837-1846.
[24] 邢春晓, 高凤荣, 战思南, 等. 适应用户兴趣变化的协同过滤推荐算法[J]. 计算机研究与发展, 2007, 44(2): 296-301. (Xing Chunxiao, Gao Fengrong, Zhan Sinan, et al. A Collaborative Filtering Recommendation Algorithm Incorporated with User Interest Change [J]. Journal of Computer Research and Development, 2007, 44(2): 296-301.)
[25] 于洪, 李转运. 基于遗忘曲线的协同过滤推荐算法[J]. 南京大学学报:自然科学版, 2010, 46(5): 520-527. (Yu Hong, Li Zhuanyun. A Collaborative Filtering Recommendation Algorithm Based on Forgetting Curve [J]. Journal of Nanjing University: Natural Sciences, 2010, 46(5): 520-527.)
[26] Wu Y K, Wang Y, Tang Z H. A Collaborative Filtering Recommendation Algorithm Based on Interest Forgetting Curve [J]. International Journal of Advancements in Computing Technology, 2012, 4(10): 148-157.
[27] Liu K, Chen W, Bu J, et al. User Modeling for Recommendation in Blogspace [C]. In: Proceedings of the 2007 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology Workshops. IEEE, 2007: 79-82.
[28] Cheng Y, Qiu G, Bu J, et al. Model Bloggers' Interests Based on Forgetting Mechanism [C]. In: Proceedings of the 17th International Conference on World Wide Web. ACM, 2008: 1129-1130.
[29] Rybak J, Balog K, Nørvåg K. Temporal Expertise Profiling [C]. In: Proceedings of the 36th European Conference on IR Research, Amsterdam, Netherlands. 2014: 540-546.
[30] Wu D, Zhao D, Zhang X. An Adaptive User Profile Based on Memory Model [C]. In: Proceedings of the 9th International Conference on Web-Age Information Management. IEEE, 2008: 461-468.
[31] Wang W, Zhao D, Luo H, et al. Mining User Interests in Web Logs of an Online News Service Based on Memory Model [C]. In: Proceedings of the 8th International Conference on Networking, Architecture and Storage. IEEE, 2013: 151-155.
[32] 于洪涛, 崔瑞飞, 董芹芹. 基于遗忘曲线的微博用户兴趣模型[J]. 计算机工程与设计, 2014, 35(10): 3367-3372, 3379. (Yu Hongtao, Cui Ruifei, Dong Qinqin. Micro-Blog User Interest Model Based on Forgetting Curve [J]. Computer Engineering and Design, 2014, 35(10): 3367-3372, 3379.)
[33] Hofmann T. Probabilistic Latent Semantic Indexing [C]. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 1999: 50-57.
[34] Blei D M, Ng A Y, Jordan M I. Latent Dirichlet Allocation [J]. Journal of Machine Learning Research, 2003, 3: 993-1022.
[35] 崔凯. 基于LDA的主题演化研究与实现[D]. 长沙: 国防科学技术大学, 2010. (Cui Kai. The Research and Implementation of Topic Evolution on LDA [D]. Changsha: National University of Defense Technology, 2010.)
[36] Ding Y, Li X. Time Weight Collaborative Filtering [C]. In: Proceedings of the 14th ACM International Conference on Information and Knowledge Management. ACM, 2005: 485-492.
[37] Cao J, Xia T, Li J, et al. A Density-Based Method for Adaptive LDA Model Selection [J]. Neurocomputing, 2009, 72(7-9): 1775-1781.
[38] Kullback S, Leibler R A. On Information and Sufficiency [J]. The Annals of Mathematical Statistics, 1951,22(1): 79-86.
[39] Jeong D H, Song M. Time Gap Analysis by the Topic Model-Based Temporal Technique [J]. Journal of Informetrics, 2014, 8(3): 776-790.
[40] Newman D, Asuncion A U, Smyth P, et al. Distributed Algorithms for Topic Models [J]. Journal of Machine Learning Research, 2009, 10: 1801-1828.

[1] 曾庆田,胡晓慧,李超. 融合主题词嵌入和网络结构分析的主题关键词提取方法 *[J]. 数据分析与知识发现, 2019, 3(7): 52-60.
[2] 夏立新,曾杰妍,毕崇武,叶光辉. 基于LDA主题模型的用户兴趣层级演化研究 *[J]. 数据分析与知识发现, 2019, 3(7): 1-13.
[3] 杨宁, 黄飞虎, 文奕, 陈云伟. 基于微博用户行为的观点传播模型[J]. 现代图书情报技术, 2015, 31(12): 34-41.
[4] 余昕聪, 李红莲, 吕学强. 本体上下位关系在招生问答机器人中的应用研究[J]. 现代图书情报技术, 2015, 31(12): 65-71.
[5] 王政军, 俞小怡, 金玉玲. 利用旁路监听技术约束数字资源过量下载[J]. 现代图书情报技术, 2015, 31(12): 95-100.
[6] 刘占兵, 肖诗斌. 基于用户兴趣模糊聚类的协同过滤算法[J]. 现代图书情报技术, 2015, 31(11): 12-17.
[7] 伍万坤, 吴清烈, 顾锦江. 基于EM-LDA综合模型的电商微博热点话题发现[J]. 现代图书情报技术, 2015, 31(11): 33-40.
[8] 强韶华, 吴鹏. 地域性差异视角下的网站分类用户心智模型空间性研究[J]. 现代图书情报技术, 2015, 31(11): 68-74.
[9] 秦学东. 基于Drupal的KVM私有云管理系统解决方案[J]. 现代图书情报技术, 2015, 31(11): 91-95.
[10] 吴江, 张劲帆. 社会网络三元结构中关注影响力研究——以学生关系网络为例[J]. 现代图书情报技术, 2015, 31(10): 72-80.
[11] 姜春涛. 自动标注中文专利的引文信息[J]. 现代图书情报技术, 2015, 31(10): 81-87.
[12] 王颖, 张智雄, 李传席, 刘毅, 汤怡洁, 周子健, 钱力, 付鸿鹄. 科技知识组织体系开放引擎系统的设计与实现[J]. 现代图书情报技术, 2015, 31(10): 95-101.
[13] 秦晓慧, 乐小虬. 面向单篇文献引文网络的主题来源与走向追踪[J]. 现代图书情报技术, 2015, 31(9): 52-59.
[14] 邓启平, 王小梅. 利用LeaderRank识别有影响力的作者[J]. 现代图书情报技术, 2015, 31(9): 60-67.
[15] 郑海山. 图书馆数据中心基础架构部署自动化系统[J]. 现代图书情报技术, 2015, 31(9): 97-101.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn