1. Computer School, Beijing Information Science and Technology University, Beijing 100101, China;
2. Beijing Key Laboratory of Internet Culture and Digital Dissemination Research, Beijing Information Science and Technology University, Beijing 100101, China
[Objective] Discovering the micro-blog user interests plays an important role in the personalized recommendation of micro-blog social network to improve users' satisfaction. [Methods] In this paper, apart from the data mining from the user's own micro-blog, analyze the data of the micro-blogs that followed by this user, as well as the social correlation among them. By computing the similarity between their micro-blogs and intimacy, uncover the user interests further. Also combine the results coming from the two aforementioned aspects to get the interest set of users. [Results] This paper experiments on the dataset gained from Sina Micro-blog, and the precision rate and recall rate rise both more than 15% compared with the traditional method. [Limitations] The stop words are not full in the process of data preprocessing, because of not realize the automatic learning the list of stop words. And needs manually tagging user interest set to calculate the precision rate and recall rate. [Conclusions] The experimental results show that the method is better than the traditional method, and it's more effective and accurate to discover user interests.
石伟杰, 徐雅斌. 微博用户兴趣发现研究[J]. 现代图书情报技术, 2015, 31(1): 52-58.
Shi Weijie, Xu Yabin. Research on Discovering Micro-blog User Interests. New Technology of Library and Information Service, 2015, 31(1): 52-58.
[1] Tang X, Zhang M, Yang C C. User Interest and Topic Detection for Personalized Recommendation [C]. In: Proceedings of the 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology, Macau, China. IEEE Computer Society, 2012: 442-446.
[2] Genc Y, Sakamoto Y, Nickerson J V. Discovering Context: Classifying Tweets through a Semantic Transform Based on Wikipedia [C]. In: Proceedings of the 6th International Conference on Foundations of Augmented Cognition: Directing the Future of Adaptive Systems, Orlando, USA. Springer Berlin Heidelberg, 2011: 484-492.
[3] Welch M J, Schonfeld U, He D, et al. Topical Semantics of Twitter Links [C]. In: Proceedings of the 4th ACM International Conference on Web Search and Data Mining (WSDM'11). New York: ACM, 2011: 327-336.
[4] Abel F, Gao Q, Houben G J, et al. Semantic Enrichment of Twitter Posts for User Profile Construction on the Social Web [C]. In: Proceedings of the 8th Extended Semantic Web Conference on the Semanic Web: Research and Pages (ESWC'11). Berlin, Heidelberg: Springer-Verlag, 2011: 375-389.
[5] Xu Z, Lu R, Xiang L, et al. Discovering User Interest on Twitter with a Modified Author-Topic Model [C]. In: Proceedings of the 2011 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT), Lyon, France. IEEE, 2011: 422-429.
[6] Michelson M, Macskassy S A. Discovering Users' Topics of Interest on Twitter: A First Look [C]. In: Proceedings of the 4th Workshop on Analytics for Noisy Unstructured Text Data (AND'10). New York: ACM, 2010: 73-80.
[7] 宋巍, 张宇, 谢毓彬, 等. 基于微博分类的用户兴趣识别[J]. 智能计算机与应用, 2013, 3(4): 80-83. (Song Wei, Zhang Yu, Xie Yubin, et al. Identifying User Interests Based on Microblog Classification [J]. Intelligent Computer and Applications, 2013, 3(4): 80-83.)
[8] 方维. 微博兴趣识别与推送系统的研究与实现[D]. 武汉: 华中科技大学, 2012. (Fang Wei. Research and Implement of Micro-blog Interest Found and Pushing System [D]. Wuhan: Huazhong University of Science and Technology, 2012.)
[9] 孙威. 微博用户兴趣挖掘与建模研究 [D]. 大连: 大连理工大学, 2012. (Sun Wei. Interest Mining and Modeling for Micro-bloggers of Micro-blog [D]. Dalian: Dalian University of Technology, 2012.)
[10] 崔争艳. 基于语义的微博短信息分类[J]. 现代计算机, 2010(8): 18-20,24. (Cui Zhengyan. Short Message Classification of Microblogging Based on Semantic [J]. Modern Computer, 2010(8): 18-20, 24.)
[11] 刘群, 李素建. 基于《知网》的词汇语义相似度计算[J]. 中文计算语言学, 2002, 7(2): 59-76. (Liu Qun, Li Sujian. Word Similarity Computing Based on How-Net [J]. Computational Linguistics and Chinese Language Processing, 2002, 7(2): 59-76.)
[12] 哈工大社会计算与信息检索研究中心.语言技术平台[EB/OL]. [2014-08-02]. http://www.ltp-cloud.com. (Harbin Institute of Technology - Research Center for Social Computing and Information Retrieval. Language Technology Platform [EB/OL]. [2014-08-02]. http://www.ltp-cloud.com.)
[13] 徐文海, 温有奎. 一种基于TFIDF方法的中文关键词抽取算法[J]. 情报理论与实践, 2008, 31(2): 298-302. (Xu Wenhai, Wen Youkui. A Chinese Keyword Extraction
Algorithm Based on TFIDF Method [J]. Information Studies: Theory & Application, 2008, 31(2): 298-302.)
[14] 哈工大社会计算与信息检索研究中心. 同义词词林扩展版[EB/OL].[2014-08-02]. http://www.ltp-cloud.com/download/. (Harbin Institute of Technology - Research Center for Social Computing and Information Retrieval. Tongyici Cilin (Extended) [EB/OL]. [2014-08-02]. http://www.ltp-cloud. com/download/.)
[15] Blei D M, Ng A Y, Jordan M I. Latent Dirichlet Allocation[J]. Journal of Machine Learning Research, 2003, 3: 993-1022.
[16] 李静月, 李培峰, 朱巧明. 一种改进的TFIDF网页关键词提取方法[J]. 计算机应用与软件, 2011, 28(5): 25-27. (Li Jingyue, Li Peifeng, Zhu Qiaoming. An Improved TFIDF- based Approach to Extract Key Words from Web Pages [J]. Computer Applications and Software, 2011, 28(5): 25-27.)