[Objective] This paper is aiming at discovering the topic of multimedia content such as images or videos in microblogs.[Context] The text content of multimedia microblogs is usually brief and the topic of such microblogs generally contains in its multimedia content such as images or videos, so the traditional text mining methods may not be applied to these cases.[Methods] Extend text space of the multimedia microblog through the use of hot comments. Then use LDA topic model to inference the classification and mine the topic features. Finally, express topic features of the multimedia mircoblog in the form of ‘topic tag -feature words'.[Results] Experiments by constructing the training set use 99 823 Sina microblogs collected by crawler tool set, and constructing the test set use 151 hot multimedia microblogs with all those comments. Results show that the classification directory built in this paper is complete, the topic tag infers with 88.6% accuracy, and the relevant feature word mining accuracy is 76.0%.[Conclusions] The experiment results show that the new algorithm can effectively and significantly discover topic features of multimedia microblogs.
叶川, 马静. 多媒体微博评论信息的主题发现算法研究[J]. 现代图书情报技术, 2015, 31(11): 51-59.
Ye Chuan, Ma Jing. Research on Topic Discovery Algoritm of Multimedia Microblog Comments Information. New Technology of Library and Information Service, 2015, 31(11): 51-59.
[1] 中国互联网络信息中心. 中国互联网络发展状况统计报告[R/OL]. [2015-02-03]. http://www.cnnic.net.cn/hlwfzyj/hlwxzbg/ hlwtjbg/201502/P020150203548852631921.pdf. (China Internet Network Information Center. The 35th Statistical Report on the Network Development of China Internet. [R/OL]. [2015-02-03]. http://www.cnnic.net.cn/hlwfzyj/hlwxzbg/hlwtjbg/201502/ P020150203548852631921.pdf.)
[2] 百度推视觉搜索引擎欲与谷歌一较高下[J]. 中国传媒科技, 2013, 6: 2. (Baidu Launches a Visual Search Engine to Compete with Google [J]. Science & Technology for China's Mass Media, 2013, 6: 2)
[3] 洪宇, 张宇, 刘挺, 等. 话题检测与跟踪的评测及研究综述[J]. 中文信息学报, 2007, 21(6): 71-87. (Hong Yu, Zhang Yu, Liu Ting, et al. Topic Detection and Tracking Review [J]. Journal of Chinese Information Processing, 2007, 21(6): 71-87.)
[4] Liu B. Web 数据挖掘[M]. 俞勇, 薛贵荣, 韩定一, 等译. 北京: 清华大学出版社, 2009:136-141. (Liu B. Web Data Mining [M]. Translated by Yu Yong, Xue Guirong, Han Dingyi, et al. Beijing: Tsinghua University Press, 2009: 136-141.)
[5] 徐戈, 王厚峰. 自然语言处理中主题模型的发展[J]. 计算机学报, 2011, 34(8): 1423-1436. (Xu Ge, Wang Houfeng. The Development of Topic Model in Natural Language Processing [J]. Chinese Journal of Computers, 2011, 34(8): 1423-1436.)
[6] Blei D M, Ng A Y, Jordan M I. Latent Dirichlet Allocation [J]. Journal of Machine Learning Research, 2003, 3: 993-1022.
[7] Blei D M. Introduction to Probabilistic Topic Models [J]. Communications of the ACM, 2012,55(4): 77-84.
[8] Rosen-Zvi M, Griffiths T, Steyvers M, et al. The Author-Topic Model for Authors and Documents [C]. In: Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence. 2004.
[9] Zhao W X, Jiang J, Weng J, et al. Comparing Twitter and Traditional Media Using Topic Models[C]. In: Proceedings of the 33rd European Conference on Information Retrieval Research, Dublin, Ireland. Springer Berlin Heidelberg, 2011: 338-349.
[10] 张晨逸, 孙建伶, 丁轶群. 基于MB-LDA模型的微博主题挖掘[J]. 计算机研究与发展, 2011, 48(10): 1795-1802. (Zhang Chenyi, Sun Jianling, Ding Yiqun. Topic Mining for Microblog Based on MB-LDA Model [J]. Journal of Computer Research and Development, 2011, 48(10): 1795-1802.)
[11] 唐晓波, 向坤. 基于LDA模型和微博热度的热点挖掘[J]. 图书情报工作, 2014, 58(5): 58-63. (Tang Xiaobo, Xiang Kun. Hot Spot Mining Based on LDA Model and Microblog Heat [J]. Library and Information Service, 2014, 58(5): 58-63.)
[12] 谢昊, 江红.一种面向微博主题挖掘的改进LDA模型[J]. 华东师范大学学报: 自然科学版, 2013(6): 93-101. (Xie Hao, Jiang Hong. Improved LDA Model for Microblog Topic Mining [J]. Journal of East China Normal University: Natural Sciences, 2013(6): 93-101.)
[13] 史存会, 林鸿飞. 追踪事件微博报道: 一种流的动态话题模型[J]. 山东大学学报: 理学版, 2012, 47 (5): 13-18. (Shi Cunhui, Lin Hongfei. Tracking Event Microblogs: A Streaming Dynamic Topic Model [J]. Journal of Shandong University: Natural Science, 2012, 47(5): 13-18.)
[14] 刁宇峰, 杨亮, 林鸿飞. 基于LDA模型的博客垃圾评论发现[J]. 中文信息学报, 2011, 25(1): 41-47. (Diao Yufeng, Yang Liang, Lin Hongfei. LDA-Based Opinion Spam Discovering [J]. Journal of Chinese Information Processing, 2011, 25(1): 41-47.)
[15] 吕韶华, 杨亮, 林鸿飞. 基于LDA模型的餐馆评论排序[J]. 计算机工程, 2011, 37(19): 62-67. (Lv Shaohua, Yang Liang, Lin Hongfei. Ranks of Restaurant Reviews Based on LDA Model [J]. Computer Engineering, 2011, 37(19): 62-67.)
[16] 阮光册. 基于LDA的网络评论主题发现研究[J]. 情报杂志, 2014, 33(3): 161-164. (Ruan Guangce. Topic Extraction Research of Net Reviews Based on Latent Dirichlet Allocation [J]. Journal of Information, 2014, 33(3): 161-164.)
[17] Ramage D, Dumais S, Liebling D. Characterizing Microblogs with Topic Models [C]. In: Proceedings of the 4th International AAAI Conference on Weblogs and Social Media. 2010.
[18] 王宇阳. 基于本体进化的自适应中文话题跟踪算法研究[D]. 南京: 南京航空航天大学, 2013. (Wang Yuyang. Research on Algorithm of Adaptive Chinese Topic Tracking Based on Ontology Evolution [D]. Nanjing: Nanjing University of Aeronautics and Astronautics, 2013.)
[19] 单斌, 李芳. 基于种子文档 LDA 话题的演化研究[J]. 现代图书情报技术, 2011(7-8): 104-109. (Shan Bin, Li Fang. Topic Evolution Based on Seminal Document and Topic Model [J]. New Technology of Library and Information Service, 2011(7-8): 104-109.)
[20] 邓爱林, 朱扬勇, 施伯乐. 基于项目评分预测的协同过滤推荐算法[J]. 软件学报, 2003, 14(9): 1621-1628. (Deng Ailin, Zhu Yangyong, Shi Bole. A Collaborative Filtering Recommendation Algorithm Based on Item Rating Prediction [J]. Journal of Software, 2003, 14(9): 1621-1628.)