Please wait a minute...
New Technology of Library and Information Service  2015, Vol. 31 Issue (11): 51-59    DOI: 10.11925/infotech.1003-3513.2015.11.08
Current Issue | Archive | Adv Search |
Research on Topic Discovery Algoritm of Multimedia Microblog Comments Information
Ye Chuan, Ma Jing
College of Economic and Management, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China
Download: PDF(1386 KB)   HTML  
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper is aiming at discovering the topic of multimedia content such as images or videos in microblogs.[Context] The text content of multimedia microblogs is usually brief and the topic of such microblogs generally contains in its multimedia content such as images or videos, so the traditional text mining methods may not be applied to these cases.[Methods] Extend text space of the multimedia microblog through the use of hot comments. Then use LDA topic model to inference the classification and mine the topic features. Finally, express topic features of the multimedia mircoblog in the form of ‘topic tag -feature words'.[Results] Experiments by constructing the training set use 99 823 Sina microblogs collected by crawler tool set, and constructing the test set use 151 hot multimedia microblogs with all those comments. Results show that the classification directory built in this paper is complete, the topic tag infers with 88.6% accuracy, and the relevant feature word mining accuracy is 76.0%.[Conclusions] The experiment results show that the new algorithm can effectively and significantly discover topic features of multimedia microblogs.

Received: 06 July 2015      Published: 06 April 2016
:  TP391  
  G35  

Cite this article:

Ye Chuan, Ma Jing. Research on Topic Discovery Algoritm of Multimedia Microblog Comments Information. New Technology of Library and Information Service, 2015, 31(11): 51-59.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2015.11.08     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2015/V31/I11/51

[1] 中国互联网络信息中心. 中国互联网络发展状况统计报告[R/OL]. [2015-02-03]. http://www.cnnic.net.cn/hlwfzyj/hlwxzbg/ hlwtjbg/201502/P020150203548852631921.pdf. (China Internet Network Information Center. The 35th Statistical Report on the Network Development of China Internet. [R/OL]. [2015-02-03]. http://www.cnnic.net.cn/hlwfzyj/hlwxzbg/hlwtjbg/201502/ P020150203548852631921.pdf.)
[2] 百度推视觉搜索引擎欲与谷歌一较高下[J]. 中国传媒科技, 2013, 6: 2. (Baidu Launches a Visual Search Engine to Compete with Google [J]. Science & Technology for China's Mass Media, 2013, 6: 2)
[3] 洪宇, 张宇, 刘挺, 等. 话题检测与跟踪的评测及研究综述[J]. 中文信息学报, 2007, 21(6): 71-87. (Hong Yu, Zhang Yu, Liu Ting, et al. Topic Detection and Tracking Review [J]. Journal of Chinese Information Processing, 2007, 21(6): 71-87.)
[4] Liu B. Web 数据挖掘[M]. 俞勇, 薛贵荣, 韩定一, 等译. 北京: 清华大学出版社, 2009:136-141. (Liu B. Web Data Mining [M]. Translated by Yu Yong, Xue Guirong, Han Dingyi, et al. Beijing: Tsinghua University Press, 2009: 136-141.)
[5] 徐戈, 王厚峰. 自然语言处理中主题模型的发展[J]. 计算机学报, 2011, 34(8): 1423-1436. (Xu Ge, Wang Houfeng. The Development of Topic Model in Natural Language Processing [J]. Chinese Journal of Computers, 2011, 34(8): 1423-1436.)
[6] Blei D M, Ng A Y, Jordan M I. Latent Dirichlet Allocation [J]. Journal of Machine Learning Research, 2003, 3: 993-1022.
[7] Blei D M. Introduction to Probabilistic Topic Models [J]. Communications of the ACM, 2012,55(4): 77-84.
[8] Rosen-Zvi M, Griffiths T, Steyvers M, et al. The Author-Topic Model for Authors and Documents [C]. In: Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence. 2004.
[9] Zhao W X, Jiang J, Weng J, et al. Comparing Twitter and Traditional Media Using Topic Models[C]. In: Proceedings of the 33rd European Conference on Information Retrieval Research, Dublin, Ireland. Springer Berlin Heidelberg, 2011: 338-349.
[10] 张晨逸, 孙建伶, 丁轶群. 基于MB-LDA模型的微博主题挖掘[J]. 计算机研究与发展, 2011, 48(10): 1795-1802. (Zhang Chenyi, Sun Jianling, Ding Yiqun. Topic Mining for Microblog Based on MB-LDA Model [J]. Journal of Computer Research and Development, 2011, 48(10): 1795-1802.)
[11] 唐晓波, 向坤. 基于LDA模型和微博热度的热点挖掘[J]. 图书情报工作, 2014, 58(5): 58-63. (Tang Xiaobo, Xiang Kun. Hot Spot Mining Based on LDA Model and Microblog Heat [J]. Library and Information Service, 2014, 58(5): 58-63.)
[12] 谢昊, 江红.一种面向微博主题挖掘的改进LDA模型[J]. 华东师范大学学报: 自然科学版, 2013(6): 93-101. (Xie Hao, Jiang Hong. Improved LDA Model for Microblog Topic Mining [J]. Journal of East China Normal University: Natural Sciences, 2013(6): 93-101.)
[13] 史存会, 林鸿飞. 追踪事件微博报道: 一种流的动态话题模型[J]. 山东大学学报: 理学版, 2012, 47 (5): 13-18. (Shi Cunhui, Lin Hongfei. Tracking Event Microblogs: A Streaming Dynamic Topic Model [J]. Journal of Shandong University: Natural Science, 2012, 47(5): 13-18.)
[14] 刁宇峰, 杨亮, 林鸿飞. 基于LDA模型的博客垃圾评论发现[J]. 中文信息学报, 2011, 25(1): 41-47. (Diao Yufeng, Yang Liang, Lin Hongfei. LDA-Based Opinion Spam Discovering [J]. Journal of Chinese Information Processing, 2011, 25(1): 41-47.)
[15] 吕韶华, 杨亮, 林鸿飞. 基于LDA模型的餐馆评论排序[J]. 计算机工程, 2011, 37(19): 62-67. (Lv Shaohua, Yang Liang, Lin Hongfei. Ranks of Restaurant Reviews Based on LDA Model [J]. Computer Engineering, 2011, 37(19): 62-67.)
[16] 阮光册. 基于LDA的网络评论主题发现研究[J]. 情报杂志, 2014, 33(3): 161-164. (Ruan Guangce. Topic Extraction Research of Net Reviews Based on Latent Dirichlet Allocation [J]. Journal of Information, 2014, 33(3): 161-164.)
[17] Ramage D, Dumais S, Liebling D. Characterizing Microblogs with Topic Models [C]. In: Proceedings of the 4th International AAAI Conference on Weblogs and Social Media. 2010.
[18] 王宇阳. 基于本体进化的自适应中文话题跟踪算法研究[D]. 南京: 南京航空航天大学, 2013. (Wang Yuyang. Research on Algorithm of Adaptive Chinese Topic Tracking Based on Ontology Evolution [D]. Nanjing: Nanjing University of Aeronautics and Astronautics, 2013.)
[19] 单斌, 李芳. 基于种子文档 LDA 话题的演化研究[J]. 现代图书情报技术, 2011(7-8): 104-109. (Shan Bin, Li Fang. Topic Evolution Based on Seminal Document and Topic Model [J]. New Technology of Library and Information Service, 2011(7-8): 104-109.)
[20] 邓爱林, 朱扬勇, 施伯乐. 基于项目评分预测的协同过滤推荐算法[J]. 软件学报, 2003, 14(9): 1621-1628. (Deng Ailin, Zhu Yangyong, Shi Bole. A Collaborative Filtering Recommendation Algorithm Based on Item Rating Prediction [J]. Journal of Software, 2003, 14(9): 1621-1628.)

[1] Xiaofeng Li,Jing Ma,Chi Li,Hengmin Zhu. Identifying Commodity Names Based on XGBoost Model[J]. 数据分析与知识发现, 2019, 3(7): 34-41.
[2] Zhongxi You,Weina Hua,Xuelian Pan. Matching Book Reviews and Essential Sentiment Lexicons with Chinese Word Segmenters[J]. 数据分析与知识发现, 2019, 3(7): 23-33.
[3] Peng Guan,Yuefen Wang,Zhu Fu. Analyzing Topic Semantic Evolution with LDA: Case Study of Lithium Ion Batteries[J]. 数据分析与知识发现, 2019, 3(7): 61-72.
[4] Jiahui Hu,An Fang,Wanqing Zhao,Chenliu Yang,Huiling Ren. Annotating Chinese E-Medical Record for Knowledge Discovery[J]. 数据分析与知识发现, 2019, 3(7): 123-132.
[5] Beibei Kong,Jing Xie,Li Qian,Zhijun Chang,Zhenxin Wu. Methodology and Tools to Enrich Sci-Tech Big Data[J]. 数据分析与知识发现, 2019, 3(7): 113-122.
[6] Fan Xuexue, Wang Zhirong, Xu Wu, Liang Yin, Ma Xiaohu. Research on Semantic Similarity Estimation Algorithm of Medical Terminology Based on Medical Ontology[J]. 现代图书情报技术, 2015, 31(12): 57-64.
[7] Ren Haiying, Yu Liting. A Multi-strategy Method for Word Sense Disambiguation Based on Wikipedia[J]. 现代图书情报技术, 2015, 31(11): 18-25.
[8] Du Kun, Liu Huailiang, Guo Lujie. Study on the Modified Method of Feature Weighting with Complex Networks[J]. 现代图书情报技术, 2015, 31(11): 26-32.
[9] Xie Xiaqing, Wu Xu. Application of Visualization Technology for “Classic Reading” Platform[J]. 现代图书情报技术, 2015, 31(11): 96-103.
[10] He Yu, Lv Xueqiang, Xu Liping. A Chinese Term Extraction System in New Energy Vehicles Domain[J]. 现代图书情报技术, 2015, 31(10): 88-94.
[11] Du Siqi, Li Honglian, Lv Xueqiang. Research of Chinese Chunk Parsing in Application of the Product Feature Extraction[J]. 现代图书情报技术, 2015, 31(9): 26-30.
[12] Xu Deshan, Li Hui, Zhang Yunliang. A Method of Keywords Annotation Based on Linked Triples[J]. 现代图书情报技术, 2015, 31(9): 31-37.
[13] Dun Wenjie, Sun Yigang, Zhu Xianzhong. Design and Realization of Multimedia Document Structure of Internet TV[J]. 现代图书情报技术, 2015, 31(9): 82-89.
[14] Chen Shiqin, Li Wenjiang. Application of WebSocket in Library Mobile Information Service[J]. 现代图书情报技术, 2015, 31(9): 90-96.
[15] Tong Guoping, Sun Jianjun. User Behavior Analysis Based on Search Engine Log[J]. 现代图书情报技术, 2015, 31(7-8): 80-88.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn