Please wait a minute...
New Technology of Library and Information Service  2015, Vol. 31 Issue (11): 51-59    DOI: 10.11925/infotech.1003-3513.2015.11.08
Current Issue | Archive | Adv Search |
Research on Topic Discovery Algoritm of Multimedia Microblog Comments Information
Ye Chuan, Ma Jing
College of Economic and Management, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China
Download:
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper is aiming at discovering the topic of multimedia content such as images or videos in microblogs.[Context] The text content of multimedia microblogs is usually brief and the topic of such microblogs generally contains in its multimedia content such as images or videos, so the traditional text mining methods may not be applied to these cases.[Methods] Extend text space of the multimedia microblog through the use of hot comments. Then use LDA topic model to inference the classification and mine the topic features. Finally, express topic features of the multimedia mircoblog in the form of ‘topic tag -feature words'.[Results] Experiments by constructing the training set use 99 823 Sina microblogs collected by crawler tool set, and constructing the test set use 151 hot multimedia microblogs with all those comments. Results show that the classification directory built in this paper is complete, the topic tag infers with 88.6% accuracy, and the relevant feature word mining accuracy is 76.0%.[Conclusions] The experiment results show that the new algorithm can effectively and significantly discover topic features of multimedia microblogs.

Received: 06 July 2015      Published: 06 April 2016
:  TP391  
  G35  

Cite this article:

Ye Chuan, Ma Jing. Research on Topic Discovery Algoritm of Multimedia Microblog Comments Information. New Technology of Library and Information Service, 2015, 31(11): 51-59.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2015.11.08     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2015/V31/I11/51

[1] 中国互联网络信息中心. 中国互联网络发展状况统计报告[R/OL]. [2015-02-03]. http://www.cnnic.net.cn/hlwfzyj/hlwxzbg/ hlwtjbg/201502/P020150203548852631921.pdf. (China Internet Network Information Center. The 35th Statistical Report on the Network Development of China Internet. [R/OL]. [2015-02-03]. http://www.cnnic.net.cn/hlwfzyj/hlwxzbg/hlwtjbg/201502/ P020150203548852631921.pdf.)
[2] 百度推视觉搜索引擎欲与谷歌一较高下[J]. 中国传媒科技, 2013, 6: 2. (Baidu Launches a Visual Search Engine to Compete with Google [J]. Science & Technology for China's Mass Media, 2013, 6: 2)
[3] 洪宇, 张宇, 刘挺, 等. 话题检测与跟踪的评测及研究综述[J]. 中文信息学报, 2007, 21(6): 71-87. (Hong Yu, Zhang Yu, Liu Ting, et al. Topic Detection and Tracking Review [J]. Journal of Chinese Information Processing, 2007, 21(6): 71-87.)
[4] Liu B. Web 数据挖掘[M]. 俞勇, 薛贵荣, 韩定一, 等译. 北京: 清华大学出版社, 2009:136-141. (Liu B. Web Data Mining [M]. Translated by Yu Yong, Xue Guirong, Han Dingyi, et al. Beijing: Tsinghua University Press, 2009: 136-141.)
[5] 徐戈, 王厚峰. 自然语言处理中主题模型的发展[J]. 计算机学报, 2011, 34(8): 1423-1436. (Xu Ge, Wang Houfeng. The Development of Topic Model in Natural Language Processing [J]. Chinese Journal of Computers, 2011, 34(8): 1423-1436.)
[6] Blei D M, Ng A Y, Jordan M I. Latent Dirichlet Allocation [J]. Journal of Machine Learning Research, 2003, 3: 993-1022.
[7] Blei D M. Introduction to Probabilistic Topic Models [J]. Communications of the ACM, 2012,55(4): 77-84.
[8] Rosen-Zvi M, Griffiths T, Steyvers M, et al. The Author-Topic Model for Authors and Documents [C]. In: Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence. 2004.
[9] Zhao W X, Jiang J, Weng J, et al. Comparing Twitter and Traditional Media Using Topic Models[C]. In: Proceedings of the 33rd European Conference on Information Retrieval Research, Dublin, Ireland. Springer Berlin Heidelberg, 2011: 338-349.
[10] 张晨逸, 孙建伶, 丁轶群. 基于MB-LDA模型的微博主题挖掘[J]. 计算机研究与发展, 2011, 48(10): 1795-1802. (Zhang Chenyi, Sun Jianling, Ding Yiqun. Topic Mining for Microblog Based on MB-LDA Model [J]. Journal of Computer Research and Development, 2011, 48(10): 1795-1802.)
[11] 唐晓波, 向坤. 基于LDA模型和微博热度的热点挖掘[J]. 图书情报工作, 2014, 58(5): 58-63. (Tang Xiaobo, Xiang Kun. Hot Spot Mining Based on LDA Model and Microblog Heat [J]. Library and Information Service, 2014, 58(5): 58-63.)
[12] 谢昊, 江红.一种面向微博主题挖掘的改进LDA模型[J]. 华东师范大学学报: 自然科学版, 2013(6): 93-101. (Xie Hao, Jiang Hong. Improved LDA Model for Microblog Topic Mining [J]. Journal of East China Normal University: Natural Sciences, 2013(6): 93-101.)
[13] 史存会, 林鸿飞. 追踪事件微博报道: 一种流的动态话题模型[J]. 山东大学学报: 理学版, 2012, 47 (5): 13-18. (Shi Cunhui, Lin Hongfei. Tracking Event Microblogs: A Streaming Dynamic Topic Model [J]. Journal of Shandong University: Natural Science, 2012, 47(5): 13-18.)
[14] 刁宇峰, 杨亮, 林鸿飞. 基于LDA模型的博客垃圾评论发现[J]. 中文信息学报, 2011, 25(1): 41-47. (Diao Yufeng, Yang Liang, Lin Hongfei. LDA-Based Opinion Spam Discovering [J]. Journal of Chinese Information Processing, 2011, 25(1): 41-47.)
[15] 吕韶华, 杨亮, 林鸿飞. 基于LDA模型的餐馆评论排序[J]. 计算机工程, 2011, 37(19): 62-67. (Lv Shaohua, Yang Liang, Lin Hongfei. Ranks of Restaurant Reviews Based on LDA Model [J]. Computer Engineering, 2011, 37(19): 62-67.)
[16] 阮光册. 基于LDA的网络评论主题发现研究[J]. 情报杂志, 2014, 33(3): 161-164. (Ruan Guangce. Topic Extraction Research of Net Reviews Based on Latent Dirichlet Allocation [J]. Journal of Information, 2014, 33(3): 161-164.)
[17] Ramage D, Dumais S, Liebling D. Characterizing Microblogs with Topic Models [C]. In: Proceedings of the 4th International AAAI Conference on Weblogs and Social Media. 2010.
[18] 王宇阳. 基于本体进化的自适应中文话题跟踪算法研究[D]. 南京: 南京航空航天大学, 2013. (Wang Yuyang. Research on Algorithm of Adaptive Chinese Topic Tracking Based on Ontology Evolution [D]. Nanjing: Nanjing University of Aeronautics and Astronautics, 2013.)
[19] 单斌, 李芳. 基于种子文档 LDA 话题的演化研究[J]. 现代图书情报技术, 2011(7-8): 104-109. (Shan Bin, Li Fang. Topic Evolution Based on Seminal Document and Topic Model [J]. New Technology of Library and Information Service, 2011(7-8): 104-109.)
[20] 邓爱林, 朱扬勇, 施伯乐. 基于项目评分预测的协同过滤推荐算法[J]. 软件学报, 2003, 14(9): 1621-1628. (Deng Ailin, Zhu Yangyong, Shi Bole. A Collaborative Filtering Recommendation Algorithm Based on Item Rating Prediction [J]. Journal of Software, 2003, 14(9): 1621-1628.)

[1] Wang Hong, Shu Zhan, Gao Yinquan, Tian Wenhong. Analyzing Implicit Discourse Relation with Single Classifier and Multi-Task Network[J]. 数据分析与知识发现, 2021, 5(11): 80-88.
[2] Wu Yanwen, Cai Qiuting, Liu Zhi, Deng Yunze. Digital Resource Recommendation Based on Multi-Source Data and Scene Similarity Calculation[J]. 数据分析与知识发现, 2021, 5(11): 114-123.
[3] Li Zhenyu, Li Shuqing. Deep Collaborative Filtering Algorithm with Embedding Implicit Similarity Groups[J]. 数据分析与知识发现, 2021, 5(11): 124-134.
[4] Dong Miao, Su Zhongqi, Zhou Xiaobei, Lan Xue, Cui Zhigang, Cui Lei. Improving PubMedBERT for CID-Entity-Relation Classification Using Text-CNN[J]. 数据分析与知识发现, 2021, 5(11): 145-152.
[5] Yu Chuanming, Zhang Zhengang, Kong Lingge. Comparing Knowledge Graph Representation Models for Link Prediction[J]. 数据分析与知识发现, 2021, 5(11): 29-44.
[6] Ding Hao, Ai Wenhua, Hu Guangwei, Li Shuqing, Suo Wei. A Personalized Recommendation Model with Time Series Fluctuation of User Interest[J]. 数据分析与知识发现, 2021, 5(11): 45-58.
[7] Hua Bin, Wu Nuo, He Xin. Integrating Expert Reviews for Government Information Projects with Knowledge Fusion[J]. 数据分析与知识发现, 2021, 5(10): 124-136.
[8] Wang Yuan, Shi Kaize, Niu Zhendong. Position-Aware Stepwise Tagging Method for Triples Extraction of Entity-Relationship[J]. 数据分析与知识发现, 2021, 5(10): 71-80.
[9] Yang Chen, Chen Xiaohong, Wang Chuhan, Liu Tingting. Recommendation Strategy Based on Users’ Preferences for Fine-Grained Attributes[J]. 数据分析与知识发现, 2021, 5(10): 94-102.
[10] Dai Zhihong, Hao Xiaoling. Extracting Hypernym-Hyponym Relationship for Financial Market Applications[J]. 数据分析与知识发现, 2021, 5(10): 60-70.
[11] Wang Xuefeng, Ren Huichao, Liu Yuqin. Research on the Visualization Method of Drawing Technology Theme Map with Clusters [J]. 数据分析与知识发现, 0, (): 1-.
[12] Wang Yifan,Li Bo,Shi Hua,Miao Wei,Jiang Bin. Annotation Method for Extracting Entity Relationship from Ancient Chinese Works[J]. 数据分析与知识发现, 2021, 5(9): 63-74.
[13] Che Hongxin,Wang Tong,Wang Wei. Comparing Prediction Models for Prostate Cancer[J]. 数据分析与知识发现, 2021, 5(9): 107-114.
[14] Zhou Yang,Li Xuejun,Wang Donglei,Chen Fang,Peng Lijuan. Visualizing Knowledge Graph for Explosive Formula Design[J]. 数据分析与知识发现, 2021, 5(9): 42-53.
[15] Ma Jiangwei, Lv Xueqiang, You Xindong, Xiao Gang, Han Junmei. Extracting Relationship Among Military Domains with BERT and Relation Position Features[J]. 数据分析与知识发现, 2021, 5(8): 1-12.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn