Please wait a minute...
New Technology of Library and Information Service  2015, Vol. 31 Issue (5): 57-64    DOI: 10.11925/infotech.1003-3513.2015.05.08
Current Issue | Archive | Adv Search |
Microblog Hotspot Detection Based on Semantic Analysis and Similarity Strength
Wu Ni, Zhao Pengwei, Qin Chunxiu
School of Economics and Management, Xidian University, Xi'an 710071, China
Download: PDF(562 KB)   HTML  
Export: BibTeX | EndNote (RIS)      
Abstract  

Abstract: [Objective] Improve the method of hotspot detection to solve the lack of semantic understanding and the limitation of clustering algorithm in the traditional method of microblog hotspot. [Methods] This paper uses the Information Gain and the Latent Semantic Analysis as the way to construct a word-document matrix, then, the two-step clustering algorithm is put up which uses an improved K-means algorithm in hotspot detection as well as incremental clustering algorithm in hotspot refreshing. Meanwhile, similarity strength is adopted to solve the low accuracy of traditional method in which the number of hot topics is firstly determined and then the topic is detected. [Results] Compared with previous methods, the recall ratio of presented method is 91.3% and the precision ratio is 92.9%, clustering effect increased. It also can update data to reduce the complexity of the experiment. [Limitations] The experimental data has a small time span making the effect of update hotspot is not outstanding. [Conclusions] Experimental results show that the proposed method has good accuracy.

Key wordsLatent semantic analysis      Similarity strength      Two-step clustering      Hotspot detection     
Received: 17 November 2014      Published: 11 June 2015
:  G353  

Cite this article:

Wu Ni, Zhao Pengwei, Qin Chunxiu. Microblog Hotspot Detection Based on Semantic Analysis and Similarity Strength. New Technology of Library and Information Service, 2015, 31(5): 57-64.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2015.05.08     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2015/V31/I5/57

[1] 张琳. 我国微博的发展研究 [D]. 南昌: 江西财经大学, 2012. (Zhang Lin. The Development Research of Microblog in China [D]. Nanchang: Jiangxi University of Finance and Ecnomics, 2012.)
[2] 唐晓波, 王洪艳. 基于潜在语义分析的微博主题挖掘模型研究[J]. 图书情报工作, 2012, 56(24): 114-119. (Tang Xiaobo, Wang Hongyan. Microblog Topic Mining Model Based on Latent Semantic Analysis [J]. Library and Information Service, 2012, 56(24): 114-119. )
[3] 丁若尧. 基于博客的网络话题发现及追踪的研究 [D]. 北京: 北京交通大学, 2011. (Ding Ruoyao. Research on Internet Topic Detection and Tracking Based on Blog [D]. Beijing: Beijing Jiaotong University, 2011.)
[4] 孙胜平. 中文微博客热点话题检测与跟踪技术研究 [D]. 北京: 北京交通大学, 2011. (Sun Shengping. Research on Chinese Micro-Blog Hot Topic Detection and Tracking [D]. Beijing: Beijing Jiaotong University, 2011.)
[5] 李劲, 张华, 吴浩雄, 等. 基于特定领域的中文微博热点话题挖掘系统BTopicMiner [J]. 计算机应用, 2012, 32(8): 2346-2349. (Li Jin, Zhang Hua, Wu Haoxiong, et al. BTopicMiner: Domain-specific Topic Mining System for Chinese Microblog [J]. Journal of Computer Applications, 2012, 32(8): 2346-2349.)
[6] 马雯雯, 魏文晗, 邓一贵. 基于隐含语义分析的微博话题发现方法[J]. 计算机工程与应用, 2014, 50(1): 96-100. (Ma Wenwen, Wei Wenhan, Deng Yigui. Micro-blog Topic Detection Method Based on Latent Semantic Analysis [J]. Computer Engineering and Applications, 2014, 50(1): 96-100.)
[7] 马雯雯. 基于隐含语义分析的微博热点话题发现策略[D]. 重庆: 重庆大学, 2013. (Ma Wenwen. Hot Topic Detection Strategy of Micro-blog Based on Latent Semantic Analysis [D]. Chongqing: Chongqing University, 2013.)
[8] 杨长春, 周猛, 叶施仁, 等. 基于改进CURE算法的微博热点话题发现[J]. 计算机仿真, 2013, 30(11): 383-387. (Yang Changchun, Zhou Meng, Ye Shiren, et al. An Improved Hot Topic Detection Method for Microblog Based on CURE Algorithm [J]. Computer Simulation, 2013, 30(11): 383-387.)
[9] 黄波. 基于向量空间模型和LDA模型相结合的微博客话题发现算法研究[D]. 成都: 西南交通大学, 2012. (Huang Bo. Research on Microblog Topic Detection Based on VSM Model and LDA Model [D]. Chengdu: Southwest Jiaotong University, 2012.)
[10] Allan J. Introduction to Topic Detection and Tracking [A]//Allan J. Topic Detection and Tracking [M]. New York: Springer US, 2002.
[11] 于满泉, 骆卫华, 许洪波,等. 话题识别与跟踪中的层次化话题识别技术研究 [J]. 计算机研究与发展, 2006, 43(3): 489-495. (Yu Manquan, Luo Weihua, Xu Hongbo, et al. Research on Hierarchical Topic Detection in Topic Detection and Tracking [J]. Journal of Computer Research and Development, 2006, 43(3): 489-495.)
[12] 洪宇, 张宇, 刘挺, 等. 话题检测与跟踪的评测及研究综述[J]. 中文信息学报, 2007, 21(6): 71-87.(Hong Yu, Zhang Yu, Liu Ting, et al. Topic Detection and Tracking Review[J]. Journal of Chinese Information Processing, 2007, 21(6): 71-87.)
[13] 丁伟莉. 中文Blog热门话题检测与跟踪技术研究 [D]. 哈尔滨: 哈尔滨工业大学, 2007. (Ding Weili. Research on Chinese Blog Hot Topic Detection and Tracking [D]. Harbin: Harbin Institute of Technology, 2007.)
[14] 姚海波. 微博热点话题检测与趋势预测研究 [D]. 广州: 华南理工大学, 2013. (Yao Haibo. Detection and Trend Prediction Research of Hot Topic of Micro-Blogging [D]. Guangzhou: South China University of Technology, 2013.)
[15] 李永道. 微博热点话题发现方法研究 [D]. 南京: 南京师范大学, 2013. (Li Yongdao. Research on Hot Topic Detection Methods for Microblog [D]. Nanjing: Nanjing Normal University, 2013.)
[16] 雷震, 吴玲达, 雷蕾, 等. 初始化类中心的增量K均值法及其在新闻事件探测中的应用 [J]. 情报学报, 2006, 25(3): 289-295. (Lei Zhen, Wu Lingda, Lei Lei, et al. Incremental K-means Method Based on Initialisation of Cluster Centers and Its Application in News Event Detection [J]. Journal of the China Society for Scientific and Technical Information, 2006, 25(3): 289-295.)
[17] 王伟, 许鑫. 基于聚类的网络舆情热点发现及分析[J]. 现代图书情报技术, 2009(3): 74-79. (Wang Wei, Xu Xin. Online Public Opinion Hotspot Detection and Analysis Based on Document Clustering [J]. New Technology of Library and Information Service, 2009(3): 74-79. )
[18] 张洋, 何楚杰, 段俊文, 等. 微博舆情热点分析系统设计研究[J]. 信息网络安全, 2012(9): 60-64. (Zhang Yang, He Chujie, Duan Junwen, et al. Public Opinion Hotspot Analysis System Design About Microblog [J]. Netinfo Security, 2012(9): 60-64. )
[19] 张乐, 祁超. 网络论坛热点话题的关注度预测[J]. 计算机与数字工程, 2013, 41(5): 772-774, 861. (Zhang Le, Qi Chao. Prediction of the Attention of Internet Forum Hot Topics [D]. Computer and Digital Engineering, 2013, 41(5): 772-774, 861.)
[20] 税仪冬, 瞿有利, 黄厚宽. 周期分类和Single-Pass聚类相结合的话题识别与跟踪方法[J]. 北京交通大学学报, 2009, 33(5): 85-89. (Shui Yidong, Qu Youli, Huang Houkuan. A New Topic Detection and Tracking Approach Combining Periodic Classification and Single-Pass Clustering [J]. Journal of Beijing Jiaotong University, 2009, 33(5): 85-89.)
[21] 殷风景, 肖卫东, 葛斌, 等. 一种面向网络话题发现的增量文本聚类算法[J]. 计算机应用研究, 2011, 28(1): 54-57. (Yin Fengjing, Xiao Weidong, Ge Bin, et al. Incremental Algorithm for Clustering Texts in Internet-oriented Topic Detection [J]. Application Research of Computers, 2011, 28(1): 54-57. )
[22] 王伟, 张晶涛, 柴天佑. PID参数先进整定方法综述[J]. 自动化学报, 2000, 26(3): 347-355. (Wang Wei, Zhang Jingtao, Chai Tianyou. A Survey of Advanced PID Parameter Tuning Methods [J]. Acta Automatica Sinica, 2000, 26(3): 347-355. )
[23] 庞剑锋, 卜东波, 白硕. 基于向量空间模型的文本自动分类系统的研究与实现 [J]. 计算机应用研究, 2001(9): 23-26. (Pang Jianfeng, Bu Dongbo, Bai Shuo. Research and Implementation of Text Categorization System Based on VSM [J]. Application Research of Computers, 2001(9): 23-26. )
[24] 周水庚, 关佶红, 胡运发. 隐含语义索引及其在中文文本处理中的应用研究[J]. 小型微型计算机系统, 2001, 22(2): 239-243. (Zhou Shuigeng, Guan Jiehong, Hu Yunfa. Latent Semantic Indexing(LSI) and Its Applications in Chinese Text Processing [J]. Mini-Micro System, 2001, 22(2): 239-243.)
[25] 万源. 基于语义统计分析的网络舆情挖掘技术研究[D]. 武汉: 武汉理工大学, 2012. (Wan Yuan. Research on Mining of Internet Public Opinion Based on Semantic and Statistic Analysis [D]. Wuhan: Wuhan University of Technology, 2012.)
[26] Chen H, Jin H. Finding and Evaluating the Community Structure in Semantic Peer-to-Peer Overlay Networks [J]. Science China: Information Sciences, 2011, 54(7): 1340-1351.

[1] Shihai Tian,Deli Lyu. An Early Warning Algorithm for Public Opinion of Safety Emergency[J]. 数据分析与知识发现, 2017, 1(2): 11-18.
[2] Zhao Yiping,Bi Qiang. Using Linked Data to Retrieve Similar Documents from the Academic Resource Websites[J]. 现代图书情报技术, 2016, 32(3): 41-49.
[3] Li Guolei, Chen Xianlai, Xia Dong, Yang Rong. Latent Semantic Analysis of Electronic Medical Record Text for Clinical Decision Making[J]. 数据分析与知识发现, 2016, 32(3): 50-57.
[4] Xia Dong, Xiao Xiaodan, Li Guolei, Chen Xianlai. Research on Correspondence Between Keyword and Chinese Library Classification Based on Latent Semantic Analysis[J]. 现代图书情报技术, 2014, 30(12): 92-96.
[5] Liu Sa Zhang Chengzhi. Survey of Multilingual Document Representation[J]. 现代图书情报技术, 2010, 26(6): 33-41.
[6] Wang Song,Dai Yisheng,Li Baozhen. Explore Network Resource Topics from Social Annotations System Based on PLSA[J]. 现代图书情报技术, 2010, 26(3): 47-51.
[7] Wang Wei,Xu Xin. Online Public Opinion Hotspot Detection and Analysis Based on Document Clustering[J]. 现代图书情报技术, 2009, 3(3): 74-79.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn