Please wait a minute...
New Technology of Library and Information Service  2011, Vol. 27 Issue (12): 52-57    DOI: 10.11925/infotech.1003-3513.2011.12.08
Current Issue | Archive | Adv Search |
Study on Web Topic Online Clustering Approach Based on Single-Pass Algorithm
Zhu Hengmin1,2, Zhu Weiwei2
1. Department of Information Management, Nanjing University, Nanjing 210093, China;
2. College of Economics & Management, Nanjing University of Posts & Telecommunications, Nanjing 210046, China
Download: PDF(719 KB)   HTML  
Export: BibTeX | EndNote (RIS)      
Abstract  In order to get dynamics of Web information timely, an online Web topic clustering approach based on Single-Pass algorithm is researched. The clustering process of this approach is analyzed firstly,and the key problems including extracting and weight calculating of features as well as representation and modification of topic cluster are deliberated. Experiment is designed to compare the effects of different weight factor of features in title, weight calculating and normalizing methods of features and the vector dimension of topic cluster on cluster quality and time efficiency.
Key wordsInternet public opinion      Topic mining      Online clustering      Single-Pass     
Received: 26 September 2011      Published: 02 February 2012
: 

G353.1

 

Cite this article:

Zhu Hengmin, Zhu Weiwei. Study on Web Topic Online Clustering Approach Based on Single-Pass Algorithm. New Technology of Library and Information Service, 2011, 27(12): 52-57.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2011.12.08     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2011/V27/I12/52

[1] 王伟,许鑫. 基于聚类的网络舆情热点发现及分析[J]. 现代图书情报技术,2009(3):74-79.

[2] Trieschnigg D, Kraaij W. Scalable Hierarchical Topic Detection: Exploring a Sample Based Approach[C]. In: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval,Salvador,Brazil.2005:655-656.

[3] 朱恒民,马静,黄卫东.基于领域本体的SOM文本逐层聚类方法[J].情报学报,2008,27(6):845-850.

[4] Guha S, Mishra N, Motwani R, et al. Clustering Data Streams[C].In:Proceedings of the Annual Symposium on Foundations of Computer Science.2000:359-366.

[5] Gupta C, Grossman R L. GenIc: A Single Pass Generalized Incremental Algorithm for Clustering[C].In:Proceedings of the 2004 SIAM International Conference on Data Mining, Philadelphia.2004:137-153.

[6] 税仪冬,瞿有利,黄厚宽.周期分类和Single-Pass聚类相结合的话题识别与跟踪方法[J].北京交通大学学报,2009,33(5):85-89.

[7] 胡迁乔. 面向中文论坛的网络舆情倾向性分析[D]. 武汉:华中科技大学,2009.
[1] Yanshuang Mei,Hengmin Zhu,Jing Wei. A Study on the Mechanism of Media Collaboration on the Spread of Internet Public Opinion[J]. 数据分析与知识发现, 2019, 3(2): 65-71.
[2] Longjia Jia,Bangzuo Zhang. Classifying Topics of Internet Public Opinion from College Students: Case Study of Sina Weibo[J]. 数据分析与知识发现, 2018, 2(7): 55-62.
[3] Shuyi Wang,Huatao Liao,Chake Wu. Mining News on Competitors with Sentiment Classification[J]. 数据分析与知识发现, 2018, 2(3): 70-78.
[4] Jingqi Wang,Rui Li,Huayi Wu. The Evolution of Online Public Opinion Based on Spatial Autocorrelation[J]. 数据分析与知识发现, 2018, 2(2): 64-73.
[5] Huang Wei,Yu Hui,Li Yuefeng. Review of Online Anti-terrorism Research in China[J]. 现代图书情报技术, 2016, 32(11): 1-10.
[6] Yang Haixia,Gao Baojun,Sun Hanlin. Extracting Topics of Computer Science Literature with LDA Model[J]. 现代图书情报技术, 2016, 32(11): 20-26.
[7] Zhang Lifan, Zhao Kai. Study on the Internet Public Opinion Dissemination Model with Discussion Under the Effect of Media[J]. 现代图书情报技术, 2015, 31(11): 60-67.
[8] Li Gang, Mao Jin, Chen Jinghao. Fast Duplicate Detection for Chinese Texts Based on Semantic Fingerprint[J]. 现代图书情报技术, 2013, 29(9): 41-47.
[9] Zhu Hengmin, Liu Kai, Lu Zifang. Study on Topic Propagation Model of Internet Public Opinion Under the Influence of the Media[J]. 现代图书情报技术, 2013, 29(3): 45-50.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn