Please wait a minute...
New Technology of Library and Information Service  2011, Vol. 27 Issue (12): 52-57    DOI: 10.11925/infotech.1003-3513.2011.12.08
Current Issue | Archive | Adv Search |
Study on Web Topic Online Clustering Approach Based on Single-Pass Algorithm
Zhu Hengmin1,2, Zhu Weiwei2
1. Department of Information Management, Nanjing University, Nanjing 210093, China;
2. College of Economics & Management, Nanjing University of Posts & Telecommunications, Nanjing 210046, China
Download:
Export: BibTeX | EndNote (RIS)      
Abstract  In order to get dynamics of Web information timely, an online Web topic clustering approach based on Single-Pass algorithm is researched. The clustering process of this approach is analyzed firstly,and the key problems including extracting and weight calculating of features as well as representation and modification of topic cluster are deliberated. Experiment is designed to compare the effects of different weight factor of features in title, weight calculating and normalizing methods of features and the vector dimension of topic cluster on cluster quality and time efficiency.
Key wordsInternet public opinion      Topic mining      Online clustering      Single-Pass     
Received: 26 September 2011      Published: 02 February 2012
: 

G353.1

 

Cite this article:

Zhu Hengmin, Zhu Weiwei. Study on Web Topic Online Clustering Approach Based on Single-Pass Algorithm. New Technology of Library and Information Service, 2011, 27(12): 52-57.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2011.12.08     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2011/V27/I12/52

[1] 王伟,许鑫. 基于聚类的网络舆情热点发现及分析[J]. 现代图书情报技术,2009(3):74-79.

[2] Trieschnigg D, Kraaij W. Scalable Hierarchical Topic Detection: Exploring a Sample Based Approach[C]. In: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval,Salvador,Brazil.2005:655-656.

[3] 朱恒民,马静,黄卫东.基于领域本体的SOM文本逐层聚类方法[J].情报学报,2008,27(6):845-850.

[4] Guha S, Mishra N, Motwani R, et al. Clustering Data Streams[C].In:Proceedings of the Annual Symposium on Foundations of Computer Science.2000:359-366.

[5] Gupta C, Grossman R L. GenIc: A Single Pass Generalized Incremental Algorithm for Clustering[C].In:Proceedings of the 2004 SIAM International Conference on Data Mining, Philadelphia.2004:137-153.

[6] 税仪冬,瞿有利,黄厚宽.周期分类和Single-Pass聚类相结合的话题识别与跟踪方法[J].北京交通大学学报,2009,33(5):85-89.

[7] 胡迁乔. 面向中文论坛的网络舆情倾向性分析[D]. 武汉:华中科技大学,2009.
[1] Ma Yingxue,Zhao Jichang. Patterns and Evolution of Public Opinion on Weibo During Natural Disasters: Case Study of Typhoons and Rainstorms[J]. 数据分析与知识发现, 2021, 5(6): 66-79.
[2] Ding Shengchun,Yu Fengyang,Li Zhen. Identifying Potential Trending Topics of Online Public Opinion[J]. 数据分析与知识发现, 2020, 4(2/3): 29-38.
[3] Huang Wei,Zhao Jiangyuan,Yan Lu. Empirical Research on Topic Drift Index for Trending Network Events[J]. 数据分析与知识发现, 2020, 4(11): 92-101.
[4] Manyu Huang,Qi Yun,Hufeng Peng,Xuemeng Dou. Analyzing Textual Features of Excess-funded Agricultural Products——Case Study of Crowdfunding Website[J]. 数据分析与知识发现, 2019, 3(9): 124-134.
[5] Yanshuang Mei,Hengmin Zhu,Jing Wei. A Study on the Mechanism of Media Collaboration on the Spread of Internet Public Opinion[J]. 数据分析与知识发现, 2019, 3(2): 65-71.
[6] Lei Yang,Zirun Wang,Guisheng Hou. Discovering Topics of Online Health Community with Q-LDA Model[J]. 数据分析与知识发现, 2019, 3(11): 52-59.
[7] Jia Longjia,Zhang Bangzuo. Classifying Topics of Internet Public Opinion from College Students: Case Study of Sina Weibo[J]. 数据分析与知识发现, 2018, 2(7): 55-62.
[8] Wang Shuyi,Liao Huatao,Wu Chake. Mining News on Competitors with Sentiment Classification[J]. 数据分析与知识发现, 2018, 2(3): 70-78.
[9] Wang Jingqi,Li Rui,Wu Huayi. The Evolution of Online Public Opinion Based on Spatial Autocorrelation[J]. 数据分析与知识发现, 2018, 2(2): 64-73.
[10] Huang Wei,Yu Hui,Li Yuefeng. Review of Online Anti-terrorism Research in China[J]. 现代图书情报技术, 2016, 32(11): 1-10.
[11] Yang Haixia,Gao Baojun,Sun Hanlin. Extracting Topics of Computer Science Literature with LDA Model[J]. 现代图书情报技术, 2016, 32(11): 20-26.
[12] Zhang Lifan, Zhao Kai. Study on the Internet Public Opinion Dissemination Model with Discussion Under the Effect of Media[J]. 现代图书情报技术, 2015, 31(11): 60-67.
[13] Li Gang, Mao Jin, Chen Jinghao. Fast Duplicate Detection for Chinese Texts Based on Semantic Fingerprint[J]. 现代图书情报技术, 2013, 29(9): 41-47.
[14] Zhu Hengmin, Liu Kai, Lu Zifang. Study on Topic Propagation Model of Internet Public Opinion Under the Influence of the Media[J]. 现代图书情报技术, 2013, 29(3): 45-50.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn