|
|
Study on Web Topic Online Clustering Approach Based on Single-Pass Algorithm |
Zhu Hengmin1,2, Zhu Weiwei2 |
1. Department of Information Management, Nanjing University, Nanjing 210093, China;
2. College of Economics & Management, Nanjing University of Posts & Telecommunications, Nanjing 210046, China |
|
|
Abstract In order to get dynamics of Web information timely, an online Web topic clustering approach based on Single-Pass algorithm is researched. The clustering process of this approach is analyzed firstly,and the key problems including extracting and weight calculating of features as well as representation and modification of topic cluster are deliberated. Experiment is designed to compare the effects of different weight factor of features in title, weight calculating and normalizing methods of features and the vector dimension of topic cluster on cluster quality and time efficiency.
|
Received: 26 September 2011
Published: 02 February 2012
|
|
[1] 王伟,许鑫. 基于聚类的网络舆情热点发现及分析[J]. 现代图书情报技术,2009(3):74-79.[2] Trieschnigg D, Kraaij W. Scalable Hierarchical Topic Detection: Exploring a Sample Based Approach[C]. In: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval,Salvador,Brazil.2005:655-656.[3] 朱恒民,马静,黄卫东.基于领域本体的SOM文本逐层聚类方法[J].情报学报,2008,27(6):845-850.[4] Guha S, Mishra N, Motwani R, et al. Clustering Data Streams[C].In:Proceedings of the Annual Symposium on Foundations of Computer Science.2000:359-366.[5] Gupta C, Grossman R L. GenIc: A Single Pass Generalized Incremental Algorithm for Clustering[C].In:Proceedings of the 2004 SIAM International Conference on Data Mining, Philadelphia.2004:137-153.[6] 税仪冬,瞿有利,黄厚宽.周期分类和Single-Pass聚类相结合的话题识别与跟踪方法[J].北京交通大学学报,2009,33(5):85-89.[7] 胡迁乔. 面向中文论坛的网络舆情倾向性分析[D]. 武汉:华中科技大学,2009. |
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|