|
|
Algorithm and Experiment Research of Textual Document Clustering Based on Improved K-means |
Cen Yonghua 1,2 Wang Xiaorong2 Ji Yonghui 1 |
1(Department of Information Management,Nanjing University,Nanjing 210093,China)
2(Department of Information Management,Nanjing University of Science & Technology,Nanjing 210094,China) |
|
|
Abstract After a concise introduction of conotation,functions and general processs of textual document clustering, this paper expotiates the basic mechanism of a kind of improved K-means clustering based on initial centroids selection through minimum-maximum principle, designs its algorithm, implements the clustering system, and conducts several experiments taking 300 academic articles and relative characteristic words for instances, which prove the good performance of the algorithm proposed.
|
Received: 18 August 2008
Published: 25 December 2008
|
|
Corresponding Authors:
Cen Yonghua
E-mail: yhcen@163.com
|
About author:: Cen Yonghua,Wang Xiaorong,Ji Yonghui |
[1] 刘远超,王晓龙,徐志明,等.文档聚类综述[J].中文信息学报,2006(3):55-62.
[2] 刘远超,王晓龙,刘秉权.一种改进的K-means文档聚类初值选择算法[J]. 高技术通讯,2006 (1):11-15.
[3] 吉雍慧. 数字图书馆中的检索结果聚类和关联推荐研究[J].现代图书情报技术,2008(2):69-75.
[4] Hearst M A. Texttiling: Segmenting Text into Multi-paragraph Subtopic Passages[J]. Computational Linguistics,1997,23(1):33-64. |
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|