|
|
An Improved Method for Determining Optimal Number of Clusters in K-means Clustering Algorithm |
Bian Peng1,2, Zhao Yan3, Su Yuzhao1,2 |
1. National Science Library, Chinese Academy of Sciences, Beijing 100190, China;
2. Graduate University of Chinese Academy of Sciences, Beijing 100049, China;
3. Computer Science and Application Department, Zhengzhou Institute of Aeronautical Industry Management, Zhengzhou 450015, China |
|
|
Abstract Based on the text clustering requirement from the embedded NSTL Recommending System, this paper researches on the BWP algorithm, and analyzes the shortage of the BWP. Then an improved algorithm is proposed to optimize the calculation of the distance within the single sample cluster. The improved algorithm enlarges the range of clusters number based on the BWP. Moreover, it changes the partial optimum into the whole optimum. At last, the test result shows it is effective and efficient.
|
Received: 12 July 2011
Published: 02 December 2011
|
|
[1] Calinski R, Harabasz J. A Dendrite Method for Cluster Analysis[J]. Communications in Statistics, 1974,3(1):1-27.[2] Davies D L, Bouldin D W. A Cluster Separation Measure[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1979,1(2):224-227.[3] Dudoit S, Fridlyand J. A Prediction-based Resampling Method for Estimating the Number of Clusters in a Dataset[J]. Genome Biology, 2002,3(7):1-21.[4] Dimitriadou E, Dolnicar S, Weingessel A. An Examination of Indexes for Determining the Number of Cluster in Binary Datasets[J]. Psychometrika, 2002,67(1):137-160.[5] Kapp A V, Tibshirani R. Are Clusters Found in One Dataset Present in Another Dataset?[J]. Biostatistics, 2007,8(1):9-31.[6] 周世兵,徐振源,唐旭清,K-means 算法最佳聚类数确定方法[J].计算机应用,2010,30(8):1995-1998.[7] Rousseeuw P J. A Graphical Aid to the Interpretation and Validation of Cluster Analysis[J].Journal of Computational and Applied Mathematics, 1987,20(1):53-65.[8] MacQueen J. Some Methods for Classification and Analysis of Multivariate Observations[C]. In: Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability.Berkeley: University of California Press, 1967:281-297.[9] 谢娟英,蒋帅,王春霞,等,一种改进的全局K-均值聚类算法[J].陕西师范大学学报:自然科学版,2010,38(2):18-22.[10] 李飞,薛彬,黄亚楼,等,初始中心优化的K-means聚类算法[J].计算机科学,2002,29(7):94-96.[11] 姜园,张朝阳,仇佩亮,等.用于数据挖掘的聚类算法[J].电子与信息学报,2005,27(4):655-662.[12] Pelleg D, Moore A. X-means: Extending K-means with Efficient Estimation of the Number of Clusters[C]. In: Proceedings of the 17th ICML. 2000: 727-734. |
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|