K-means算法研究综述

doi:10.11925/infotech.1003-3513.2011.05.05

现代图书情报技术

2011, Vol. 27

Issue (5): 28-35 https://doi.org/10.11925/infotech.1003-3513.2011.05.05

知识组织与知识管理

本期目录 | 过刊浏览 | 高级检索

K-means算法研究综述

吴夙慧¹, 成颖¹, 郑彦宁², 潘云涛²

1. 南京大学信息管理系南京 210093;
2. 中国科学技术信息研究所北京 100038

Survey on K-means Algorithm

Wu Suhui¹, Cheng Ying¹, Zheng Yanning², Pan Yuntao²

1. Department of Information Management, Nanjing University, Nanjing 210093,China;
2. Institute of Scientific & Technical Information of China, Beijing 100038,China

摘要
参考文献
相关文章
Metrics

全文: PDF (957 KB) HTML
输出: BibTeX | EndNote (RIS)

摘要对聚类分析中的基本算法K-means算法中的K值确定、初始聚类中心选择以及分类属性数据处理等主要问题进行综述,理清K-means算法的整个发展脉络及算法研究中的热点和难点,提出改进K-means聚类算法的思路。

	服务

	把本文推荐给朋友
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	吴夙慧
	成颖
	郑彦宁
	潘云涛

关键词 ： K-means算法, 聚类算法, K值, 初始聚类中心

Abstract：The main problems of K-means algorithm which is a basic algorithm in clustering are outlined in this paper, such as determination of the optimal clusters, selection of initial centers, and categorical data clustering, etc. The development, the hot spots and difficulties of this algorithm are clarified, and some ideas are introduced to improve the algorithm efficiency.

Key words： K-means algorithm Clustering algorithm Number of clusters Initial clustering centers

收稿日期: 2011-03-21 出版日期: 2011-07-11

G202

基金资助:

本文系教育部人文社会科学研究项目“危机事件的网络信息扩散规律与控制机理研究”(项目编号:10YJC630396)和湖北省教育厅人文社会科学项目“企业内部的知识协同行为与协同网络研究”(项目编号:2010q052)的研究成果之一。

引用本文:

吴夙慧, 成颖, 郑彦宁, 潘云涛. K-means算法研究综述[J]. 现代图书情报技术, 2011, 27(5): 28-35.
Wu Suhui, Cheng Ying, Zheng Yanning, Pan Yuntao. Survey on K-means Algorithm. New Technology of Library and Information Service, 2011, 27(5): 28-35.

链接本文:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.1003-3513.2011.05.05 或 https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2011/V27/I5/28

[1] Hartigan J A. Clustering Algorithms [M]. New York: John Wiley & Sons Inc., 1975.

[2] MacQueen J. Some Methods for Classification and Analysis of Multivariate Observations . In: Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability. Berkeley, University of California Press,1967:281-297.

[3] Cox D R. Note on Grouping [J].Journal of the American Statistical Association, 1957, 52(280): 543-547.

[4] Fisher W D. On Grouping for Maximum Homogeneity[J].Journal of the American Statistical Association, 1958, 53(284):789-798.

[5] Sebestyen G S. Decision Making Process in Pattern Recognition[M]. New York: Macmillan,1962.

[6] Wilpon J G, Rabiner L R. A Modified K-means Clustering Algorithm for Use in Isolated Word Recognition[J]. IEEE Transactions on Acoustics Speech and Signal Processing, 1985,33(3):587-594.

[7] McBratney A B, De Gruijter J J. A Continuum Approach to Soil Classification by Modified Fuzzy K-means with Extragrades[J]. Journal of Soil Science,1992,43(1):159-175.

[8] Simek J F. A K-means Approach to the Analysis of Spatial Structure in Upper Paleolithic Habitation Sites[M]. Oxford, England: B.A.R.,1984:400-401.

[9] Hartigan J A. Asymptotic Distributions for Clustering Criteria[J].The Annuals of Statistics, 1978,6(1):117-131.

[10] Pollard D. Strong Consistency of K-means Clustering[J].The Annuals of Statistics,1981,9(1):135-140.

[11] Pollard D. A Central Limit Theorem for K-means Clustering[J].The Annuals of Probability, 1982,10(4):919-926.

[12] Creator R, Kaufman L, Source R, et al. K-means Type Algorithms: A Generalized Convergence Theorem and Characterization of Local Optimality[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,1984,6(1):81-87.

[13] Chinrungrueng C, Sequin C H. Optimal Adaptive K-means Algorithm with Dynamic Adjustment of Learning Rate[J]. IEEE Transactions on Neural Networks,1995,6(1):157-169.

[14] Davies D L, Bouldin D W. A Cluster Separation Measure[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,1979(2):224-227.

[15] Dunn J C.A Fuzzy Relative of the Isodata Process and Its Use in Detecting Compact Well-Separated Clusters[J]. Cybernetics and Systems,1973,3(3):32-57.

[16] Rezaee M R, Lelieveldt B P, Reiber J H. A New Cluster Validity Index for the Fuzzy C-Means[J]. Pattern Recognition Letters,1998,19(3-4):237-246.

[17] Hubert L J, Arabie P. Comparing Partitions[J]. Journal of Classification,1985,2(1): 193-218.

[18] Bezdek J C,Pal N R. Some New Indexes of Cluster Validity[J]. IEEE Transactions on Systems, Man, and Cybernetics,1998,28(3):301-315.

[19] 李永森,杨善林,马溪骏,等. 空间聚类算法中的 K值优化问题研究[J]. 系统仿真学报 ,2006,18(3):573-576.

[20] 张逸清,刘文才.聚类数的确定[J]. 计算机与数字工程 ,2007,35(2):42-44.

[21] 张忠平,王爱杰,柴旭光.简单有效的确定聚类数目算法[J]. 计算机工程与应用 ,2009,45(15):166-168.

[22] Bandyopadhyay S, Maulik U. Genetic Clustering for Automatic Evolution of Clusters and Application to Image Classification[J]. Pattern Recognition,2002,35(6):1197-1208.

[23] Lin H J, Yang F W, Kao Y T. An Efficient GA-based Clustering Technique [J]. Tamkang Journal of Science and Engineering,2005,8(2):113-122.

[24] Liu Y, Ye M, Peng J, et al. Finding the Optimal Number of Clusters Using Genetic Algorithms .In:Proceedings of IEEE International Conference on Cybernetic Intelligent Systems.2008:1325-1330.

[25] 巩敦卫,蒋余庆,张勇,等.基于微粒群优化聚类数目的K-均值算法[J]. 控制理论与应用 ,2009,26(10):1175-1179.

[26] Van Der Merwe D W, Engelbrecht A P. Data Clustering Using Particle Swarm Optimization . In:Proceedings of the 2003 Congress on Evolutionary Computation, Canberra,Australia.2003:215-220.

[27] Omran M, Engelbrecht A P, Salman A. Particle Swarm Optimization Method for Image Clustering[J]. International Journal of Pattern Recognition and Artificial Intelligence,2005,19(3):297-321.

[28] Xu L, Krzyzak A, Oja E. Rival Penalized Competitive Learning for Clustering Analysis, RBF Net,and Curve Detection[J]. IEEE Transactions on Neural Networks, 1993, 4(4):636-649.

[29] Xu L, Krzyzak A, Oja E. Unsupervised and Supervised Classification by Rival Penalized Competitive Learning . In:Proceedings of the 11th International Conference on Pattern Recognition.1992:496-499.

[30] Pelleg D, Moore A. X-means: Extending K-means with Efficient Estimation of the Number of Clusters . In:Proceeding of the 17th International Conference on Machine Learning.2000:727-734.

[31] Duda R O, Hart P E. Pattern Classification and Scene Analysis[M]. New York: John Wiley & Sons Inc.,1973.

[32] Katsavounidis I, Kuo C J, Zhang Z. A New Initialization Technique for Generalized Loyd Iteration[J]. IEEE Signal Processing Letters,1994,1(10):144-146.

[33] Khan S S, Ahmad A. Cluster Center Initialization Algorithm for K-means Clustering[J]. Pattern Recognition Letters,2004,25(11):1293-1302.

[34] Redmond S J, Heneghan C. A Method for Initialising the K-means Clustering Algorithm Using Kd-trees[J]. Pattern Recognition Letters,2007,28(8):965-973.

[35] 张文君,顾行发,陈良富,等.基于均值-标准差的K均值初始聚类中心选取算法[J]. 遥感学报 ,2006,10(5):715-721.

[36] 牛琨,张舒博,陈俊亮.融合网格密度的聚类中心初始化方案[J]. 北京邮电大学学报 ,2007,30(2):6-10.

[37] 张文明,吴江,袁小蛟. 基于密度和最近邻的K-means文本聚类算法[J]. 计算机应用 ,2010,30(7):1933-1935.

[38] Metropolis N, Rosenbluth A W, Rosenbluth M N, et al. Equation of State Calculations by Fast Computing Machines[J]. Journal of Chemical Physics, 1953,21(6):1087-1092.

[39] Klein R W, Dubes R C. Experiments in Projection and Clustering by Simulated Annealing[J]. Pattern Recognition,1989,22(2):213-220.

[40] Dong J, Qi M. K-means Optimization Algorithm for Solving Clustering Problem . In:Proceedings of the 2nd International Workshop on Knowledge Discovery and Data Mining,Moscow.2009:52-55.

[41] Bhuyan J N, Raghavan V V, Elayavalli V K. Genetic Algorithm for Clustering with an Ordered Representation .In:Proceedings of the 4th International Conference Genetic Algorithms,San Diego, CA,USA.1991.

[42] Jones D R, Beltramo M A. Solving Partitioning Problems with Genetic Algorithms .In: Proceedings of the 4th International Conference Genetic Algorithms,San Diego,CA,USA.1991:442-494.

[43] Babu G P, Murty N M. A Near-optimal Initial Seed Selection in K-means Algorithm Using a Genetic Algorithm[J]. Pattern Recognit Letters,1993,14(10):763-769.

[44] Krishna K, Murty N M. Genetic K-means Algorithm[J]. IEEE Transactions on Systems, Man and Cybernetics,1999,29(3):433-439.

[45] Maulik U, Bandyopadhyay S. Genetic Algorithm-based Clustering Technique[J].Pattern Recognition,2000,33(9):1455-1465.

[46] Laszlo M, Mukherjee S. A Genetic Algorithm that Exchanges Neighboring Centers for K-means Clustering[J]. Pattern Recognition Letters,2007,28(16):2359-2366.

[47] Bradley P S, Fayyad U M. Refining Initial Points for K-means Clustering . In: Proceedings of the 15th International Conference on Machine Learning.1998:91-99.

[48] Bandyopadhyay S, Maulik U, Pakhira M K. Clustering Using Simulated Annealing with Probabilistic Redistribution[J]. International Journal of Pattern Recognition and Artificial Intelligence,2001,15(2):269-285.

[49] Zalik K R. An Efficient K-means Clustering Algorithm[J].Pattern Recognition Letters, 2008,29(9):1385-1391.

[50] Cao F,Liang J,Jiang G. An Initialization Method for the K-means Algorithm Using Neighborhood Model[J]. Computers & Mathematics with Applications,2009,58(3):474-483.

[51] 刘立平,孟志青. 一种选取初始聚类中心的方法[J]. 计算机工程与应用 ,2004,40(8):179-180.

[52] 王汉芝,刘振全. 一种新的确定均值算法初始聚类中心的方法[J]. 天津科技大学学报 ,2005,20(4):76-79.

[53] 戴文华. 基于遗传算法的文本分类及聚类研究[M]. 北京:科学出版社, 2008.

[54] Ralambondarmy H. A Conceptual Version of the K-means Algorithm[J]. Pattern Recognition Lcttcrs,1995,16(11):1147-1157.

[55] Huang Z X. A Fast Clustering Algorithm to Cluster Very Large Categorical Data Sets in Data Mining . In:Proceedings of the SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery.1997:1-8.

[56] Huang Z X. Extensions to the K-means Algorithm for Clustering Large Data Sets with Categorical Values[J]. Data Mining and Knowledge Discovery,1998,2(3):283-304.

[57] Chaturvedi A D, Green P E, Carroll J D. K-modes Clustering[J]. Journal of Classification, 2001,18(1):35-55.

[58] Huang Z X, Ng M K. A Note on K-modes Clustering[J]. Journal of Classification,2003,20(2): 257-261.

[59] Sun Y, Zhu Q M, Chen Z X. An Iterative Initial-points Refinement Algorithm for Categorical Data Clustering[J]. Pattern Recognition Letters, 2002,23(7):875-884.

[60] 蒋盛益,李庆华.一种增强的K-means 聚类算法[J]. 计算机工程与科学 ,2006,28(11):56-59.

[61] Roy D K, Sharma L K. Genetic K-means Clustering Algorithm for Mixed Numeric and Categorical Data Sets[J]. International Journal of Artificial Intelligence & Applications,2010,1(2):23-28.

[1]	卢利农,祝忠明,张旺强,王小春. 基于Lingo3G聚类算法的机构知识库跨库知识整合与知识指纹服务实现[J]. 数据分析与知识发现, 2021, 5(5): 127-132.
[2]	温廷新,李洋子,孙静霜. 基于多因素特征选择与AFOA/K-means的新闻热点发现方法^*[J]. 数据分析与知识发现, 2019, 3(4): 97-106.
[3]	刘洪伟, 高鸿铭, 陈丽, 詹明君, 梁周扬. 基于用户浏览行为的兴趣识别管理模型^*[J]. 数据分析与知识发现, 2018, 2(2): 74-85.
[4]	龚凯乐,成颖,孙建军. 基于参与者共现分析的博文聚类研究^*[J]. 现代图书情报技术, 2016, 32(10): 50-58.
[5]	赵辉, 刘怀亮. 面向用户生成内容的短文本聚类算法研究[J]. 现代图书情报技术, 2013, 29(9): 88-92.
[6]	李英吴园园宁福锦. 基于PSO的K-means改进算法在证券客户细分中的应用[J]. 现代图书情报技术, 2010, 26(7/8): 88-94.
[7]	吉雍慧. 数字图书馆中的检索结果聚类和关联推荐研究[J]. 现代图书情报技术, 2008, 24(2): 69-75.

Viewed

Full text

Abstract

Cited

Shared

Discussed