[Objective] This paper develops a new clustering algorithm, aiming to automatically calculate the cut-off distance and select the cluster centers. [Methods] First, we proposed a new adaptive algorithm based on information entropy and the cut-off distance. Then, we extracted the cluster centers, with the help of inflection points determined by the slope trend of the weight in the sorting chart. Finally, we evaluated the performance of the ADPC algorithm to those of the DBSCAN, DPC, DGCCD, and ACP algorithms using UCI and manmade datasets. [Results] The ADPC algorithm automatically identified the cluster centers and significantly improved the precision, F-measure, normalized mutual information measurement and runtime. [Limitations] The proposed algorithm’s performance with high-dimension data as well as its efficiency to process large data sets need to be improved. [Conclusions] The proposed ADPC algorithm could effectively identify clustering centers and the cut-off distance with low-dimension or arbitrary data sets.
杨震, 王红军, 周宇. 一种截断距离和聚类中心自适应的聚类算法*[J]. 数据分析与知识发现, 2018, 2(3): 39-48.
Yang Zhen,Wang Hongjun,Zhou Yu. A Clustering Algorithm with Adaptive Cut-off Distance and Cluster Centers. Data Analysis and Knowledge Discovery, 2018, 2(3): 39-48.
Datta S, Giannella C, Kargupta H.Approximate Distributed K-Means Clustering over a Peer-to-Peer Network[J]. IEEE Transactions on Knowledge and Data Engineering, 2009, 21(10): 1372-1388.
doi: 10.1109/TKDE.2008.222
[2]
Lu W. The Research on Media Audience Market Segmentation Based on Cluster Analysis [J]. Advanced Materials Research, 2011, 219-220: 84-87.
doi: 10.4028/www.scientific.net/AMR.219-220.84
[3]
Cluster based Information Security Method in Machine to Machine[P]. KR101317859, 2013-10-14.
[4]
Han J, Kamber M.Data Mining Concepts and Techniques[M]. Beijing: China Machine Press, 2012.
(An Jianrui, Zhang Longbo, Wang Lei, et al.An Improved OPTICS Algorithm Based on Grid and Weighted Information Entropy[J]. Computer Engineering, 2017, 43(2): 206-209.)
doi: 10.3969/j.issn.1000-3428.2017.02.034
(Gan Wenyan, Liu Chong.An Improved Clustering Algorithm That Searches and Finds Density Peaks[J]. CAAI Transactions on Intelligent Systems, 2017, 12(2): 229-236.)
doi: 10.11992/tis.201512036
(Li Tao, Ge Hongwei, Su Shuzhi.Density Peaks Clustering by Automatic Determination of Cluster Centers[J]. Journal of Frontiers of Computer Science and Technology, 2016, 10(11): 1614-1622.)
doi: 10.3778/j.issn.1673-9418.1510049
(He Xiongxiong, Guan Junyi, Ye Xuanzuo, et al.A Density-based and Grid-based Cluster Centers Determination Clustering Algorithm[J]. Control and Decision, 2017(5): 913-919. )
(Yu Xiaofei, Ge Hongwei.Potential Clustering by Automatic Determination of Cluster Center[J]. Journal of Frontiers of Computer Science and Technology, 2017. DOI: 10.3778/j.issn.1673-9418.1702048.)
(Wang Huaqiu, Nie Zhen.Application of Fast Search Density Peak Clustering in Image Retrieval[J]. Computer Engineering and Design, 2016, 37(11): 3045-3050, 3057.)
Veenman C J, Reinders M J T, Backer E. A Maximum Variance Cluster Algorithm[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2002, 24(9): 1273-1280.
doi: 10.1109/TPAMI.2002.1033218
[16]
Lichman M. UCI Machine Learning Repository [EB/OL]. [2017-07-09]. .
[17]
Zheng Y, Zhang L, Xie X, et al.Mining Interesting Locations and Travel Sequences from GPS Trajectories[C]//Proceedings of International Conference on World Wild Web (WWW 2009), Madrid, Spain. ACM Press, 2009: 791-800.
[18]
Zheng Y, Li Q, Chen Y, et al.Understanding Mobility Based on GPS Data[C]//Proceedings of ACM Conference on Ubiquitous Computing (UbiComp 2008), Seoul, Korea. ACM Press, 2008: 312-321.
[19]
Zheng Y, Xie X, Ma W Y.GeoLife: A Collaborative Social Networking Service among User, Location and Trajectory[J]. IEEE Data Engineering Bulletin, 2010, 33(2): 32-40.