Please wait a minute...
Data Analysis and Knowledge Discovery  2018, Vol. 2 Issue (3): 39-48    DOI: 10.11925/infotech.2096-3467.2017.0889
Current Issue | Archive | Adv Search |
A Clustering Algorithm with Adaptive Cut-off Distance and Cluster Centers
Yang Zhen(), Wang Hongjun, Zhou Yu
Electronic Engineering Institute of PLA, Hefei 230037, China
Download: PDF (2184 KB)   HTML ( 5
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper develops a new clustering algorithm, aiming to automatically calculate the cut-off distance and select the cluster centers. [Methods] First, we proposed a new adaptive algorithm based on information entropy and the cut-off distance. Then, we extracted the cluster centers, with the help of inflection points determined by the slope trend of the weight in the sorting chart. Finally, we evaluated the performance of the ADPC algorithm to those of the DBSCAN, DPC, DGCCD, and ACP algorithms using UCI and manmade datasets. [Results] The ADPC algorithm automatically identified the cluster centers and significantly improved the precision, F-measure, normalized mutual information measurement and runtime. [Limitations] The proposed algorithm’s performance with high-dimension data as well as its efficiency to process large data sets need to be improved. [Conclusions] The proposed ADPC algorithm could effectively identify clustering centers and the cut-off distance with low-dimension or arbitrary data sets.

Key wordsClustering      Cut-off      Distance      Slope Change      ADPC     
Received: 06 September 2017      Published: 03 April 2018
ZTFLH:  TP391  

Cite this article:

Yang Zhen,Wang Hongjun,Zhou Yu. A Clustering Algorithm with Adaptive Cut-off Distance and Cluster Centers. Data Analysis and Knowledge Discovery, 2018, 2(3): 39-48.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2017.0889     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2018/V2/I3/39

数据集 样本数 维数 类别数
L3 312 2 3
R15 600 2 15
数据集 样本数 维数 类别数
Iris 150 4 3
Aggregation 788 2 7
Waveform 5 000 21 3
Wine 178 13 3
[1] Datta S, Giannella C, Kargupta H.Approximate Distributed K-Means Clustering over a Peer-to-Peer Network[J]. IEEE Transactions on Knowledge and Data Engineering, 2009, 21(10): 1372-1388.
doi: 10.1109/TKDE.2008.222
[2] Lu W. The Research on Media Audience Market Segmentation Based on Cluster Analysis [J]. Advanced Materials Research, 2011, 219-220: 84-87.
doi: 10.4028/www.scientific.net/AMR.219-220.84
[3] Cluster based Information Security Method in Machine to Machine[P]. KR101317859, 2013-10-14.
[4] Han J, Kamber M.Data Mining Concepts and Techniques[M]. Beijing: China Machine Press, 2012.
[5] 安建瑞, 张龙波, 王雷, 等. 一种基于网格与加权信息熵的OPTICS改进算法[J]. 计算机工程, 2017, 43(2): 206-209.
doi: 10.3969/j.issn.1000-3428.2017.02.034
[5] (An Jianrui, Zhang Longbo, Wang Lei, et al.An Improved OPTICS Algorithm Based on Grid and Weighted Information Entropy[J]. Computer Engineering, 2017, 43(2): 206-209.)
doi: 10.3969/j.issn.1000-3428.2017.02.034
[6] 安计勇, 韩海英, 侯效礼. 一种改进的DBSCAN聚类算法[J]. 微电子学与计算机, 2015, 32(7): 68-71.
[6] (An Jiyong, Han Haiying, Hou Xiaoli.An Improved DBSCAN Clustering Algorithm[J]. Microelectronics and Computer, 2015, 32(7): 68-71.)
[7] 王实美. 基于DBSCAN的自适应非均匀密度聚类算法研究[D]. 北京: 北京交通大学, 2017.
[7] (Wang Shimei.Research on Adaptive Varied Density Clustering Algorithm Based on DBSCAN [D]. Beijing: Beijing Jiaotong University, 2017. )
[8] Rodriguez A, Latio A.Clustering by Fast Search and Find of Density Peaks[J]. Science, 2014, 344(6191): 1492-1496.
doi: 10.1126/science.1242072
[9] 淦文燕, 刘冲. 一种改进的搜索密度峰值的聚类算法[J]. 智能系统学, 2017, 12(2): 229-236.
doi: 10.11992/tis.201512036
[9] (Gan Wenyan, Liu Chong.An Improved Clustering Algorithm That Searches and Finds Density Peaks[J]. CAAI Transactions on Intelligent Systems, 2017, 12(2): 229-236.)
doi: 10.11992/tis.201512036
[10] 李涛, 葛洪伟, 苏树智. 自动确定聚类中心的密度峰聚类[J]. 计算机科学与探索, 2016, 10(11): 1614-1622.
doi: 10.3778/j.issn.1673-9418.1510049
[10] (Li Tao, Ge Hongwei, Su Shuzhi.Density Peaks Clustering by Automatic Determination of Cluster Centers[J]. Journal of Frontiers of Computer Science and Technology, 2016, 10(11): 1614-1622.)
doi: 10.3778/j.issn.1673-9418.1510049
[11] 何熊熊, 管俊轶, 叶宣佐, 等. 一种基于密度和网格的簇心可确定聚类算法[J]. 控制与决策, 2017(5): 913-919.
[11] (He Xiongxiong, Guan Junyi, Ye Xuanzuo, et al.A Density-based and Grid-based Cluster Centers Determination Clustering Algorithm[J]. Control and Decision, 2017(5): 913-919. )
[12] 于晓飞, 葛洪伟.自动确定聚类中心的势能聚类算法[J]. 计算机科学与探索, 2017. DOI: 10.3778/j.issn.1673-9418. 1702048.
[12] (Yu Xiaofei, Ge Hongwei.Potential Clustering by Automatic Determination of Cluster Center[J]. Journal of Frontiers of Computer Science and Technology, 2017. DOI: 10.3778/j.issn.1673-9418.1702048.)
[13] 王华秋, 聂珍. 快速搜索密度峰值聚类在图像检索中的应用[J]. 计算机工程与设计, 2016, 37(11): 3045-3050, 3057.
[13] (Wang Huaqiu, Nie Zhen.Application of Fast Search Density Peak Clustering in Image Retrieval[J]. Computer Engineering and Design, 2016, 37(11): 3045-3050, 3057.)
[14] Chang H, Yeung D Y.Robust Path-based Spectral Clustering[J]. Pattern Recognition, 2008, 41(1): 191-203.
doi: 10.1016/j.patcog.2007.04.010
[15] Veenman C J, Reinders M J T, Backer E. A Maximum Variance Cluster Algorithm[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2002, 24(9): 1273-1280.
doi: 10.1109/TPAMI.2002.1033218
[16] Lichman M. UCI Machine Learning Repository [EB/OL]. [2017-07-09]. .
[17] Zheng Y, Zhang L, Xie X, et al.Mining Interesting Locations and Travel Sequences from GPS Trajectories[C]//Proceedings of International Conference on World Wild Web (WWW 2009), Madrid, Spain. ACM Press, 2009: 791-800.
[18] Zheng Y, Li Q, Chen Y, et al.Understanding Mobility Based on GPS Data[C]//Proceedings of ACM Conference on Ubiquitous Computing (UbiComp 2008), Seoul, Korea. ACM Press, 2008: 312-321.
[19] Zheng Y, Xie X, Ma W Y.GeoLife: A Collaborative Social Networking Service among User, Location and Trajectory[J]. IEEE Data Engineering Bulletin, 2010, 33(2): 32-40.
[1] Wang Ruolin, Niu Zhendong, Lin Qika, Zhu Yifan, Qiu Ping, Lu Hao, Liu Donglei. Disambiguating Author Names with Embedding Heterogeneous Information and Attentive RNN Clustering Parameters[J]. 数据分析与知识发现, 2021, 5(8): 13-24.
[2] Wang Xiwei,Jia Ruonan,Wei Yanan,Zhang Liu. Clustering User Groups of Public Opinion Events from Multi-dimensional Social Network[J]. 数据分析与知识发现, 2021, 5(6): 25-35.
[3] Lu Linong,Zhu Zhongming,Zhang Wangqiang,Wang Xiaochun. Cross-database Knowledge Integration and Fingerprint of Institutional Repositories with Lingo3G Clustering Algorithm[J]. 数据分析与知识发现, 2021, 5(5): 127-132.
[4] Chen Jun,Liang Hao,Qian Chen. Studying Investment Decisions of Rewarded Crowdfunding Users with Emotional Distance and Text Analysis[J]. 数据分析与知识发现, 2021, 5(4): 60-71.
[5] Zhang Mengyao, Zhu Guangli, Zhang Shunxiang, Zhang Biao. Grouping Microblog Users of Trending Topics Based on Sentiment Analysis[J]. 数据分析与知识发现, 2021, 5(2): 43-49.
[6] Ding Hao, Ai Wenhua, Hu Guangwei, Li Shuqing, Suo Wei. A Personalized Recommendation Model with Time Series Fluctuation of User Interest[J]. 数据分析与知识发现, 2021, 5(11): 45-58.
[7] Yang Chen, Chen Xiaohong, Wang Chuhan, Liu Tingting. Recommendation Strategy Based on Users’ Preferences for Fine-Grained Attributes[J]. 数据分析与知识发现, 2021, 5(10): 94-102.
[8] Yu Fengchang,Cheng Qikai,Lu Wei. Locating Academic Literature Figures and Tables with Geometric Object Clustering[J]. 数据分析与知识发现, 2021, 5(1): 140-149.
[9] Wu Jinming,Hou Yuefang,Cui Lei. Automatic Expression of Co-occurrence Clustering Based on Indexing Rules of Medical Subject Headings[J]. 数据分析与知识发现, 2020, 4(9): 133-144.
[10] Wen Pingmei,Ye Zhiwei,Ding Wenjian,Liu Ying,Xu Jian. Developments of Named Entity Disambiguation[J]. 数据分析与知识发现, 2020, 4(9): 15-25.
[11] Xi Yunjiang, Du Diedie, Liao Xiao, Zhang Xuehong. Analyzing & Clustering Enterprise Microblog Users with Supernetwork[J]. 数据分析与知识发现, 2020, 4(8): 107-118.
[12] Wei Guohui,Zhang Fengcong,Fu Xianjun,Wang Zhenguo. Similarity Measurement of Traditional Chinese Medicine Components for Cold-hot Nature Discrimination[J]. 数据分析与知识发现, 2020, 4(5): 75-83.
[13] Yang Xu,Qian Xiaodong. Synchronous Clustering Algorithm for Social Networks Based on Improved Vicsek Model[J]. 数据分析与知识发现, 2020, 4(4): 119-128.
[14] Xiong Huixiang,Li Xiaomin,Li Yueyan. Group Recommendation Based on Attribute Mining of Book Reviews[J]. 数据分析与知识发现, 2020, 4(2/3): 214-222.
[15] Wei Jiaze,Dong Cheng,He Yanqing,Liu Zhihui,Peng Keyun. Detecting News Topics Based on Equalized Paragraph and Sub-topic Vector[J]. 数据分析与知识发现, 2020, 4(10): 70-79.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn