Uncertain Data Clustering Algorithm Based on Local Density
Luo Yanfu1, Qian Xiaodong2()
1School of Automation and Electrical Engineering, Lanzhou Jiaotong University, Lanzhou 730070, China 2School of Economics and Management, Lanzhou Jiaotong University, Lanzhou 730070, China
[Objective] This paper proposes a new algorithm to cluster uncertain data, aiming to reduce the shortcomings inherited from the classic ones. [Methods] First, we modified the measurement of uncertain distance and compared the probability differences between two existing uncertain objects. Then, we defined the cluster centers and proposed a new algorithm to group the data into the related clusters based on the concepts of maximum supporting points and density chain regions. [Results] We used two data sets from the UCI machine learning library to examine the proposed algorithm. We found that the F values of the two data sets increased by 13.23% and 23.44% compared to traditional algorithm (UK-Means and FDBSCAN). It took the algorithm longer time to calculate the distance matrix. Therefore, the overall clustering time was only slightly shorter than the traditional algorithm. [Limitations] There was no appropriate method to define the parameter for the proposed algorithm, and the clustering time was complex. [Conclusions] The proposed algorithm could quickly determine the clustering centers and complete the clustering tasks. The value of t (the only parameter) poses much influence to the clustering results.
罗彦福, 钱晓东. 基于局部密度的不确定数据聚类算法*[J]. 数据分析与知识发现, 2017, 1(12): 84-91.
Luo Yanfu,Qian Xiaodong. Uncertain Data Clustering Algorithm Based on Local Density. Data Analysis and Knowledge Discovery, 2017, 1(12): 84-91.
(Li Jianzhong, Wang Hongzhi, Gao Hong.State-of-the-Art of Research on Big Data Usability[J]. Journal of Software, 2016, 27(7): 1605-1625.)
doi: 10.13328/j.cnki.jos.005038
[2]
Anagnostopoulos A, Dasgupta A, Kumar R.Approximation Algorithms for Co-Clustering[C]// Proceedings of the 27th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems. ACM, 2008:201-210.
[3]
Kanagal B, Deshpande A. Online Filtering, Smoothing and Probabilistic Modeling of Streaming Data[C]// Proceedings of the 24th International Conference on Data Engineering. IEEE, 2008:1160-1169.
[4]
Ré C, Letchner J, Balazinksa M, et al.Event Queries on Correlated Probabilistic Streams[C]// Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data. ACM, 2008:715-728.
[5]
Chau M, Cheng R, Kao B, et al.Uncertain Data Mining: An Example in Clustering Location Data [A]// Advances in Knowledge Discovery and Data Mining[M]. Springer Berlin Heidelberg, 2006: 199-204.
[6]
刘位龙. 面向不确定性数据的聚类算法研究[D]. 济南: 山东师范大学, 2011.
[6]
(Liu Weilong.Research on Clustering Algorithm for Uncertainty Data[D]. Ji’nan: Shandong Normal University, 2011.)
[7]
Gullo F, Ponti G, Tagarelli A.Clustering Uncertain Data via K-Medoids [A]// Scalable Uncertainty Management[M]. Springer Berlin Heidelberg, 2008: 229-242.
[8]
Xu H J, Li G H.Density-based Probabilistic Clustering of Uncertain Data[C]//Proceeedings of the 2008 International Conference on Computer Science and Software Engineering. 2008: 474-477.
[9]
Kriegel H P, Pfeifle M.Hierarchical Density-Based Clustering of Uncertain Data[C]//Proceedings of the 5th IEEE Conference on Data Mining. 2005:689-692.
[10]
Jiang B, Pei J, Tao Y, et al.Clustering Uncertain Data Based on Probability Distribution Similarity[J]. IEEE Transactions on Knowledge and Data Engineering, 2013, 25(4): 751-763.
doi: 10.1109/TKDE.2011.221
Liu H, Zhang X, Zhang X, et al.Self-adapted Mixture Distance Measure for Clustering Uncertain Data[J]. Knowledge-Based Systems, 2017, 126: 33-47.
doi: 10.1016/j.knosys.2017.04.002
[13]
Gullo F, Ponti G, Tagarelli A, et al.An Information-Theoretic Approach to Hierarchical Clustering of Uncertain Data[J]. Information Sciences, 2017,402:199-215.
doi: 10.1016/j.ins.2017.03.030
(Chi Ronghua, Cheng Yuan, Zhu Suxia, et al.Uncertain Data Analysis Algorithm Based on Fast Gaussian Transform[J]. Journal of Communications, 2017, 38(3): 101-111)
[15]
Rodriguez A, Laio A. Machine Learning.Clustering by Fast Search and Find of Density Peaks[J]. Science, 2014, 344(6191): 1492-1496.