[Objective] This paper proposes a new algorithm to cluster uncertain data, aiming to reduce the shortcomings inherited from the classic ones. [Methods] First, we modified the measurement of uncertain distance and compared the probability differences between two existing uncertain objects. Then, we defined the cluster centers and proposed a new algorithm to group the data into the related clusters based on the concepts of maximum supporting points and density chain regions. [Results] We used two data sets from the UCI machine learning library to examine the proposed algorithm. We found that the F values of the two data sets increased by 13.23% and 23.44% compared to traditional algorithm (UK-Means and FDBSCAN). It took the algorithm longer time to calculate the distance matrix. Therefore, the overall clustering time was only slightly shorter than the traditional algorithm. [Limitations] There was no appropriate method to define the parameter for the proposed algorithm, and the clustering time was complex. [Conclusions] The proposed algorithm could quickly determine the clustering centers and complete the clustering tasks. The value of t (the only parameter) poses much influence to the clustering results.
罗彦福, 钱晓东. 基于局部密度的不确定数据聚类算法*[J]. 数据分析与知识发现, 2017, 1(12): 84-91.
Luo Yanfu,Qian Xiaodong. Uncertain Data Clustering Algorithm Based on Local Density. Data Analysis and Knowledge Discovery, 2017, 1(12): 84-91.
(Li Jianzhong, Wang Hongzhi, Gao Hong.State-of-the-Art of Research on Big Data Usability[J]. Journal of Software, 2016, 27(7): 1605-1625.)
Anagnostopoulos A, Dasgupta A, Kumar R.Approximation Algorithms for Co-Clustering[C]// Proceedings of the 27th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems. ACM, 2008:201-210.
Kanagal B, Deshpande A. Online Filtering, Smoothing and Probabilistic Modeling of Streaming Data[C]// Proceedings of the 24th International Conference on Data Engineering. IEEE, 2008:1160-1169.
Ré C, Letchner J, Balazinksa M, et al.Event Queries on Correlated Probabilistic Streams[C]// Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data. ACM, 2008:715-728.
Chau M, Cheng R, Kao B, et al.Uncertain Data Mining: An Example in Clustering Location Data [A]// Advances in Knowledge Discovery and Data Mining[M]. Springer Berlin Heidelberg, 2006: 199-204.
刘位龙. 面向不确定性数据的聚类算法研究[D]. 济南: 山东师范大学, 2011.
(Liu Weilong.Research on Clustering Algorithm for Uncertainty Data[D]. Ji’nan: Shandong Normal University, 2011.)
Gullo F, Ponti G, Tagarelli A.Clustering Uncertain Data via K-Medoids [A]// Scalable Uncertainty Management[M]. Springer Berlin Heidelberg, 2008: 229-242.
Xu H J, Li G H.Density-based Probabilistic Clustering of Uncertain Data[C]//Proceeedings of the 2008 International Conference on Computer Science and Software Engineering. 2008: 474-477.
Kriegel H P, Pfeifle M.Hierarchical Density-Based Clustering of Uncertain Data[C]//Proceedings of the 5th IEEE Conference on Data Mining. 2005:689-692.
Jiang B, Pei J, Tao Y, et al.Clustering Uncertain Data Based on Probability Distribution Similarity[J]. IEEE Transactions on Knowledge and Data Engineering, 2013, 25(4): 751-763.