|
|
Uncertain Data Clustering Algorithm Based on Local Density |
Luo Yanfu1, Qian Xiaodong2() |
1School of Automation and Electrical Engineering, Lanzhou Jiaotong University, Lanzhou 730070, China 2School of Economics and Management, Lanzhou Jiaotong University, Lanzhou 730070, China |
|
|
Abstract [Objective] This paper proposes a new algorithm to cluster uncertain data, aiming to reduce the shortcomings inherited from the classic ones. [Methods] First, we modified the measurement of uncertain distance and compared the probability differences between two existing uncertain objects. Then, we defined the cluster centers and proposed a new algorithm to group the data into the related clusters based on the concepts of maximum supporting points and density chain regions. [Results] We used two data sets from the UCI machine learning library to examine the proposed algorithm. We found that the F values of the two data sets increased by 13.23% and 23.44% compared to traditional algorithm (UK-Means and FDBSCAN). It took the algorithm longer time to calculate the distance matrix. Therefore, the overall clustering time was only slightly shorter than the traditional algorithm. [Limitations] There was no appropriate method to define the parameter for the proposed algorithm, and the clustering time was complex. [Conclusions] The proposed algorithm could quickly determine the clustering centers and complete the clustering tasks. The value of t (the only parameter) poses much influence to the clustering results.
|
Received: 24 July 2017
Published: 29 December 2017
|
|
[1] |
李建中, 王宏志, 高宏. 大数据可用性的研究进展[J]. 软件学报, 2016, 27(7): 1605-1625.
doi: 10.13328/j.cnki.jos.005038
|
[1] |
(Li Jianzhong, Wang Hongzhi, Gao Hong.State-of-the-Art of Research on Big Data Usability[J]. Journal of Software, 2016, 27(7): 1605-1625.)
doi: 10.13328/j.cnki.jos.005038
|
[2] |
Anagnostopoulos A, Dasgupta A, Kumar R.Approximation Algorithms for Co-Clustering[C]// Proceedings of the 27th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems. ACM, 2008:201-210.
|
[3] |
Kanagal B, Deshpande A. Online Filtering, Smoothing and Probabilistic Modeling of Streaming Data[C]// Proceedings of the 24th International Conference on Data Engineering. IEEE, 2008:1160-1169.
|
[4] |
Ré C, Letchner J, Balazinksa M, et al.Event Queries on Correlated Probabilistic Streams[C]// Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data. ACM, 2008:715-728.
|
[5] |
Chau M, Cheng R, Kao B, et al.Uncertain Data Mining: An Example in Clustering Location Data [A]// Advances in Knowledge Discovery and Data Mining[M]. Springer Berlin Heidelberg, 2006: 199-204.
|
[6] |
刘位龙. 面向不确定性数据的聚类算法研究[D]. 济南: 山东师范大学, 2011.
|
[6] |
(Liu Weilong.Research on Clustering Algorithm for Uncertainty Data[D]. Ji’nan: Shandong Normal University, 2011.)
|
[7] |
Gullo F, Ponti G, Tagarelli A.Clustering Uncertain Data via K-Medoids [A]// Scalable Uncertainty Management[M]. Springer Berlin Heidelberg, 2008: 229-242.
|
[8] |
Xu H J, Li G H.Density-based Probabilistic Clustering of Uncertain Data[C]//Proceeedings of the 2008 International Conference on Computer Science and Software Engineering. 2008: 474-477.
|
[9] |
Kriegel H P, Pfeifle M.Hierarchical Density-Based Clustering of Uncertain Data[C]//Proceedings of the 5th IEEE Conference on Data Mining. 2005:689-692.
|
[10] |
Jiang B, Pei J, Tao Y, et al.Clustering Uncertain Data Based on Probability Distribution Similarity[J]. IEEE Transactions on Knowledge and Data Engineering, 2013, 25(4): 751-763.
doi: 10.1109/TKDE.2011.221
|
[11] |
潘冬明, 黄德才. 基于相对密度的不确定数据聚类算法[J]. 计算机科学, 2015, 42(11A): 72-74.
|
[11] |
(Pan Dongming, Huang Decai.Relative Density-based Clustering Algorithm over Uncertain Data[J]. Computer Science, 2015, 42(11A): 72-74.)
|
[12] |
Liu H, Zhang X, Zhang X, et al.Self-adapted Mixture Distance Measure for Clustering Uncertain Data[J]. Knowledge-Based Systems, 2017, 126: 33-47.
doi: 10.1016/j.knosys.2017.04.002
|
[13] |
Gullo F, Ponti G, Tagarelli A, et al.An Information-Theoretic Approach to Hierarchical Clustering of Uncertain Data[J]. Information Sciences, 2017,402:199-215.
doi: 10.1016/j.ins.2017.03.030
|
[14] |
迟荣华, 程媛, 朱素霞, 等. 基于快速高斯变换的不确定数据聚类算法[J]. 通信学报, 2017, 38(3): 101-111.
|
[14] |
(Chi Ronghua, Cheng Yuan, Zhu Suxia, et al.Uncertain Data Analysis Algorithm Based on Fast Gaussian Transform[J]. Journal of Communications, 2017, 38(3): 101-111)
|
[15] |
Rodriguez A, Laio A. Machine Learning.Clustering by Fast Search and Find of Density Peaks[J]. Science, 2014, 344(6191): 1492-1496.
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|