Please wait a minute...
Advanced Search
数据分析与知识发现  2018, Vol. 2 Issue (7): 81-88     https://doi.org/10.11925/infotech.2096-3467.2017.1333
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
一种基于Huffman树的FCM聚类算法*
肖满生, 周丽娟(), 文志诚
湖南工业大学计算机学院 株洲 412007
A Fuzzy C-Means Algorithm Based on Huffman Tree
Xiao Mansheng, Zhou Lijuan(), Wen Zhicheng
School of Computer Science, Hunan University of Technology, Zhuzhou 412007, China
全文: PDF (3038 KB)   HTML ( 2
输出: BibTeX | EndNote (RIS)      
摘要 

目的】解决传统的FCM算法随机选取初始聚类中心、对噪声敏感、只适合均衡分布的样本聚类问题。【方法】提出一种基于Huffman树的FCM新算法, 该算法设计一种高密度样本的相异度矩阵构建Huffman树并获取初始聚类中心, 进而给出非归一化约束的样本隶属度函数。【结果】通过人造样本及图像数据集、UCI数据集的实验对比结果表明, 算法在聚类精度、运算时间等指标上比基于高斯核FCM算法及传统FCM算法更有优势。【局限】仅凭实验或经验确定样本密度调节因子$\beta $, 尚缺乏理论依据。【结论】本研究在现实生活中对含有大量噪声样本及样本分布非均衡的数据集聚类有一定的实际应用价值。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
肖满生
周丽娟
文志诚
关键词 样本密度相异度Huffman树隶属度    
Abstract

[Objective] This paper tries to solve the issues facing traditional FCM algorithm, such as randomly choosing initial cluster center, sensitive to noise, and only capable of clustering the equally distributed samples. [Methods] We proposed a new FCM clustering algorithm based on Huffman tree with dissimilarity degree matrix of high density sample sets. The new algorithm could get initial clustering centers, and then generate the membership function of the non-normalized constraint samples. [Results] We examined the proposed algorithm with man-made samples, images, and UCI datasets. The clustering accuracy and the computation time of the new algorithm were better than algorithms based on the Gauss kernel or traditional FCM. [Limitations] The $\beta $ of the sample density adjustment factor was decided by experiment or experience without theoretical supports. [Conclusions] The proposed algorithm could be used for clustering data sets with high level of noise and distributed unequally.

Key wordsDensity of Samples    Dissimilarity Degree    Huffman Tree    Membership
收稿日期: 2017-12-28      出版日期: 2018-08-15
ZTFLH:  TP391 G35  
基金资助:*本文系2018年湖南省自然科学基金项目“非归一化约束下模糊C均值聚类及其在图像处理中的应用研究”(项目编号: 2018JJ4068)和2016年湖南省教育厅科研项目“基于数据融合的网络态势感知技术研究”(项目编号: 16C0480)的研究成果之一
引用本文:   
肖满生, 周丽娟, 文志诚. 一种基于Huffman树的FCM聚类算法*[J]. 数据分析与知识发现, 2018, 2(7): 81-88.
Xiao Mansheng,Zhou Lijuan,Wen Zhicheng. A Fuzzy C-Means Algorithm Based on Huffman Tree. Data Analysis and Knowledge Discovery, 2018, 2(7): 81-88.
链接本文:  
http://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2017.1333      或      http://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2018/V2/I7/81
  相异度矩阵的计算过程
  Huffman树的构建过程
  人造仿真样本数据集及其聚类划分结果
样本子集 均值坐标 协方差矩阵 样本数
第一类 (4.5,6.2) 18.7 1.5 200
1.5 19.6
第二类 (10.9,4.6) 2.3 0.3 50
-0.3 1.8
第三类 (10.8,7.6) 2.3 -0.5 50
-0.15 2.5
  人造仿真样本数据集基本情况表
  “笑脸”图像样本及其聚类中心获取实验结果
算法 Max(%) Min(%) Ave(%) Run(ms)
GFCM 51.7 42.6 47.15 33.6
GKFCM 87.40 75.33 81.37 43.4
HFCM 94.5 94.5 94.5 30.3
  人造仿真样本数据集三种算法聚类结果对比
算法 Max(%) Min(%) Ave(%) Run(ms)
GFCM 82.67 46.33 64.5 123.4
GKFCM 84.66 54.22 69.44 165.5
HFCM 93.67 93.67 93.67 132.4
  “笑脸”图像数据集三种算法聚类结果对比
算法 Max(%) Min(%) Ave(%) Run(ms)
GFCM 88.67 50.33 69.50 3.34
GKFCM 82.62 75.82 79.22 4.88
HFCM 88.67 88.67 88.67 2.76
  IRIS数据集三种算法的聚类结果对比
  IRIS数据集三算法获取的聚类中心距离对比
[1] Chaturvedi K, Patel R, Swami D K.Fuzzy C-Means Based Inference Mechanism for Association Rule Mining: A Clinical Data Mining Approach[J]. International Journal of Advanced Computer Science and Applications, 2015, 6(6): 103-110.
[2] He L H, Wen Y, Wan M, et al.Multi-Channel Features Based Automated Segmentation of Diffusion Tensor Imaging Using an Improved FCM with Spatial Constraints[J]. Neurocomputing, 2014, 137(5): 107-114.
doi: 10.1016/j.neucom.2013.09.051
[3] Li M, Gao Z, Pei Z, et al.Fuzzy Markov Model Based on FCM for Electromagnetic Environment Parameters Prediction[J]. Journal of Information and Computational Science, 2015, 12(5): 1713-1722.
doi: 10.12733/jics20105534
[4] Kalam R, Thomas C, Rahiman M A.Gaussian Kernel Based Fuzzy C-Means Clustering Algorithm for Image Segmentation[J]. Computer Science & Information Technology, 2016, 6(4): 47-56.
doi: 10.5121/csit.2016.60405
[5] Bezdek J C.Fuzzy Mathematics in Pattern Classification PHP Thesis[D]. Applied Math Center, Ithaca, Cornell University, 1973: 67-84.
[6] Dunn J C.A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters[J]. Journal of Cybernetics, 1973(1): 32-57
doi: 10.1080/01969727308546046
[7] Oh S, Ahn C W, Jeon M.A New Evolutionary Approach to Cluster Validation Index[J]. Journal of Computational and Theoretical Nanoscience, 2010, 7(5): 806-812.
doi: 10.1166/jctn.2010.1424
[8] Lin B, Wen W, Liu S, et al.Incremental Kernel Fuzzy C-Means with Optimizing Cluster Center Initialization and Delivery[J]. Kybernetes, 2016, 45(8): 1273-1291.
doi: 10.1108/K-08-2015-0209
[9] Zhang B, Qin S, Wang W, et al.Data Stream Clustering Based on Fuzzy C-Mean Algorithm and Entropy Theory[J]. Signal Processing, 2016: 111-116.
doi: 10.1016/j.sigpro.2015.10.014
[10] Kannan S R, Ramathilagam S, Chung P C.Effective Fuzzy C-Means Clustering Algorithms for Data Clustering Problems[J]. Expert Systems with Applications, 2012, 39(7): 6292-6300.
doi: 10.1016/j.eswa.2011.11.063
[11] Mohamad F, Nosratallah F, Mohammad T.Parameter Optimization of Improved Fuzzy C-Means Clustering Algorithm for Brain MR Image Segmentation[J]. Journal Engineering Applications of Artificial Intelligence, 2010, 23(2): 160-168.
doi: 10.1016/j.engappai.2009.10.002
[12] Usman Q.A Dissimilarity Measure Based Fuzzy C-Means (FCM) Clustering Algorithm[J]. Journal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology, 2014, 26(1): 229-238.
doi: 10.3233/IFS-120730
[13] 孟海东, 马娜娜, 宋宇晨, 等. 基于密度函数加权的模糊C均值聚类算法研究[J]. 计算机工程与应用, 2012, 48(27): 123-127.
doi: 10.3778/j.issn.1002-8331.2012.27.026
[13] (Meng Haidong, Ma Nana, Song Yuchen, et al.Research on Fuzzy C-Means Clustering Algorithm Based on Density Function Weighted[J]. Computer Engineering and Applications, 2012, 48(27): 123-127.)
doi: 10.3778/j.issn.1002-8331.2012.27.026
[14] Zhu C J, Yang S Z, Zhao Q, et al.Robust Semi-Supervised Kernel-FCM Algorithm Incorporating Local Spatial Information for Remote Sensing Image Classification[J]. Journal of the Indian Society of Remote Sensing, 2014, 42(1): 35-49.
doi: 10.1007/s12524-013-0296-x
[15] Zhang Q, Lu J, Wei H, et al.Dynamic Hand Gesture Segmentation Method Based on Unequai-Probabilities Background Difference and Improved FCM Algorithm[J]. International Journal of Innovative Computing, Information and Control, 2015, 11(5): 1823-1834
[16] Siarry P, Oulhadj H.A Benaichouche Multiobjective Improved Spatial Fuzzy C-Means Clustering for Image Segmentation Combining Pareto-Optimal Clusters[J]. Journal of Heuristics, 2016, 22(4): 1-22.
doi: 10.1007/s10732-015-9300-7
[17] Bache K, Lichman M. UCI Machine Learning Repository [EB/OL]. [2017-10-10]. .
[1] 刘开第,庞彦军,眭辉强. 信息系统服务质量模糊综合评价的一种新算法*[J]. 现代图书情报技术, 2008, 24(5): 89-93.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn