【目的】针对CFSFDP(Clustering by Fast Search and Find of Density Peaks)算法利用局部密度和距离的乘积选择聚类中心而导致聚类结果不理想的问题进行改进。【方法】提出一种基于粒子群算法的CFSFDP算法, 通过粒子群算法寻找CFSFDP算法中的最佳局部密度和距离阈值, 得到相对较高的局部密度和距离的聚类中心, 减少离散点对数据中心选取的影响, 并在某高考咨询平台提供的考生问题库中随机选取数据集进行试验。【结果】实验结果表明, 在不同的数据集中, 本文算法相对于基本的CFSFDP算法在准确率、召回率、F值上均有明显提高。【局限】文本处理时没有考虑语义关系。【结论】本文方法有很好的聚类效果, 应用在高考咨询库中能够有效地减轻被咨询方的工作量并且帮助快速回答考生的问题。
[Objective] This paper aims to improve the un-satisfactory performance of CFSFDP (clustering by fast search and find of density peaks) algorithm with the help of based on particle swarm optimization. [Methods] First, we determined the cluster centers by searching optimal local density and distance thresholds to increase the accuracy of results. These clustering centers have relatively high local density and distance, which reduced the influence of discrete points. Then, we examined the proposed method on a randomly selected dataset from the question-answer database of a college entrance exam consulting platform. [Results] The modified CFSFDP algorithm had better performance than the original one. [Limitations] We did not include the semantic relations to process the texts. [Conclusions] The proposed algorithm could achieve good clustering results, and improve the efficiency of the consulting personnel .
(Sun Jigui, Liu Jie, Zhao Lianyu.Clustering Algorithms Research[J]. Journal of Software, 2008, 19(1): 48-61.)
[3]
史梦洁. 文本聚类算法综述[J]. 现代计算机, 2014(2): 3-6.
[3]
(Shi Mengjie.Summary of Text Clustering Algorithms[J]. Modern Computer, 2014(2): 3-6.)
[4]
Rodriguez A, Laio A.Clustering by Fast Search and Find of Density Peaks[J]. Science, 2014, 344(6191): 1492-1496.
doi: 10.1126/science.1242072
[5]
张文开. 基于密度的层次聚类算法研究[D]. 合肥: 中国科学技术大学, 2015.
[5]
(Zhang Wenkai.Research on Density- based Hierarchical Clustering Algorithm[D]. Hefei: University of Science and Technology of China, 2015.)
[6]
Mehmood R, Bie R, Dawood H, et al.Fuzzy Clustering by Fast Search and Find of Density Peaks[C]//Proceedings of the 2015 International Conference on Identification, Information, and Knowledge in the Internet of Things. 2015.
(Ma Chunlai, Shan Hong, Ma Tao.Improved Density Peaks Based Clustering Algorithm with Strategy Choosing Cluster Center Automatically[J]. Computer Science, 2016, 43(7): 255-258.)
doi: 10.11896/j.issn.1002-137X.2016.7.046
[8]
Kennedy J, Eberhart R.Partical Swarm Optimization[C]// Proceeding of the 1995 IEEE International Conference on Neural Networks. 1995.
[9]
刘建华. 粒子群算法的基本理论及其改进研究[D]. 长沙: 中南大学, 2009.
[9]
(Liu Jianhua.The Basic Theory of Partical Swarm Optimization and Its Improvement[D]. Changsha: Central South University,2009.)
(Huang Chenghui, Yin Jian, Hou Fang.A Text Similarity Measurement Combining Word Semantic Information with TF-IDF Method[J]. Chinese Journal of Computer, 2011, 34(5): 856-864.)
doi: 10.3724/SP.J.1016.2011.00856
[11]
Aizawa A.An Information-treoretic Perspective of TF-IDF Measures[J]. Information Processing and Management, 2003, 39(1): 45-65.
doi: 10.1016/S0306-4573(02)00021-3
[12]
Salton G, Buckley C.Term Weight Approaches in Automatic Text Retrieval[J]. Information Processing and Management, 1988, 24(5): 513-523.
doi: 10.1016/0306-4573(88)90021-0
[13]
谭静. 基于向量空间模型的文本相似度算法研究[D]. 成都: 西南石油大学, 2015.
[13]
(Tan Jing.Research on Text Similarity Algorithm Based on Vector Space Modal[D]. Chengdu: Southwest Petroleum University, 2015.)
(Zhao Junjie, Hu Xuegang.Simility Calculation Based on Text Classification[J]. Microcomputer Application, 2008, 24(12): 46-47.)
doi: 10.3969/j.issn.1007-757X.2008.12.016
[15]
Halkidi M, Batistakis Y, Vazirgiannis M.On Clustering Validation Techniques[J]. Journal of Intelligent Information Systems, 2015, 17(2-3): 107-145.
[16]
Liang J, Bai L, Dang C, et al.The K-Means-Type Algorithms Versus Imbalanced Data Distributions[J]. IEEE Transactions on Fuzzy Systems, 2012, 20(4): 728-745.
doi: 10.1109/TFUZZ.2011.2182354
[17]
张鸣. 符号数据聚类评价指标研究[D]. 太原: 山西大学, 2013.
[17]
(Zhang Ming.Study on the Evaluation Index Symbol of Data Clustering[D]. Taiyuan: University of Shanxi, 2013.)
[18]
Franti P, Virmajoki O, Hautamaki V.Fast Agglomerative Clustering Using a K-nearest Neighbor Graph[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2006, 28(11): 1875-1881.
doi: 10.1109/TPAMI.2006.227
pmid: 17063692
[19]
段明秀. 层次聚类算法的研究及应用[D]. 长沙:中南大学, 2009.
[19]
(Duan Mingxiu.Research and Application of Hierarchical Clustering Algorithm[J]. Changsha: Central South University, 2009.)