[Objective] This paper aims to improve the efficiency and accuracy of the hot topic by studying the feature reduction method and clustering algorithm of the news text. [Methods] Based on the traditional TF-IDF formula, the four features are introduced to realize multi factor feature selection, including weighting of symbol, part of speech, position and length. The Ameliorated Fruit fly Optimization Algorithm(AFOA) is constructed from four aspects of coding, fitness function, adaptive step length and population fitness variance. AFOA is used to optimize the K-means initial cluster center, and the optimized K-means is used to find hot topics. Multi factor feature selection is used to identify hot topics, and hot topic ranking is achieved by using TOPSIS. [Results] Relevant experiments show that multi factor feature selection and AFOA/K-means algorithm significantly improve the clustering effect respectively, and verify the overall effectiveness of the proposed method. [Limitations] It is only applicable to Chinese news texts. [Conclusions] The proposed method can provide a new idea for the research of Chinese news hotspots discovery.
温廷新,李洋子,孙静霜. 基于多因素特征选择与AFOA/K-means的新闻热点发现方法*[J]. 数据分析与知识发现, 2019, 3(4): 97-106.
Tingxin Wen,Yangzi Li,Jingshuang Sun. News Hotspots Discovery Method Based on Multi Factor Feature Selection and AFOA/K-means. Data Analysis and Knowledge Discovery, DOI：10.11925/infotech.2096-3467.2018.0757.
Lu P, Liu S, Dong Z, et al.HSPKNN: An Effective and Practical Framework for Hot Topic Detection of Internet News[C]// Proceedings of the 7th International Conference on Computing and Convergence Technology. IEEE, 2013: 888-893.
(Gesang Duoji, Qiao Shaojie, Han Nan, et al.An Internet Public Opinion Hotspot Detection Algorithm Based on Single-Pass[J]. Journal of University of Electronic Science and Technology of China, 2015, 44(4): 599-604.)
(Chang E.Automatic Text Clustering Based on Latent Semantic Index Theory[J]. Library and Information Service, 2012, 56(11): 89-92.)
Zahedi M, Sorkhi A G.Improving Text Classification Performance Using PCA and Recall-Precision Criteria[J]. Arabian Journal for Science & Engineering, 2013, 38(8): 2095-2102.
Abdulhussain M I, Gan J Q.An Experimental Investigation on PCA Based on Cosine Similarity and Correlation for Text Feature Dimensionality Reduction[C]// Proceedings of the 7th Computer Science and Electronic Engineering Conference. IEEE, 2015: 1-4.