[Objective] This paper proposes a model to detect the topics of trending news stories, aiming to improve user experience of news reading.[Methods] We modified the TF-IDF method with the weighting of balanced paragraphs (WTF-IDF). We also improved the K-means clustering model with sub-topic vectors in hierarchical clustering. Finally, we extracted high frequency words from titles with the new model.[Results] The F1 value of our model was 5.4% higher than the TF-IDF method (with three extracted keywords). The hierarchical clustering accuracy based on WTF-IDF and sub-topic vector was 3.1% higher than the single-layer K-means clustering.[Limitations] Our model does not include phrases extraction method and the hierarchical clustering method is complex.[Conclusions] The proposed method could effectively detect topics of trending news reports.
( Lei Zhen, Wu Lingda, Lei Lei, et al. Incremental K-means Method Based on Initialization of Cluster Centers and Its Application in News Event Detection[J]. Journal of the China Society for Scientific and Technical Information, 2006,25(3):289-295.)
( Chang Yaocheng, Zhang Yuxiang, Wang Hong, et al. Features Oriented Survey of State-of-the-Art Keyphrase Extraction Algorithms[J]. Journal of Software, 2018,29(7):2046-2070.)
Jones K S. A Statistical Interpretation of Term Specificity and Its Application in Retrieval[J]. Journal of Documentation, 1972,28(1):11-21.
Mihalcea R, Tarau P. TextRank: Bringing Order into Texts[C]// Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing. 2004: 404-411.
Wan X, Xiao J. Single Document Keyphrase Extraction Using Neighborhood Knowledge[C]//Proceedings of the 23rd National Conference on Artificial Intelligence. 2008: 855-860.
Florescu C, Caragea C. A Position-Biased PageRank Algorithm for Keyphrase Extraction[C]//Proceedings of the 31st American Association for Artificial Intelligence. 2017.
Bougouin A, Boudin F, Béatrice D. TopicRank: Graph-Based Topic Ranking for Keyphrase Extraction[C]//Proceedings of the 6th International Joint Conference on Natural Language Processing. 2013: 543-551.
Florescu C, Caragea C. A New Scheme for Scoring Phrases in Unsupervised Keyphrase Extraction[C]//Proceedings of the 39th European Conference on Information Retrieval. 2017: 477-483.
Wang R, Liu W, McDonald C. Corpus-independent Generic Keyphrase Extraction Using Word Embedding Vectors[C]// Proceedings of the 2014 Software Engineering Research Conference. 2014.
Lu Y, Lu S, Fotouhi F, et al. FGKA: A Fast Genetic K-means Clustering Algorithm[C]//Proceedings of the 2004 ACM Symposium on Applied Computing. 2004: 622-623.
Gong Z, Jia Z, Luo S, et al. An Adaptive Topic Tracking Approach Based on Single-Pass Clustering with Sliding Time Window[C]//Proceedings of the 2011 International Conference on Computer Science and Network Technology. 2011: 1311-1314.
Guan R, Shi X, Marchese M, et al. Text Clustering with Seeds Affinity Propagation[J]. IEEE Transactions on Knowledge and Data Engineering, 2011,23(4):627-637.
Zheng L, Li L, Hong W, et al. PENETRATE: Personalized News Recommendation Using Ensemble Hierarchical Clustering[J]. Expert Systems with Applications, 2013,40(6):2127-2136.
石正新. 网络新闻热点话题检测分析与趋势研究[D]. 北京: 首都经济贸易大学, 2018.
( Shi Zhengxin. Hot Topics Detection Analysis and Trend Research on Network News[D]. Beijing: Capital University of Economics and Business, 2018.)
( Peng Nanyun, Wang Houfeng, Ling Chentian. Event Mining in On-line News Based on Hierarchical Clustering[C]//Proceedings of the 11th China National Conference on Computational Linguistics. 2011: 487-492.)
( Gu Wanrong, Dong Shoubin, He Jinchao, et al. A News Recommendation Method Based on Two-Fold Clustering[J]. Journal of South China University of Technology (Natural Science Edition), 2014,42(7):15-20, 32.)
谢晓东. 基于LDA融合模型和多层聚类的新闻话题检测[D]. 天津:天津大学, 2016.
( Xie Xiaodong. News Topic Detection Based on LDA Fusion Model and Multi-layer Clustering[D]. Tianjin: Tianjin University, 2016.)
( Dai Xiang, Huang Xifeng, Tang Rui, et al. Subtopic Detection Algorithm Based on Hierarchical Clustering[J]. Journal of South China University of Technology (Natural Science Edition), 2019,47(8):84-95.)
( Yang Jie, Ji Duo, Cai Dongfeng, et al. Keyword Extraction in Multi-Document Based on TextRank Technology[C]//Proceedings of the 4th National Conference on Information Retrieval and Content Security (Part 1), 2008: 404-411.)