[Objective] The hot news topic detection technology is used to extract hot news topics to alleviate the pressure of users' news reading.
[Methods] Keyword extraction is based on the TF-IDF method by weighting the position of balanced paragraphs; Taking K-Means clustering as the base method, topic clustering is completed by introducing sub-topic vectors in hierarchical clustering; extract title’s high frequency words to achieve topic description.
[Results] Compared with TF-IDF method, F1 of WTF-IDF method is increased by 5.4 when keyword extraction number is 3; Compared with single-layer K-Means clustering, hierarchical clustering based on WTF-IDF and sub-topic vector increased by 5.27 percentage points. [Limitations] Phrases are not considered in keyword extraction; hierarchical clustering methods increase the time complexity.
[Conclusions] Our methods of keywords extraction and hierarchical clustering improved the effect of hot topic detection, and the topic phrases obtained from topic descriptions have reached a certain degree of representativeness and readability.
魏家泽, 董诚, 何彦青, 刘志辉. 基于均衡段落和分话题向量的新闻热点话题检测研究
[J]. 数据分析与知识发现, 0, (): 1-.
Wei Jiaze, Dong Cheng, He Yanqing, Liu Zhihui. Research of Hot News Topic Detection Based on Equalized Paragraph and Sub-topic Vector
. Data Analysis and Knowledge Discovery, 0, (): 1-.