Abstract
[Objective] The hot news topic detection technology is used to extract hot news topics to alleviate the pressure of users' news reading.
[Methods] Keyword extraction is based on the TF-IDF method by weighting the position of balanced paragraphs; Taking K-Means clustering as the base method, topic clustering is completed by introducing sub-topic vectors in hierarchical clustering; extract title’s high frequency words to achieve topic description.
[Results] Compared with TF-IDF method, F1 of WTF-IDF method is increased by 5.4 when keyword extraction number is 3; Compared with single-layer K-Means clustering, hierarchical clustering based on WTF-IDF and sub-topic vector increased by 5.27 percentage points. [Limitations] Phrases are not considered in keyword extraction; hierarchical clustering methods increase the time complexity.
[Conclusions] Our methods of keywords extraction and hierarchical clustering improved the effect of hot topic detection, and the topic phrases obtained from topic descriptions have reached a certain degree of representativeness and readability.
|