Data Analysis and Knowledge Discovery  0, Vol. Issue (): 1-    DOI: 10.11925/infotech. 2096-3467. 2020.0361
Research of Hot News Topic Detection Based on Equalized Paragraph and Sub-topic Vector
Wei Jiaze,Dong Cheng,He Yanqing,Liu Zhihui
(Institute of Scientific and Technical Information of China  Beijing  100038)
[Objective] The hot news topic detection technology is used to extract hot news topics to alleviate the pressure of users' news reading.

[Methods] Keyword extraction is based on the TF-IDF method by weighting the position of balanced paragraphs; Taking K-Means clustering as the base method, topic clustering is completed by introducing sub-topic vectors in hierarchical clustering; extract title’s high frequency words to achieve topic description.

[Results] Compared with TF-IDF method, F1 of WTF-IDF method is increased by 5.4 when keyword extraction number is 3; Compared with single-layer K-Means clustering, hierarchical clustering based on WTF-IDF and sub-topic vector increased by 5.27 percentage points. [Limitations] Phrases are not considered in keyword extraction; hierarchical clustering methods increase the time complexity.

[Conclusions] Our methods of keywords extraction and hierarchical clustering improved the effect of hot topic detection, and the topic phrases obtained from topic descriptions have reached a certain degree of representativeness and readability.

Key words Equalized paragraph      Sub-topic vector      Hot topic detection      Hierarchical clustering      
Published: 10 July 2020
