Please wait a minute...
Data Analysis and Knowledge Discovery  0, Vol. Issue (): 1-    DOI: 10.11925/infotech. 2096-3467. 2020.0361
Current Issue | Archive | Adv Search |
Research of Hot News Topic Detection Based on Equalized Paragraph and Sub-topic Vector
Wei Jiaze,Dong Cheng,He Yanqing,Liu Zhihui
(Institute of Scientific and Technical Information of China  Beijing  100038)
Export: BibTeX | EndNote (RIS)      

[Objective] The hot news topic detection technology is used to extract hot news topics to alleviate the pressure of users' news reading.

[Methods] Keyword extraction is based on the TF-IDF method by weighting the position of balanced paragraphs; Taking K-Means clustering as the base method, topic clustering is completed by introducing sub-topic vectors in hierarchical clustering; extract title’s high frequency words to achieve topic description.

[Results] Compared with TF-IDF method, F1 of WTF-IDF method is increased by 5.4 when keyword extraction number is 3; Compared with single-layer K-Means clustering, hierarchical clustering based on WTF-IDF and sub-topic vector increased by 5.27 percentage points. [Limitations] Phrases are not considered in keyword extraction; hierarchical clustering methods increase the time complexity.

[Conclusions] Our methods of keywords extraction and hierarchical clustering improved the effect of hot topic detection, and the topic phrases obtained from topic descriptions have reached a certain degree of representativeness and readability.

Key words Equalized paragraph      Sub-topic vector      Hot topic detection      Hierarchical clustering      
Published: 10 July 2020
ZTFLH:  TP391  

Cite this article:

Wei Jiaze, Dong Cheng, He Yanqing, Liu Zhihui. Research of Hot News Topic Detection Based on Equalized Paragraph and Sub-topic Vector . Data Analysis and Knowledge Discovery, 0, (): 1-.

URL: 2096-3467. 2020.0361     OR

[1] Wei Jiaze,Dong Cheng,He Yanqing,Liu Zhihui,Peng Keyun. Detecting News Topics Based on Equalized Paragraph and Sub-topic Vector[J]. 数据分析与知识发现, 2020, 4(10): 70-79.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938