Please wait a minute...
Advanced Search
数据分析与知识发现  0, Vol. Issue (): 1-     https://doi.org/10.11925/infotech. 2096-3467. 2020.0361
  本期目录 | 过刊浏览 | 高级检索 |
基于均衡段落和分话题向量的新闻热点话题检测研究
魏家泽,董诚,何彦青,刘志辉
(中国科学技术信息研究所 北京 100038)
Research of Hot News Topic Detection Based on Equalized Paragraph and Sub-topic Vector
Wei Jiaze,Dong Cheng,He Yanqing,Liu Zhihui
(Institute of Scientific and Technical Information of China  Beijing  100038)
全文: PDF (651 KB)  
输出: BibTeX | EndNote (RIS)      
摘要 

[目的]通过新闻热点话题检测技术提取热点新闻话题,减轻用户的新闻阅读压力。

[方法]通过均衡段落的位置加权方式在TF-IDF方法基础上进行关键词提取;将K-Means聚类作为基方法,在分层聚类中引入分话题向量完成话题聚类;提取标题高频词实现话题描述。

[结果]WTF-IDF方法在关键词抽取数为3时与先前最优位置加权TF-IDF方法相比F1值提升3.19%;基于WTF-IDF与分话题向量的分层聚类与分层TF-IDF的K-Means聚类相比P值提升3.39%。

[局限]关键词抽取未考虑短语形式;分层聚类方法增加了算法时间复杂度。

[结论]本文提出的关键词抽取和分层聚类方法可以改善新闻热点话题检测效果,话题描述得到的话题短语也达到了一定的代表性与可读性。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
关键词 均衡段落分话题向量热点话题检测分层聚类     
Abstract

[Objective] The hot news topic detection technology is used to extract hot news topics to alleviate the pressure of users' news reading.

[Methods] Keyword extraction is based on the TF-IDF method by weighting the position of balanced paragraphs; Taking K-Means clustering as the base method, topic clustering is completed by introducing sub-topic vectors in hierarchical clustering; extract title’s high frequency words to achieve topic description.

[Results] Compared with TF-IDF method, F1 of WTF-IDF method is increased by 5.4 when keyword extraction number is 3; Compared with single-layer K-Means clustering, hierarchical clustering based on WTF-IDF and sub-topic vector increased by 5.27 percentage points. [Limitations] Phrases are not considered in keyword extraction; hierarchical clustering methods increase the time complexity.

[Conclusions] Our methods of keywords extraction and hierarchical clustering improved the effect of hot topic detection, and the topic phrases obtained from topic descriptions have reached a certain degree of representativeness and readability.


Key words Equalized paragraph    Sub-topic vector    Hot topic detection    Hierarchical clustering
     出版日期: 2020-07-10
ZTFLH:  TP391  
  G250  
引用本文:   
魏家泽, 董诚, 何彦青, 刘志辉. 基于均衡段落和分话题向量的新闻热点话题检测研究 [J]. 数据分析与知识发现, 0, (): 1-.
Wei Jiaze, Dong Cheng, He Yanqing, Liu Zhihui. Research of Hot News Topic Detection Based on Equalized Paragraph and Sub-topic Vector . Data Analysis and Knowledge Discovery, 0, (): 1-.
链接本文:  
http://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech. 2096-3467. 2020.0361      或      http://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y0/V/I/1
[1] 魏家泽,董诚,何彦青,刘志辉,彭柯芸. 基于均衡段落和分话题向量的新闻热点话题检测研究*[J]. 数据分析与知识发现, 2020, 4(10): 70-79.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn