|
|
Research of Title Party News Identification Technology Based on Topic Sentence Similarity |
Wang Zhichao1, Weng Nan2, Wang Yu3 |
1. Institute of Information Science & Technology, Shanghai Jiaotong University, Shanghai 200240, China; 2. School of Management & Engineering, Nanjing University, Nanjing 210093, China; 3. School of Management, Dalian University of Technology, Dalian 116024, China |
|
|
Abstract Concerning the issues of the more and more title party news in the Web,this paper presents a new algorithm of title party news identification. Firstly, it analyzes the composition of the news page, then puts forward an approach of news title extraction and information extraction based on the features of news page. Secondly, considering the problem of extracting coherent topic sentences from news pages, starting with the relationship matrix of sentences, it puts forward an algorithm of topic sentence extraction. Then, according to the extracted news title and the candidate set of topic sentences, it can compute the similarity value, which is the main basis for judging the title party. Finally, the experiment results show that this method is effective and feasible.
|
Received: 16 September 2011
Published: 06 January 2012
|
|
[1] 蒲宇达,关毅,王强. 基于数据挖掘思想的网页正文抽取方法的研究 .见: 第三届学生计算语言学研讨会论文集 ,沈阳.2006. [2] Moorn L.Discovery in Web-Documents .In: Proceedings of the 1999 ACM SIGMOD,Philadelphia,Pennsylvania,USA.1999. [3] Marlin L.Relational Learning of Pattern-Match Rules for Information Extraction . In: Proceedings of Workshop in Natural Language Learning.1997:3-84. [4] 李彬,刘挺,秦兵,等.基于语义依存的汉语句子相似度计算[J]. 计算机应用研究, 2003,20(12):15-17. [5] 车万翔,刘挺,秦兵,等.基于改进编辑距离的中文相似句子检索[J]. 高技术通讯, 2004,14(7):15-19. [6] 杨思春,程节华,陈家骏,等.一种基于模式的汉语句子相似度计算方法[J]. 微型机与应用, 2001,20(8):52-53. [7] 李芳,柯熙政.基于切平面的主题提取算法[J]. 计算机工程与应用, 2007(25):172-174. [8] 石晶,胡明,戴国忠.基于小世界模型的中文文本主题分析[J]. 中文信息学报, 2007,21(3):69-75. [9] 李楠.基于遗传算法的汉语文本主题词提取研究 .长春:吉林大学,2007. [10] 罗永莲,秦振吉.新闻网页主题内容提取方法研究[J]. 微计算机应用, 2007,28(5):556-560. [11] 孙承杰,关毅.基于统计的网页正文信息抽取方法的研究[J]. 中文信息学报, 2004,18(5):17-22. [12] 王森,王宇.基于文本树结构的论文复制检测算法[J]. 现代图书情报技术, 2009(10):50-55. |
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|