Please wait a minute...
New Technology of Library and Information Service  2011, Vol. Issue (11): 48-53    DOI: 10.11925/infotech.1003-3513.2011.11.08
Current Issue | Archive | Adv Search |
Research of Title Party News Identification Technology Based on Topic Sentence Similarity
Wang Zhichao1, Weng Nan2, Wang Yu3
1. Institute of Information Science & Technology, Shanghai Jiaotong University, Shanghai 200240, China;
2. School of Management & Engineering, Nanjing University, Nanjing 210093, China;
3. School of Management, Dalian University of Technology, Dalian 116024, China
Download: PDF(698 KB)   HTML  
Export: BibTeX | EndNote (RIS)      
Abstract  Concerning the issues of the more and more title party news in the Web,this paper presents a new algorithm of title party news identification. Firstly, it analyzes the composition of the news page, then puts forward an approach of news title extraction and information extraction based on the features of news page. Secondly, considering the problem of extracting coherent topic sentences from news pages, starting with the relationship matrix of sentences, it puts forward an algorithm of topic sentence extraction. Then, according to the extracted news title and the candidate set of topic sentences, it can compute the similarity value, which is the main basis for judging the title party. Finally, the experiment results show that this method is effective and feasible.
Key wordsTitle party news      News title extraction      News information extraction      Sentence similarity computing     
Received: 16 September 2011      Published: 06 January 2012
:  TP391  

Cite this article:

Wang Zhichao, Weng Nan, Wang Yu. Research of Title Party News Identification Technology Based on Topic Sentence Similarity. New Technology of Library and Information Service, 2011, (11): 48-53.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2011.11.08     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2011/V/I11/48

[1] 蒲宇达,关毅,王强. 基于数据挖掘思想的网页正文抽取方法的研究 .见: 第三届学生计算语言学研讨会论文集 ,沈阳.2006.
[2] Moorn L.Discovery in Web-Documents .In: Proceedings of the 1999 ACM SIGMOD,Philadelphia,Pennsylvania,USA.1999.
[3] Marlin L.Relational Learning of Pattern-Match Rules for Information Extraction . In: Proceedings of Workshop in Natural Language Learning.1997:3-84.
[4] 李彬,刘挺,秦兵,等.基于语义依存的汉语句子相似度计算[J]. 计算机应用研究, 2003,20(12):15-17.
[5] 车万翔,刘挺,秦兵,等.基于改进编辑距离的中文相似句子检索[J]. 高技术通讯, 2004,14(7):15-19.
[6] 杨思春,程节华,陈家骏,等.一种基于模式的汉语句子相似度计算方法[J]. 微型机与应用, 2001,20(8):52-53.
[7] 李芳,柯熙政.基于切平面的主题提取算法[J]. 计算机工程与应用, 2007(25):172-174.
[8] 石晶,胡明,戴国忠.基于小世界模型的中文文本主题分析[J]. 中文信息学报, 2007,21(3):69-75.
[9] 李楠.基于遗传算法的汉语文本主题词提取研究 .长春:吉林大学,2007.
[10] 罗永莲,秦振吉.新闻网页主题内容提取方法研究[J]. 微计算机应用, 2007,28(5):556-560.
[11] 孙承杰,关毅.基于统计的网页正文信息抽取方法的研究[J]. 中文信息学报, 2004,18(5):17-22.
[12] 王森,王宇.基于文本树结构的论文复制检测算法[J]. 现代图书情报技术, 2009(10):50-55.
[1] Xiaofeng Li,Jing Ma,Chi Li,Hengmin Zhu. Identifying Commodity Names Based on XGBoost Model[J]. 数据分析与知识发现, 2019, 3(7): 34-41.
[2] Zhongxi You,Weina Hua,Xuelian Pan. Matching Book Reviews and Essential Sentiment Lexicons with Chinese Word Segmenters[J]. 数据分析与知识发现, 2019, 3(7): 23-33.
[3] Peng Guan,Yuefen Wang,Zhu Fu. Analyzing Topic Semantic Evolution with LDA: Case Study of Lithium Ion Batteries[J]. 数据分析与知识发现, 2019, 3(7): 61-72.
[4] Jiahui Hu,An Fang,Wanqing Zhao,Chenliu Yang,Huiling Ren. Annotating Chinese E-Medical Record for Knowledge Discovery[J]. 数据分析与知识发现, 2019, 3(7): 123-132.
[5] Beibei Kong,Jing Xie,Li Qian,Zhijun Chang,Zhenxin Wu. Methodology and Tools to Enrich Sci-Tech Big Data[J]. 数据分析与知识发现, 2019, 3(7): 113-122.
[6] Fan Xuexue, Wang Zhirong, Xu Wu, Liang Yin, Ma Xiaohu. Research on Semantic Similarity Estimation Algorithm of Medical Terminology Based on Medical Ontology[J]. 现代图书情报技术, 2015, 31(12): 57-64.
[7] Ren Haiying, Yu Liting. A Multi-strategy Method for Word Sense Disambiguation Based on Wikipedia[J]. 现代图书情报技术, 2015, 31(11): 18-25.
[8] Du Kun, Liu Huailiang, Guo Lujie. Study on the Modified Method of Feature Weighting with Complex Networks[J]. 现代图书情报技术, 2015, 31(11): 26-32.
[9] Ye Chuan, Ma Jing. Research on Topic Discovery Algoritm of Multimedia Microblog Comments Information[J]. 现代图书情报技术, 2015, 31(11): 51-59.
[10] Xie Xiaqing, Wu Xu. Application of Visualization Technology for “Classic Reading” Platform[J]. 现代图书情报技术, 2015, 31(11): 96-103.
[11] He Yu, Lv Xueqiang, Xu Liping. A Chinese Term Extraction System in New Energy Vehicles Domain[J]. 现代图书情报技术, 2015, 31(10): 88-94.
[12] Du Siqi, Li Honglian, Lv Xueqiang. Research of Chinese Chunk Parsing in Application of the Product Feature Extraction[J]. 现代图书情报技术, 2015, 31(9): 26-30.
[13] Xu Deshan, Li Hui, Zhang Yunliang. A Method of Keywords Annotation Based on Linked Triples[J]. 现代图书情报技术, 2015, 31(9): 31-37.
[14] Dun Wenjie, Sun Yigang, Zhu Xianzhong. Design and Realization of Multimedia Document Structure of Internet TV[J]. 现代图书情报技术, 2015, 31(9): 82-89.
[15] Chen Shiqin, Li Wenjiang. Application of WebSocket in Library Mobile Information Service[J]. 现代图书情报技术, 2015, 31(9): 90-96.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn