Please wait a minute...
New Technology of Library and Information Service  2009, Vol. 3 Issue (3): 57-61    DOI: 10.11925/infotech.1003-3513.2009.03.10
Current Issue | Archive | Adv Search |
Extracting Topic Sentences form Web Text Based on Sentence Relationship Map
He Wei  Wang Yu
(School of Management, Dalian University of Technology, Dalian 116024, China)
Download: PDF (452 KB)  
Export: BibTeX | EndNote (RIS)      
Abstract  

Concerning the issues of Web text with little structure information and big noise, sentences are viewed as nodes and similarities between them are viewed as edges, a relationship map is used to describe the relationship between sentences. Topic sentences of a text can be got through searching the nodes which have most of edges. Using the semantic dictionary, sentence similarity is defined as its semantic similarity to address the problem of low word frequency similarity of short text. An internet public campus is chosen to take a test, 80.6% acceptability have been achieved.

Key wordsTopic Sentence      Sentence Relationship Map      Sentence similarity     
Received: 29 December 2008      Published: 25 March 2009
ZTFLH: 

TP391

 
Corresponding Authors: Wang Yu     E-mail: ywang@dlut.edu.cn
About author:: He Wei,Wang Yu

Cite this article:

He Wei,Wang Yu. Extracting Topic Sentences form Web Text Based on Sentence Relationship Map. New Technology of Library and Information Service, 2009, 3(3): 57-61.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2009.03.10     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2009/V3/I3/57

[1] 张云涛,龚玲,王永成.基于综合方法的文本主题句的自动抽取[J].上海交通大学学报,2006,40(5): 771-774,782.
[2] 马颖华,王永成,苏贵洋,等.一种基于字同现频率的汉语文本主题抽取方法[J].计算机研究与发展,2003,40(6):874-878.
[3] 廉站俊,吕学强,张玉杰,等.基于句子相似度计算的信息抽取[J].现代图书情报技术,2007 (6):38-41.
[4] 孙宏纲,陆余良.中文博客主题情感句自动抽取研究[J].计算机工程与应用,2008,44(20):165-168,221.
[5] 陈炯,张永奎.基于加权信息论的突发事件新闻主题抽取方法[J].计算机应用,2008,28:150-151.
[6] 蔡巍,王永成,尹中航.一种无词典的从Web新闻页面抽取主题的算法[J].情报学报,2008,27(1):12-17.
[7] Salton G, Allan J. Automatic Text Decomposition and Structuring [J]. Information Processing and Management,1996,32(2):127-138.
[8] Salton G, Singhal A, Buckley C, et al. Automatic Text Decomposition Using Text Segments and Text Themes[C].In:Proceedings of the Seventh ACM Conference on Hypertext. NY: ACM New York,1996:53-65.
[9] Mitra M, Singhal A, Buckley C. Automatic Text Summarization by Paragraph Extraction[C]. In:Proceedings of ACL’97/EACL’97. Worksho Pon Intelligent Scaleable Text Summarization, Madrid. NJ: Assoc. Comput. Linguistics, 1997:39-46.
[10] 薛翠芳,郭炳炎.汉语文本结构的自动分析[J].情报学报,2000,19(4):319-325.
[11] Chatterjee N. A Statistical Approach for Similarity Measurement between Sentences for EBMT[C]. In:Proceedings of Symposium on Translation Support Systems STRANS-2001, 2001.
[12] Chen K, Fan XZ, Liu J, et al. A New Approach to Compute the Semantic Similarity of Chinese Question Sentence[C].In:Proceedings of the Sixth International Conference on Machine Learning and Cybernetics(ICMLC 2007), Hong Kong. NJ:IEEE, 2007:1830-1835.
[13] Li Y, McLean D, Bandar Z A, et al. Sentence Similarity Based on Semantic Nets and Corpus Statistics[J].IEEE transactions on knowledge and data engineering, 2006,18(8):1138-1150.
[14] Che W X, Jiang J M, Su Z, et al. Improved-Edit-Distance Kernel for Chinese Relation Extraction[C].In:The Second International Joint Conference on Natural Language Processing (IJCNLP-05),Jeju Korea. Springer,2005:134-139.
[15] 哈尔滨工业大学信息检索研究室.同义词词林(扩展版)[EB/OL].[2008-05-19].http://www.ir-lab.org/.
[16] 搜狗实验室.文本分类语料库:精简版(tar.gz格式)[DB/OL].[2008-03-18]. http://www.sogou.com/labs/dl/c.html.
[17] 张华平. ICTCLAS3.0 API[CP].[ 2008-03-17]. http://www.nlp.org.cn/project/project.php?proj_id=6.

[1] Wang Zixuan,Le Xiaoqiu,He Yuanbiao. Recognizing Core Topic Sentences with Improved TextRank Algorithm Based on WMD Semantic Similarity[J]. 数据分析与知识发现, 2017, 1(4): 1-8.
[2] Yuan Dong, Xiong Jing, Liu Yongge. Research on Example-based Machine Translation for Oracle Bone Inscriptions[J]. 现代图书情报技术, 2012, 28(5): 48-54.
[3] Wang Zhichao, Weng Nan, Wang Yu. Research of Title Party News Identification Technology Based on Topic Sentence Similarity[J]. 现代图书情报技术, 2011, (11): 48-53.
[4] Wang Sen,Wang Yu. Algorithm of the Text Copy Detection Based on Text Structure Tree[J]. 现代图书情报技术, 2009, (10): 50-55.
[5] Lian Zhanjun,Lv Xueqiang,Zhang Yujie,Shi Shuicai. Information Extraction Based on Calculation of Sentence Similarity[J]. 现代图书情报技术, 2007, 2(6): 38-41.
[6] Hua Bolin. Article Novelty Evaluation System Based on Sentence Matching[J]. 现代图书情报技术, 2007, 2(11): 40-44.
[7] Qin Xinguo. Research on the Copy Detection Based on the Similarity of Sentences[J]. 现代图书情报技术, 2007, 2(11): 63-66.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn