|
|
The Subject Extraction Based on Topic Segmentation and PageRank Algorithm |
Duan Xiaoli, Wang Yu |
School of Management, Dalian University of Technology, Dalian 116024, China |
|
|
Abstract Considering the completeness of subject extraction, this paper sorts the sentences with PageRank algorithm based on text theme divisions after reconstructing sentence relation map to every theme package. Then the sentence which has the maximum weight among all the texts is set to be the topics sentence. Experiments show that the topic sentence extraction algorithm has a good coverage of the full text.
|
Received: 10 November 2010
Published: 07 January 2011
|
|
[1] 王继成,武港山,周源远,等. 一种篇章结构指导的中文Web文档自动摘要方法
[J]. 计算机研究与发展,2003,40(3):398-405.
[2] Salton G, Singhal A, Mitra M, et al. Automatic Text Structing and Summarization
[J]. Information Processing and Management, 1997, 33(2):193-207.
[3] 刘娜,唐焕玲,鲁明羽. 文本线性分割方法的研究
[J]. 计算机工程与应用,2008,44(21):212-216.
[4] Hearst M A. TextTiling: A Quantitative Approach to Discourse Segmentation, UCB: S2K-93-24. 1993:33-64.
[5] Reynar J C. An Automatic Method of Finding Topic Boundaries. In: Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistic. 1994:331-333.
[6] Kozima H. Text Segmentation Based on Similarity Between Words. In: Proceedings of ACL-93. 1993: 286-288.
[7] 张云涛,龚玲,王永成. 基于综合方法的文本主题句的自动抽取
[J]. 上海交通大学学报,2006,40(5): 771-774,782.
[8] Fattah M A, Ren F. Automatic Text Summarization
[J]. International Journal of Computer Science, 2008, 3 (1): 25-28.
[9] 张培颖.基于句子特征和语义距离的文本摘要技术
[J]. 微计算机应用,2009,30(7):14-18.
[10] 马亮,何婷婷,李芳,等.以关键词抽取为核心的文摘句选择策略
[J]. 中文信息学报,2008,22(6):50-54.
[11] 马颖华,王永成,苏贵洋,等.一种基于字同现频率的汉语文本主题抽取方法
[J]. 计算机研究与发展,2003,40(6):874-878.
[12] Salton G, Singhal A, Buckley C, et al. Automatic Text Decomposition Using Text Segments and Text Themes.In: Proceedings of the 7th ACM Conference on Hypertext. New York: ACM,1996:53-65.
[13] 周昭涛,卜东波,程学旗.文本的图表示初探
[J]. 中文信息学报,2004,19(2):36-43.
[14] Mihalcea R,Tarau P. TextRank: Bringing Order into Texts. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2004), Barcelona, Spain. 2004.
[15] 薛翠芳,郭炳炎.汉语文本结构的自动分析
[J]. 情报学报,2000,19(4):319-325.
[16] 马慧芳,祈云平,杨小东.一种基于文本关系图的多文档自动摘要技术
[J]. 情报杂志,2007,26(3):67-69.
[17] Mihalcea R. Graph-based Ranking Algorithms for Sentence Extraction Applied to Text Summarization. In: Proceedings of the 42nd Annual Meeting of the Association of Computational Linguistics (ACL 2004), Barcelona, Spain. 2004.
[18] 傅间莲,陈群秀.基于连续段落相似度的主题划分算法
[J]. 计算机应用,2005,25(9):2022-2024.
[19] 何维,王宇.基于句子关系图的网页文本主题句抽取
[J]. 现代图书情报技术,2009(3):57-61.
[20] Brin S, Page L. The Anatomy of a Large-scale Hypertextual Web Search Engine
[J]. International Journal of Approximate Reasoning,1998, 156(4):134-141.
[21] 搜狗实验室资料下载-搜狐新闻数据. http://www.sogou.com/labs/dl/cs.html.
[22] 中文自然语言处理开放平台. http://www.nlp.org.cn/project/project.php?proj_id=6.
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|