Please wait a minute...
New Technology of Library and Information Service  2010, Vol. 26 Issue (12): 34-39    DOI: 10.11925/infotech.1003-3513.2010.12.06
article Current Issue | Archive | Adv Search |
The Subject Extraction Based on Topic Segmentation and PageRank Algorithm
Duan Xiaoli, Wang Yu
School of Management, Dalian University of Technology, Dalian 116024, China
Export: BibTeX | EndNote (RIS)      

Considering the completeness of subject extraction, this paper sorts the sentences with PageRank algorithm based on text theme divisions after reconstructing sentence relation map to every theme package. Then the sentence which has the maximum weight among all the texts is set to be the topics sentence. Experiments show that the topic sentence extraction algorithm has a good coverage of the full text.

Key wordsTopic      sentence      extraction      Subject      segmenting      Sentence      relation      map      PageRank      algorithm     
Received: 10 November 2010      Published: 07 January 2011



Cite this article:

Duan Xiaoli, Wang Yu. The Subject Extraction Based on Topic Segmentation and PageRank Algorithm. New Technology of Library and Information Service, 2010, 26(12): 34-39.

URL:     OR

[1] 王继成,武港山,周源远,等. 一种篇章结构指导的中文Web文档自动摘要方法
[J]. 计算机研究与发展,2003,40(3):398-405.

[2] Salton G, Singhal A, Mitra M, et al. Automatic Text Structing and Summarization
[J]. Information Processing and Management, 1997, 33(2):193-207.

[3] 刘娜,唐焕玲,鲁明羽. 文本线性分割方法的研究
[J]. 计算机工程与应用,2008,44(21):212-216.

[4] Hearst M A. TextTiling: A Quantitative Approach to Discourse Segmentation, UCB: S2K-93-24. 1993:33-64.

[5] Reynar J C. An Automatic Method of Finding Topic Boundaries. In: Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistic. 1994:331-333.

[6] Kozima H. Text Segmentation Based on Similarity Between Words. In: Proceedings of ACL-93. 1993: 286-288.

[7] 张云涛,龚玲,王永成. 基于综合方法的文本主题句的自动抽取
[J]. 上海交通大学学报,2006,40(5): 771-774,782.

[8] Fattah M A, Ren F. Automatic Text Summarization
[J]. International Journal of Computer Science, 2008, 3 (1): 25-28.

[9] 张培颖.基于句子特征和语义距离的文本摘要技术
[J]. 微计算机应用,2009,30(7):14-18.

[10] 马亮,何婷婷,李芳,等.以关键词抽取为核心的文摘句选择策略
[J]. 中文信息学报,2008,22(6):50-54.

[11] 马颖华,王永成,苏贵洋,等.一种基于字同现频率的汉语文本主题抽取方法
[J]. 计算机研究与发展,2003,40(6):874-878.

[12] Salton G, Singhal A, Buckley C, et al. Automatic Text Decomposition Using Text Segments and Text Themes.In: Proceedings of the 7th ACM Conference on Hypertext. New York: ACM,1996:53-65.

[13] 周昭涛,卜东波,程学旗.文本的图表示初探
[J]. 中文信息学报,2004,19(2):36-43.

[14] Mihalcea R,Tarau P. TextRank: Bringing Order into Texts. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2004), Barcelona, Spain. 2004.

[15] 薛翠芳,郭炳炎.汉语文本结构的自动分析
[J]. 情报学报,2000,19(4):319-325.

[16] 马慧芳,祈云平,杨小东.一种基于文本关系图的多文档自动摘要技术
[J]. 情报杂志,2007,26(3):67-69.

[17] Mihalcea R. Graph-based Ranking Algorithms for Sentence Extraction Applied to Text Summarization. In: Proceedings of the 42nd Annual Meeting of the Association of Computational Linguistics (ACL 2004), Barcelona, Spain. 2004.

[18] 傅间莲,陈群秀.基于连续段落相似度的主题划分算法
[J]. 计算机应用,2005,25(9):2022-2024.

[19] 何维,王宇.基于句子关系图的网页文本主题句抽取
[J]. 现代图书情报技术,2009(3):57-61.

[20] Brin S, Page L. The Anatomy of a Large-scale Hypertextual Web Search Engine
[J]. International Journal of Approximate Reasoning,1998, 156(4):134-141.

[21] 搜狗实验室资料下载-搜狐新闻数据.

[22] 中文自然语言处理开放平台.

[1] Fan Shaoping,Zhao Yuxuan,An Xinying,Wu Qingqiang. Classification Model for Medical Entity Relations with Convolutional Neural Network[J]. 数据分析与知识发现, 2021, 5(9): 75-84.
[2] Wang Yifan,Li Bo,Shi Hua,Miao Wei,Jiang Bin. Annotation Method for Extracting Entity Relationship from Ancient Chinese Works[J]. 数据分析与知识发现, 2021, 5(9): 63-74.
[3] Ma Jiangwei, Lv Xueqiang, You Xindong, Xiao Gang, Han Junmei. Extracting Relationship Among Military Domains with BERT and Relation Position Features[J]. 数据分析与知识发现, 2021, 5(8): 1-12.
[4] Han Hui, Liu Xiuwen. Automatic Scoring for Subjective Questions in Maritime Competency Assessment[J]. 数据分析与知识发现, 2021, 5(8): 113-121.
[5] Chai Qingfeng, Shi Linyan, Mei Shan, Xiong Haitao, He Huixin. Extracting Knowledge Elements of Sci-Tech Literature Based on Artificial and Machine Features[J]. 数据分析与知识发现, 2021, 5(8): 132-144.
[6] Tan Ying, Tang Yifei. Extracting Citation Contents with Coreference Resolution[J]. 数据分析与知识发现, 2021, 5(8): 25-33.
[7] Zhang Jiandong, Chen Shiji, Xu Xiaoting, Zuo Wenge. Extracting PDF Tables Based on Word Vectors[J]. 数据分析与知识发现, 2021, 5(8): 34-44.
[8] Jiang Yaren, Le Xiaoqiu. Continual Learning for One-to-many Entity Relationship Generation with Small Samples[J]. 数据分析与知识发现, 2021, 5(8): 45-53.
[9] Su Qiang, Hou Xiaoli, Zou Ni. Predicting Surgical Infections Based on Machine Learning[J]. 数据分析与知识发现, 2021, 5(8): 65-75.
[10] Zhu Hou,Fang Qingyan. Quantifying and Examining Privacy Paradox of Social Media Users[J]. 数据分析与知识发现, 2021, 5(7): 111-125.
[11] Yu Xuehan, He Lin, Xu Jian. Extracting Events from Ancient Books Based on RoBERTa-CRF[J]. 数据分析与知识发现, 2021, 5(7): 26-35.
[12] Zhao Danning,Mu Dongmei,Bai Sen. Automatically Extracting Structural Elements of Sci-Tech Literature Abstracts Based on Deep Learning[J]. 数据分析与知识发现, 2021, 5(7): 70-80.
[13] Liu Wenbin, He Yanqing, Wu Zhenfeng, Dong Cheng. Sentence Alignment Method Based on BERT and Multi-similarity Fusion[J]. 数据分析与知识发现, 2021, 5(7): 48-58.
[14] Chen Xingyue, Ni Liping, Ni Zhiwei. Extracting Financial Events with ELECTRA and Part-of-Speech[J]. 数据分析与知识发现, 2021, 5(7): 36-47.
[15] Dong Mei,Chang Zhijun,Zhang Runjie. A Multiple Pattern Matching Algorithm for Specifications of Incremental Metadata for Sci-Tech Literature[J]. 数据分析与知识发现, 2021, 5(6): 135-144.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938