Please wait a minute...
New Technology of Library and Information Service  2010, Vol. 26 Issue (12): 34-39    DOI: 10.11925/infotech.1003-3513.2010.12.06
article Current Issue | Archive | Adv Search |
The Subject Extraction Based on Topic Segmentation and PageRank Algorithm
Duan Xiaoli, Wang Yu
School of Management, Dalian University of Technology, Dalian 116024, China
Download: PDF(410 KB)   HTML  
Export: BibTeX | EndNote (RIS)      
Abstract  

Considering the completeness of subject extraction, this paper sorts the sentences with PageRank algorithm based on text theme divisions after reconstructing sentence relation map to every theme package. Then the sentence which has the maximum weight among all the texts is set to be the topics sentence. Experiments show that the topic sentence extraction algorithm has a good coverage of the full text.

Key wordsTopic      sentence      extraction      Subject      segmenting      Sentence      relation      map      PageRank      algorithm     
Received: 10 November 2010      Published: 07 January 2011
: 

TP391

 

Cite this article:

Duan Xiaoli, Wang Yu. The Subject Extraction Based on Topic Segmentation and PageRank Algorithm. New Technology of Library and Information Service, 2010, 26(12): 34-39.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2010.12.06     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2010/V26/I12/34


[1] 王继成,武港山,周源远,等. 一种篇章结构指导的中文Web文档自动摘要方法
[J]. 计算机研究与发展,2003,40(3):398-405.

[2] Salton G, Singhal A, Mitra M, et al. Automatic Text Structing and Summarization
[J]. Information Processing and Management, 1997, 33(2):193-207.

[3] 刘娜,唐焕玲,鲁明羽. 文本线性分割方法的研究
[J]. 计算机工程与应用,2008,44(21):212-216.

[4] Hearst M A. TextTiling: A Quantitative Approach to Discourse Segmentation, UCB: S2K-93-24. 1993:33-64.

[5] Reynar J C. An Automatic Method of Finding Topic Boundaries. In: Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistic. 1994:331-333.

[6] Kozima H. Text Segmentation Based on Similarity Between Words. In: Proceedings of ACL-93. 1993: 286-288.

[7] 张云涛,龚玲,王永成. 基于综合方法的文本主题句的自动抽取
[J]. 上海交通大学学报,2006,40(5): 771-774,782.

[8] Fattah M A, Ren F. Automatic Text Summarization
[J]. International Journal of Computer Science, 2008, 3 (1): 25-28.

[9] 张培颖.基于句子特征和语义距离的文本摘要技术
[J]. 微计算机应用,2009,30(7):14-18.

[10] 马亮,何婷婷,李芳,等.以关键词抽取为核心的文摘句选择策略
[J]. 中文信息学报,2008,22(6):50-54.

[11] 马颖华,王永成,苏贵洋,等.一种基于字同现频率的汉语文本主题抽取方法
[J]. 计算机研究与发展,2003,40(6):874-878.

[12] Salton G, Singhal A, Buckley C, et al. Automatic Text Decomposition Using Text Segments and Text Themes.In: Proceedings of the 7th ACM Conference on Hypertext. New York: ACM,1996:53-65.

[13] 周昭涛,卜东波,程学旗.文本的图表示初探
[J]. 中文信息学报,2004,19(2):36-43.

[14] Mihalcea R,Tarau P. TextRank: Bringing Order into Texts. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2004), Barcelona, Spain. 2004.

[15] 薛翠芳,郭炳炎.汉语文本结构的自动分析
[J]. 情报学报,2000,19(4):319-325.

[16] 马慧芳,祈云平,杨小东.一种基于文本关系图的多文档自动摘要技术
[J]. 情报杂志,2007,26(3):67-69.

[17] Mihalcea R. Graph-based Ranking Algorithms for Sentence Extraction Applied to Text Summarization. In: Proceedings of the 42nd Annual Meeting of the Association of Computational Linguistics (ACL 2004), Barcelona, Spain. 2004.

[18] 傅间莲,陈群秀.基于连续段落相似度的主题划分算法
[J]. 计算机应用,2005,25(9):2022-2024.

[19] 何维,王宇.基于句子关系图的网页文本主题句抽取
[J]. 现代图书情报技术,2009(3):57-61.

[20] Brin S, Page L. The Anatomy of a Large-scale Hypertextual Web Search Engine
[J]. International Journal of Approximate Reasoning,1998, 156(4):134-141.

[21] 搜狗实验室资料下载-搜狐新闻数据. http://www.sogou.com/labs/dl/cs.html.

[22] 中文自然语言处理开放平台. http://www.nlp.org.cn/project/project.php?proj_id=6.

[1] Xiaofeng Li,Jing Ma,Chi Li,Hengmin Zhu. Identifying Commodity Names Based on XGBoost Model[J]. 数据分析与知识发现, 2019, 3(7): 34-41.
[2] Xiuxian Wen,Jian Xu. Research on Product Characteristics Extraction and Hedonic Price Based on User Comments[J]. 数据分析与知识发现, 2019, 3(7): 42-51.
[3] Qingtian Zeng,Xiaohui Hu,Chao Li. Extracting Keywords with Topic Embedding and Network Structure Analysis[J]. 数据分析与知识发现, 2019, 3(7): 52-60.
[4] Peng Guan,Yuefen Wang,Zhu Fu. Analyzing Topic Semantic Evolution with LDA: Case Study of Lithium Ion Batteries[J]. 数据分析与知识发现, 2019, 3(7): 61-72.
[5] Ming Yi,Tingting Zhang. Ranking Answer Quality of Popular Q&A Community[J]. 数据分析与知识发现, 2019, 3(6): 12-20.
[6] Qikai Cheng,Jiamin Wang,Wei Lu. Discovering Domain Vocabularies Based on Citation Co-word Network[J]. 数据分析与知识发现, 2019, 3(6): 57-65.
[7] Ruihua Qi,Junyi Zhou,Xu Guo,Caihong Liu. Extracting Book Review Topics with Knowledge Base[J]. 数据分析与知识发现, 2019, 3(6): 83-91.
[8] Cheng Zhou,Hongqin Wei. Evaluating and Classifying Patent Values Based on Self-Organizing Maps and Support Vector Machine[J]. 数据分析与知识发现, 2019, 3(5): 117-124.
[9] Jingjing Pei,Xiaoqiu Le. Identifying Coordinate Text Blocks in Discourses[J]. 数据分析与知识发现, 2019, 3(5): 51-56.
[10] Jinzhu Zhang,Yiming Hu. Extracting Titles from Scientific References in Patents with Fusion of Representation Learning and Machine Learning[J]. 数据分析与知识发现, 2019, 3(5): 68-76.
[11] Bengong Yu,Yangnan Chen,Ying Yang. Classifying Short Text Complaints with nBD-SVM Model[J]. 数据分析与知识发现, 2019, 3(5): 77-85.
[12] Yuemin Wu,Ganggui Ding,Bin Hu. Extracting Relationship of Agricultural Financial Texts with Attention Mechanism[J]. 数据分析与知识发现, 2019, 3(5): 86-92.
[13] Xiaolan Wu,Chengzhi Zhang. Analysis of Knowledge Flow Based on Academic Social Networks:
A Case Study of ScienceNet.cn
[J]. 数据分析与知识发现, 2019, 3(4): 107-116.
[14] Jiang Wu,Guanjun Liu,Xian Hu. An Overview of Online Medical and Health Research: Hot Topics, Theme Evolution and Research Content[J]. 数据分析与知识发现, 2019, 3(4): 2-12.
[15] Lu An,Yanping Liang. Selection of Users’ Behaviors Towards Different Topics of Microblog on Public Health Emergencies[J]. 数据分析与知识发现, 2019, 3(4): 33-41.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn