基于句子关系图的网页文本主题句抽取*

doi:10.11925/infotech.1003-3513.2009.03.10

现代图书情报技术

2009, Vol. 3

Issue (3): 57-61 https://doi.org/10.11925/infotech.1003-3513.2009.03.10

知识组织与知识管理

本期目录 | 过刊浏览 | 高级检索

基于句子关系图的网页文本主题句抽取*

何维王宇

（大连理工大学管理学院大连 116024）

Extracting Topic Sentences form Web Text Based on Sentence Relationship Map

He Wei Wang Yu

(School of Management, Dalian University of Technology, Dalian 116024, China)

摘要
参考文献
相关文章
Metrics

全文: PDF (452 KB)
输出: BibTeX | EndNote (RIS)

摘要

针对网页文本结构信息少、噪声大的特点，将句子看作点，将句子间的相似性看作边，用句子关系图描述文本中句子间的关系。抽取文本主题句的任务转化为搜索图中边最多的点。利用语义词典，将句子相似度定义为句子语义相似度，解决短文本词频相似度低的问题。选用互联网公开语料进行测试，抽取的主题句达到平均80.6%的可接受性。

	服务

	把本文推荐给朋友
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	何维
	王宇

关键词 ：主题句, 句子关系图, 句子相似度

Abstract：

Concerning the issues of Web text with little structure information and big noise, sentences are viewed as nodes and similarities between them are viewed as edges, a relationship map is used to describe the relationship between sentences. Topic sentences of a text can be got through searching the nodes which have most of edges. Using the semantic dictionary, sentence similarity is defined as its semantic similarity to address the problem of low word frequency similarity of short text. An internet public campus is chosen to take a test, 80.6% acceptability have been achieved.

Key words： Topic Sentence Sentence Relationship Map Sentence similarity

收稿日期: 2008-12-29 出版日期: 2009-03-25

TP391

基金资助:

* 本文系国家自然科学基金项目“企业（组织）知识管理中的若干基础科学问题研究”（项目编号：70431001）的研究成果之一。

通讯作者: 王宇 E-mail: ywang@dlut.edu.cn

作者简介: 何维,王宇

引用本文:

何维,王宇. 基于句子关系图的网页文本主题句抽取*[J]. 现代图书情报技术, 2009, 3(3): 57-61.
He Wei,Wang Yu. Extracting Topic Sentences form Web Text Based on Sentence Relationship Map. New Technology of Library and Information Service, 2009, 3(3): 57-61.

链接本文:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.1003-3513.2009.03.10 或 https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2009/V3/I3/57

［1］张云涛,龚玲,王永成.基于综合方法的文本主题句的自动抽取［J］.上海交通大学学报,2006,40(5): 771-774,782.
［2］马颖华,王永成,苏贵洋,等.一种基于字同现频率的汉语文本主题抽取方法［J］.计算机研究与发展,2003,40(6):874-878.
［3］廉站俊,吕学强,张玉杰,等.基于句子相似度计算的信息抽取［J］.现代图书情报技术,2007 (6):38-41.
［4］孙宏纲,陆余良.中文博客主题情感句自动抽取研究［J］.计算机工程与应用,2008,44(20):165-168,221.
［5］陈炯,张永奎.基于加权信息论的突发事件新闻主题抽取方法［J］.计算机应用,2008,28:150-151.
［6］蔡巍,王永成,尹中航.一种无词典的从Web新闻页面抽取主题的算法［J］.情报学报,2008,27(1):12-17.
［7］ Salton G, Allan J. Automatic Text Decomposition and Structuring ［J］. Information Processing and Management,1996,32(2):127-138.
［8］ Salton G, Singhal A, Buckley C, et al. Automatic Text Decomposition Using Text Segments and Text Themes［C］.In:Proceedings of the Seventh ACM Conference on Hypertext. NY: ACM New York,1996:53-65.
［9］ Mitra M, Singhal A, Buckley C. Automatic Text Summarization by Paragraph Extraction［C］. In:Proceedings of ACL’97/EACL’97. Worksho Pon Intelligent Scaleable Text Summarization, Madrid. NJ: Assoc. Comput. Linguistics, 1997:39-46.
［10］薛翠芳,郭炳炎.汉语文本结构的自动分析［J］.情报学报,2000,19(4):319-325.
［11］ Chatterjee N. A Statistical Approach for Similarity Measurement between Sentences for EBMT［C］. In:Proceedings of Symposium on Translation Support Systems STRANS-2001, 2001.
［12］ Chen K, Fan XZ, Liu J, et al. A New Approach to Compute the Semantic Similarity of Chinese Question Sentence［C］.In:Proceedings of the Sixth International Conference on Machine Learning and Cybernetics(ICMLC 2007), Hong Kong. NJ:IEEE, 2007:1830-1835.
［13］ Li Y, McLean D, Bandar Z A, et al. Sentence Similarity Based on Semantic Nets and Corpus Statistics［J］.IEEE transactions on knowledge and data engineering, 2006,18(8):1138-1150.
［14］ Che W X, Jiang J M, Su Z, et al. Improved-Edit-Distance Kernel for Chinese Relation Extraction［C］.In:The Second International Joint Conference on Natural Language Processing (IJCNLP-05),Jeju Korea. Springer,2005:134-139.
［15］哈尔滨工业大学信息检索研究室.同义词词林（扩展版）［EB/OL］.［2008-05-19］.http://www.ir-lab.org/.
［16］搜狗实验室.文本分类语料库:精简版(tar.gz格式)［DB/OL］.［2008-03-18］. http://www.sogou.com/labs/dl/c.html.
［17］张华平. ICTCLAS3.0 API［CP］.［ 2008-03-17］. http://www.nlp.org.cn/project/project.php?proj_id=6.

[1]	王子璇, 乐小虬, 何远标. 基于WMD语义相似度的TextRank改进算法识别论文核心主题句研究[J]. 数据分析与知识发现, 2017, 1(4): 1-8.
[2]	袁冬, 熊晶, 刘永革. 面向甲骨文的实例机器翻译技术研究[J]. 现代图书情报技术, 2012, 28(5): 48-54.
[3]	王志超, 翁楠, 王宇. 基于主题句相似度的标题党新闻鉴别技术研究[J]. 现代图书情报技术, 2011, (11): 48-53.
[4]	段晓丽, 王宇. 基于主题分割与PageRank算法的文本主题抽取[J]. 现代图书情报技术, 2010, 26(12): 34-39.
[5]	王森,王宇. 基于文本结构树的论文复制检测算法[J]. 现代图书情报技术, 2009, (10): 50-55.
[6]	廉站俊,吕学强,张玉杰,施水才. 基于句子相似度计算的信息抽取*[J]. 现代图书情报技术, 2007, 2(6): 38-41.
[7]	化柏林 . 基于句子匹配的文章自写度测评系统[J]. 现代图书情报技术, 2007, 2(11): 40-44.
[8]	秦新国. 基于句子相似度的文档复制检测算法研究[J]. 现代图书情报技术, 2007, 2(11): 63-66.

Viewed

Full text

Abstract

Cited

Shared

Discussed