基于CPN网络的Web正文抽取技术研究

doi:10.11925/infotech.1003-3513.2008.11.13

现代图书情报技术

2008, Vol. 24

Issue (11): 65-71 https://doi.org/10.11925/infotech.1003-3513.2008.11.13

情报分析与研究

本期目录 | 过刊浏览 | 高级检索

基于CPN网络的Web正文抽取技术研究

陈敬文彭哲

(武汉大学信息资源研究中心武汉 430072)

Study on Web Text Extraction Based on CPN Networks

Chen Jingwen Peng Zhe

(Center for Studies of Information Resource of Wuhan University, Wuhan 430072, China)

摘要
参考文献
相关文章
Metrics

全文: PDF (628 KB)
输出: BibTeX | EndNote (RIS)

摘要

通过研究使用CPN神经网络进行页面正文抽取，针对传统抽取技术在通用性、可扩展性和可维护性方面的不足提出一种解决问题的思路。

	服务

	把本文推荐给朋友
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	陈敬文
	彭哲

关键词 ：信息抽取, CPN神经网络

Abstract：

This paper proposes a approach to solve the problem of generality,scalability,maintainability in the traditional methods.

Key words： Information extraction CPN neural network

收稿日期: 2008-06-17 出版日期: 2008-11-25

G202

通讯作者: 陈敬文 E-mail: chenjw_2001@yahoo.com

作者简介: 陈敬文,彭哲

引用本文:

陈敬文,彭哲. 基于CPN网络的Web正文抽取技术研究[J]. 现代图书情报技术, 2008, 24(11): 65-71.
Chen Jingwen,Peng Zhe. Study on Web Text Extraction Based on CPN Networks. New Technology of Library and Information Service, 2008, 24(11): 65-71.

链接本文:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.1003-3513.2008.11.13 或 https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2008/V24/I11/65

［1］胡昌平. 信息服务与用户［M］. 武汉: 武汉大学出版社, 2007.
［2］ Cai D, Yu S, Wen J R, et al. VIPS: A Vision-based Page Segmentation Algorithm［EB/OL］. (2003-11-01). http://research.microsoft.com/users/jrwen/jrwen_files/publications/VIPS_Technical%20Report.PDF.
［3］ Alexjc. The Easy Way to Extract Userful Text from Arbitray HTML［EB/OL］. (2007-04-05). http://ai-depot.com/articles/the-easy-way-to-extract-useful-text-from-arbitrary-html/.
［4］ Hornik K M, Stinchcombe M, White H. Multilayer Feed Forward Networks are Universal Approximators［J］. Neural Networks, 1989, 2 (2): 359-366.
［5］ R. Hecht-Nielsen. Counterpropagation Networks［J］. Applied Optics, 1987(26):4979-4984.
［6］ Heaton J. Java neural networks［EB/OL］.(2007-12-24). http://www.heatonresearch.com/articles/5/page1.html.
［7］飞思科技产品研发中心. 神经网络实现与Matlab7实现［M］. 北京: 电子工业出版社, 2005.
［8］孙承杰, 关毅. 基于统计的网页正文信息抽取方法的研究［J］. 中文信息学报, 2004, 18(5):17-22.
［9］ Hammer J, McHugh J. Semi-structured Data: The TSIMMIS Experience［A］. In： Proceeding of the First East-European Symposium on Advance in Databases and Information Systems, 1997.
［10］ Liu L, Pu C. XWRAP: An XML-enable Wrapper Construction System for the Web Information Source［C］. In: Proceedings of the 16th IEEE International Conference on Data Engineering, 2000.
［11］ Crescenzi V, Mecca G. RoadRunner: Towards Automatic Data Extraction from Large Web Site［C］. In: Proceeding of the 26th International Conference on very Large Database Systems, 2001.
［12］ Califf M E, Mooney R J. Relational Learning of Pattern-Match Rules for Information Extraction［C］. In: Proceedings of 16th National Conference on Artificial Intelligence and Eleventh Coference on Innovative Applications of Artificial Intelligence, 1999.
［13］ HSU C N, Dung, M T. Generating Finite-State Transducers for Semi-structured Data Extraction from the Web［J］. Information System, 1998,23(8):521-538.

[1]	谭荧, 唐亦非. 基于指代消解的引文内容抽取研究^*[J]. 数据分析与知识发现, 2021, 5(8): 25-33.
[2]	陶玥,余丽,张润杰. 科技文献中短语级主题抽取的主动学习方法研究^*[J]. 数据分析与知识发现, 2020, 4(10): 134-143.
[3]	刘志强,都云程,施水才. 基于改进的隐马尔科夫模型的网页新闻关键信息抽取^*[J]. 数据分析与知识发现, 2019, 3(3): 120-128.
[4]	章成志,李铮. 基于学术论文全文的创新研究评价句抽取研究 ^*[J]. 数据分析与知识发现, 2019, 3(10): 12-18.
[5]	牟冬梅, 金姗, 琚沅红. 基于文献数据的疾病与基因关联关系研究^*[J]. 数据分析与知识发现, 2018, 2(8): 98-106.
[6]	段宇锋,黄思思. 中文植物物种多样性描述文本的信息抽取研究^*[J]. 现代图书情报技术, 2016, 32(1): 87-96.
[7]	刘伟, 王星, 宋培彦. 同义词抽取结果的噪音清洗方法研究[J]. 现代图书情报技术, 2015, 31(6): 64-70.
[8]	李湘东, 霍亚勇, 黄莉. 图书网页的自动识别及书目信息抽取研究[J]. 现代图书情报技术, 2014, 30(4): 71-77.
[9]	刘雅静, 王衍喜, 郝丹, 周津慧. 机构知识库支撑科研服务方法研究[J]. 现代图书情报技术, 2014, 30(3): 1-7.
[10]	翟东升, 张欣琦, 张杰, 康宁. 分布式专利信息抽取系统设计与构建[J]. 现代图书情报技术, 2013, 29(7/8): 114-121.
[11]	张晗, 刘双梅. 中心度指标对语义述谓网络概念抽取的比较分析——以疾病治疗学研究为例[J]. 现代图书情报技术, 2013, (6): 30-35.
[12]	黄勋, 游宏梁, 于洋. 关系抽取技术研究综述[J]. 现代图书情报技术, 2013, 29(11): 30-39.
[13]	何琳, 何娟, 沈耕宇, 杨波, 黄水清. 一种通过文本挖掘发现实时定量聚合酶链式反应实验内参基因的方法研究[J]. 现代图书情报技术, 2012, 28(7): 109-114.
[14]	高强, 游宏梁. 基于层叠模型的国防领域命名实体识别研究[J]. 现代图书情报技术, 2012, (11): 47-52.
[15]	王秀艳, 崔雷. 应用关键动词抽取生物医学实体间语义关系研究综述[J]. 现代图书情报技术, 2011, 27(9): 21-27.

Viewed

Full text

Abstract

Cited

Shared

Discussed