Please wait a minute...
New Technology of Library and Information Service  2008, Vol. 24 Issue (11): 65-71    DOI: 10.11925/infotech.1003-3513.2008.11.13
Current Issue | Archive | Adv Search |
Study on Web Text Extraction Based on CPN Networks
Chen Jingwen  Peng Zhe
(Center for Studies of Information Resource of Wuhan University, Wuhan 430072, China)
Download:
Export: BibTeX | EndNote (RIS)      
Abstract  

This paper proposes a approach to solve the problem of generality,scalability,maintainability in the traditional methods.

Key wordsInformation extraction      CPN neural network     
Received: 17 June 2008      Published: 25 November 2008
: 

G202

 
Corresponding Authors: Chen Jingwen     E-mail: chenjw_2001@yahoo.com
About author:: Chen Jingwen,Peng Zhe

Cite this article:

Chen Jingwen,Peng Zhe. Study on Web Text Extraction Based on CPN Networks. New Technology of Library and Information Service, 2008, 24(11): 65-71.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2008.11.13     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2008/V24/I11/65

[1] 胡昌平. 信息服务与用户[M]. 武汉: 武汉大学出版社, 2007.
[2] Cai D, Yu S, Wen J R, et al. VIPS: A Vision-based Page Segmentation Algorithm[EB/OL]. (2003-11-01). http://research.microsoft.com/users/jrwen/jrwen_files/publications/VIPS_Technical%20Report.PDF.
[3] Alexjc. The Easy Way to Extract Userful Text from Arbitray HTML[EB/OL]. (2007-04-05). http://ai-depot.com/articles/the-easy-way-to-extract-useful-text-from-arbitrary-html/.
[4] Hornik K M, Stinchcombe M, White H. Multilayer Feed Forward Networks are Universal Approximators[J]. Neural Networks, 1989, 2 (2): 359-366.
[5] R. Hecht-Nielsen. Counterpropagation Networks[J]. Applied Optics, 1987(26):4979-4984.
[6] Heaton J. Java neural networks[EB/OL].(2007-12-24). http://www.heatonresearch.com/articles/5/page1.html.
[7] 飞思科技产品研发中心. 神经网络实现与Matlab7实现[M]. 北京: 电子工业出版社, 2005.
[8] 孙承杰, 关毅. 基于统计的网页正文信息抽取方法的研究[J]. 中文信息学报, 2004, 18(5):17-22.
[9] Hammer J, McHugh J. Semi-structured Data: The TSIMMIS Experience[A]. In: Proceeding of the First East-European Symposium on Advance in Databases and Information Systems, 1997.
[10] Liu L, Pu C. XWRAP: An XML-enable Wrapper Construction System for the Web Information Source[C]. In: Proceedings of the 16th IEEE International Conference on Data Engineering, 2000.
[11] Crescenzi V, Mecca G. RoadRunner: Towards Automatic Data Extraction from Large Web Site[C]. In: Proceeding of the 26th International Conference on very Large Database Systems, 2001.
[12] Califf M E,  Mooney R J. Relational Learning of Pattern-Match Rules for Information Extraction[C]. In: Proceedings of 16th National Conference on Artificial Intelligence and Eleventh Coference on Innovative Applications of Artificial Intelligence, 1999.
[13] HSU C N, Dung, M T. Generating Finite-State Transducers for Semi-structured Data Extraction from the Web[J]. Information System, 1998,23(8):521-538.

[1] Tan Ying, Tang Yifei. Extracting Citation Contents with Coreference Resolution[J]. 数据分析与知识发现, 2021, 5(8): 25-33.
[2] Wang Yi,Shen Zhe,Yao Yifan,Cheng Ying. Domain-Specific Event Graph Construction Methods:A Review[J]. 数据分析与知识发现, 2020, 4(10): 1-13.
[3] Tao Yue,Yu Li,Zhang Runjie. Active Learning Strategies for Extracting Phrase-Level Topics from Scientific Literature[J]. 数据分析与知识发现, 2020, 4(10): 134-143.
[4] Zhiqiang Liu,Yuncheng Du,Shuicai Shi. Extraction of Key Information in Web News Based on Improved Hidden Markov Model[J]. 数据分析与知识发现, 2019, 3(3): 120-128.
[5] Chengzhi Zhang,Zheng Li. Extracting Sentences of Research Originality from Full Text Academic Articles[J]. 数据分析与知识发现, 2019, 3(10): 12-18.
[6] Mu Dongmei,Jin Shan,Ju Yuanhong. Finding Association Between Diseases and Genes from Literature Abstracts[J]. 数据分析与知识发现, 2018, 2(8): 98-106.
[7] Yufeng Duan,Sisi Huang. Information Extraction from Chinese Plant Species Diversity Description Text[J]. 现代图书情报技术, 2016, 32(1): 87-96.
[8] Liu Wei, Wang Xing, Song Peiyan. A Noise Cleaning Method for Synonym Extraction Results[J]. 现代图书情报技术, 2015, 31(6): 64-70.
[9] Jiang Chuntao. Automatic Annotation of Bibliographical References in Chinese Patent Documents[J]. 现代图书情报技术, 2015, 31(10): 81-87.
[10] Li Xiangdong, Huo Yayong, Huang Li. Study of Book Pages Automatic Identification and Bibliographic Information Extraction[J]. 现代图书情报技术, 2014, 30(4): 71-77.
[11] Liu Yajing, Wang Yanxi, Hao Dan, Zhou Jinhui. Study on the Methods of Institutional Repository Supporting Research Services[J]. 现代图书情报技术, 2014, 30(3): 1-7.
[12] Zhang Han, Liu Shuangmei. Comparative Analysis of Centrality Indices in Extracting Concepts from Semantic Predication Network——Based on Disease Treatment Research[J]. 现代图书情报技术, 2013, (6): 30-35.
[13] Huang Xun, You Hongliang, Yu Yang. A Review of Relation Extraction[J]. 现代图书情报技术, 2013, 29(11): 30-39.
[14] He Lin, He Juan, Shen Gengyu, Yang Bo, Huang Shuiqing. An Approach to Discovery of Reference Control Gene for qRT-PCR Experiment Based on Texting Mining[J]. 现代图书情报技术, 2012, 28(7): 109-114.
[15] Gao Qiang, You Hongliang. Study on Named Entity Recognition Based on Cascaded Model for Field of Defense[J]. 现代图书情报技术, 2012, (11): 47-52.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn