|
|
Study on Web Text Extraction Based on CPN Networks |
Chen Jingwen Peng Zhe |
(Center for Studies of Information Resource of Wuhan University, Wuhan 430072, China) |
|
|
Abstract This paper proposes a approach to solve the problem of generality,scalability,maintainability in the traditional methods.
|
Received: 17 June 2008
Published: 25 November 2008
|
|
Corresponding Authors:
Chen Jingwen
E-mail: chenjw_2001@yahoo.com
|
About author:: Chen Jingwen,Peng Zhe |
[1] 胡昌平. 信息服务与用户[M]. 武汉: 武汉大学出版社, 2007.
[2] Cai D, Yu S, Wen J R, et al. VIPS: A Vision-based Page Segmentation Algorithm[EB/OL]. (2003-11-01). http://research.microsoft.com/users/jrwen/jrwen_files/publications/VIPS_Technical%20Report.PDF.
[3] Alexjc. The Easy Way to Extract Userful Text from Arbitray HTML[EB/OL]. (2007-04-05). http://ai-depot.com/articles/the-easy-way-to-extract-useful-text-from-arbitrary-html/.
[4] Hornik K M, Stinchcombe M, White H. Multilayer Feed Forward Networks are Universal Approximators[J]. Neural Networks, 1989, 2 (2): 359-366.
[5] R. Hecht-Nielsen. Counterpropagation Networks[J]. Applied Optics, 1987(26):4979-4984.
[6] Heaton J. Java neural networks[EB/OL].(2007-12-24). http://www.heatonresearch.com/articles/5/page1.html.
[7] 飞思科技产品研发中心. 神经网络实现与Matlab7实现[M]. 北京: 电子工业出版社, 2005.
[8] 孙承杰, 关毅. 基于统计的网页正文信息抽取方法的研究[J]. 中文信息学报, 2004, 18(5):17-22.
[9] Hammer J, McHugh J. Semi-structured Data: The TSIMMIS Experience[A]. In: Proceeding of the First East-European Symposium on Advance in Databases and Information Systems, 1997.
[10] Liu L, Pu C. XWRAP: An XML-enable Wrapper Construction System for the Web Information Source[C]. In: Proceedings of the 16th IEEE International Conference on Data Engineering, 2000.
[11] Crescenzi V, Mecca G. RoadRunner: Towards Automatic Data Extraction from Large Web Site[C]. In: Proceeding of the 26th International Conference on very Large Database Systems, 2001.
[12] Califf M E, Mooney R J. Relational Learning of Pattern-Match Rules for Information Extraction[C]. In: Proceedings of 16th National Conference on Artificial Intelligence and Eleventh Coference on Innovative Applications of Artificial Intelligence, 1999.
[13] HSU C N, Dung, M T. Generating Finite-State Transducers for Semi-structured Data Extraction from the Web[J]. Information System, 1998,23(8):521-538. |
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|