|
|
System Design and Implementation of University Laboratory Web Information Extraction Based on Rules |
Hua Bolin1 Guo Jiang2 |
1(Institute of Scientific and Technical Information of China, Beijing 100038, China)
2(Beijing Used Vehicle Trading Market Inc., Beijing 100070, China) |
|
|
Abstract This paper summarizes the laboratory information characters based on analysis of university laboratory Web information, which is used to formulate rules of laboratory Web information.It designs an information extraction system on university laboratory, and presents system architecture and technical architecture of labIE. It also describes the design of rules on table recognition and methodology of constructing characteristic predicate.
|
Received: 21 May 2009
Published: 25 October 2009
|
|
Corresponding Authors:
Hua Bolin
E-mail: huabolin@istic.ac.cn
|
About author:: Hua Bolin,Guo Jiang |
[1] Muggleton S, Building W, Road P. Inverse Entailment and Progol[J]. New Generation Computing (Special issue on Inductive Logic Programming),1995, 13(3-4):245-286.
[2] DeJong G. An Overview of the FRUMP System[C]. In: Proceedings of Strategies for Natural Language Processing. Hillsdale, NJ:Lawrence Erlbaum Associates, 1982:149-176.
[3] Grishman R, Sundheim B. Message Understanding Conference-6: A Brief History[C].In: Proceedings of the 16th International Conference on Computational Linguistics.1996:466-471.
[4]Automatic Content Extraction(ACE) Evaluation[EB/OL].[2009-02-20]. http://www.itl.nist.gov/iad/mig/tests/ace/.
[5] Grishman R,Huttunen S,Yangarber R.Information Extraction for Enhanced Access to Disease Outbreak Reports[J].Journal of Biomedical Informatics,2002,35(4):236-246.
[6] 陆伟, 韩曙光, 袁泽林,等.信息检索实验[M].武汉:武汉大学出版社,2008:6-26.
[7] 叶娜.面向信息抽取的文本预处理和规则自动学习技术研究[D].沈阳:东北大学,2005.
[8] 车万翔,刘挺,李生.实体关系自动抽取[J].中文信息学报,2005,19(2):1-6.
[9] HTMLParser的两种使用[EB/OL].[2008-12-21]. http://jansener.javaeye.com/blog/205883. |
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|