System Design and Implementation of University Laboratory Web Information Extraction Based on Rules
Hua Bolin1 Guo Jiang2
1(Institute of Scientific and Technical Information of China, Beijing 100038, China) 2(Beijing Used Vehicle Trading Market Inc., Beijing 100070, China)
This paper summarizes the laboratory information characters based on analysis of university laboratory Web information, which is used to formulate rules of laboratory Web information.It designs an information extraction system on university laboratory, and presents system architecture and technical architecture of labIE. It also describes the design of rules on table recognition and methodology of constructing characteristic predicate.
化柏林, 郭江. 基于规则的高校实验室Web信息抽取的系统设计与实现*[J]. 现代图书情报技术, 2009, (10): 62-66.
Hua Bolin,Guo Jiang. System Design and Implementation of University Laboratory Web Information Extraction Based on Rules. New Technology of Library and Information Service, 2009, (10): 62-66.
[1] Muggleton S, Building W, Road P. Inverse Entailment and Progol[J]. New Generation Computing (Special issue on Inductive Logic Programming),1995, 13(3-4):245-286.
[2] DeJong G. An Overview of the FRUMP System[C]. In: Proceedings of Strategies for Natural Language Processing. Hillsdale, NJ:Lawrence Erlbaum Associates, 1982:149-176.
[3] Grishman R, Sundheim B. Message Understanding Conference-6: A Brief History[C].In: Proceedings of the 16th International Conference on Computational Linguistics.1996:466-471.
[4]Automatic Content Extraction(ACE) Evaluation[EB/OL].[2009-02-20]. http://www.itl.nist.gov/iad/mig/tests/ace/.
[5] Grishman R,Huttunen S,Yangarber R.Information Extraction for Enhanced Access to Disease Outbreak Reports[J].Journal of Biomedical Informatics,2002,35(4):236-246.
[6] 陆伟, 韩曙光, 袁泽林,等.信息检索实验[M].武汉:武汉大学出版社,2008:6-26.
[7] 叶娜.面向信息抽取的文本预处理和规则自动学习技术研究[D].沈阳:东北大学,2005.
[8] 车万翔,刘挺,李生.实体关系自动抽取[J].中文信息学报,2005,19(2):1-6.
[9] HTMLParser的两种使用[EB/OL].[2008-12-21]. http://jansener.javaeye.com/blog/205883.