Please wait a minute...
Advanced Search
现代图书情报技术  2004, Vol. 20 Issue (2): 68-71     https://doi.org/10.11925/infotech.1003-3513.2004.02.19
  网络资源与建设 本期目录 | 过刊浏览 | 高级检索 |
基于DOM的Web信息自动抽取
吴伟   刘友华
(南京大学信息管理系 南京 210093)
Automatic Web Information Extraction Based on DOM
Wu Wei   Liu Youhua
(Department of Information Management, Nanjing University,Nanjing 210093,China)
全文:
输出: BibTeX | EndNote (RIS)      
摘要 

提出了Web页面信息的自动抽取思想,并使用WebBrowser和DOM技术实现了Web页面上网页元素查找、表单自动填写、表单自动提交、自动获得查询结果并自动抽取所需信息的技术,从而实现了Web页面信息的自动抽取。文中还给出了这一方法的实现细节和示例代码。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
关键词 Web页面自动信息抽取 DOMWebBrowser    
Abstract

More and more Web sites are built on databasedriven architecture. The Web pages of these sites are creating dynamically. This paper advances and implements a method of automatic information extraction from the dynamic pages by using WebBrowser and DOM technique. In addition, the paper illustrates the details and code through a prototype.

Key wordsDynamic Web    Automatic information extraction    DOM    WebBrowser
收稿日期: 2003-09-15      出版日期: 2004-01-06
: 

TP311

 
通讯作者: 吴伟     E-mail: wuweibox@hotmail.com
作者简介: 吴伟,刘友华
引用本文:   
吴伟,刘友华. 基于DOM的Web信息自动抽取[J]. 现代图书情报技术, 2004, 20(2): 68-71.
Wu Wei,Liu Youhua. Automatic Web Information Extraction Based on DOM. New Technology of Library and Information Service, 2004, 20(2): 68-71.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.1003-3513.2004.02.19      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2004/V20/I2/68

1Document Object Model (DOM) Level1 Specification, Version 1.0. W3C Recommendation. October,01,1998. http://www.w3.org/TR/1998/REC-DOM-Level-1-19981001/  (Accessed Mar.5,2003)
2Document Object Model(DOM)Level 2 Events Specification,Version1.0. W3C Recommendation. November,13,2000. http://www.w3.org/TR/DOM-Level-2-Events/ (Accessed Mar,6,2003)
3Michael Edwards, Scott Roberts. Reusing Internet Explorer and the WebBrowser Control: An Array of Options. Microsoft Corporation MSDN Library. July 30, 1998.http://msdn.microsoft.com/library/en-us/dnwebgen/html/reusebovw.asp  (Accessed Jul.15,2002)
4Kevin Hoffman, Jeff Gabriel al. Professional .NET Framework. Wrox Press Ltd. 2001
5Microsoft Corporation. Microsoft Visual C#.NET Language Reference. Microsoft Press. 2002
6Microsoft Corporation MSDN Library.Microsoft.NET/COM Migration and Interoperability. August,2001. http://msdn.microsoft.com/library/en-us/dnbda/html/cominterop.asp (Accessed May,1,2003)
7Microsoft Corporation MSDN Library. MSHTML Reference. 
http://msdn.microsoft.com/workshop/browser/mshtml/reference/reference.asp (Accessed Apri. 24,2003)
8Microsoft Corporation MSDN Library. WebBrowser Object.
http://msdn.microsoft.com/workshop/browser/webbrowser/reference/Objects/WebBrowser.asp (Accessed Apr.5, 2003)
9Microsoft Visual InterDev. http://msdn.microsoft.com/vinterdev/default.asp (Accessed Apr.3, 2003)

[1] 化柏林, 郭江. 基于规则的高校实验室Web信息抽取的系统设计与实现*[J]. 现代图书情报技术, 2009, (10): 62-66.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn