Please wait a minute...
New Technology of Library and Information Service  2004, Vol. 20 Issue (2): 68-71    DOI: 10.11925/infotech.1003-3513.2004.02.19
Current Issue | Archive | Adv Search |
Automatic Web Information Extraction Based on DOM
Wu Wei   Liu Youhua
(Department of Information Management, Nanjing University,Nanjing 210093,China)
Download:
Export: BibTeX | EndNote (RIS)      
Abstract  

More and more Web sites are built on databasedriven architecture. The Web pages of these sites are creating dynamically. This paper advances and implements a method of automatic information extraction from the dynamic pages by using WebBrowser and DOM technique. In addition, the paper illustrates the details and code through a prototype.

Key wordsDynamic Web      Automatic information extraction      DOM      WebBrowser     
Received: 15 September 2003      Published: 06 January 2004
: 

TP311

 
Corresponding Authors: Wu Wei     E-mail: wuweibox@hotmail.com
About author:: Wu Wei,Liu Youhua

Cite this article:

Wu Wei,Liu Youhua. Automatic Web Information Extraction Based on DOM. New Technology of Library and Information Service, 2004, 20(2): 68-71.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2004.02.19     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2004/V20/I2/68

1Document Object Model (DOM) Level1 Specification, Version 1.0. W3C Recommendation. October,01,1998. http://www.w3.org/TR/1998/REC-DOM-Level-1-19981001/  (Accessed Mar.5,2003)
2Document Object Model(DOM)Level 2 Events Specification,Version1.0. W3C Recommendation. November,13,2000. http://www.w3.org/TR/DOM-Level-2-Events/ (Accessed Mar,6,2003)
3Michael Edwards, Scott Roberts. Reusing Internet Explorer and the WebBrowser Control: An Array of Options. Microsoft Corporation MSDN Library. July 30, 1998.http://msdn.microsoft.com/library/en-us/dnwebgen/html/reusebovw.asp  (Accessed Jul.15,2002)
4Kevin Hoffman, Jeff Gabriel al. Professional .NET Framework. Wrox Press Ltd. 2001
5Microsoft Corporation. Microsoft Visual C#.NET Language Reference. Microsoft Press. 2002
6Microsoft Corporation MSDN Library.Microsoft.NET/COM Migration and Interoperability. August,2001. http://msdn.microsoft.com/library/en-us/dnbda/html/cominterop.asp (Accessed May,1,2003)
7Microsoft Corporation MSDN Library. MSHTML Reference. 
http://msdn.microsoft.com/workshop/browser/mshtml/reference/reference.asp (Accessed Apri. 24,2003)
8Microsoft Corporation MSDN Library. WebBrowser Object.
http://msdn.microsoft.com/workshop/browser/webbrowser/reference/Objects/WebBrowser.asp (Accessed Apr.5, 2003)
9Microsoft Visual InterDev. http://msdn.microsoft.com/vinterdev/default.asp (Accessed Apr.3, 2003)

[1] Chen Jie,Ma Jing,Li Xiaofeng. Short-Text Classification Method with Text Features from Pre-trained Models[J]. 数据分析与知识发现, 2021, 5(9): 21-30.
[2] Shan Xiaohong,Wang Chunwen,Liu Xiaoyan,Han Shengxi,Yang Juan. Identifying Lead Users in Open Innovation Community from Knowledge-based Perspectives[J]. 数据分析与知识发现, 2021, 5(9): 85-96.
[3] Liu Yuanchen, Wang Hao, Gao Yaqi. Predicting Online Music Playbacks and Influencing Factors[J]. 数据分析与知识发现, 2021, 5(8): 100-112.
[4] Chen Wenjie,Wen Yi,Yang Ning. Fuzzy Overlapping Community Detection Algorithm Based on Node Vector Representation[J]. 数据分析与知识发现, 2021, 5(5): 41-50.
[5] Cheng Bin,Shi Shuicai,Du Yuncheng,Xiao Shibin. Keyword Extraction for Journals Based on Part-of-Speech and BiLSTM-CRF Combined Model[J]. 数据分析与知识发现, 2021, 5(3): 101-108.
[6] Zheng Xinman, Dong Yu. Constructing Degree Lexicon for STI Policy Texts[J]. 数据分析与知识发现, 2021, 5(10): 81-93.
[7] Zhao Ping,Sun Lianying,Tu Shuai,Bian Jianling,Wan Ying. Identifying Scenic Spot Entities Based on Improved Knowledge Transfer[J]. 数据分析与知识发现, 2020, 4(5): 118-126.
[8] Li Chengliang,Zhao Zhongying,Li Chao,Qi Liang,Wen Yan. Extracting Product Properties with Dependency Relationship Embedding and Conditional Random Field[J]. 数据分析与知识发现, 2020, 4(5): 54-65.
[9] Qi Ruihua,Jian Yue,Guo Xu,Guan Jinghua,Yang Mingxin. Sentiment Analysis of Cross-Domain Product Reviews Based on Feature Fusion and Attention Mechanism[J]. 数据分析与知识发现, 2020, 4(12): 85-94.
[10] Peng Chen,Lv Xueqiang,Sun Ning,Zang Le,Jiang Zhaocai,Song Li. Building Phrase Dictionary for Defective Products with Convolutional Neural Network[J]. 数据分析与知识发现, 2020, 4(11): 112-120.
[11] Wang Sili,Zhu Zhongming,Yang Heng,Liu Wei. Automatically Identifying Hypernym-Hyponym Relations of Domain Concepts with Patterns and Projection Learning[J]. 数据分析与知识发现, 2020, 4(11): 15-25.
[12] Qin Chenglei,Zhang Chengzhi. Recognizing Structure Functions of Academic Articles with Hierarchical Attention Network[J]. 数据分析与知识发现, 2020, 4(11): 26-42.
[13] Wang Yi,Shen Zhe,Yao Yifan,Cheng Ying. Domain-Specific Event Graph Construction Methods:A Review[J]. 数据分析与知识发现, 2020, 4(10): 1-13.
[14] Bengong Yu,Yumeng Cao,Yangnan Chen,Ying Yang. Classification of Short Texts Based on nLD-SVM-RF Model[J]. 数据分析与知识发现, 2020, 4(1): 111-120.
[15] Huiying Qi,Yuhe Jiang. Predicting Breast Cancer Survival Length with Multi-Omics Data Fusion[J]. 数据分析与知识发现, 2019, 3(8): 88-93.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn