Please wait a minute...
New Technology of Library and Information Service  2004, Vol. 20 Issue (2): 68-71    DOI: 10.11925/infotech.1003-3513.2004.02.19
Current Issue | Archive | Adv Search |
Automatic Web Information Extraction Based on DOM
Wu Wei   Liu Youhua
(Department of Information Management, Nanjing University,Nanjing 210093,China)
Download: PDF (0 KB)  
Export: BibTeX | EndNote (RIS)      
Abstract  

More and more Web sites are built on databasedriven architecture. The Web pages of these sites are creating dynamically. This paper advances and implements a method of automatic information extraction from the dynamic pages by using WebBrowser and DOM technique. In addition, the paper illustrates the details and code through a prototype.

Key wordsDynamic Web      Automatic information extraction      DOM      WebBrowser     
Received: 15 September 2003      Published: 06 January 2004
ZTFLH: 

TP311

 
Corresponding Authors: Wu Wei     E-mail: wuweibox@hotmail.com
About author:: Wu Wei,Liu Youhua

Cite this article:

Wu Wei,Liu Youhua. Automatic Web Information Extraction Based on DOM. New Technology of Library and Information Service, 2004, 20(2): 68-71.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2004.02.19     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2004/V20/I2/68

1Document Object Model (DOM) Level1 Specification, Version 1.0. W3C Recommendation. October,01,1998. http://www.w3.org/TR/1998/REC-DOM-Level-1-19981001/  (Accessed Mar.5,2003)
2Document Object Model(DOM)Level 2 Events Specification,Version1.0. W3C Recommendation. November,13,2000. http://www.w3.org/TR/DOM-Level-2-Events/ (Accessed Mar,6,2003)
3Michael Edwards, Scott Roberts. Reusing Internet Explorer and the WebBrowser Control: An Array of Options. Microsoft Corporation MSDN Library. July 30, 1998.http://msdn.microsoft.com/library/en-us/dnwebgen/html/reusebovw.asp  (Accessed Jul.15,2002)
4Kevin Hoffman, Jeff Gabriel al. Professional .NET Framework. Wrox Press Ltd. 2001
5Microsoft Corporation. Microsoft Visual C#.NET Language Reference. Microsoft Press. 2002
6Microsoft Corporation MSDN Library.Microsoft.NET/COM Migration and Interoperability. August,2001. http://msdn.microsoft.com/library/en-us/dnbda/html/cominterop.asp (Accessed May,1,2003)
7Microsoft Corporation MSDN Library. MSHTML Reference. 
http://msdn.microsoft.com/workshop/browser/mshtml/reference/reference.asp (Accessed Apri. 24,2003)
8Microsoft Corporation MSDN Library. WebBrowser Object.
http://msdn.microsoft.com/workshop/browser/webbrowser/reference/Objects/WebBrowser.asp (Accessed Apr.5, 2003)
9Microsoft Visual InterDev. http://msdn.microsoft.com/vinterdev/default.asp (Accessed Apr.3, 2003)

[1] Zhao Ping,Sun Lianying,Tu Shuai,Bian Jianling,Wan Ying. Identifying Scenic Spot Entities Based on Improved Knowledge Transfer[J]. 数据分析与知识发现, 2020, 4(5): 118-126.
[2] Li Chengliang,Zhao Zhongying,Li Chao,Qi Liang,Wen Yan. Extracting Product Properties with Dependency Relationship Embedding and Conditional Random Field[J]. 数据分析与知识发现, 2020, 4(5): 54-65.
[3] Bengong Yu,Yumeng Cao,Yangnan Chen,Ying Yang. Classification of Short Texts Based on nLD-SVM-RF Model[J]. 数据分析与知识发现, 2020, 4(1): 111-120.
[4] Huiying Qi,Yuhe Jiang. Predicting Breast Cancer Survival Length with Multi-Omics Data Fusion[J]. 数据分析与知识发现, 2019, 3(8): 88-93.
[5] Han Huang,Hongyu Wang,Xiaoguang Wang. Automatic Recognizing Legal Terminologies with Active Learning and Conditional Random Field Model[J]. 数据分析与知识发现, 2019, 3(6): 66-74.
[6] Wancheng Chen,Haoran Dai,Yinghan Jin. Appraising Home Prices with HEDONIC Model: Case Study of Seattle, U.S.[J]. 数据分析与知识发现, 2019, 3(5): 19-26.
[7] Zhiqiang Liu,Yuncheng Du,Shuicai Shi. Extraction of Key Information in Web News Based on Improved Hidden Markov Model[J]. 数据分析与知识发现, 2019, 3(3): 120-128.
[8] Jiaxin Ye,Huixiang Xiong. Recommending Personalized Contents from Cross-Domain Resources Based on Tags[J]. 数据分析与知识发现, 2019, 3(2): 21-32.
[9] Pengcheng Xu,Qiang Bi. Identifying Domain Experts Based on Knowledge Super-Network[J]. 数据分析与知识发现, 2019, 3(11): 89-98.
[10] Kan Liu,Haochen Du. Detecting Twitter Rumors with Deep Transfer Network[J]. 数据分析与知识发现, 2019, 3(10): 47-55.
[11] Li Xiangdong,Gao Fan,Li Youhai. Categorizing Documents Automatically within Common Semantic Space[J]. 数据分析与知识发现, 2018, 2(9): 66-73.
[12] He Youshi,He Shufang. Sentiment Mining of Online Product Reviews Based on Domain Ontology[J]. 数据分析与知识发现, 2018, 2(8): 60-68.
[13] Zhou Cheng,Wei Hongqin. Identifying Crowd Participants with Modified Random Forests Algorithm[J]. 数据分析与知识发现, 2018, 2(7): 46-54.
[14] Tang Huihui,Wang Hao,Zhang Zixuan,Wang Xueying. Extracting Names of Historical Events Based on Chinese Character Tags[J]. 数据分析与知识发现, 2018, 2(7): 89-100.
[15] Chen Yuan,Wang Chaoqun,Hu Zhongyi,Wu Jiang. Identifying Malicious Websites with PCA and Random Forest Methods[J]. 数据分析与知识发现, 2018, 2(4): 71-80.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn