Please wait a minute...
New Technology of Library and Information Service  2009, Vol. 25 Issue (7-8): 93-96    DOI: 10.11925/infotech.1003-3513.2009.07-08.18
Current Issue | Archive | Adv Search |
Automatic Extraction of Job Listing Page Link Information in Job Web Station
Fang Hong1    Lv  Taizhi2
1(Department of Information Engineering, Jiangsu Marine Institute, Nanjing 211170,China)
2(School of Computer Science and Technology,Nanjing University of Science and Technology,Nanjing 210094,China )
Download: PDF(353 KB)   HTML  
Export: BibTeX | EndNote (RIS)      
Abstract  

In this paper,the URL clustering and JavaScript interpreting technologies are applied to extract the job and next page links in the job listing pages automatically. The experiment proves  the technologies are useful.

Key wordsURL clustering      Listing page      Information extraction      Seeking job     
Received: 02 June 2009      Published: 25 August 2009
: 

TP393.09

 
Corresponding Authors: Fang Hong     E-mail: fanghong_jmi@sina.com
About author:: Fang Hong,Lv Taizhi

Cite this article:

Fang Hong,Lv Taizhi. Automatic Extraction of Job Listing Page Link Information in Job Web Station. New Technology of Library and Information Service, 2009, 25(7-8): 93-96.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2009.07-08.18     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2009/V25/I7-8/93

[1] 刘伟,孟小峰,孟卫一. Deep Web数据集成研究综述[J].计算机学报,2007,30(9):1475-1489.
[2] Manuel álvarez,Alberto Pan,Juan Raposo,et al. Crawling the Client-side Hidden Web[EB/OL].[2008-06-25].http://www.tic.udc.es/~mad/publications/icwi2004.pdf.
[3] SourceForge Org. HtmlUnit [EB/OL].[2008-07-20]. http://htmlunit.sourceforge.net/.
[4] 李超峰,卢炎生.基于URL结构和访问时间的Web页面访问相似性度量[J].计算机科学,2007,34(4):207-209,286.
[5] 李魁,程学旗,郭岩,等.WWW论坛中的动态网页采集[J].计算机工程,2007,33(6):80-82.
[6] 王舜燕,李蕾,吴兵华.基于ID3分类算法的深度网络爬虫设计[J].现代图书情报技术,2008(6):41-45.
[7] 金晓鸥,钟宝燕,李翔.基于Rhino的JavaScript动态页面解析研究与实现[J].计算机技术与发展,2008, 18(2):1-4,50.

[1] Zhiqiang Liu,Yuncheng Du,Shuicai Shi. Extraction of Key Information in Web News Based on Improved Hidden Markov Model[J]. 数据分析与知识发现, 2019, 3(3): 120-128.
[2] Dongmei Mu,Shan Jin,Yuanhong Ju. Finding Association Between Diseases and Genes from Literature Abstracts[J]. 数据分析与知识发现, 2018, 2(8): 98-106.
[3] Yufeng Duan,Sisi Huang. Information Extraction from Chinese Plant Species Diversity Description Text[J]. 现代图书情报技术, 2016, 32(1): 87-96.
[4] Liu Wei, Wang Xing, Song Peiyan. A Noise Cleaning Method for Synonym Extraction Results[J]. 现代图书情报技术, 2015, 31(6): 64-70.
[5] Jiang Chuntao. Automatic Annotation of Bibliographical References in Chinese Patent Documents[J]. 现代图书情报技术, 2015, 31(10): 81-87.
[6] Li Xiangdong, Huo Yayong, Huang Li. Study of Book Pages Automatic Identification and Bibliographic Information Extraction[J]. 现代图书情报技术, 2014, 30(4): 71-77.
[7] Liu Yajing, Wang Yanxi, Hao Dan, Zhou Jinhui. Study on the Methods of Institutional Repository Supporting Research Services[J]. 现代图书情报技术, 2014, 30(3): 1-7.
[8] Zhang Han, Liu Shuangmei. Comparative Analysis of Centrality Indices in Extracting Concepts from Semantic Predication Network——Based on Disease Treatment Research[J]. 现代图书情报技术, 2013, (6): 30-35.
[9] Huang Xun, You Hongliang, Yu Yang. A Review of Relation Extraction[J]. 现代图书情报技术, 2013, 29(11): 30-39.
[10] He Lin, He Juan, Shen Gengyu, Yang Bo, Huang Shuiqing. An Approach to Discovery of Reference Control Gene for qRT-PCR Experiment Based on Texting Mining[J]. 现代图书情报技术, 2012, 28(7): 109-114.
[11] Gao Qiang, You Hongliang. Study on Named Entity Recognition Based on Cascaded Model for Field of Defense[J]. 现代图书情报技术, 2012, (11): 47-52.
[12] Wang Xiuyan, Cui Lei. Overview of Semantic Relations Extraction Between Biomedical Entities by Key Verbs[J]. 现代图书情报技术, 2011, 27(9): 21-27.
[13] Zhou Hong, Zhang Bei, Jiang Airong, Zhang Chengyu. Design and Implementation of Library Bibliography Information Self SMS Push Service[J]. 现代图书情报技术, 2011, 27(7/8): 127-131.
[14] Wang Zhichao, Weng Nan, Wang Yu. Research of Title Party News Identification Technology Based on Topic Sentence Similarity[J]. 现代图书情报技术, 2011, (11): 48-53.
[15] Lu Wanhui, Ma Jianxia. Research on Complex Time Information Extraction Based on CRF Model[J]. 现代图书情报技术, 2011, 27(10): 29-33.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn