|
|
Automatic Extraction of Job Listing Page Link Information in Job Web Station |
Fang Hong1 Lv Taizhi2 |
1(Department of Information Engineering, Jiangsu Marine Institute, Nanjing 211170,China)
2(School of Computer Science and Technology,Nanjing University of Science and Technology,Nanjing 210094,China ) |
|
|
Abstract In this paper,the URL clustering and JavaScript interpreting technologies are applied to extract the job and next page links in the job listing pages automatically. The experiment proves the technologies are useful.
|
Received: 02 June 2009
Published: 25 August 2009
|
|
Corresponding Authors:
Fang Hong
E-mail: fanghong_jmi@sina.com
|
About author:: Fang Hong,Lv Taizhi |
[1] 刘伟,孟小峰,孟卫一. Deep Web数据集成研究综述[J].计算机学报,2007,30(9):1475-1489.
[2] Manuel álvarez,Alberto Pan,Juan Raposo,et al. Crawling the Client-side Hidden Web[EB/OL].[2008-06-25].http://www.tic.udc.es/~mad/publications/icwi2004.pdf.
[3] SourceForge Org. HtmlUnit [EB/OL].[2008-07-20]. http://htmlunit.sourceforge.net/.
[4] 李超峰,卢炎生.基于URL结构和访问时间的Web页面访问相似性度量[J].计算机科学,2007,34(4):207-209,286.
[5] 李魁,程学旗,郭岩,等.WWW论坛中的动态网页采集[J].计算机工程,2007,33(6):80-82.
[6] 王舜燕,李蕾,吴兵华.基于ID3分类算法的深度网络爬虫设计[J].现代图书情报技术,2008(6):41-45.
[7] 金晓鸥,钟宝燕,李翔.基于Rhino的JavaScript动态页面解析研究与实现[J].计算机技术与发展,2008, 18(2):1-4,50. |
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|