求职网站职位列表页链接信息的自动提取

doi:10.11925/infotech.1003-3513.2009.07-08.18

现代图书情报技术

2009, Vol. 25

Issue (7-8): 93-96 https://doi.org/10.11925/infotech.1003-3513.2009.07-08.18

情报分析与研究

本期目录 | 过刊浏览 | 高级检索

求职网站职位列表页链接信息的自动提取

方宏¹吕太之²

¹（江苏海事职业技术学院信息工程系南京 211170）
²（南京理工大学计算机科学与技术学院南京 210094）

Automatic Extraction of Job Listing Page Link Information in Job Web Station

Fang Hong¹Lv Taizhi²

¹(Department of Information Engineering, Jiangsu Marine Institute, Nanjing 211170,China)
²(School of Computer Science and Technology,Nanjing University of Science and Technology,Nanjing 210094,China )

摘要
参考文献
相关文章
Metrics

全文: PDF (353 KB)
输出: BibTeX | EndNote (RIS)

摘要

综合运用URL聚类、JavaScript脚本解释等技术，自动识别和提取职位列表页中的职位及翻页链接。实验证明上述技术是行之有效的。

	服务

	把本文推荐给朋友
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	方宏
	吕太之

关键词 ： URL聚类, 列表页, 信息提取, 求职

Abstract：

In this paper,the URL clustering and JavaScript interpreting technologies are applied to extract the job and next page links in the job listing pages automatically. The experiment proves the technologies are useful.

Key words： URL clustering Listing page Information extraction Seeking job

收稿日期: 2009-06-02 出版日期: 2009-08-25

TP393.09

通讯作者: 方宏 E-mail: fanghong_jmi@sina.com

作者简介: 方宏,吕太之

引用本文:

方宏,吕太之. 求职网站职位列表页链接信息的自动提取[J]. 现代图书情报技术, 2009, 25(7-8): 93-96.
Fang Hong,Lv Taizhi. Automatic Extraction of Job Listing Page Link Information in Job Web Station. New Technology of Library and Information Service, 2009, 25(7-8): 93-96.

链接本文:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.1003-3513.2009.07-08.18 或 https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2009/V25/I7-8/93

［1］刘伟，孟小峰，孟卫一. Deep Web数据集成研究综述［J］.计算机学报，2007,30(9):1475-1489.
［2］ Manuel álvarez，Alberto Pan，Juan Raposo，et al. Crawling the Client-side Hidden Web［EB/OL］.［2008-06-25］.http://www.tic.udc.es/~mad/publications/icwi2004.pdf.
［3］ SourceForge Org. HtmlUnit ［EB/OL］．［2008-07-20］. http://htmlunit.sourceforge.net/.
［4］李超峰，卢炎生.基于URL结构和访问时间的Web页面访问相似性度量［J］.计算机科学,2007,34(4):207-209,286.
［5］李魁，程学旗，郭岩，等.WWW论坛中的动态网页采集［J］.计算机工程,2007,33(6):80-82.
［6］王舜燕，李蕾，吴兵华.基于ID3分类算法的深度网络爬虫设计［J］.现代图书情报技术,2008(6):41-45.
［7］金晓鸥，钟宝燕，李翔．基于Rhino的JavaScript动态页面解析研究与实现［J］．计算机技术与发展,2008, 18(2):1-4,50.

[1]	王毅,沈喆,姚毅凡,成颖. 领域事件图谱构建方法综述^*[J]. 数据分析与知识发现, 2020, 4(10): 1-13.
[2]	姜春涛. 自动标注中文专利的引文信息[J]. 现代图书情报技术, 2015, 31(10): 81-87.
[3]	崔宇红. 机构知识库自动存储系统研究[J]. 现代图书情报技术, 2010, 26(12): 76-80.
[4]	陈权,曹卓文,杨晓江. 一个基础教育网站搜索引擎的设计与实现[J]. 现代图书情报技术, 2007, 2(6): 70-73.
[5]	许文,都云程,李渝勤,施水才 . 一种通用HTML网页主题信息提取方法*[J]. 现代图书情报技术, 2007, 2(1): 40-43.

Viewed

Full text

Abstract

Cited

Shared

Discussed