Please wait a minute...
New Technology of Library and Information Service  2011, Vol. 27 Issue (6): 20-26    DOI: 10.11925/infotech.1003-3513.2011.06.04
Current Issue | Archive | Adv Search |
Study on Talents Description Web Page Automatic Recognition System
Xu Jian1, Wen Haosheng2
1. School of Information Management, Sun Yat-Sen University, Guangzhou 510006, China;
2. Shenzhen Thunder Network Technology Company Ltd., Shenzhen 518057, China
Download: PDF(693 KB)   HTML  
Export: BibTeX | EndNote (RIS)      
Abstract  The paper brings forward a talents description Web page automatic recognition system, realizes automatic recognition methods of university talents description Web pages which are crawled by Nutch crawl system. During the automatic recognition process, features of Web page URL, title label content, anchor text content and Web page content are used.The value of those features are computed based on matching of name list, positive feature word list and negative feature word list. Based on multiple feature values, the system uses LibSVM to realize talents description Web page automatic recognition.
Key wordsLibSVM      Talents description Web page      Automatic classification      Classification feature extraction     
Received: 09 May 2011      Published: 15 August 2011
: 

G250

 

Cite this article:

Xu Jian, Wen Haosheng. Study on Talents Description Web Page Automatic Recognition System. New Technology of Library and Information Service, 2011, 27(6): 20-26.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2011.06.04     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2011/V27/I6/20

[1] Eickhoff C, Serdyukov P, De Vries A P. Web Page Classification on Child Suitability[C]. In:Proceedings of the 19th ACM International Conference on Information and Knowledge Management. New York, NY, USA:ACM, 2010:1425-1428.

[2] Large A, Beheshti J, Rahman T. Design Criteria for Children's Web Portals: The Users Speak Out[J]. Journal of the American Society for Information Science and Technology, 2002, 53(2): 79-94.

[3] Hung B Q, Otsubo M, Hijikata Y, et al. Extraction of Semantic Text Portion Related to Anchor Link[J]. IEICE Transactions on Information and Systems, 2006,89(6):1834-1847.

[4] 吴思竹, 张智雄, 李峰. 科研机构动态监测的网络资源重要性排序方法研究[J]. 情报理论与实践, 2011,34(3):49-53.

[5] Wen H, Fang L, Guan L. Automatic Web Page Classification Using Various Features[C]. In:Proceedings of the 9th Pacific Rim Conference on Multimedia.Springer Verlag, 2008:368-376.

[6] Ozel S A. A Web Page Classification System Based on a Genetic Algorithm Using Tagged-terms as Features[J]. Expert Systems with Applications, 2011, 38(4):3407-3415.

[7] 许世明,武波,马翠,等. 一种基于预分类的高效SVM中文网页分类器[J]. 计算机工程与应用, 2010, 46(1):125-128.

[8] Nutch [EB/OL].[2011-05-08]. http://wiki.apache.org/nutch/.

[9] Introduction Heritrix[EB/OL].[2011-05-08]. http://crawler.archive.org/.

[10] Web-Harvest [EB/OL].[2011-05-08]. http://web-harvest.sourceforge.net/.

[11] MySVM[EB/OL].[2011-05-08]. http://www.dmresearch.net/html/content/classfication-algorithm/1000000928.php.

[12] SVM-Light Support Vector Machine [EB/OL].[2011-05-08]. http://www.360doc.com/showWeb/0/0/117221.aspx.

[13] LibSVM [EB/OL].[2011-05-08]. http://www.csie.ntu.edu.tw/~cjlin/.

[14] Top Universities by Reputation 2011[EB/OL].[2011-05-08]. http://www.timeshighereducation.co.uk/world-university-rankings/2010-2011/reputation-rankings.html.

[15] Precision and Recall[EB/OL].[2011-05-08]. http://en.wikipedia.org/wiki/Precision_and_recall.
[1] Sanhong Deng,Yuyangzi Fu,Hao Wang. Multi-Label Classification of Chinese Books with LSTM Model[J]. 数据分析与知识发现, 2017, 1(7): 52-60.
[2] Li Xiangdong,Ba Zhichao,Gao Fan. Review of Digital Documents Automatic Classification Research[J]. 现代图书情报技术, 2016, 32(9): 17-26.
[3] He Lin, Wan Jian, He Juan, Guo Shiyun. Research on Automatic Classification of Chinese Books Based on Social Tagging[J]. 现代图书情报技术, 2014, 30(9): 1-7.
[4] Hu Bing, Zhang Jianli. Research on Chinese Patent Automatic Classification Method Based on Statistical Distribution[J]. 现代图书情报技术, 2013, 29(7/8): 101-106.
[5] Ma Fang. Research of Patent Automatic Classification Based on RBFNN[J]. 现代图书情报技术, 2011, 27(12): 58-63.
[6] Wang Meiwen. Design and Implementation of Automatic Classification Meta-search Engine Based on Ontology[J]. 现代图书情报技术, 2008, 24(9): 58-63.
[7] Guo Shaoyou. Research on Automatic Classification Based on Term Context Relations[J]. 现代图书情报技术, 2008, 24(5): 44-49.
[8] Qian Aibing,Jiang Lan . Automatic Classification Based on News Titles for Chinese News Web Pages[J]. 现代图书情报技术, 2008, 24(10): 59-68.
[9] Yue Qingling. Automated Folksonomy Research of Tag Resource Based on Synergetic Mechanism[J]. 现代图书情报技术, 2007, 2(9): 58-61.
[10] Luan Fangfang. Automatic Classification Approach and Implement of Multi-media Information Resources[J]. 现代图书情报技术, 2007, 2(7): 83-87.
[11] Fu Liang. A Design of Automatic Classification Based on the Military Information Resources Classification’s Indexing-experience[J]. 现代图书情报技术, 2007, 2(11): 76-79.
[12] Zang Guoquan. On Automatic Classification of Web Page in Virtual Library[J]. 现代图书情报技术, 2002, 18(3): 28-31.
[13] Xiao Ming,Shen Ying. Development of Research on Automatic Classification[J]. 现代图书情报技术, 2000, 16(5): 25-28.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn