|
|
Study on Talents Description Web Page Automatic Recognition System |
Xu Jian1, Wen Haosheng2 |
1. School of Information Management, Sun Yat-Sen University, Guangzhou 510006, China;
2. Shenzhen Thunder Network Technology Company Ltd., Shenzhen 518057, China |
|
|
Abstract The paper brings forward a talents description Web page automatic recognition system, realizes automatic recognition methods of university talents description Web pages which are crawled by Nutch crawl system. During the automatic recognition process, features of Web page URL, title label content, anchor text content and Web page content are used.The value of those features are computed based on matching of name list, positive feature word list and negative feature word list. Based on multiple feature values, the system uses LibSVM to realize talents description Web page automatic recognition.
|
Received: 09 May 2011
Published: 15 August 2011
|
|
[1] Eickhoff C, Serdyukov P, De Vries A P. Web Page Classification on Child Suitability[C]. In:Proceedings of the 19th ACM International Conference on Information and Knowledge Management. New York, NY, USA:ACM, 2010:1425-1428.[2] Large A, Beheshti J, Rahman T. Design Criteria for Children's Web Portals: The Users Speak Out[J]. Journal of the American Society for Information Science and Technology, 2002, 53(2): 79-94.[3] Hung B Q, Otsubo M, Hijikata Y, et al. Extraction of Semantic Text Portion Related to Anchor Link[J]. IEICE Transactions on Information and Systems, 2006,89(6):1834-1847.[4] 吴思竹, 张智雄, 李峰. 科研机构动态监测的网络资源重要性排序方法研究[J]. 情报理论与实践, 2011,34(3):49-53.[5] Wen H, Fang L, Guan L. Automatic Web Page Classification Using Various Features[C]. In:Proceedings of the 9th Pacific Rim Conference on Multimedia.Springer Verlag, 2008:368-376.[6] Ozel S A. A Web Page Classification System Based on a Genetic Algorithm Using Tagged-terms as Features[J]. Expert Systems with Applications, 2011, 38(4):3407-3415.[7] 许世明,武波,马翠,等. 一种基于预分类的高效SVM中文网页分类器[J]. 计算机工程与应用, 2010, 46(1):125-128.[8] Nutch [EB/OL].[2011-05-08]. http://wiki.apache.org/nutch/.[9] Introduction Heritrix[EB/OL].[2011-05-08]. http://crawler.archive.org/.[10] Web-Harvest [EB/OL].[2011-05-08]. http://web-harvest.sourceforge.net/.[11] MySVM[EB/OL].[2011-05-08]. http://www.dmresearch.net/html/content/classfication-algorithm/1000000928.php.[12] SVM-Light Support Vector Machine [EB/OL].[2011-05-08]. http://www.360doc.com/showWeb/0/0/117221.aspx.[13] LibSVM [EB/OL].[2011-05-08]. http://www.csie.ntu.edu.tw/~cjlin/.[14] Top Universities by Reputation 2011[EB/OL].[2011-05-08]. http://www.timeshighereducation.co.uk/world-university-rankings/2010-2011/reputation-rankings.html.[15] Precision and Recall[EB/OL].[2011-05-08]. http://en.wikipedia.org/wiki/Precision_and_recall. |
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|