New Technology of Library and Information Service  2016, Vol. 32 Issue (1): 24-31    DOI: 10.11925/infotech.1003-3513.2016.01.05
A Study on Hub Page Recognition Using URL Features
Ce Zhang1(),Yuncheng Du1,2,Ran Liang2
1Open Laboratory of TRS Software, Beijing Information Science and Technology University, Beijing 100085, China
2Beijing TRS Information Technology Co. Ltd., Beijing 100101, China
[Objective] By building a simple data sample, the low efficiency as the problem of traditional recognition method is solved. [Methods] This method uses URL features as the basis of recognition, and uses Support Vector Machine (SVM) to recognize page type. [Results] The precision of this method is 91.2%, also in terms of efficiency performance, the method is increased by nearly 60%. [Limitations] When the URL feature is not obvious or even completely contrary, the recognition accuracy will be greatly reduced. [Conclusions] The experimental results show that the method has a great advantage in efficiency, and it will increase the efficiency of the collection system.

Key wordsURL features      Hub pages      SVM     
Received: 25 June 2015      Published: 04 February 2016

Ce Zhang,Yuncheng Du,Ran Liang. A Study on Hub Page Recognition Using URL Features. New Technology of Library and Information Service, 2016, 32(1): 24-31.

