New Technology of Library and Information Service  2014, Vol. 30 Issue (11): 53-58    DOI: 10.11925/infotech.1003-3513.2014.11.08
A Semi-supervised Web Scientific and Technical Information Classification Model
Li Chuanxi, Zhang Zhixiong, Liu Jianhua, Qian Li
National Science Library, Chinese Academy of Sciences, Beijing 100190, China
[Objective] Considering the difference of open Web scientific and techical information is minor, general rule-based and statistical learning methods cannot classify the information effectively for the practical application demands. [Methods] By analyzing the content and structure of Web pages, and utilizing the open resources (such as domain Ontology and thesaurus etc.) to perform the self-learning of domain features, this paper proposes a semi-supervised classification model of scientific and technical information. [Results] The experiment results show that the proposed method achieves the precision of 0.9016, recall of 0.8756 and F1 score of 0.8884 respectively, which are superior to Naive Bayes classification. [Limitations] Applying the proposed method to new domain, the domain seed features need be supplied still. [Conclusions] The proposed method can classify the scientific and technical information effectively and satisfy the demand of the information deep analysis and process.

Key wordsWeb scientific and technical information      Scientific and technical information classification model      Open resources     
Received: 20 May 2014      Published: 18 December 2014
PACS:  TP181  

