New Technology of Library and Information Service  2014, Vol. 30 Issue (5): 83-89    DOI: 10.11925/infotech.1003-3513.2014.05.11
Semantic Annotation of Species Description Text in Chinese by Combining Naïve Bayes Algorithm with Bootstrapping Method
Duan Yufeng1, Zhu Wenjing2, Chen Qiao1, Cui Hong3
1 Business School, East China Normal University, Shanghai 200241, China;
2 Institute of Scientific and Technical Information of Shanghai, Shanghai Library, Shanghai 200031, China;
3 School of Information Resources and Library Science, University of Arizona, Tucson, AZ85719, USA
[Objective] To reduce cost of machine learning by declining the size of learning dataset in species description text annotation in Chinese. [Methods] Based on Bootstrapping method, design a weakly supervised learning method which performs learning and tagging processes iteratively with a small amount of data at the beginning. The iteration process promotes annotation ability continuously by expanding the knowledge base. [Results] The average score of F-value runs up to 0.911 2 on a dataset with 15 041 sentences. [Limitations] The annotation efficiency might be relatively low on sparse data. [Conclusions] The experimental data shows that the algorithm in this study not only declines the dataset size requirement of machine learning dramatically, but also increases annotation efficiency.

Key wordsBootstrapping method      Na&#x000ef      ve Bayes      Species description text      Semantic annotation     
Received: 15 January 2014      Published: 06 June 2014
:  TP391  

Cite this article:

Duan Yufeng, Zhu Wenjing, Chen Qiao, Cui Hong. Semantic Annotation of Species Description Text in Chinese by Combining Naïve Bayes Algorithm with Bootstrapping Method. New Technology of Library and Information Service, 2014, 30(5): 83-89.

URL:

