New Technology of Library and Information Service  2012, Vol. 28 Issue (5): 41-47    DOI: 10.11925/infotech.1003-3513.2012.05.06
Study on Semantic Markup of Species Description Text in Chinese Based on Auto-learning Rules
Duan Yufeng1, Hei Zhenzhen1, Ju Fei1, Cui Hong2
1. Business School, East China Normal University, Shanghai 200241, China;
2. School of Information Resource & Library Science, University of Arizona, Tucson 85719, USA
Abstract  This paper uses the algorithm of auto-learning rules combining with leading words to implement the semantic markup of species description text in Chinese with the data set of 1 000 documents collected from Flora of China randomly. Experimental results indicate that the whole markup efficiency (the values of F) of rule-based algorithm, which is designed by the study, generally reaches 0.930, and most elements are in the range of 0.724-0.964. Therefore, this algorithm is better than Naive Bayesian categorization algorithm, and it is also proved that leading words are positive for optimizing the algorithm.
Key wordsRules      Leading words      Species description text      Semantic markup     
Received: 26 March 2012      Published: 24 July 2012



Duan Yufeng, Hei Zhenzhen, Ju Fei, Cui Hong. Study on Semantic Markup of Species Description Text in Chinese Based on Auto-learning Rules. New Technology of Library and Information Service, 2012, 28(5): 41-47.

