|
|
Semantic Annotation of Species Description Text in Chinese by Combining Naïve Bayes Algorithm with Bootstrapping Method |
Duan Yufeng1, Zhu Wenjing2, Chen Qiao1, Cui Hong3 |
1 Business School, East China Normal University, Shanghai 200241, China;
2 Institute of Scientific and Technical Information of Shanghai, Shanghai Library, Shanghai 200031, China;
3 School of Information Resources and Library Science, University of Arizona, Tucson, AZ85719, USA |
|
|
Abstract [Objective] To reduce cost of machine learning by declining the size of learning dataset in species description text annotation in Chinese. [Methods] Based on Bootstrapping method, design a weakly supervised learning method which performs learning and tagging processes iteratively with a small amount of data at the beginning. The iteration process promotes annotation ability continuously by expanding the knowledge base. [Results] The average score of F-value runs up to 0.911 2 on a dataset with 15 041 sentences. [Limitations] The annotation efficiency might be relatively low on sparse data. [Conclusions] The experimental data shows that the algorithm in this study not only declines the dataset size requirement of machine learning dramatically, but also increases annotation efficiency.
|
Received: 15 January 2014
Published: 06 June 2014
|
|
[1] 段宇锋, 黑珍珍, 鞠菲, 等. 基于自主学习规则的中文物种描述文本的语义标注研究[J]. 现代图书情报技术, 2012(5): 41-47. (Duan Yufeng, Hei Zhenzhen, Ju Fei, et al. Study on Semantic Markup of Species Description Text in Chinese Based on Auto-learning Rules[J]. New Technology of Library and Information Service, 2012(5): 41-47.)
[2] 段宇锋, 黑珍珍, 鞠菲, 等. 基于贝叶斯分类的中文物种描述文本的语义标注研究[J]. 情报学报, 2012, 31(8): 805-812. (Duan Yufeng, Hei Zhenzhen, Ju Fei, et al. Semantic Annotation of Species Description Text in Chinese Literature by Naïve Bayes Classifier[J]. Journal of the China Society for Scientific and Technical Information, 2012, 31(8):805-812.)
[3] 中国植物志编辑委员会. 中国植物志[M]. 北京: 科学出版社, 1959. (Flora of China Editorial Committee. Flora of China[M]. Beijing: Science Press, 1959.)
[4] Cui H. The XML Schema for MARTT[OL].[2012-08-08]. http://publish.uwo.ca/~hcui7/research/xmlschema.xsd.
[5] Michie D,Spiegelhalter D J,Taylor C C.Machine Learning, Neural and Statistical Classification[M]. New York: Ellis Horwood, 1994.
[6] 罗军, 高琦, 王翊. 基于Bootstrapping的本体标注方法[J]. 计算机工程, 2010, 36(23): 85-87. (Luo Jun, Gao Qi, Wang Yi. Ontology Annotation Method Based on Bootstrapping[J]. Computer Engineering, 2010, 36(23): 85-87.)
[7] 琚春华, 殷贤君, 许翀寰. 结合自助抽样的动态数据流贝叶斯分类算法[J]. 计算机工程与应用, 2011, 47(8): 118-121, 142. (Ju Chunhua, Yin Xianjun, Xu Chonghuan. Bayesian Classification Algorithm of Dynamic Data Stream Based on Bootstrap[J]. Computer Engineering and Applications, 2011, 47(8): 118-121, 142.)
[8] Sacchi L, Tucker A, Counsell S, et al. Improving Predictive Models of Glaucoma Severity by Incorporationg Quality Indicators[J]. Artificial Intelligence in Medicine, 2014, 60(2): 103-112.
[9] Mitchell T M. 机器学习[M]. 曾华军, 张银奎, 等译. 北京:机械工业出版社, 2003: 112-143. (Mitchell T M. Machine Learning[M]. Translated by Zeng Huajun, Zhang Yinkui, et al. Beijing: China Machine Press, 2003: 112-143.)
[10] Cui H. MARTT:A General Approach to Automatic Markup of Taxonomic Descriptions with XML[OL]. [2011-10-12]. http://cais-acsi.ca/proceedings/2005/cui_2005.pdf. |
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|