Please wait a minute...
New Technology of Library and Information Service  2014, Vol. 30 Issue (5): 83-89    DOI: 10.11925/infotech.1003-3513.2014.05.11
RESEARCH ON APPLICATION Current Issue | Archive | Adv Search |
Semantic Annotation of Species Description Text in Chinese by Combining Naïve Bayes Algorithm with Bootstrapping Method
Duan Yufeng1, Zhu Wenjing2, Chen Qiao1, Cui Hong3
1 Business School, East China Normal University, Shanghai 200241, China;
2 Institute of Scientific and Technical Information of Shanghai, Shanghai Library, Shanghai 200031, China;
3 School of Information Resources and Library Science, University of Arizona, Tucson, AZ85719, USA
Download: PDF(1194 KB)   HTML  
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] To reduce cost of machine learning by declining the size of learning dataset in species description text annotation in Chinese. [Methods] Based on Bootstrapping method, design a weakly supervised learning method which performs learning and tagging processes iteratively with a small amount of data at the beginning. The iteration process promotes annotation ability continuously by expanding the knowledge base. [Results] The average score of F-value runs up to 0.911 2 on a dataset with 15 041 sentences. [Limitations] The annotation efficiency might be relatively low on sparse data. [Conclusions] The experimental data shows that the algorithm in this study not only declines the dataset size requirement of machine learning dramatically, but also increases annotation efficiency.

Key wordsBootstrapping method      Na&#x000ef      ve Bayes      Species description text      Semantic annotation     
Received: 15 January 2014      Published: 06 June 2014
:  TP391  

Cite this article:

Duan Yufeng, Zhu Wenjing, Chen Qiao, Cui Hong. Semantic Annotation of Species Description Text in Chinese by Combining Naïve Bayes Algorithm with Bootstrapping Method. New Technology of Library and Information Service, 2014, 30(5): 83-89.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2014.05.11     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2014/V30/I5/83

[1] 段宇锋, 黑珍珍, 鞠菲, 等. 基于自主学习规则的中文物种描述文本的语义标注研究[J]. 现代图书情报技术, 2012(5): 41-47. (Duan Yufeng, Hei Zhenzhen, Ju Fei, et al. Study on Semantic Markup of Species Description Text in Chinese Based on Auto-learning Rules[J]. New Technology of Library and Information Service, 2012(5): 41-47.)
[2] 段宇锋, 黑珍珍, 鞠菲, 等. 基于贝叶斯分类的中文物种描述文本的语义标注研究[J]. 情报学报, 2012, 31(8): 805-812. (Duan Yufeng, Hei Zhenzhen, Ju Fei, et al. Semantic Annotation of Species Description Text in Chinese Literature by Naïve Bayes Classifier[J]. Journal of the China Society for Scientific and Technical Information, 2012, 31(8):805-812.)
[3] 中国植物志编辑委员会. 中国植物志[M]. 北京: 科学出版社, 1959. (Flora of China Editorial Committee. Flora of China[M]. Beijing: Science Press, 1959.)
[4] Cui H. The XML Schema for MARTT[OL].[2012-08-08]. http://publish.uwo.ca/~hcui7/research/xmlschema.xsd.
[5] Michie D,Spiegelhalter D J,Taylor C C.Machine Learning, Neural and Statistical Classification[M]. New York: Ellis Horwood, 1994.
[6] 罗军, 高琦, 王翊. 基于Bootstrapping的本体标注方法[J]. 计算机工程, 2010, 36(23): 85-87. (Luo Jun, Gao Qi, Wang Yi. Ontology Annotation Method Based on Bootstrapping[J]. Computer Engineering, 2010, 36(23): 85-87.)
[7] 琚春华, 殷贤君, 许翀寰. 结合自助抽样的动态数据流贝叶斯分类算法[J]. 计算机工程与应用, 2011, 47(8): 118-121, 142. (Ju Chunhua, Yin Xianjun, Xu Chonghuan. Bayesian Classification Algorithm of Dynamic Data Stream Based on Bootstrap[J]. Computer Engineering and Applications, 2011, 47(8): 118-121, 142.)
[8] Sacchi L, Tucker A, Counsell S, et al. Improving Predictive Models of Glaucoma Severity by Incorporationg Quality Indicators[J]. Artificial Intelligence in Medicine, 2014, 60(2): 103-112.
[9] Mitchell T M. 机器学习[M]. 曾华军, 张银奎, 等译. 北京:机械工业出版社, 2003: 112-143. (Mitchell T M. Machine Learning[M]. Translated by Zeng Huajun, Zhang Yinkui, et al. Beijing: China Machine Press, 2003: 112-143.)
[10] Cui H. MARTT:A General Approach to Automatic Markup of Taxonomic Descriptions with XML[OL]. [2011-10-12]. http://cais-acsi.ca/proceedings/2005/cui_2005.pdf.

[1] Yongnan Li. Using Bayes Theory to Classify Counter Terrorism Intelligence[J]. 数据分析与知识发现, 2018, 2(10): 9-14.
[2] Mao Chenyu,Le Xiaoqiu. Linguistic Features of New Findings in Chinese Scientific Papers[J]. 现代图书情报技术, 2016, 32(5): 47-55.
[3] Tang Xiangbin, Lu Wei, Zhang Xiaojuan, Huang Shihao. Feature Analysis and Automatic Identification of Query Specificity[J]. 现代图书情报技术, 2015, 31(2): 15-23.
[4] Ma Bin, Yin Lifeng. A Parallel Naive Bayesian Network Public Opinion Fast Classification Algorithm Based on Hadoop Platform[J]. 现代图书情报技术, 2015, 31(2): 78-84.
[5] Wang Chuanqing, Bi Qiang. System Model of Digital Library Automatic Semantic Annotation Tool[J]. 现代图书情报技术, 2014, 30(6): 17-24.
[6] Tang Shouli, Xu Baoxiang. Research on Ontology-based Cloud Services Semantic Retrieval System[J]. 现代图书情报技术, 2014, 30(12): 27-35.
[7] She Guiqing, Zhang Yongan. Study on the Model of Automatic Extraction and Annotation of Trail Cases[J]. 现代图书情报技术, 2013, (6): 23-29.
[8] Yao Xiaona, Zhu Zhongming, Wang Sili. Research on Automatic Semantic Annotation for Geosciences[J]. 现代图书情报技术, 2013, (4): 48-53.
[9] Xu Xin, Guo Jinlong. Construction of Subject Knowledge Base——Taking the Domain of Chinese Cuisine Culture as an Example[J]. 现代图书情报技术, 2013, (12): 2-9.
[10] Guo Jinlong, Hong Yunjia, Xu Xin. Construction and Application of Ontology in the Domain of Chinese Cuisine Culture[J]. 现代图书情报技术, 2013, (12): 10-18.
[11] Jin Biyi, Guo Jinlong, Xu Xin. Research on Using Domain Ontology to Optimize the Document Retrieval——Design and Implementation on the KIM Platform[J]. 现代图书情报技术, 2013, (12): 27-33.
[12] Mi Yang, Cao Jindan. A Case Study of Semantic Annotation with Multi-Ontology by Upper-level Ontology Unitive Control[J]. 现代图书情报技术, 2012, (9): 36-41.
[13] Duan Yufeng, Hei Zhenzhen, Ju Fei, Cui Hong. Study on Semantic Markup of Species Description Text in Chinese Based on Auto-learning Rules[J]. 现代图书情报技术, 2012, 28(5): 41-47.
[14] Hu Yuanjiao, Wang Hao. Scholars Knowledge Map Construction and Analysis Based on CSSCI[J]. 现代图书情报技术, 2011, 27(3): 38-44.
[15] Zhang Hongbin, Cao Yiqin. A New Classifier Design in a Topic Search Engine by Combining Multi-layer Classifier with Naive Bayes Classification Model[J]. 现代图书情报技术, 2011, 27(3): 73-79.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn