Please wait a minute...
New Technology of Library and Information Service  2012, Vol. 28 Issue (5): 41-47    DOI: 10.11925/infotech.1003-3513.2012.05.06
Current Issue | Archive | Adv Search |
Study on Semantic Markup of Species Description Text in Chinese Based on Auto-learning Rules
Duan Yufeng1, Hei Zhenzhen1, Ju Fei1, Cui Hong2
1. Business School, East China Normal University, Shanghai 200241, China;
2. School of Information Resource & Library Science, University of Arizona, Tucson 85719, USA
Export: BibTeX | EndNote (RIS)      
Abstract  This paper uses the algorithm of auto-learning rules combining with leading words to implement the semantic markup of species description text in Chinese with the data set of 1 000 documents collected from Flora of China randomly. Experimental results indicate that the whole markup efficiency (the values of F) of rule-based algorithm, which is designed by the study, generally reaches 0.930, and most elements are in the range of 0.724-0.964. Therefore, this algorithm is better than Naive Bayesian categorization algorithm, and it is also proved that leading words are positive for optimizing the algorithm.
Key wordsRules      Leading words      Species description text      Semantic markup     
Received: 26 March 2012      Published: 24 July 2012



Cite this article:

Duan Yufeng, Hei Zhenzhen, Ju Fei, Cui Hong. Study on Semantic Markup of Species Description Text in Chinese Based on Auto-learning Rules. New Technology of Library and Information Service, 2012, 28(5): 41-47.

URL:     OR

[1] Taylor A. Extracting Knowledge from Biological Descriptions[C]. In: Proceedings of the 2nd International Conference on Building and Sharing Very Large-Scale Knowledge Bases. 1995:114-119.

[2] Vanel J M. Worldwide Botanical Knowledge Base[EB/OL]. [2011-10-11].

[3] Wood M M, Lydon S J, Tablan V, et al. Using Parallel Texts to Improve Recall in IE[C]. In: Proceedings of International Conference on Recent Advances in Natural Language Processing (RANLP).Amsterdam: John Benjamins, 2004:70-77.

[4] 罗贝,吴洁,曹存根,等. 从文本中获取植物知识方法的研究[J]. 计算机科学 ,2005,32(10):6-13.(Luo Bei, Wu Jie, Cao Cungen,et al. Botanical Knowledge Acquisition from Text[J]. Computer Science, 2005,32(10):6-13.)

[5] 沙丽华. 面向领域文档的语义标注方法研究[D]. 长春:吉林大学,2009.(Sha Lihua. Research on Semantic Annotation for Domain Documents[D]. Changchun: Jilin University,2009.)

[6] 石静. 基于本体的植物信息抽取与分析研究[D]. 西安:西北农林科技大学,2010. (Shi Jing. Information Extraction and Analysis Based on Plant Ontology[D]. Xi'an: Northwest Agriculture and Foresty University, 2010.)

[7] Sautter G, Bohm K, Agosti D. A Combining Approach to Find all Taxon Names[J]. Biodiversity Informatics,2006(3):46-58.

[8] Tang X Y, Heidorn P B. Using Automatically Extracted Information in Species Page Retrieval[EB/OL]. [2011-08-10].

[9] Soderland S. Learning Information Extraction Rules for Semi-Structured and Free Text[J]. Machine Learning, 1999, 34 (1-3): 233-272.

[10] 郑家恒,菅小艳. 农作物信息抽取系统的设计与实现[J]. 计算机工程 ,2006,32(7):197-198,220.(Zheng Jiaheng, Jian Xiaoyan. Design and Realization of the System of Farm Crop Information Extraction[J]. Computer Engineering, 2006, 32(7):197-198,220.)

[11] Cui H, Heidorn P B. The Reusability of Induced Knowledge for Automatic Semantic Markup of Taxonomic Descriptions[J]. Journal of the American Society for Information Science and Technology, 2007, 58(1): 133-149.

[12] Cui H, Boufford D, Selden P. Semantic Annotation of Biosystematics Literature Without Training Examples[J]. Journal of the American Society of Information Science and Technology, 2010, 61 (3): 522-542.

[13] Cui H. The XML Schema for MARTT[EB/OL]. [2012-08-08].

[14] 中国植物志编辑委员会. 中国植物志[M]. 北京:科学出版社,1959.(Flora of China Editorial Committee. Flora of China [M]. Beijing: Science Press, 1959.)
[1] Li Tiejun,Yan Duanwu,Yang Xiongfei. Recommending Microblogs Based on Emotion-Weighted Association Rules[J]. 数据分析与知识发现, 2020, 4(4): 27-33.
[2] Wei Wei,Guo Chonghui,Xing Xiaoyu. Annotating Knowledge Points & Recommending Questions Based on Semantic Association Rules[J]. 数据分析与知识发现, 2020, 4(2/3): 182-191.
[3] Yong Zhang,Shuqing Li,Yongshang Cheng. Mining Algorithm for Weighted Association Rules Based on Frequency Effective Length[J]. 数据分析与知识发现, 2019, 3(7): 85-93.
[4] Qiang Lu,Zhenfang Zhu,Fuyong Xu,Qiangqiang Guo. Chinese Sentiment Classification Method with Bi-LSTM and Grammar Rules[J]. 数据分析与知识发现, 2019, 3(11): 99-107.
[5] Wei Xing,Hu Dehua,Yi Minhan,Zhu Qizhen,Zhu Wenjie. Extracting Disease-Gene-Drug Correlations Based on Data Cube[J]. 数据分析与知识发现, 2017, 1(10): 94-104.
[6] Li Xiaoying,Xia Guanghui,Li Danya. Finding Semantic Relations Among Subject Indexed Papers[J]. 现代图书情报技术, 2016, 32(7-8): 87-93.
[7] Guangce Ruan, Lei Xia. Mining Document Topics Based on Association Rules[J]. 数据分析与知识发现, 2016, 32(12): 50-56.
[8] Du Siqi, Li Honglian, Lv Xueqiang. Research of Chinese Chunk Parsing in Application of the Product Feature Extraction[J]. 现代图书情报技术, 2015, 31(9): 26-30.
[9] Tang Xiaobo, Hu Hua. Research of Ontology Concept Extraction Based on Chinese UGC Sources[J]. 现代图书情报技术, 2014, 30(5): 41-49.
[10] Duan Yufeng, Zhu Wenjing, Chen Qiao, Cui Hong. Semantic Annotation of Species Description Text in Chinese by Combining Naïve Bayes Algorithm with Bootstrapping Method[J]. 现代图书情报技术, 2014, 30(5): 83-89.
[11] Shi Cui, Wang Yang, Yang Bin, Yao Ye. Identification of Non-nest Coordination for Chinese Patent Literature[J]. 现代图书情报技术, 2014, 30(10): 76-83.
[12] Hu Apei, Zhang Jing, Liu Junli. Chinese Term Extraction Based on Improved C-value Method[J]. 现代图书情报技术, 2013, 29(2): 24-29.
[13] Wang Yong, Zhang Qin, Yang Xiaojie. Research on the Method of Extracting Features from Chinese Product Reviews on the Internet[J]. 现代图书情报技术, 2013, (12): 70-73.
[14] Sun Haixia, Li Junlian, Li Danya, Wu Yingjie, Li Xiaoying. The Study on Semantic Mapping from Free Word to Subject Headings Based on Semantic System of CMeSH[J]. 现代图书情报技术, 2013, 29(11): 46-51.
[15] Qu Jianfeng, Li Fang, Zhang Yihua, Li Bao. Study and Implementation on the Automatic Mapping Rules Between Knowledge Organization Systems——The Case of the Dewey Decimal Classification and the Chinese Library Classification[J]. 现代图书情报技术, 2012, (10): 83-88.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938