|
|
Research on Chinese Chemical Name Recognition Based on Heuristic Rules |
Li Nan Zheng Rongting Ji JiumingTeng Qingqing |
(Library of East China University of Science and Technology, Shanghai 200237, China) |
|
|
Abstract This paper proposes a method of domain name recognition based on heuristic rules, to overcome the shortage of traditional solution in specific domain. It firstly studies chemical name in Chinese to obtain its domain features and statistical language features, and then on the basis of such features,it puts forward several heuristic rules, which is applicable to domain name recognition of chemical literature. Comparison experiment shows this method can improve the efficiency of domain name recognition obviously.
|
Received: 09 April 2010
Published: 25 May 2010
|
|
Corresponding Authors:
Li Nan
E-mail: ajen@ecust.edu.cn
|
1] 赵军.命名实体识别、排歧和跨语言关联[J].中文信息学报,2009,23(2):3-17.
[2] Grishman R, Sundhiem B. Design of the MUC-6 Evaluation[C]. In: Proceedings of the 6th Message Understanding Conference. NJ: Association for Computational Linguistics, 1995:1-11.
[3] Chen H H, Ding Y W, Tsai S C, et al. Description of the NTU System Used for MET-2[C]. In: Proceedings of the 7th Message Understanding Conference. 1998.
[4] Black W J, Rinaldi F, Mowatt D. Facile: Description of the NE System Used For MUC-7[C]. In: Proceedings of the 7th Message Understanding Conference. 1998.
[5] Sun J, Gao J F, Zhang L, et al. Chinese Named Entity Identification Using Class Based Language Model[C]. In: Proceedings of the 19th International Conference on Computational Linguistics. NJ: Association for Computational Linguistics, 2002: 1-7.
[6] Zhou G D, Su J. Named Entity Recognition Using an HMM Based Chunk Tagger[C]. In: Proceedings of the 40th Annual Meeting of the ACL. NJ: Association for Computational Linguistics, 2002: 473-480.
[7] Ramaparkhi A. A Simple Introduction to Maximum Entropy Models for Natural Language Processing[R]. Institute for Research in Cognitive Science, University of Pennsylvania, 1997.
[8] 刘建华,张智雄,徐健,等.自动术语识别——对科技文献进行文本挖掘的重要技术方法[J].现代图书情报技术,2008(8):12-17.
[9] Krauthammer M, Rzhetsky A, Morozov P, et al. Using BLAST for Identifying Gene and Protein Names in Journal Articles [J]. Gene, 2000, 259(1):245-252.
[10] 宋丹,孙济庆.基于规则的化学特征词自动标引研究[J].情报学报, 2009,28(5):689-692.
[11] Klinger R, Kolárik C, Fluck J, et al. Detection of IUPAC and IUPAC-like Chemical Names[J]. Bioinformatics, 2008, 24(13):268-276.
[12] 中国化学会.化学命名原则[M].北京:科学出版社,1984. |
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|