Please wait a minute...
New Technology of Library and Information Service  2010, Vol. 26 Issue (6): 48-52    DOI: 10.11925/infotech.1003-3513.2010.06.08
article Current Issue | Archive | Adv Search |
Research on Recognition of Chinese Chemical Substance Names
Zheng Rongting,Li Nan,Ji Jiuming,Teng Qingqing
(Library of East China University of Science and Technology, Shanghai 200237,China)
Download:
Export: BibTeX | EndNote (RIS)      
Abstract  

This article uses the model of CRF to conduct an experiment for comparing recognition performance and recognition efficiency between the way based on char labeled and the way based on word labeled. The experiment result shows that the performance of based on char is better than that of based on word at the expense of costing more time. In addition, it also pays more attention to the quantity of feature’s influence on the experiment performance.  

Key wordsCRF      Chinese chemical substance names      Labeled on char      Labeled on word      Quantity of feature     
Received: 12 April 2010      Published: 26 July 2010
: 

TP393

 
Fund:

*本文系上海市科委软科学研究基金项目“基于知识集成的上海研发公共服务平台协同机制研究”(项目编号:056921012)的研究成果之一。
*本文系2010“图书馆信息技术的应用、服务和创新”学术研讨会论文。

Corresponding Authors: Ji Jiuming     E-mail: jjm@mail.lib.ecust.edu.cn

Cite this article:

Zheng Rongting Li Nan Ji Jiuming Teng Qingqing. Research on Recognition of Chinese Chemical Substance Names. New Technology of Library and Information Service, 2010, 26(6): 48-52.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2010.06.08     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2010/V26/I6/48

[1] ICTCLAS简介[EB/OL].[2009-05-18]. http://ictclas.org/sub_1_1.html.
[2] He Y, Kayaal P M. Biological Entity Recognition with Conditional Random Fields[C].In: Proceedings of AMIA Annual Symposium.2008: 293-297.
[3] 梁樑, 李祎. 商品文本中药物名称和化学名称识别的研究[J]. 烟台大学学报:自然科学与工程版,2002,15(4):280-285.
[4] 宋丹,孙济庆.基于规则的化学特征词自动标引研究[J].情报学报,2009,28(5):689-692.
[5] Klinger R, Koláik C, Fluck J, et al. Detection of IUPAC and IUPAC-like Chemical Names[J]. Bioinformatics, 2008, 24(13):i268-i276.
[6] Lafferty J, McCallum A, Pereira F. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data[C]. In: Proceedings of the 18th International Conference on Machine Learning. San Francisco, CA,USA:Morgan Kaufmann Publishers Inc., 2001: 282-289.
[7] 王昊,苏新宁.基于CRFs的角色标注人名识别模型在网络舆情分析中的应用[J].情报学报, 2009, 28(1):88-96.
[8] 黄昌宁,赵海.中文分词十年回顾[J].中文信息学报,2007,21(3):8-19.
[9] 许晓丽,卢志茂,张格森.基于条件随机场的中文命名实体识别研究[J].中国新技术新产品, 2009(2):15.
[10] 贾美英,杨炳儒,郑德权,等. 采用CRF技术的军事情报术语自动抽取研究[J].计算机工程与应用,2009,45(32):126-129.
[11] Van Rijsbergen C J. Information Retrieval[M]. 2nd Edition. London: Butterworth, 1979.

[1] Wang Hao, Lin Kerou, Meng Zhen, Li Xinlei. Identifying Multi-Type Entities in Legal Judgments with Text Representation and Feature Generation[J]. 数据分析与知识发现, 2021, 5(7): 10-25.
[2] Yu Xuehan, He Lin, Xu Jian. Extracting Events from Ancient Books Based on RoBERTa-CRF[J]. 数据分析与知识发现, 2021, 5(7): 26-35.
[3] Hu Haotian,Ji Jinfeng,Wang Dongbo,Deng Sanhong. An Integrated Platform for Food Safety Incident Entities Based on Deep Learning[J]. 数据分析与知识发现, 2021, 5(3): 12-24.
[4] Xue Fuliang,Liu Lifang. Fine-Grained Sentiment Analysis with CRF and ATAE-LSTM[J]. 数据分析与知识发现, 2020, 4(2/3): 207-213.
[5] Ma Jianxia,Yuan Hui,Jiang Xiang. Extracting Name Entities from Ecological Restoration Literature with Bi-LSTM+CRF[J]. 数据分析与知识发现, 2020, 4(2/3): 78-88.
[6] Na Ma,Zhixiong Zhang,Pengmin Wu. Automatic Identification of Term Citation Object with Feature Fusion[J]. 数据分析与知识发现, 2020, 4(1): 89-98.
[7] Xiaoxiao Zhu,Zunqi Yang,Jing Liu. Construction of an Adverse Drug Reaction Extraction Model Based on Bi-LSTM and CRF[J]. 数据分析与知识发现, 2019, 3(2): 90-97.
[8] Li Yu,Li Qian,Changlei Fu,Huaming Zhao. Extracting Fine-grained Knowledge Units from Texts with Deep Learning[J]. 数据分析与知识发现, 2019, 3(1): 38-45.
[9] Feng Guoming,Zhang Xiaodong,Liu Suhui. DBLC Model for Word Segmentation Based on Autonomous Learning[J]. 数据分析与知识发现, 2018, 2(5): 40-47.
[10] Qi Huiying,Guo Jianguang. Integrating Multi-Source Clinical Research Data Based on CDISC Standard[J]. 数据分析与知识发现, 2018, 2(5): 88-93.
[11] Wang Miping,Wang Hao,Deng Sanhong,Wu Zhixiang. Extracting Chinese Metallurgy Patent Terms with Conditional Random Fields[J]. 现代图书情报技术, 2016, 32(6): 28-36.
[12] Sui Mingshuang,Cui Lei. Extracting Chemical and Disease Named Entities with Multiple-Feature CRF Model[J]. 现代图书情报技术, 2016, 32(10): 91-97.
[13] Duan Yufeng, Zhu Wenjing, Chen Qiao, Liu Wei, Liu Fenghong. The Study on Out-of-Vocabulary Identification on a Model Based on the Combination of CRFs and Domain Ontology Elements Set[J]. 现代图书情报技术, 2015, 31(4): 41-49.
[14] Shi Cui, Wang Yang, Yang Bin, Yao Ye. Identification of Non-nest Coordination for Chinese Patent Literature[J]. 现代图书情报技术, 2014, 30(10): 76-83.
[15] Wang Run,He Lin,Wang Dongbo,Huang Shuiqing,Fan Yuanbiao. Research on Plant Growth and Development Stage Named Entity Recognition for Text Mining[J]. 现代图书情报技术, 2014, 30(1): 24-27.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn