This article uses the model of CRF to conduct an experiment for comparing recognition performance and recognition efficiency between the way based on char labeled and the way based on word labeled. The experiment result shows that the performance of based on char is better than that of based on word at the expense of costing more time. In addition, it also pays more attention to the quantity of feature’s influence on the experiment performance.
郑荣廷 李楠 吉久明 滕青青. 中文化学物质名称识别研究*[J]. 现代图书情报技术, 2010, 26(6): 48-52.
Zheng Rongting Li Nan Ji Jiuming Teng Qingqing. Research on Recognition of Chinese Chemical Substance Names. New Technology of Library and Information Service, 2010, 26(6): 48-52.
[1] ICTCLAS简介[EB/OL].[2009-05-18]. http://ictclas.org/sub_1_1.html.
[2] He Y, Kayaal P M. Biological Entity Recognition with Conditional Random Fields[C].In: Proceedings of AMIA Annual Symposium.2008: 293-297.
[3] 梁樑, 李祎. 商品文本中药物名称和化学名称识别的研究[J]. 烟台大学学报:自然科学与工程版,2002,15(4):280-285.
[4] 宋丹,孙济庆.基于规则的化学特征词自动标引研究[J].情报学报,2009,28(5):689-692.
[5] Klinger R, Koláik C, Fluck J, et al. Detection of IUPAC and IUPAC-like Chemical Names[J]. Bioinformatics, 2008, 24(13):i268-i276.
[6] Lafferty J, McCallum A, Pereira F. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data[C]. In: Proceedings of the 18th International Conference on Machine Learning. San Francisco, CA,USA:Morgan Kaufmann Publishers Inc., 2001: 282-289.
[7] 王昊,苏新宁.基于CRFs的角色标注人名识别模型在网络舆情分析中的应用[J].情报学报, 2009, 28(1):88-96.
[8] 黄昌宁,赵海.中文分词十年回顾[J].中文信息学报,2007,21(3):8-19.
[9] 许晓丽,卢志茂,张格森.基于条件随机场的中文命名实体识别研究[J].中国新技术新产品, 2009(2):15.
[10] 贾美英,杨炳儒,郑德权,等. 采用CRF技术的军事情报术语自动抽取研究[J].计算机工程与应用,2009,45(32):126-129.
[11] Van Rijsbergen C J. Information Retrieval[M]. 2nd Edition. London: Butterworth, 1979.