Please wait a minute...
New Technology of Library and Information Service  2014, Vol. 30 Issue (1): 72-78    DOI: 10.11925/infotech.1003-3513.2014.01.11
INFORMATION ANALYSIS AND RESEARCH Current Issue | Archive | Adv Search |
Chinese Organization Name Recognition in User Query Log
Guan Xiaoda1, Lv Xueqiang1, Li Zhuo1, Zheng Luexing1, 2
1Beijing Key Laboratory of Internet Culture and Digital Dissemination Research,Beijing Information Science and Technology University,Beijing 100101,China; 2Institute of Computational Linguistics,Peking University,Beijing 100871,China
Download: PDF(458 KB)   HTML  
Export: BibTeX | EndNote (RIS)      
Abstract  [Objective] To solve the problems of query log annotated data shortage and information asymmetry in user query log organization name recognition. [Methods] The paper proposes an automatic method to create training data,which abates the insufficient of user query log annotated data. The authors cite the adhesion features and constructed CRF model to recognize organization names by integrating context information. [Results] Experiments on Sogou user query log show that precision rate can reach 72.80%,recall rate can reach 86.73% and F-measure can reach 79.16%. The method improves F-measure by 30% comparing with the traditional organization name recognition method. [Limitations] The model error using auto-created training set will be greater than standard annotated user query log data.The scale of organization name set will affect the completeness of the model’s context knowledge. [Conclusions] Experiment results demonstrate that the method is effective.
Key wordsUser query log      Chinese organization name      Corpus construction      Adhesion feature      CRF     
Received: 14 February 2014      Published: 14 February 2014
:  TP391  

Cite this article:

Guan Xiaoda,Lv Xueqiang,Li Zhuo,Zheng Luexing,. Chinese Organization Name Recognition in User Query Log. New Technology of Library and Information Service, 2014, 30(1): 72-78.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2014.01.11     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2014/V30/I1/72

[1] 沈嘉懿,李芳,徐飞玉,等.中文组织机构名称与简称的识别[J].中文信息学报,2007,21(6):17-21.(Shen Jiayi,Li Fang,Xu Feiyu,et al. Recognition of Chinese Organization Names and Abbreviations[J]. Journal of Chinese Information Processing,2007,21(6):17-21.)
[2]张小衡,王玲玲.中文机构名称的识别与分析[J].中文信息学报,1997,11(4):21-32.(Zhang Xiaoheng,Wang Lingling. Identification and Analysis of Chinese Organization and Institution Names[J]. Journal of Chinese Information Processing,1997,11(4):21-32.)
[3]周昆.基于规则的命名实体识别研究[D].合肥:合肥工业大学,2010.(Zhou Kun. Research on Named Entity Recognition Based on Rules[D]. Hefei:Hefei University of Technology,2010.)
[4]俞鸿魁,张华平,刘群.基于角色标注的中文机构名识别[C].见:第20届东方语言计算机处理国际会议论文集,沈阳,中国.2003:79-87.(Yu Hongkui,Zhang Huaping,Liu Qun. Recognition of Chinese Organization Name Based on Role Tagging[C]. In:Proceedings of the 20th International Conference on Computer Processing of Oriental Languages,Shenyang,China.2003:79-87.)
[5]周俊生,戴新宇,尹存燕,等.基于层叠条件随机场模型的中文机构名自动识别[J].电子学报,2006,34(5):804-809.(Zhou Junsheng,Dai Xinyu,Yin Cunyan,et al. Automatic Recognition of Chinese Organization Name Based on Cascaded Conditional Random Fields[J]. Acta Electronica Sinica,2006,34(5):804-809.)
[6]黄德根,李泽中,万如.基于SVM和CRF的双层模型中文机构名识别[J].大连理工大学学报,2010,50(5):782-787.(Huang Degen,Li Zezhong,Wan Ru. Chinese Organization Name Recognition Using Cascaded Model Based on SVM and CRF[J]. Journal of Dalian University of Technology,2010,50(5):782-787.)
[7]金朝,蒋宗礼.中文机构名的识别讨论[C].见:高等职业教育电子信息类专业学术暨教学研讨会论文集.2011.(Jin Zhao,Jiang Zongli. Discussion on Recognition of Chinese Organization Name[C]. In:Proceedings of 2011 Academic and Teaching Seminar on Electronic Information Sciences of Higher Vocational Education. 2011.)
[8]冯丽萍,焦莉娟.结合多特征的支持向量机中文组织机构名识别模型[J].现代计算机,2010(7):24-27.( Feng Liping,Jiao Lijuan. Fusion of Multiple Features for SVM Chinese Organization Names Reorganization Model[J]. Modern Com- puter,2010(7):24-27.)
[9]胡文博,都云程,吕学强,等.基于多层条件随机场的中文命名实体识别[J].计算机工程与应用,2009,45(1):163-165,227.(Hu Wenbo,Du Yuncheng,Lv Xueqiang,et al. Study on Chinese Named Entity Recognition Based on Cascaded Conditional Random Fields[J]. Computer Engineering and Applications,2009,45(1):163-165,227.)
[10]付春元.汉语嵌套命名实体识别方法研究[D].哈尔滨:黑龙江大学,2011.(Fu ChunYuan. Research on Chinese Nested Named Entity Recognition Method[D]. Harbin:Heilongjiang University,2011.)
[11]蔡月红,朱倩,程显毅.基于Tri-training半监督学习的中文组织机构名识别[J].计算机应用研究,2010,27(1):193-195.(Cai Yuehong,Zhu Qian,Cheng Xianyi. Chinese Organization Names Recognition with Tri-training Learning[J]. Application Research of Computers,2010,27(1):193-195.)
[12]邱莎,王付艳,申浩如,等.基于含边界词性特征的中文命名实体识别[J]. 计算机工程,2012,38(13):128-130.(Qiu Sha,Wang Fuyan,Shen Haoru,et al. Chinese Named Entity Recognition Based on Part of Speech Feature with Edges[J]. Computer Engineering,2013,38(13):128-130.)
[13]杨晓东,晏立,尤慧丽.CCRF与规则相结合的中文机构名识别[J]. 计算机工程,2011,37(8):169-171,174.(Yang Xiaodong,Yan Li,You Huili. Chinese Organization Names Recognition Combined with CCRF and Rules[J]. Computer Engineering,2011,37(8):169-171,174.)
[14]鞠久朋,张伟伟,宁建军,等.CRF与规则相结合的地理空间命名实体识别[J]. 计算机工程,2011,37(7):210-212,215.(Ju Jiupeng,Zhang Weiwei,Ning Jianjun,et al. Geospatial Named Entities Recognition Using Combination of CRF and Rules[J]. Computer Engineering,2011,37(7):210-212,215.)
[15]Lafferty J,McCallum A,Pereira F.Conditional Random Fields:Probabilistic Models for Segmenting and Labeling Sequence Data[C]. In:Proceedings of the 18th International Conference on Machine Learning. San Francisco:Morgan Kaufmann Publishers Inc.,2001:282-289.
[16]Sutton C,McCallum A,Rohanimanesh K. Dynamic Con- ditional Random Fields:Factorized Probabilistic Models for Labeling and Segmenting Sequence Data[J]. The Journal of Machine Learning Research,2007,8:693-723.
[1] Xiaoxiao Zhu,Zunqi Yang,Jing Liu. Construction of an Adverse Drug Reaction Extraction Model Based on Bi-LSTM and CRF[J]. 数据分析与知识发现, 2019, 3(2): 90-97.
[2] Li Yu,Li Qian,Changlei Fu,Huaming Zhao. Extracting Fine-grained Knowledge Units from Texts with Deep Learning[J]. 数据分析与知识发现, 2019, 3(1): 38-45.
[3] Guoming Feng,Xiaodong Zhang,Suhui Liu. DBLC Model for Word Segmentation Based on Autonomous Learning[J]. 数据分析与知识发现, 2018, 2(5): 40-47.
[4] Huiying Qi,Jianguang Guo. Integrating Multi-Source Clinical Research Data Based on CDISC Standard[J]. 数据分析与知识发现, 2018, 2(5): 88-93.
[5] Wang Miping,Wang Hao,Deng Sanhong,Wu Zhixiang. Extracting Chinese Metallurgy Patent Terms with Conditional Random Fields[J]. 现代图书情报技术, 2016, 32(6): 28-36.
[6] Sui Mingshuang,Cui Lei. Extracting Chemical and Disease Named Entities with Multiple-Feature CRF Model[J]. 现代图书情报技术, 2016, 32(10): 91-97.
[7] Duan Yufeng, Zhu Wenjing, Chen Qiao, Liu Wei, Liu Fenghong. The Study on Out-of-Vocabulary Identification on a Model Based on the Combination of CRFs and Domain Ontology Elements Set[J]. 现代图书情报技术, 2015, 31(4): 41-49.
[8] Shi Cui, Wang Yang, Yang Bin, Yao Ye. Identification of Non-nest Coordination for Chinese Patent Literature[J]. 现代图书情报技术, 2014, 30(10): 76-83.
[9] Wang Run,He Lin,Wang Dongbo,Huang Shuiqing,Fan Yuanbiao. Research on Plant Growth and Development Stage Named Entity Recognition for Text Mining[J]. 现代图书情报技术, 2014, 30(1): 24-27.
[10] Meng Meiren, Ding Shengchun. Research on the Credibility of Online Chinese Product Reviews[J]. 现代图书情报技术, 2013, 29(9): 60-66.
[11] Gu Jun, Xu Xin. Study on Ontology Relation Extraction in Chinese Patent Documents[J]. 现代图书情报技术, 2013, 29(10): 73-78.
[12] Kang Xiaoli, Zhang Chengzhi. Chinese-English Comparable Corpus Construction for Bilingual Terminology Extraction[J]. 现代图书情报技术, 2012, 28(2): 28-33.
[13] Feng Guanjun, Yu Long, Tian Shengwei. Auto Construction of Uyghur Emotional Words Corpus Based on CRFs[J]. 现代图书情报技术, 2011, 27(3): 17-21.
[14] Lu Wanhui, Ma Jianxia. Research on Complex Time Information Extraction Based on CRF Model[J]. 现代图书情报技术, 2011, 27(10): 29-33.
[15] Zheng Rongting Li Nan Ji Jiuming Teng Qingqing. Research on Recognition of Chinese Chemical Substance Names[J]. 现代图书情报技术, 2010, 26(6): 48-52.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn