|
|
Chinese Organization Name Recognition in User Query Log |
Guan Xiaoda1, Lv Xueqiang1, Li Zhuo1, Zheng Luexing1, 2 |
1Beijing Key Laboratory of Internet Culture and Digital Dissemination Research,Beijing Information Science and Technology University,Beijing 100101,China; 2Institute of Computational Linguistics,Peking University,Beijing 100871,China |
|
|
Abstract [Objective] To solve the problems of query log annotated data shortage and information asymmetry in user query log organization name recognition. [Methods] The paper proposes an automatic method to create training data,which abates the insufficient of user query log annotated data. The authors cite the adhesion features and constructed CRF model to recognize organization names by integrating context information. [Results] Experiments on Sogou user query log show that precision rate can reach 72.80%,recall rate can reach 86.73% and F-measure can reach 79.16%. The method improves F-measure by 30% comparing with the traditional organization name recognition method. [Limitations] The model error using auto-created training set will be greater than standard annotated user query log data.The scale of organization name set will affect the completeness of the model’s context knowledge. [Conclusions] Experiment results demonstrate that the method is effective.
|
Received: 14 February 2014
Published: 14 February 2014
|
|
[1] 沈嘉懿,李芳,徐飞玉,等.中文组织机构名称与简称的识别[J].中文信息学报,2007,21(6):17-21.(Shen Jiayi,Li Fang,Xu Feiyu,et al. Recognition of Chinese Organization Names and Abbreviations[J]. Journal of Chinese Information Processing,2007,21(6):17-21.) [2]张小衡,王玲玲.中文机构名称的识别与分析[J].中文信息学报,1997,11(4):21-32.(Zhang Xiaoheng,Wang Lingling. Identification and Analysis of Chinese Organization and Institution Names[J]. Journal of Chinese Information Processing,1997,11(4):21-32.) [3]周昆.基于规则的命名实体识别研究[D].合肥:合肥工业大学,2010.(Zhou Kun. Research on Named Entity Recognition Based on Rules[D]. Hefei:Hefei University of Technology,2010.) [4]俞鸿魁,张华平,刘群.基于角色标注的中文机构名识别[C].见:第20届东方语言计算机处理国际会议论文集,沈阳,中国.2003:79-87.(Yu Hongkui,Zhang Huaping,Liu Qun. Recognition of Chinese Organization Name Based on Role Tagging[C]. In:Proceedings of the 20th International Conference on Computer Processing of Oriental Languages,Shenyang,China.2003:79-87.) [5]周俊生,戴新宇,尹存燕,等.基于层叠条件随机场模型的中文机构名自动识别[J].电子学报,2006,34(5):804-809.(Zhou Junsheng,Dai Xinyu,Yin Cunyan,et al. Automatic Recognition of Chinese Organization Name Based on Cascaded Conditional Random Fields[J]. Acta Electronica Sinica,2006,34(5):804-809.) [6]黄德根,李泽中,万如.基于SVM和CRF的双层模型中文机构名识别[J].大连理工大学学报,2010,50(5):782-787.(Huang Degen,Li Zezhong,Wan Ru. Chinese Organization Name Recognition Using Cascaded Model Based on SVM and CRF[J]. Journal of Dalian University of Technology,2010,50(5):782-787.) [7]金朝,蒋宗礼.中文机构名的识别讨论[C].见:高等职业教育电子信息类专业学术暨教学研讨会论文集.2011.(Jin Zhao,Jiang Zongli. Discussion on Recognition of Chinese Organization Name[C]. In:Proceedings of 2011 Academic and Teaching Seminar on Electronic Information Sciences of Higher Vocational Education. 2011.) [8]冯丽萍,焦莉娟.结合多特征的支持向量机中文组织机构名识别模型[J].现代计算机,2010(7):24-27.( Feng Liping,Jiao Lijuan. Fusion of Multiple Features for SVM Chinese Organization Names Reorganization Model[J]. Modern Com- puter,2010(7):24-27.) [9]胡文博,都云程,吕学强,等.基于多层条件随机场的中文命名实体识别[J].计算机工程与应用,2009,45(1):163-165,227.(Hu Wenbo,Du Yuncheng,Lv Xueqiang,et al. Study on Chinese Named Entity Recognition Based on Cascaded Conditional Random Fields[J]. Computer Engineering and Applications,2009,45(1):163-165,227.) [10]付春元.汉语嵌套命名实体识别方法研究[D].哈尔滨:黑龙江大学,2011.(Fu ChunYuan. Research on Chinese Nested Named Entity Recognition Method[D]. Harbin:Heilongjiang University,2011.) [11]蔡月红,朱倩,程显毅.基于Tri-training半监督学习的中文组织机构名识别[J].计算机应用研究,2010,27(1):193-195.(Cai Yuehong,Zhu Qian,Cheng Xianyi. Chinese Organization Names Recognition with Tri-training Learning[J]. Application Research of Computers,2010,27(1):193-195.) [12]邱莎,王付艳,申浩如,等.基于含边界词性特征的中文命名实体识别[J]. 计算机工程,2012,38(13):128-130.(Qiu Sha,Wang Fuyan,Shen Haoru,et al. Chinese Named Entity Recognition Based on Part of Speech Feature with Edges[J]. Computer Engineering,2013,38(13):128-130.) [13]杨晓东,晏立,尤慧丽.CCRF与规则相结合的中文机构名识别[J]. 计算机工程,2011,37(8):169-171,174.(Yang Xiaodong,Yan Li,You Huili. Chinese Organization Names Recognition Combined with CCRF and Rules[J]. Computer Engineering,2011,37(8):169-171,174.) [14]鞠久朋,张伟伟,宁建军,等.CRF与规则相结合的地理空间命名实体识别[J]. 计算机工程,2011,37(7):210-212,215.(Ju Jiupeng,Zhang Weiwei,Ning Jianjun,et al. Geospatial Named Entities Recognition Using Combination of CRF and Rules[J]. Computer Engineering,2011,37(7):210-212,215.) [15]Lafferty J,McCallum A,Pereira F.Conditional Random Fields:Probabilistic Models for Segmenting and Labeling Sequence Data[C]. In:Proceedings of the 18th International Conference on Machine Learning. San Francisco:Morgan Kaufmann Publishers Inc.,2001:282-289. [16]Sutton C,McCallum A,Rohanimanesh K. Dynamic Con- ditional Random Fields:Factorized Probabilistic Models for Labeling and Segmenting Sequence Data[J]. The Journal of Machine Learning Research,2007,8:693-723. |
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|