Please wait a minute...
New Technology of Library and Information Service  2014, Vol. 30 Issue (12): 71-77    DOI: 10.11925/infotech.1003-3513.2014.12.09
Current Issue | Archive | Adv Search |
The Automatic Identification of Chinese Names in Query Logs
Zeng Zhen, Lv Xueqiang, Li Zhuo
Beijing Key Laboratory of Internet Culture and Digital Dissemination Research, Beijing Information Science and Technology University, Beijing 100101, China
Download:
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] Many names exist in query logs, and the name recognition can improve the performance of the search engine. [Methods] This paper presents a method that identifies the names in query logs. Basing on the internal structure characters of the name and its context information, extract seven features, choose suitable feature template, and apply the conditional random field model to preliminary identify of the person's name. According to the characteristics of the query string that CRFs cannot mark with the names, design Bayesian conditional probability formula to select more names. [Results] Experiments are done in Sogou Web query logs, the precision of name recognition reaches 95%, and the F-measure of the machine learning method is 91%. [Limitations] A certain amount of manual annotation training corpus is required. [Conclusions] The results validate the effectiveness of this name recognition method, and prove that this method has positive impact on name recognition.

Key wordsQuery log      Name recognition      Feature template      Conditional Random Fields      Conditional probability     
Received: 26 May 2014      Published: 20 January 2015
:  TP391  

Cite this article:

Zeng Zhen, Lv Xueqiang, Li Zhuo. The Automatic Identification of Chinese Names in Query Logs. New Technology of Library and Information Service, 2014, 30(12): 71-77.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2014.12.09     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2014/V30/I12/71

[1] 赵龙, 江荣安. 基于Hive的海量搜索日志分析系统研究[J]. 计算机应用研究, 2013, 30(11): 3343-3345. (Zhao Long, Jiang Rong'an. Research of Massive Searching Logs Analysis System Based on Hive [J]. Application Research of Computers, 2013, 30(11): 3343-3345.)
[2] 徐骥超. 网络日志挖掘及其在查询理解中的应用研究[D]. 北京:北方工业大学, 2013. (Xu Jichao. Web Log Mining and Its Application in the Query Understanding [D]. Beijing: North China University of Technology, 2013.)
[3] Downey D, Broadhead M, Etzioni O. Locating Complex Named Entities in Web Text [C]. In: Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI'07). San Francisco: Morgan Kaufmann Publishers Inc., 2007: 2733-2739.
[4] 岑荣伟, 刘奕群, 张敏, 等. 基于日志挖掘的搜索引擎用户行为分析[J]. 中文信息学报, 2010, 24(3): 49-54. (Cen Rongwei, Liu Yiqun, Zhang Min, et al. Search Engine User Behavior Analysis Based on Log Mining [J]. Journal of Chinese Information Processing, 2010, 24(3): 49-54.)
[5] 黄德根, 马玉霞, 杨元生. 基于互信息的中文姓名识别方法[J]. 大连理工大学学报, 2004, 44(5): 744-748. (Huang Degen, Ma Yuxia, Yang Yuansheng. Chinese Names Identification Based on Mutual Information [J]. Journal of Dalian University of Technology, 2004, 44(5): 744-748.)
[6] 向晓雯, 史晓东, 曾华琳. 一个统计与规则相结合的中文命名实体识别系统[J]. 计算机应用, 2005, 25(10): 2404-2406. (Xiang Xiaowen, Shi Xiaodong, Zeng Hualin. Chinese Named Entity Recognition System Using Statistics- based and Rules-based Method [J]. Journal of Computer Applications, 2005, 25(10): 2404-2406.)
[7] 张华平, 刘群. 基于角色标注的中国人名自动识别研究[J].计算机学报,2004, 27(1): 85-91. (Zhang Huaping, Liu Qun. Automatic Recognition of Chinese Personal Name Based on Role Tagging [J]. Chinese Journal of Computers, 2004, 27(1): 85-91.)
[8] 伍大勇. 搜索引擎中命名实体查询处理相关技术研究[D]. 哈尔滨:哈尔滨工业大学, 2012. (Wu Dayong. Relevant Techniques of Named Entity Query Processing for Search Engine [D]. Harbin: Harbin Institute of Technology, 2012.)
[9] 伍大勇, 刘挺. 基于随机游走模型的查询日志中命名实体挖掘[J]. 智能计算机与应用, 2012, 2(4): 22-26, 30. (Wu Dayong, Liu Ting. Mining Named Entities in Query Log Using Random Walk Model [J]. Intelligent Computer and Applications, 2012, 2(4): 22-26, 30.)
[10] Pasca M. Weakly-supervised Discovery of Named Entities Using Web Search Queries [C]. In: Proceedings of the 16th ACM Conference on Information and Knowledge Management (CIKM'07). New York: ACM, 2007: 683-690.
[11] 曹雷, 郭嘉丰, 白露, 等. 基于半监督话题模型的用户查询日志命名实体挖掘[J]. 中文信息学报, 2012,26(5): 26-32. (Cao Lei, Guo Jiafeng, Bai Lu, et al. Named Entity Mining from Query Log through Semi-supervised Topic Modeling [J]. Journal of Chinese Information Processing, 2012, 26(5): 26-32.)
[12] 张磊, 王斌, 靖红芳, 等. 中文网页搜索日志中的特殊命名实体挖掘[J]. 哈尔滨工业大学学报, 2011, 43(5): 119-122. (Zhang Lei, Wang Bin, Jing Hongfang, et al. Mining Special Name Entities from Chinese Web Search Query Logs [J]. Journal of Harbin Institute of Technology, 2011, 43(5): 119-122.)
[13] Wen B, Xiao S, Luo Y, et al. Unsupervised Chinese Personal Name Identification Based Search Session [J]. Journal of Computational Information Systems, 2013, 9(6): 2201-2208.
[14] 维基百科. 常见姓氏列表 [EB/OL]. [2012-07-02]. http://zh. wikipedia.org/wiki/常见姓氏列表. (Wikipedia. Common Surnames List [EB/OL]. [2012-07-02]. http://zh.wikipedia.org/ wiki/常见姓氏列表.)

[1] Zhao Ping,Sun Lianying,Tu Shuai,Bian Jianling,Wan Ying. Identifying Scenic Spot Entities Based on Improved Knowledge Transfer[J]. 数据分析与知识发现, 2020, 4(5): 118-126.
[2] Xiaofeng Li,Jing Ma,Chi Li,Hengmin Zhu. Identifying Commodity Names Based on XGBoost Model[J]. 数据分析与知识发现, 2019, 3(7): 34-41.
[3] Tang Huihui,Wang Hao,Zhang Zixuan,Wang Xueying. Extracting Names of Historical Events Based on Chinese Character Tags[J]. 数据分析与知识发现, 2018, 2(7): 89-100.
[4] Wang Xiaoyu,Li Bin. Automatically Segmenting Middle Ancient Chinese Words with CRFs[J]. 数据分析与知识发现, 2017, 1(5): 62-70.
[5] Wang Dongbo,Wu Yi,Ye Wenhao,Liu Ruilun. Extracting Events of Food Safety Emergencies with Characteristics Knowledge[J]. 数据分析与知识发现, 2017, 1(3): 54-61.
[6] Zhang Yue,Wang Dongbo,Zhu Danhao. Segmenting Chinese Words from Food Safety Emergencies[J]. 数据分析与知识发现, 2017, 1(2): 64-72.
[7] He Huixin,Liu Lijuan. A Scientific Research Object Labeling System Based on Active earning[J]. 现代图书情报技术, 2016, 32(3): 67-73.
[8] Liu Tong,Ni Weijian,Liu Mei. Identifying Terminology from Search Engine Query Logs[J]. 现代图书情报技术, 2016, 32(2): 25-33.
[9] Tong Guoping, Sun Jianjun. User Behavior Analysis Based on Search Engine Log[J]. 现代图书情报技术, 2015, 31(7-8): 80-88.
[10] Gu Wei, Li Chaofan, Wang Hongjun, Xiao Shibin, Shi Shuicai. Acquisition of Synonym from Patent Query Logs[J]. 现代图书情报技术, 2015, 31(2): 24-30.
[11] Jiang Chuntao. Automatic Annotation of Bibliographical References in Chinese Patent Documents[J]. 现代图书情报技术, 2015, 31(10): 81-87.
[12] He Yu, Lv Xueqiang, Xu Liping. A Chinese Term Extraction System in New Energy Vehicles Domain[J]. 现代图书情报技术, 2015, 31(10): 88-94.
[13] Zhang Xiaojuan, Tang Xiangbin. Query Recommendation Based on User Task[J]. 现代图书情报技术, 2014, 30(4): 34-40.
[14] Guan Xiaoda,Lv Xueqiang,Li Zhuo,Zheng Luexing,. Chinese Organization Name Recognition in User Query Log[J]. 现代图书情报技术, 2014, 30(1): 72-78.
[15] Wang Hao, Zou Jieli, Deng Sanhong. Model Construction and Experiment Analysis of Automatic Indexing for Chinese Books[J]. 现代图书情报技术, 2013, 29(7/8): 55-62.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn