|
|
The Automatic Identification of Chinese Names in Query Logs |
Zeng Zhen, Lv Xueqiang, Li Zhuo |
Beijing Key Laboratory of Internet Culture and Digital Dissemination Research, Beijing Information Science and Technology University, Beijing 100101, China |
|
|
Abstract [Objective] Many names exist in query logs, and the name recognition can improve the performance of the search engine. [Methods] This paper presents a method that identifies the names in query logs. Basing on the internal structure characters of the name and its context information, extract seven features, choose suitable feature template, and apply the conditional random field model to preliminary identify of the person's name. According to the characteristics of the query string that CRFs cannot mark with the names, design Bayesian conditional probability formula to select more names. [Results] Experiments are done in Sogou Web query logs, the precision of name recognition reaches 95%, and the F-measure of the machine learning method is 91%. [Limitations] A certain amount of manual annotation training corpus is required. [Conclusions] The results validate the effectiveness of this name recognition method, and prove that this method has positive impact on name recognition.
|
Received: 26 May 2014
Published: 20 January 2015
|
|
[1] 赵龙, 江荣安. 基于Hive的海量搜索日志分析系统研究[J]. 计算机应用研究, 2013, 30(11): 3343-3345. (Zhao Long, Jiang Rong'an. Research of Massive Searching Logs Analysis System Based on Hive [J]. Application Research of Computers, 2013, 30(11): 3343-3345.)
[2] 徐骥超. 网络日志挖掘及其在查询理解中的应用研究[D]. 北京:北方工业大学, 2013. (Xu Jichao. Web Log Mining and Its Application in the Query Understanding [D]. Beijing: North China University of Technology, 2013.)
[3] Downey D, Broadhead M, Etzioni O. Locating Complex Named Entities in Web Text [C]. In: Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI'07). San Francisco: Morgan Kaufmann Publishers Inc., 2007: 2733-2739.
[4] 岑荣伟, 刘奕群, 张敏, 等. 基于日志挖掘的搜索引擎用户行为分析[J]. 中文信息学报, 2010, 24(3): 49-54. (Cen Rongwei, Liu Yiqun, Zhang Min, et al. Search Engine User Behavior Analysis Based on Log Mining [J]. Journal of Chinese Information Processing, 2010, 24(3): 49-54.)
[5] 黄德根, 马玉霞, 杨元生. 基于互信息的中文姓名识别方法[J]. 大连理工大学学报, 2004, 44(5): 744-748. (Huang Degen, Ma Yuxia, Yang Yuansheng. Chinese Names Identification Based on Mutual Information [J]. Journal of Dalian University of Technology, 2004, 44(5): 744-748.)
[6] 向晓雯, 史晓东, 曾华琳. 一个统计与规则相结合的中文命名实体识别系统[J]. 计算机应用, 2005, 25(10): 2404-2406. (Xiang Xiaowen, Shi Xiaodong, Zeng Hualin. Chinese Named Entity Recognition System Using Statistics- based and Rules-based Method [J]. Journal of Computer Applications, 2005, 25(10): 2404-2406.)
[7] 张华平, 刘群. 基于角色标注的中国人名自动识别研究[J].计算机学报,2004, 27(1): 85-91. (Zhang Huaping, Liu Qun. Automatic Recognition of Chinese Personal Name Based on Role Tagging [J]. Chinese Journal of Computers, 2004, 27(1): 85-91.)
[8] 伍大勇. 搜索引擎中命名实体查询处理相关技术研究[D]. 哈尔滨:哈尔滨工业大学, 2012. (Wu Dayong. Relevant Techniques of Named Entity Query Processing for Search Engine [D]. Harbin: Harbin Institute of Technology, 2012.)
[9] 伍大勇, 刘挺. 基于随机游走模型的查询日志中命名实体挖掘[J]. 智能计算机与应用, 2012, 2(4): 22-26, 30. (Wu Dayong, Liu Ting. Mining Named Entities in Query Log Using Random Walk Model [J]. Intelligent Computer and Applications, 2012, 2(4): 22-26, 30.)
[10] Pasca M. Weakly-supervised Discovery of Named Entities Using Web Search Queries [C]. In: Proceedings of the 16th ACM Conference on Information and Knowledge Management (CIKM'07). New York: ACM, 2007: 683-690.
[11] 曹雷, 郭嘉丰, 白露, 等. 基于半监督话题模型的用户查询日志命名实体挖掘[J]. 中文信息学报, 2012,26(5): 26-32. (Cao Lei, Guo Jiafeng, Bai Lu, et al. Named Entity Mining from Query Log through Semi-supervised Topic Modeling [J]. Journal of Chinese Information Processing, 2012, 26(5): 26-32.)
[12] 张磊, 王斌, 靖红芳, 等. 中文网页搜索日志中的特殊命名实体挖掘[J]. 哈尔滨工业大学学报, 2011, 43(5): 119-122. (Zhang Lei, Wang Bin, Jing Hongfang, et al. Mining Special Name Entities from Chinese Web Search Query Logs [J]. Journal of Harbin Institute of Technology, 2011, 43(5): 119-122.)
[13] Wen B, Xiao S, Luo Y, et al. Unsupervised Chinese Personal Name Identification Based Search Session [J]. Journal of Computational Information Systems, 2013, 9(6): 2201-2208.
[14] 维基百科. 常见姓氏列表 [EB/OL]. [2012-07-02]. http://zh. wikipedia.org/wiki/常见姓氏列表. (Wikipedia. Common Surnames List [EB/OL]. [2012-07-02]. http://zh.wikipedia.org/ wiki/常见姓氏列表.) |
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|