New Technology of Library and Information Service  2014, Vol. 30 Issue (12): 71-77    DOI: 10.11925/infotech.1003-3513.2014.12.09
The Automatic Identification of Chinese Names in Query Logs
Zeng Zhen, Lv Xueqiang, Li Zhuo
Beijing Key Laboratory of Internet Culture and Digital Dissemination Research, Beijing Information Science and Technology University, Beijing 100101, China
[Objective] Many names exist in query logs, and the name recognition can improve the performance of the search engine. [Methods] This paper presents a method that identifies the names in query logs. Basing on the internal structure characters of the name and its context information, extract seven features, choose suitable feature template, and apply the conditional random field model to preliminary identify of the person's name. According to the characteristics of the query string that CRFs cannot mark with the names, design Bayesian conditional probability formula to select more names. [Results] Experiments are done in Sogou Web query logs, the precision of name recognition reaches 95%, and the F-measure of the machine learning method is 91%. [Limitations] A certain amount of manual annotation training corpus is required. [Conclusions] The results validate the effectiveness of this name recognition method, and prove that this method has positive impact on name recognition.

Key wordsQuery log      Name recognition      Feature template      Conditional Random Fields      Conditional probability     
Received: 26 May 2014      Published: 20 January 2015
Cite this article:

Zeng Zhen, Lv Xueqiang, Li Zhuo. The Automatic Identification of Chinese Names in Query Logs. New Technology of Library and Information Service, 2014, 30(12): 71-77.

