Please wait a minute...
New Technology of Library and Information Service  2015, Vol. 31 Issue (6): 49-56    DOI: 10.11925/infotech.1003-3513.2015.06.08
Current Issue | Archive | Adv Search |
Named Entity Recognition from Search Log
Ren Yuwei1, Lv Xueqiang1, Li Zhuo2, Xu Liping2
1 Key Laboratory of Internet Culture and Digital Dissemination Research, Beijing Information Science and Technology University, Beijing 100101, China;
2 Beijing Research Center of Urban System Engineering, Beijing 100089, China
Download:
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] Recognizing the named entity in the search logs provides great value and significance for enhancing the quality of search service. [Methods] Extract candidate named entity by using seed named entity and template matching principle. After clustering the candidate named entity, extracte the recognition feature of candidate named entity, including the frequency, the number of different templates and template weight. Fuse these features to construct calculation formula of named entity recognition weight and adjust feature influencing parameters reasonably. [Results] By marking and counting the extracted named entity, the average value of P@500 reaches 75% and is higher than Pa?ca method by 7%. [Limitations] The named entity which has weak sensitivity for the template can not be extracted correctly. [Conclusions] Calculate the P@N index value of the extracted results, which shows the effectiveness of this method.

Key wordsSearch log      Template weight      K-means clustering      Feature weight      Seed named entity     
Received: 28 October 2014      Published: 08 July 2015
:  TP391  

Cite this article:

Ren Yuwei, Lv Xueqiang, Li Zhuo, Xu Liping. Named Entity Recognition from Search Log. New Technology of Library and Information Service, 2015, 31(6): 49-56.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2015.06.08     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2015/V31/I6/49

[1] CNNIC. 第34次中国互联网络发展状况统计报告[R]. 中国互联网络信息中心, 2014. (CNNIC. The Report of the 34th China Internet Development Statistics [R]. Information Center of the China Internet Network, 2014.)
[2] 王丹, 樊兴华. 面向短文本的命名实体识别[J]. 计算机应用, 2009, 29(1): 143-145, 171. (Wang Dan, Fan Xinghua. Named Entity Recognition for the Short Text [J]. Journal of Computer Applications, 2009, 29(1): 143-145, 171.)
[3] 曹雷, 郭嘉丰, 程学旗. 基于二部图半监督方法的查询日志实体挖掘[J]. 山东大学学报: 理学版, 2012, 47(5): 32-37, 42. (Cao Lei, Guo Jiafeng, Cheng Xueqi. Bipartite Graph Based Semi-supervised Method for Entity Mining from the Query Log [J]. Journal of Shandong University: Natural Science, 2012, 47(5): 32-37, 42.)
[4] 伍大勇, 刘挺. 基于随机游走模型的查询日志中命名实体挖掘[J]. 智能计算机与应用, 2012, 2(4): 22-26, 30. (Wu Dayong, Liu Ting. Mining Named Entities in Query Log Using Random Walk Model [J]. Intelligent Computer and Applications, 2012, 2(4): 22-26, 30.)
[5] 翟海军, 郭勇, 郭嘉丰, 等. 基于转移学习的命名实体挖掘技术[J]. 上海交通大学学报, 2011, 45(2): 164-167. (Zhai Haijun, Guo Yong, Guo Jiafeng, et al. A Named Entity Mining Method Based on Transfer Learning [J]. Journal of Shang Hai Jiao Tong University, 2011, 45(2): 164-167.)
[6] 翟海军, 郭嘉丰, 王小磊, 等. 基于用户查询日志的命名实体挖掘[J]. 中文信息学报, 2010, 24(1): 71-76, 116. (Zhai Haijun, Guo Jiafeng, Wang Xiaolei, et al. Mining Named Entities from Query Logs [J]. Journal of Chinese Information Processing, 2010, 24(1): 71-76, 116.)
[7] 曹雷, 郭嘉丰, 白露, 等. 基于半监督话题模型的用户查询日志命名实体挖掘[J]. 中文信息学报, 2012, 26(5): 26-32. (Cao Lei, Guo Jiafeng, Bai Lu, et al. Named Entity Mining from Query Log Through Semi-supervised Topic Modeling [J]. Journal of Chinese Information Processing, 2012, 26(5): 26-32.)
[8] 张磊, 王斌, 靖红芳, 等. 中文网页搜索日志中的特殊命名实体挖掘[J]. 哈尔滨工业大学学报, 2011, 43(5): 119-122. (Zhang Lei, Wang Bin, Jing Hongfang, et al. Mining for Special Named Entities from Chinese Web Search Query Logs [J]. Journal of Harbin Institute of Technology, 2011, 43(5): 119-122.)
[9] Du J, Zhang Z, Yan J, et al. Using Search Session Context for Named Entity Recognition in Query[C]. In: Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2010: 765-766.
[10] Jonnalagadda S, Cohen T, Wu S, et al. Using Empirically Constructed Lexical Resources for Named Entity Recognition [J]. Biomedical Informatics Insights, 2013, 6(1): 17-27.
[11] Gross O, Doucet A, Toivonen H. Named Entity Filtering Based on Concept Association Graphs [C]. In: Proceedings of the 14th International Conference on Intelligent Text Processing and Computational Linguistics (CICLing), Samos, Greece. 2013.
[12] Dalvi B, Xiong C, Callan J. A Language Modeling Approach to Entity Recognition and Disambiguation for Search Queries [C]. In: Proceedings of the 1st International Workshop on Entity Recognition & Disambiguation. ACM, 2014: 45-54.
[13] Wen B, Xiao S, Luo Y, et al. Unsupervised Chinese Personal Name Recognition Using Search Session [J]. Journal of Computational Information Systems, 2013, 9(6): 2201-2208.
[14] Pa?ca M. Weakly-supervised Discovery of Named Entities Using Web Search Queries [C]. In: Proceedings of the 16th ACM Conference on Conference on Information and Knowledge Management. ACM, 2007: 683-690.
[15] Levenshtein V I. Binary Codes Capable of Correcting Deletions, Insertions and Reversals [J] Soviet Physics Doklady, 1966, 10: 707-710.

[1] Jia Xiaoting,Wang Mingyang,Cao Yu. Automatic Abstracting of Chinese Document with Doc2Vec and Improved Clustering Algorithm[J]. 数据分析与知识发现, 2018, 2(2): 86-95.
[2] Wang Xueying,Zhang Zixuan,Wang Hao,Deng Sanhong. Evaluating Brands of Agriculture Products: A Literature Review[J]. 数据分析与知识发现, 2017, 1(7): 13-21.
[3] Lu Yonghe, Wang Hongbin. Feature Weighting Method Affected by Part of Speech in Text Classification[J]. 现代图书情报技术, 2015, 31(4): 18-25.
[4] Xiao Tianjiu, Liu Ying. Words and N-gram Models Analysis for “A Dream of Red Mansions”[J]. 现代图书情报技术, 2015, 31(4): 50-57.
[5] Zhang Wenjun, Wang Jun, Xu Shanchuan. The Probing of E-commerce User Need States by Page Cluster Analysis ——An Empirical Study on Women's Clothes from Taobao.com[J]. 现代图书情报技术, 2015, 31(3): 67-74.
[6] Du Kun, Liu Huailiang, Guo Lujie. Study on the Modified Method of Feature Weighting with Complex Networks[J]. 现代图书情报技术, 2015, 31(11): 26-32.
[7] Wang Hao, Ye Peng, Deng Sanhong. The Application of Machine-Learning in the Research on Automatic Categorization of Chinese Periodical Articles[J]. 现代图书情报技术, 2014, 30(3): 80-87.
[8] Li Xuewei, Lv Xueqiang, Liu Kehui. Chinese New Words Identification from Query Log by Extending the Context[J]. 现代图书情报技术, 2014, 30(11): 59-65.
[9] Wang Dongbo, Han Pu, Shen Si, Wei Xiangqing. Research of Mining the Category Knowledge Based on English-Chinese Humanities and Social Sciences Parallel Corpus in Phrase Level[J]. 现代图书情报技术, 2012, (11): 40-46.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn