Please wait a minute...
Advanced Search
现代图书情报技术  2015, Vol. 31 Issue (6): 49-56     https://doi.org/10.11925/infotech.1003-3513.2015.06.08
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
搜索日志中命名实体识别
任育伟1, 吕学强1, 李卓2, 徐丽萍2
1 北京信息科技大学网络文化与数字传播北京市重点实验室 北京 100101;
2 北京城市系统工程研究中心 北京 100089
Named Entity Recognition from Search Log
Ren Yuwei1, Lv Xueqiang1, Li Zhuo2, Xu Liping2
1 Key Laboratory of Internet Culture and Digital Dissemination Research, Beijing Information Science and Technology University, Beijing 100101, China;
2 Beijing Research Center of Urban System Engineering, Beijing 100089, China
全文: PDF (541 KB)   HTML  
输出: BibTeX | EndNote (RIS)      
摘要 

目的】搜索日志中命名实体识别对于优化搜索意图, 提高搜索引擎服务质量存在重要意义。【方法】利用种子命名实体和模板匹配原则抽取候选命名实体并聚类, 聚类后进行候选命名实体识别特征抽取, 包括频次、不同模板数、模板权重特征。融合这些特征构造命名实体识别权重计算公式, 并合理调整特征影响参数。【结果】对根据权重提取的命名实体进行标注和统计, 发现P@500值平均达到75%左右, 比Pa?ca方法高出7%。【局限】对模板敏感性弱的命名实体不能精确抽取。【结论】通过计算该方法抽取结果指标P@N值, 并和其他方法抽取结果指标进行比较, 证明该方法的有效性。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
任育伟
吕学强
徐丽萍
李卓
关键词 搜索日志模板权重类K-means聚类特征权重种子命名实体    
Abstract

[Objective] Recognizing the named entity in the search logs provides great value and significance for enhancing the quality of search service. [Methods] Extract candidate named entity by using seed named entity and template matching principle. After clustering the candidate named entity, extracte the recognition feature of candidate named entity, including the frequency, the number of different templates and template weight. Fuse these features to construct calculation formula of named entity recognition weight and adjust feature influencing parameters reasonably. [Results] By marking and counting the extracted named entity, the average value of P@500 reaches 75% and is higher than Pa?ca method by 7%. [Limitations] The named entity which has weak sensitivity for the template can not be extracted correctly. [Conclusions] Calculate the P@N index value of the extracted results, which shows the effectiveness of this method.

Key wordsSearch log    Template weight    K-means clustering    Feature weight    Seed named entity
收稿日期: 2014-10-28      出版日期: 2015-07-08
:  TP391  
基金资助:

本文系国家自然科学基金项目“基于本体的专利自动标引研究”(项目编号:61271304)和北京市教委科技发展计划重点项目暨北京市自然科学基金B类重点项目“面向领域的互联网多模态信息精准搜索方法研究”(项目编号:KZ201311232037)的研究成果之一。

通讯作者: 任育伟, ORCID: 0000-0002-0236-476X, E-mail: wisdomryw@sina.cn。     E-mail: wisdomryw@sina.cn
作者简介: 作者贡献声明: 吕学强: 提出研究命题; 任育伟: 提出研究思路, 设计研究方案, 完成实验, 分析数据, 起草论文; 李卓: 修改论文; 徐丽萍: 论文最终版本修订。
引用本文:   
任育伟, 吕学强, 李卓, 徐丽萍. 搜索日志中命名实体识别[J]. 现代图书情报技术, 2015, 31(6): 49-56.
Ren Yuwei, Lv Xueqiang, Li Zhuo, Xu Liping. Named Entity Recognition from Search Log. New Technology of Library and Information Service, 2015, 31(6): 49-56.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.1003-3513.2015.06.08      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2015/V31/I6/49

[1] CNNIC. 第34次中国互联网络发展状况统计报告[R]. 中国互联网络信息中心, 2014. (CNNIC. The Report of the 34th China Internet Development Statistics [R]. Information Center of the China Internet Network, 2014.)
[2] 王丹, 樊兴华. 面向短文本的命名实体识别[J]. 计算机应用, 2009, 29(1): 143-145, 171. (Wang Dan, Fan Xinghua. Named Entity Recognition for the Short Text [J]. Journal of Computer Applications, 2009, 29(1): 143-145, 171.)
[3] 曹雷, 郭嘉丰, 程学旗. 基于二部图半监督方法的查询日志实体挖掘[J]. 山东大学学报: 理学版, 2012, 47(5): 32-37, 42. (Cao Lei, Guo Jiafeng, Cheng Xueqi. Bipartite Graph Based Semi-supervised Method for Entity Mining from the Query Log [J]. Journal of Shandong University: Natural Science, 2012, 47(5): 32-37, 42.)
[4] 伍大勇, 刘挺. 基于随机游走模型的查询日志中命名实体挖掘[J]. 智能计算机与应用, 2012, 2(4): 22-26, 30. (Wu Dayong, Liu Ting. Mining Named Entities in Query Log Using Random Walk Model [J]. Intelligent Computer and Applications, 2012, 2(4): 22-26, 30.)
[5] 翟海军, 郭勇, 郭嘉丰, 等. 基于转移学习的命名实体挖掘技术[J]. 上海交通大学学报, 2011, 45(2): 164-167. (Zhai Haijun, Guo Yong, Guo Jiafeng, et al. A Named Entity Mining Method Based on Transfer Learning [J]. Journal of Shang Hai Jiao Tong University, 2011, 45(2): 164-167.)
[6] 翟海军, 郭嘉丰, 王小磊, 等. 基于用户查询日志的命名实体挖掘[J]. 中文信息学报, 2010, 24(1): 71-76, 116. (Zhai Haijun, Guo Jiafeng, Wang Xiaolei, et al. Mining Named Entities from Query Logs [J]. Journal of Chinese Information Processing, 2010, 24(1): 71-76, 116.)
[7] 曹雷, 郭嘉丰, 白露, 等. 基于半监督话题模型的用户查询日志命名实体挖掘[J]. 中文信息学报, 2012, 26(5): 26-32. (Cao Lei, Guo Jiafeng, Bai Lu, et al. Named Entity Mining from Query Log Through Semi-supervised Topic Modeling [J]. Journal of Chinese Information Processing, 2012, 26(5): 26-32.)
[8] 张磊, 王斌, 靖红芳, 等. 中文网页搜索日志中的特殊命名实体挖掘[J]. 哈尔滨工业大学学报, 2011, 43(5): 119-122. (Zhang Lei, Wang Bin, Jing Hongfang, et al. Mining for Special Named Entities from Chinese Web Search Query Logs [J]. Journal of Harbin Institute of Technology, 2011, 43(5): 119-122.)
[9] Du J, Zhang Z, Yan J, et al. Using Search Session Context for Named Entity Recognition in Query[C]. In: Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2010: 765-766.
[10] Jonnalagadda S, Cohen T, Wu S, et al. Using Empirically Constructed Lexical Resources for Named Entity Recognition [J]. Biomedical Informatics Insights, 2013, 6(1): 17-27.
[11] Gross O, Doucet A, Toivonen H. Named Entity Filtering Based on Concept Association Graphs [C]. In: Proceedings of the 14th International Conference on Intelligent Text Processing and Computational Linguistics (CICLing), Samos, Greece. 2013.
[12] Dalvi B, Xiong C, Callan J. A Language Modeling Approach to Entity Recognition and Disambiguation for Search Queries [C]. In: Proceedings of the 1st International Workshop on Entity Recognition & Disambiguation. ACM, 2014: 45-54.
[13] Wen B, Xiao S, Luo Y, et al. Unsupervised Chinese Personal Name Recognition Using Search Session [J]. Journal of Computational Information Systems, 2013, 9(6): 2201-2208.
[14] Pa?ca M. Weakly-supervised Discovery of Named Entities Using Web Search Queries [C]. In: Proceedings of the 16th ACM Conference on Conference on Information and Knowledge Management. ACM, 2007: 683-690.
[15] Levenshtein V I. Binary Codes Capable of Correcting Deletions, Insertions and Reversals [J] Soviet Physics Doklady, 1966, 10: 707-710.

[1] 杜坤, 刘怀亮, 郭路杰. 结合复杂网络的特征权重改进算法研究[J]. 现代图书情报技术, 2015, 31(11): 26-32.
[2] 曾镇, 吕学强, 李卓. 搜索日志中中文人名的自动识别[J]. 现代图书情报技术, 2014, 30(12): 71-77.
[3] 李雪伟, 吕学强, 刘克会. 扩展搜索日志上下文的新词识别[J]. 现代图书情报技术, 2014, 30(11): 59-65.
[4] 刘志杰, 吕学强, 程涛. 搜索引擎日志中“N1+N2”型名词短语研究[J]. 现代图书情报技术, 2010, 26(12): 58-63.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn