Please wait a minute...
Advanced Search
数据分析与知识发现  2019, Vol. 3 Issue (2): 79-89    DOI: 10.11925/infotech.2096-3467.2018.0449
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
查询歧义性程度自动标注指标的替代性 验证研究*
桂思思1,2(),张晓娟3,王鑫1,2
1武汉大学信息管理学院 武汉 430072
2武汉大学信息检索与知识挖掘研究所 武汉 430072
3西南大学计算机与信息科学学院 重庆 400715
Automatically Rating Query Ambiguity with Alt-Metrics
Sisi Gui1,2(),Xiaojuan Zhang3,Xin Wang1,2
1School of Information Management, Wuhan University, Wuhan 430072, China
2Institute for Information Retrieval and Knowledge Mining, Wuhan University, Wuhan 430072, China
3School of Computer and Information Science, Southwest University, Chongqing 400715, China
全文: PDF(706 KB)   HTML ( 1
输出: BibTeX | EndNote (RIS)      
摘要 

【目的】针对查询歧义性程度的标注问题, 通过分析自动标注指标间的相关性及自动标注指标与人工标注指标的一致性, 以期获得在一定程度上能替代其他自动标注指标和人工标注的自动标注指标。【方法】分别选取基于文档、用户以及查询词项特征的自动标注指标, 依据查询词项对应类目的频率改进一种基于查询词项特征的自动标注指标; 利用皮尔逊相关系数与对称AP相关系数分析自动标注结果之间的相关性, 利用宏平均F1与宏平均准确率分析自动标注指标与人工标注结果的一致性。【结果】自动标注指标之间相关性较弱; 本文改进的自动标注指标与人工标注指标之间一致性最高: 宏平均F1值与宏平均准确率分别为0.623与0.707。【局限】限于目录型网站的查询词项覆盖率, 部分自动标注指标无法用于所有歧义性查询, 导致用于检验替代性的歧义查询数量较少。【结论】自动标注指标之间的替代性较弱; 查询词项对应类目的频率能提高基于查询词项特征的自动标注指标间一致性; 与已有自动标注指标相比, 本文改进的自动标注指标与人工标注结果一致性最高, 在一定程度上可替代人工标注。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
桂思思
张晓娟
王鑫
关键词 查询歧义性程度自动标注人工标注替代性相关性一致性    
Abstract

[Objective] This paper aims to find better alt-metrics for automatically rating query ambiguity. [Methods] First, we chose several existing auto-metrics based on documents, users and queries. Then, we modified one of them with query category occurences. Finally, we examined the relationship between the modified alt-metrics and other automatic or human rating metrics. Their correlations were tested with Pearson and symmetric AP correlation coefficients. Their degrees of agreement were tested with macro average accuracy and macro average F1. [Results] The proposed method showed significant relationship with human rating, and achieved F1 of 0.623 and accuracy of 0.707. [Limitations] Only examined the proposed model with data from online directories.[Conclusions] Automatic rating metrics for query ambuiguity can hardly be replaced by other automatic counterparts. Considering the occurences of top-level categories for each query could improve the degrees of agreement for automatic metrics. Compared to the exisiting automatic metrics, the proposed method can be used to replace the human metrics for query ambiguity.

Key wordsQuery Ambiguity Rating    Automatic Rating    Human Rating    Alternativeness    Correlation    Agreement
收稿日期: 2018-04-23     
基金资助:*本文系国家社会科学基金青年项目“融合用户个性化与实时性意图的查询推荐模型研究”(项目编号: 15CTQ019)的研究成果之一
引用本文:   
桂思思,张晓娟,王鑫. 查询歧义性程度自动标注指标的替代性 验证研究*[J]. 数据分析与知识发现, 2019, 3(2): 79-89.
Sisi Gui,Xiaojuan Zhang,Xin Wang. Automatically Rating Query Ambiguity with Alt-Metrics. Data Analysis and Knowledge Discovery, DOI:10.11925/infotech.2096-3467.2018.0449.
链接本文:  
http://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2018.0449
[1] Calderón-Benavides L, González-Caro C, Baeza-Yates R.Towards a Deeper Understanding of the User’s Query Intent[C]// Proceedings of the SIGIR 2010 Workshop on Query Representation and Understanding. 2010: 21-24.
[2] Nguyen B V, Kan M Y.Functional Faceted Web Query Analysis[C]// Proceedings of the 16th International World Wide Web Conference. 2007.
[3] González-Caro C, Baeza-Yates R.A Multi-faceted Approach to Query Intent Classification[C]// Proceedings of the 18th International Conference on String Processing and Information Retrieval. 2011: 368-379.
[4] Clough P, Sanderson M, Abouammoh M, et al.Multiple Approaches to Analysing Query Diversity[C]// Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval. 2009: 734-735.
[5] Aurelio D N, Mourant R R.The Effects of Web Search Engine Query Ambiguity and Results Sorting Method on User Performance and Preference[J]. Proceedings of the Human Factors and Ergonomics Society Annual Meeting,2002, 46(12): 1271-1275.
[6] Baeza-Yates R, Calderón-Benavides L, González-Caro C.The Intention Behind Web Queries[C]// Proceedings of the 13th International Conference on String Processing and Information Retrieval. 2006: 98-109.
[7] Mendoza M, Baeza-Yates R.A Web Search Analysis Considering the Intention Behind Queries[C]// Proceedings of the 2008 Latin American Web Conference. 2008: 66-74.
[8] Wang Y, Agichtein E.Query Ambiguity Revisited: Clickthrough Measures for Distinguishing Informational and Ambiguous Queries[C]// Proceedings of the 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics. 2010: 361-364.
[9] Song R, Luo Z, Wen J R, et al.Identifying Ambiguous Queries in Web Search[C]// Proceedings of the 16th International Conference on World Wide Web. ACM, 2007: 1169-1170.
[10] Song R, Luo Z, Nie J Y, et al.Identification of Ambiguous Queries in Web Search[J]. Information Processing and Management, 2009, 45(2): 216-229.
[11] Song R, Dou Z, Hon H W, et al.Learning Query Ambiguity Models by Using Search Logs[J]. Journal of Computer Science and Technology, 2010, 25(4): 728-738.
[12] Pradhan N, Deolalikar V, Li K.Atypical Queries in eCommerce[C]// Proceedings of the 24th ACM International on Conference on Information and Knowledge Management. ACM, 2015: 1767-1770.
[13] Lioma C, Blanco R, Moens M.A Logical Inference Approach to Query Expansion with Social Tags[C]// Proceedings of the 2nd ACM International Conference on the Theory of Information Retrieval. 2009: 358-361.
[14] Lioma C, Ounis I.A Syntactically-based Query Reformulation Technique for Information Retrieval[J]. Information Processing and Management, 2008, 44(1): 143-162.
[15] Welch M J, Cho J, Olston C.Search Result Diversity for Informational Queries[C]// Proceedings of the 20th International Conference on World Wide Web. ACM, 2011: 237-246.
[16] Santos R L T, Macdonald C, Ounis I. Intent-aware Search Result Diversification[C]// Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2011: 595-604.
[17] Ashkan A, Clarke C L A. On the Informativeness of Cascade and Intent-aware Effectiveness Measures[C]// Proceedings of the 20th International Conference on World Wide Web. ACM, 2011: 407-416.
[18] Zhou K, Whiting S, Jose J, et al.The Impact of Temporal Intent Variability on Diversity Evaluation[C]// Proceedings of the 35th European Conference on Advances in Information Retrieval. 2013: 820-823.
[19] Stojanovic N.On Analysing Query Ambiguity for Query Refinement: The Librarian Agent Approach[C]// Proceedings of the 22nd International Conference on Conceptual Modeling. 2003: 490-505.
[20] Qiu G, Liu K, Bu J, et al.Quantify Query Ambiguity Using ODP Metadata[C]// Proceedings of the 30th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2007: 697-698.
[21] Cronen-Townsend S, Croft W B.Quantifying Query Ambiguity[C]// Proceedings of the 2nd International Conference on Human Language Technology Research. 2002: 104-109.
[22] Yano Y, Tagami Y, Tajima A.Quantifying Query Ambiguity with Topic Distributions[C]// Proceedings of the 25th ACM Conference on Information and Knowledge Management. 2016: 1877-1880.
[23] Teevan J, Dumais S T, Liebling D J.To Personalize or Not to Personalize: Modeling Queries with Variation in User Intent[C]// Proceedings of the 31st International ACM SIGIR Conference on Research and Development in Information Retrieval. 2008: 163-170.
[24] Fleiss J L.Measuring Nominal Scale Agreement Among Many Raters[J]. Psychological Bulletin, 1971, 76(5): 378-382.
[25] Teevan J, Dumais S T, Horvitz E. Potential for Personalization[J]. ACM Transactions on Computer-Human Interaction, 2010, 17(1): Article No.4.
[26] Dou Z, Song R, Wen J R.A Large-scale Evaluation and Analysis of Personalized Search Strategies[C]// Proceedings of the 16th International Conference on World Wide Web. 2007: 581-590.
[27] Lavrenko V, Croft W B.Relevance Based Language Models[C]// Proceedings of the 24th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2001:120-127.
[28] Clarke C L, Craswell N, Soboroff I.Overview of the TREC 2009 Web Track[C]// Proceedings of the Text Retrieval Conference. 2009.
[29] Clarke C L, Craswell N, Soboroff I, et al.Overview of the TREC 2010 Web Track[C]// Proceedings of the Text Retrieval Conference. 2010.
[30] Clarke C L, Craswell N, Soboroff I, et al.Overview of the TREC 2011 Web Track[C]// Proceedings of the Text Retrieval Conference. 2011.
[31] Clarke C L, Craswell N, Voorhees E M.Overview of the TREC 2012 Web Track[C]// Proceedings of the Text Retrieval Conference. 2012.
[32] Yilmaz E, Aslam J A, Robertson S.A New Rank Correlation Coefficient for Information Retrieval[C]// Proceedings of the 31st International ACM SIGIR Conference on Research and Development in Information Retrieval. 2008: 587-594.
[33] Cohen J.Statistical Power Analysis for the Behavioral Sciences[M]. L. Erlbaum Associates, 1988.
[34] Sokolova M, Lapalme G.A Systematic Analysis of Performance Measures for Classification Tasks[J]. Information Processing and Management, 2009, 45(4): 427-437.
[1] 吴朋民,陈挺,王小梅. Altmetrics与引文指标相关性研究[J]. 数据分析与知识发现, 2018, 2(6): 58-69.
[2] 陆伟,罗梦奇,丁恒,李信. 深度学习图像标注与用户标注比较研究*[J]. 数据分析与知识发现, 2018, 2(5): 1-10.
[3] 张肃. 中国城镇居民信息消费的空间相关性与影响因素分析*——基于动态空间杜宾面板模型的实证研究[J]. 数据分析与知识发现, 2017, 1(5): 52-61.
[4] 王晓玉,李斌. 基于CRFs和词典信息的中古汉语自动分词*[J]. 数据分析与知识发现, 2017, 1(5): 62-70.
[5] 余凡, 楼雯. 领域概念的三层递进筛选方法研究[J]. 现代图书情报技术, 2015, 31(4): 26-33.
[6] 乔建忠. 一种基于统计特征面向“类型”主题抓取的网页相关性判断策略研究[J]. 现代图书情报技术, 2012, 28(6): 9-16.
[7] 徐树维. 同步协作检索结果的相关性判断策略[J]. 现代图书情报技术, 2012, 28(4): 41-47.
[8] 吴红, 王凤英, 付秀颖. 面向专利分析的法律状态分布式采集系统的设计与实现[J]. 现代图书情报技术, 2012, (12): 66-71.
[9] 成颖. 基于相关性判据的学术信息检索系统成功模型建构[J]. 现代图书情报技术, 2011, 27(9): 46-53.
[10] 成颖. 基于相关性判据的学术信息检索系统成功模型实证分析[J]. 现代图书情报技术, 2011, 27(10): 45-53.
[11] 王军辉, 胡铁军, 李丹亚. 相关文献检索研究综述[J]. 现代图书情报技术, 2011, 27(1): 39-45.
[12] 冯平, 黄名选. 特征词抽取和相关性融合的伪相关反馈查询扩展[J]. 现代图书情报技术, 2011, 27(1): 52-56.
[13] 余希田,万莉莉,胡铁军,李丹亚. 基于向量空间模型的文献相关性数据库的研究与实现*[J]. 现代图书情报技术, 2008, 24(6): 61-66.
[14] 曾新红,林伟明,明仲. 中文叙词表本体一致性检测机制研究与实现*[J]. 现代图书情报技术, 2008, 24(5): 1-9.
[15] 徐德智,王庆涛,王斌 . 基于本体的Web信息采集*[J]. 现代图书情报技术, 2007, 2(2): 53-55.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn