Please wait a minute...
Advanced Search
数据分析与知识发现  2019, Vol. 3 Issue (3): 66-75     https://doi.org/10.11925/infotech.2096-3467.2018.0550
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
基于查询表达式特征的时态意图识别研究*
桂思思1,2,陆伟3,张晓娟4()
1武汉大学信息管理学院 武汉 430072
2武汉大学信息检索与知识挖掘研究所 武汉 430072
3武汉大学信息资源研究中心 武汉 430072
4西南大学计算机与信息科学学院 重庆 400715
Temporal Intent Classification with Query Expression Feature
Sisi Gui1,2,Wei Lu3,Xiaojuan Zhang4()
1School of Information Management, Wuhan University, Wuhan 430072, China
2Institute for Information Retrieval and Knowledge Mining, Wuhan University, Wuhan 430072, China
3Center for Studies of Information Resources, Wuhan University, Wuhan 430072, China
4School of Computer and Information Science, Southwest University, Chongqing 400715, China
全文: PDF (2148 KB)   HTML ( 2
输出: BibTeX | EndNote (RIS)      
摘要 

【目的】针对时态意图识别问题, 探讨可抽取查询表达式特征的有效性及采用不同类别分类算法的识别准确度, 为后续相关研究提供一定的借鉴。【方法】按查询表达式特征与时间的关联性, 将其归类为时间无关特征、潜在时间特征、显式时间特征。在此基础上, 分别采用有监督分类算法及半监督分类算法, 探讨采用不同特征组合的有效性及不同分类算法的识别准确度。【结果】在抽取的三类查询表达式特征中, 仅使用显式时间特征的平均分类准确率最高, 且“查询是否包含年份”这一特征为强特征; 使用不同分类算法的识别准确度相差不大; 时态意图识别结果优于已有参与时态意图分类子任务(TQIC)测评的成果, 平均分类准确率为81.14%。【局限】限于数据集的获取途径, 仅对300条查询的时态意图识别效果进行验证; 仅考虑已有的查询表达式特征, 未提出用于时态意图识别的新特征。【结论】查询表达式特征中与时间关联性高的特征能提高时态意图识别准确度, 而基于统计的特征(如查询词长度)对时态意图识别分类准确度的提升效果不明显。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
桂思思
陆伟
张晓娟
关键词 时态意图有监督分类半监督分类特征抽取    
Abstract

[Objective] This paper investigates the effectiveness of query-based features and compares the performance of two types of classifiers in a query temporal intent classification task. [Methods] This paper first reviews all query-based features and then classifies those features into three types, according to their temporal relevance, namely, atemporal, implicit temporal and explicit temporal. Then, it tests accuracy of a temporal query intent classification task, using a supervised classifier and a semi-supervised classifier individually, with various combinations of query-based features of different types. [Results] Among all tested query-based features, using explicit temporal features achieves best accuracy, especially for the feature on whether a query contains a year; The performance hardly varies across classifiers; Our best macro average accuracy of 81.14% is higher than that in previous studies with the same experimental setups. [Limitations] Due to accessibility of dataset, our experiments are done on a limited size dataset. Only existing query-based features are studied and no new feature is proposed or tested. [Conclusions] Using highly temporal relevant features can improve accuracy in temporal query intent classification task, whereas using slightly temporal relevant features could hardly improve accuracy.

Key wordsTemporal Intent    Supervised Classification    Semi-supervised Classification    Feature Engineering
收稿日期: 2018-05-17      出版日期: 2019-04-17
基金资助:*本文系国家社会科学基金青年项目“融合用户个性化与实时性意图的查询推荐模型研究”(项目编号: 15 CT Q019)的研究成果之一
引用本文:   
桂思思,陆伟,张晓娟. 基于查询表达式特征的时态意图识别研究*[J]. 数据分析与知识发现, 2019, 3(3): 66-75.
Sisi Gui,Wei Lu,Xiaojuan Zhang. Temporal Intent Classification with Query Expression Feature. Data Analysis and Knowledge Discovery, 2019, 3(3): 66-75.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2018.0550      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2019/V3/I3/66
[1] Broder A.A Taxonomy of Web Search[J]. SIGIR Forum, 2002, 36(2): 3-10.
[2] Sushmita S, Piwowarski B, Lalmas M.Dynamics of Genre and Domain Intents[C]// Proceedings of the 6th Asia Information Retrieval Societies Conference on Information Retrieval Technology. Springer, 2010: 399-409.
[3] Calderón-Benavides L, González-Caro C, Baeza-Yates R A. Towards a Deeper Understanding of the User’s Query Intent[C]// Proceedings of the 2010 Workshop on Query Representation and Understanding. 2010: 21-24.
[4] Nguyen B V, Kan M.Functional Faceted Web Query Analysis[C]// Proceedings of the 16th International World Wide Web Conference. 2007.
[5] González-Caro C, Baeza-Yates R.A Multi-faceted Approach to Query Intent Classification[C]// Proceedings of the 18th International Conference on String Processing and Information Retrieval. 2011: 368-379.
[6] Campos R, Dias G, Jorge A M.What is the Temporal Value of Web Snippets?[C]// Proceedings of the 1st International Temporal Web Analytics Workshop. 2011: 9-16.
[7] 张晓娟, 韩毅. 时态信息检索研究综述[J]. 数据分析与知识发现, 2017, 1(1): 3-15.
[7] (Zhang Xiaojuan, Han Yi.Reviews on Temporal Information Retrieval[J]. Data Analysis and Knowledge Discovery, 2017, 1(1): 3-15.)
[8] Jones R, Diaz F. Temporal Profiles of Queries[J]. ACM Transactions on Information Systems, 2007, 25(3): Article No.14.
[9] Joho H, Jatowt A, Blanco R, et al.Overview of NTCIR-11 Temporal Information Access (Temporalia) Task[C]// Proceedings of the 11th NTCIR Conference on Evaluation of Information Access Technologies. 2014: 217-224.
[10] Mizzaro S.How Many Relevances in Information Retrieval?[J]. Interacting with Computers, 1998, 10(3): 303-320.
[11] Yu H, Kang X, Ren F.TUTA1 at the NTCIR-11 Temporalia Task[C]// Proceedings of the 11th NTCIR Conference on Evaluation of Information Access Technologies. 2014: 461-467.
[12] Shah A, Shah D, Majumder P.Andd7@NTCIR-11 Temporal Information Access Task[C]// Proceedings of the 11th NTCIR Conference on Evaluation of Information Access Technologies. 2014: 456-460.
[13] Filannino M, Nenadic G.Using Machine Learning to Predict Temporal Orientation of Search Engines’ Queries in the Temporalia Challenge[C]// Proceedings of the 11th NTCIR Conference on Evaluation of Information Access Technologies. 2014: 438-442.
[14] Burghartz R, Berberich K.MPI-INF at the NTCIR-11 Temporal Query Classification Task[C]// Proceedings of the 11th NTCIR Conference on Evaluation of Information Access Technologies. 2014: 443-450.
[15] Hasanuzzaman M, Dias G, Ferrari S.HULTECH at the NTCIR-11 Temporalia Task: Ensemble Learning for Temporal Query Intent Classification[C]// Proceedings of the 11th NTCIR Conference on Evaluation of Information Access Technologies. 2014: 478-482.
[16] Campos R, Dias G, Jorge A, et al.GTE: A Distributional Second-order Co-occurrence Approach to Improve the Identification of Top Relevant Dates in Web Snippets[C]// Proceedings of the 21st ACM International Conference on Information and Knowledge Management. 2012: 2035-2039.
[17] Hasanuzzaman M, Saha S, Dias G, et al.Understanding Temporal Query Intent[C]// Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2015: 823-826.
[18] Hou Y, Tan C, Xu J, et al.HITSZ-ICRC at NTCIR-11 Temporalia Task[C]// Proceedings of the 11th NTCIR Conference on Evaluation of Information Access Technologies. 2014: 468-473.
[19] Miller G A.WordNet: A Lexical Database for English[J]. Communications of the ACM, 1995, 38(11): 39-41.
[20] Sokolova M, Lapalme G.A Systematic Analysis of Performance Measures for Classification Tasks[J]. Information Processing and Management, 2009, 45(4): 427-437.
[1] 余本功,汲浩敏. 基于DW-TCI的半监督文本分类方法研究*[J]. 数据分析与知识发现, 2020, 4(10): 58-69.
[2] 聂卉,何欢. 引入词向量的隐性特征识别研究*[J]. 数据分析与知识发现, 2020, 4(1): 99-110.
[3] 李晓峰,马静,李驰,朱恒民. 基于XGBoost模型的电商商品品名识别算法研究 *[J]. 数据分析与知识发现, 2019, 3(7): 34-41.
[4] 杨陟卓,韩燮. 一种基于特征抽取的文档信息过滤算法研究[J]. 现代图书情报技术, 2008, 24(4): 29-34.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn