Please wait a minute...
Data Analysis and Knowledge Discovery  2024, Vol. 8 Issue (1): 90-103    DOI: 10.11925/infotech.2096-3467.2022.1281
Current Issue | Archive | Adv Search |
Identifying User Satisfaction Levels and Evolution Patterns in Exploratory Search
Zhao Yiming1,2,3,Chen Zhan2,3,4,Zhang Fan2,5()
1The Center for Studies of Information Resources, Wuhan University, Wuhan 430072, China
2School of Information Management, Wuhan University, Wuhan 430072, China
3Big Data Institute, Wuhan University, Wuhan 430072, China
4National Demonstration Center for Experimental Library and Information Science Education, Wuhan University, Wuhan 430072, China
5Center for Science, Technology & Education Assessment, Wuhan University, Wuhan 430072, China
Download: PDF (920 KB)   HTML ( 10
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper identifies the user satisfaction levels in exploratory search and reveals the interaction and evolution between user satisfaction and reconstruction patterns of queries. [Methods] First, we retrieved the characteristics of user queries and their temporal sequences. Then, we used four supervised learning algorithms to predict user satisfaction levels. Third, we identified the interaction between user satisfaction and query reformulations. Finally, we developed new recommendation strategies for query reformulation in intelligent exploratory search assistance. [Results] We examined the proposed model with an open benchmark dataset, and the model’s prediction accuracy reached 74%, surpassing existing baseline models. There is a significant association between user satisfaction and query reformulation patterns. [Limitations] User satisfaction represents only one of the search perspectives. Future research should focus on constructing a comprehensive and unified description and classification system for users in exploratory search. [Conclusions] The proposed model further enhances the performance of the user satisfaction prediction. It provides theoretical support for intelligent search assistance strategy.

Key wordsExploratory Search      User Satisfaction      User Satisfaction Prediction      Query Reformulation     
Received: 03 December 2022      Published: 28 March 2023
ZTFLH:  TP393  
  G250  
Fund:National Natural Science Foundation of China(72274146);National Natural Science Foundation of China(71874130);National Natural Science Foundation of China(71921002)
Corresponding Authors: Zhang Fan,ORCID:0000-0003-0831-7371,E-mail:fan.zhang@whu.edu.cn。   

Cite this article:

Zhao Yiming, Chen Zhan, Zhang Fan. Identifying User Satisfaction Levels and Evolution Patterns in Exploratory Search. Data Analysis and Knowledge Discovery, 2024, 8(1): 90-103.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2022.1281     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2024/V8/I1/90

特征变量 描述
query_length 查询式长度
relation 查询式与上一查询式的关系
query_complexity 查询式复杂度(分词、去重、去停用词后独立的词汇数量)
query_time 查询总时长、浏览时长
SERP_num 查看结果列表页数量
open_link_num 打开结果页数量
open_link_rank_avg 打开结果页的平均排名
open_link_rank_max 打开结果页的最深排名
open_link_dwell_avg 在结果页中的平均停留时长
open_link_dwell_max 在结果页中的最大停留时长
open_link_dwell_total 在结果页中的总停留时长
mouse_move_num 鼠标移动次数
mouse_move_time 鼠标移动时间
mouse_scroll_num 鼠标滚动次数
mouse_scroll_dist 鼠标滚动总距离
Query Features in Satisfaction Prediction Model
特征变量 描述
session_index 本次查询在会话中的次序
last_query_length 上一个查询式长度
last_sim 上一个查询式与当前查询的相似度(共享的独立词汇/全部的独立词汇)
new_term 当前查询相比上一个查询的独立词汇数量
last_query_time 上一个查询总时长
last_SERP_num 上一个查询查看结果列表页数量
last_open_link_num 上一个查询打开结果页数量
last_open_link_rank_avg 上一个查询打开结果页的平均排名
last_open_link_rank_max 上一个查询打开结果页的最深排名
last_open_link_dwell_avg 上一个查询在结果页中的平均停留时长
last_open_link_dwell_max 上一个查询在结果页中的最大停留时长
last_open_link_dwell_total 上一个查询在结果页中的总停留时长
last_mouse_move_num 上一个查询鼠标移动次数
last_mouse_move_time 上一个查询鼠标移动时间
last_mouse_scroll_num 上一个查询鼠标滚动次数
last_mouse_scroll_dist 上一个查询鼠标滚动总距离
last_query_time_ratio 上一个查询总时长与本次的比值
last_SERP_num_ratio 上一个查询查看结果列表页数量与本次的比值
last_open_link_num_ratio 上一个查询打开结果页数量与本次的比值
last_open_link_rank_avg_ratio 上一个查询打开结果页的平均排名与本次的比值
last_open_link_rank_max_ratio 上一个查询打开结果页的最深排名与本次的比值
last_open_link_dwell_avg_ratio 上一个查询在结果页中的平均停留时长与本次的比值
last_open_link_dwell_max_ratio 上一个查询在结果页中的最大停留时长与本次的比值
last_open_link_dwell_total_ratio 上一个查询在结果页中的总停留时长与本次的比值
last_mouse_move_num_ratio 上一个查询鼠标移动次数与本次的比值
last_mouse_move_time_ratio 上一个查询鼠标移动时间与本次的比值
last_mouse_scroll_num_ratio 上一个查询鼠标滚动次数与本次的比值
last_mouse_scroll_dist_ratio 上一个查询鼠标滚动总距离与本次的比值
history_query_time_ratio 当前会话中的平均查询总时长与本次的比值
history_SERP_num_ratio 当前会话中的平均查看结果列表页数量与本次的比值
history_open_link_num_ratio 当前会话中的平均打开结果页数量与本次的比值
history_open_link_rank_avg_ratio 当前会话中打开结果页的平均排名与本次的比值
history_open_link_rank_max_ratio 当前会话中打开结果页的最深排名与本次的比值
history_open_link_dwell_avg_ratio 当前会话中在结果页中的平均停留时长与本次的比值
history_open_link_dwell_max_ratio 当前会话中在结果页中的最大停留时长与本次的比值
history_open_link_dwell_total_ratio 当前会话中的平均结果页总停留时长与本次的比值
history_mouse_move_num_ratio 当前会话中的平均鼠标移动次数与本次的比值
history_mouse_move_time_ratio 当前会话中的平均鼠标移动时间与本次的比值
history_mouse_scroll_num_ratio 当前会话中的平均鼠标滚动次数与本次的比值
history_mouse_scroll_dist_ratio 当前会话中的平均鼠标滚动总距离与本次的比值
Temporal Features in Satisfaction Prediction Model
特征变量 描述
user_experience 用户领域经验水平
user_NIT 用户计算机使用水平
user_query_time_avg 用户历史平均查询时长
user_SERP_num_avg 用户历史平均查看结果列表页数量
user_open_link_num_avg 用户历史平均打开结果页数量
user_open_link_rank_avg 用户打开结果页的历史平均排名
user_open_link_rank_max_avg 用户打开结果页的历史平均最深排名
user_open_link_dwell_avg 用户在结果页中的历史平均停留时长
user_open_link_dwell_max_avg 用户在结果页中的历史平均最大停留时长
user_open_link_dwell_total_avg 用户在结果页中的历史平均总停留时长
user_mouse_move_num_avg 用户历史平均鼠标移动次数
user_mouse_move_time_avg 用户历史平均鼠标移动时间
user_mouse_scroll_num_avg 用户历史平均鼠标滚动次数
user_mouse_scroll_dist_avg 用户历史平均鼠标滚动总距离
user_query_time_avg_ratio 用户历史平均查询时长与本次的比值
user_SERP_num_avg_ratio 用户历史平均查看结果列表页数量与本次的比值
user_open_link_num_avg_ratio 用户历史平均打开结果页数量与本次的比值
user_open_link_rank_avg_ratio 用户打开结果页的历史平均排名与本次的比值
user_open_link_rank_max_avg_ratio 用户打开结果页的历史平均最深排名与本次的比值
user_open_link_dwell_avg_ratio 用户在结果页中的历史平均停留时长与本次的比值
user_open_link_dwell_max_avg_ratio 用户在结果页中的历史平均最大停留时长与本次的比值
user_open_link_dwell_total_avg_ratio 用户在结果页中的历史平均总停留时长与本次的比值
user_mouse_move_num_avg_ratio 用户历史平均鼠标移动次数与本次的比值
user_mouse_move_time_avg_ratio 用户历史平均鼠标移动时间与本次的比值
user_mouse_scroll_num_avg_ratio 用户历史平均鼠标滚动次数与本次的比值
user_mouse_scroll_dist_avg_ratio 用户历史平均鼠标滚动总距离与本次的比值
User Features in the Satisfaction Prediction Model
relation字段 当前查询式与上一查询式的关系
0 初始查询式
1 同一主题替换关键词
2 同一主题增加关键词
3 同一主题删除关键词
4 前一查询式的子主题
5 同一主题下的不同子主题
6 与前一主题相关的新主题
Query Reformulation Pattern (QRP)
特征组合 预测准确率/%
K近邻 决策树 随机森林 支持向量机
查询特征 49.65 52.28 55.23 51.62
查询特征+时序特征 52.22 60.06 62.52 59.20
查询特征+用户特征 60.18 63.52 67.89 64.58
查询特征+时序特征+用户特征 62.46 70.81 74.03 67.18
Accuracy of Four Prediction Models with Different Features
Feature Importance in the Prediction Models
差异源 平方和 自由度 均方 F p
组间 867.482 4 216.870 83.789 0.000
组内 6 990.995 2 701 2.588
总计 7 858.476 2 705
ANOVA Results Between Last Satisfaction and QRP
莱文统计 自由度 1 自由度 2 p
query_relation 基于平均值 24.058 4 2 701 0.000
基于中位数 16.554 4 2 701 0.000
基于中位数并具有调整后自由度 16.554 4 2 584.938 0.000
基于剪除后平均值 24.401 4 2701 0.000
Homogeneity of Variances Results Between Last Satisfaction and QRP
(I) last_satisfaction 平均值差值(I-J) 标准误 p 95% 置信区间
下限 上限
Tamhane 0 1 -0.166 0.180 0.988 -0.670 0.340
2 -0.623* 0.159 0.001 -1.070 -0.170
3 -1.426* 0.147 0.000 -1.840 -1.010
4 -1.673* 0.149 0.000 -2.090 -1.250
1 0 0.166 0.180 0.988 -0.340 0.670
2 -0.456* 0.139 0.011 -0.850 -0.070
3 -1.260* 0.124 0.000 -1.610 -0.910
4 -1.507* 0.127 0.000 -1.860 -1.150
2 0 0.623* 0.159 0.001 0.170 1.070
1 0.456* 0.139 0.011 0.070 0.850
3 -0.803* 0.092 0.000 -1.060 -0.540
4 -1.051* 0.095 0.000 -1.320 -0.780
3 0 1.426* 0.147 0.000 1.010 1.840
1 1.260* 0.124 0.000 0.910 1.610
2 0.803* 0.092 0.000 0.540 1.060
4 -0.247* 0.073 0.007 -0.450 -0.040
4 0 1.673* 0.149 0.000 1.250 2.090
1 1.507* 0.127 0.000 1.150 1.860
2 1.051* 0.095 0.000 0.780 1.320
3 0.247* 0.073 0.007 0.040 0.450
Multiple Analysis (Tamhane’s T2) Results Between Last Satisfaction and QRP
差异源 平方和 自由度 均方 F p
组间 141.627 5 28.325 23.788 0.000
组内 3 215.011 2 700 1.191
总计 3 356.637 2 705
ANOVA Results Between QRP and Satisfaction
莱文统计 自由度 1 自由度 2 p
satisfaction 基于平均值 27.267 5 2 700 0.000
基于中位数 11.488 5 2 700 0.000
基于中位数并具有调整后自由度 11.488 5 2 537.416 0.000
基于剪除后平均值 24.955 5 2 700 0.000
Homogeneity of Variances Results Between QRP and Satisfaction
(I) query_relation 平均值差值(I-J) 标准误 p 95% 置信区间
下限 上限
Tamhane 1 2 -0.160 0.087 0.641 -0.410 0.090
3 0.094 0.148 1.000 -0.350 0.540
4 -0.462* 0.079 0.000 -0.690 -0.230
5 -0.403* 0.069 0.000 -0.610 -0.200
6 -0.731* 0.081 0.000 -0.970 -0.490
2 1 0.160 0.087 0.641 -0.090 0.410
3 0.254 0.148 0.751 -0.190 0.690
4 -0.303* 0.078 0.002 -0.530 -0.070
5 -0.243* 0.069 0.006 -0.440 -0.040
6 -0.571* 0.080 0.000 -0.810 -0.340
3 1 -0.094 0.148 1.000 -0.540 0.350
2 -0.254 0.148 0.751 -0.690 0.190
4 -0.556* 0.143 0.003 -0.980 -0.130
5 -0.497* 0.138 0.008 -0.910 -0.080
6 -0.825* 0.144 0.000 -1.260 -0.390
4 1 0.462* 0.079 0.000 0.230 0.690
2 0.303* 0.078 0.002 0.070 0.530
3 0.556* 0.143 0.003 0.130 0.980
5 0.060 0.058 0.996 -0.110 0.230
6 -0.269* 0.072 0.003 -0.480 -0.060
5 1 0.403* 0.069 0.000 0.200 0.610
2 0.243* 0.069 0.006 0.040 0.440
3 0.497* 0.138 0.008 0.080 0.910
4 -0.060 0.058 0.996 -0.230 0.110
6 -0.328* 0.061 0.000 -0.510 -0.150
6 1 0.731* 0.081 0.000 0.490 0.970
2 0.571* 0.080 0.000 0.340 0.810
3 0.825* 0.144 0.000 0.390 1.260
4 0.269* 0.072 0.003 0.060 0.480
5 0.328* 0.061 0.000 0.150 0.510
Multiple Analysis (Tamhane’s T2) Results Between QRP and Satisfaction
差异源 平方和 自由度 均方 F p
组间 105.914 1 105.914 88.100 0.000
组内 3 250.724 2 704 1.202
总计 3 356.637 2 705
ANOVA Results Between QRP(Two Categories) and Satisfaction
QRPs under Different Levels of Last Satisfaction
QRPs Under Different Levels of Last Satisfaction
relation 平均
满意度
上一次平均
满意度
满意度
改变量
1 替换 2.52 2.16 0.36
2 增加 2.68 2.36 0.32
3 删除 2.42 2.1 0.32
4 上一查询子主题 2.98 3.07 -0.09
5 同一话题下子主题 2.92 2.97 -0.05
6 相关的新主题 3.24 3.25 -0.01
Satisfaction Changes Caused by Different QRPs
Changes of User Satisfaction under Different QRPs
relation 1 替换 2 增加 3 删除 4 上一查询的子主题 5 同一话题下的子主题 6 相关的新主题
1 替换 32.6% 13.4% 3.6% 12.7% 26.1% 11.7%
2 增加 20.7% 18.5% 8.5% 13.7% 26.2% 12.5%
3 删除 13.8% 31.0% 6.9% 13.8% 20.7% 13.8%
4 上一查询的子主题 11.6% 6.3% 2.8% 24.8% 46.4% 8.2%
5 同一话题下的子主题 10.2% 8.9% 1.2% 13.1% 60.9% 5.7%
6 相关的新主题 10.5% 13.3% 1.6% 9.0% 16.8% 48.8%
Transition Probability Matrix among Different QRPs
relation 1 替换 2 增加 3 删除 4 上一查询的子主题 5 同一话题下的子主题 6 相关的新主题
1 替换 1.81 2.64 3.00 2.93 2.80 2.80
2 增加 2.19 2.58 2.82 2.94 2.85 3.33
3 删除 2.00 2.83 2.00 2.50 3.00 3.33
4 上一查询的子主题 3.00 2.88 2.50 3.13 3.00 3.25
5 同一话题下的子主题 2.58 3.29 2.75 3.00 2.93 3.33
6 相关的新主题 2.67 3.50 2.50 3.00 3.33 3.32
Average Values of User Satisfaction under Different Transition Path of QRPs
[1] Marchionini G. Exploratory Search: From Finding to Understanding[J]. Communications of the ACM, 2006, 49(4): 41-46.
[2] 赵一鸣. 智能时代的搜索与问答服务创新研究[M]. 北京: 科学出版社, 2020.
[2] (Zhao Yiming. Research on the Innovation of Search and Question Answering Service in the Intelligent Age[M]. Beijing: Science Press, 2020.)
[3] Liu J Q, Sarkar S, Shah C. Identifying and Predicting the States of Complex Search Tasks[C]// Proceedings of the 2020 Conference on Human Information Interaction and Retrieval. 2020: 193-202.
[4] Liu J Q, Yu R. State-Aware Meta-Evaluation of Evaluation Metrics in Interactive Information Retrieval[C]// Proceedings of the 30th ACM International Conference on Information & Knowledge Management. 2021: 3258-3262.
[5] 赵一鸣, 程宗, 陈忆金. 探寻式搜索路径与搜索意图转换路径的协同分析[J]. 情报资料工作, 2021, 42(6): 82-90.
[5] (Zhao Yiming, Cheng Zong, Chen Yijin. Collaborative Analysis of Exploratory Search Path and Search Intention Conversion Path[J]. Information and Documentation Services, 2021, 42(6): 82-90.)
[6] Lee H J, Lee J, Makara K A, et al. Does Higher Education Foster Critical and Creative Learners? An Exploration of Two Universities in South Korea and the USA[J]. Higher Education Research & Development, 2015, 34(1): 131-146.
[7] Anderson L W, Krathwohl D R, Airasian P W, et al. A Taxonomy for Learning, Teaching and Assessing: A Revision of Bloom’s Taxonomy of Educational Objectives[M]. New York: Addison Wesley Longman, 2001.
[8] Capra R, Arguello J. Using Trails to Support Users with Tasks of Varying Scope[C]// Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. 2019: 977-980.
[9] Capra R, Arguello J, O’Brien H, et al. The Effects of Manipulating Task Determinability on Search Behaviors and Outcomes[C]// Proceedings of the 41st International ACM SIGIR Conference on Research & Development in Information Retrieval. 2018: 445-454.
[10] Choi B, Arguello J, Capra R, et al. OrgBox: A Knowledge Representation Tool to Support Complex Search Tasks[C]// Proceedings of the 2021 Conference on Human Information Interaction and Retrieval. 2021: 219-228.
[11] Crescenzi A, Capra R, Arguello J. Time Limits, Information Search and the Use of Search Assistance[C]// Proceedings of the 2017 Conference on Human Information Interaction and Retrieval. 2017: 349-352.
[12] Kelly D. Methods for Evaluating Interactive Information Retrieval Systems with Users[J]. Foundations and Trends® in Information Retrieval, 2009, 3(1-2): 1-224.
doi: 10.1561/1500000012
[13] Al-Maskari A, Sanderson M, Clough P. The Relationship Between IR Effectiveness Measures and User Satisfaction[C]// Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 2007: 773-774.
[14] Huffman S B, Hochster M. How Well does Result Relevance Predict Session Satisfaction?[C]// Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 2007: 567-574.
[15] Wang B, Liu J Q. Extracting the Implicit Search States from Explicit Behavioral Signals in Complex Search Tasks[J]. Proceedings of the Association for Information Science and Technology, 2021, 58(1): 854-856.
doi: 10.1002/pra2.v58.1
[16] Kim Y, Hassan A, White R W, et al. Modeling Dwell Time to Predict Click-Level Satisfaction[C]// Proceedings of the 7th ACM International Conference on Web Search and Data Mining. 2014: 193-202.
[17] Liu Y Q, Chen Y, Tang J H, et al. Different Users, Different Opinions: Predicting Search Satisfaction with Mouse Movement Information[C]// Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2015: 493-502.
[18] Zhang F, Mao J X, Liu Y Q, et al. Models Versus Satisfaction: Towards a Better Understanding of Evaluation Metrics[C]// Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 2020: 379-388.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn