Please wait a minute...
Advanced Search
数据分析与知识发现  2017, Vol. 1 Issue (4): 9-19     https://doi.org/10.11925/infotech.2096-3467.2017.04.02
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
信息类、导航类与事务类查询的网络动态性分析*
张晓娟()
西南大学计算机与信息科学学院 重庆 400715
Analyzing Dynamic Informational, Navigational and Transactional Online Queries
Zhang Xiaoojuan()
School of Computer and Information Science, Southwest University, Chongqing 400715, China
全文: PDF (2609 KB)   HTML ( 7
输出: BibTeX | EndNote (RIS)      
摘要 

目的】分析信息类、导航类与事务类查询随时间的网络动态性特征, 以期为搜索引擎性能优化提供相关依据。【方法】利用相关评测指标分别从查询动态﹑文档内容动态和信息需求动态三个角度出发, 分析不同意图类别查询随时间变化所呈现的特征; 针对不同意图类别查询, 分析在不同查询流行度特征中, 其文档内容以及信息需求的变化情况。【结果】在查询流行度分布方面, 信息类查询通常包含波峰, 事务类查询更可能包含多个波峰且具有周期性, 导航类查询通常保持平滑趋势; 信息类查询随网页内容与信息需求变化幅度均比其他两类查询的要大。【局限】观察时间段只有29天; 未对不包含波峰与包含多个波峰的查询流行度分布图中波峰进行归类与自动识别。【结论】对于信息类查询来说, 搜索引擎尽可能地对其查询结果进行多样化展示; 对于导航类查询来说, 搜索引擎需要保证与之相关权威网页在查询结果中的靠前性; 对于与用户交互行为相关的事务类查询, 应长时间保持相关网页排序不变; 对于一些与娱乐相关事务类查询, 在网页排序中需考虑网页的新颖性。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
张晓娟
关键词 信息类查询事务类查询导航类查询查询动态信息需求动态文档内容动态    
Abstract

[Objective] This paper aims to improve the performance of search engines optimization through analyzing dynamic informational, navigational and transactional online queries. [Methods] First, the author analyzed user intentions with queries, Web documents and the information needs. Second, for each category of query intention, this paper investigated the changing of Web documents and information needs for different trending queries. [Results] The distribution of popular informational, transactional and navigational queries were different. The informational queries were more dependent on Web documents and needs than the other two types of queries. [Limitations] The data for this study was collected in 29 days. More research is needed to automatically identify and aggregate the popular queries. [Conclusions] Search engines need to list diversified results for informational queries. They need to keep the relevant pages on the first page for navigational queries, maintain the original ranking of relevant pages for the user behavior-related queries, and improve the novelty of results for the entertainment-related queries.

Key wordsInformational Query    Transactional Query    Navigational Query    Query Dynamic    Information Need Dynamic    Document Content Dynamic
收稿日期: 2016-11-07      出版日期: 2017-05-24
ZTFLH:  G353.4  
基金资助:*本文系国家社科基金青年项目“融合用户个性化与实时性意图的查询推荐模型研究”(项目编号: 15CTQ019)和西南大学博士启动基金“查询意图自动分类与分析研究”(项目编号: SWU114093)的研究成果之一
引用本文:   
张晓娟. 信息类、导航类与事务类查询的网络动态性分析*[J]. 数据分析与知识发现, 2017, 1(4): 9-19.
Zhang Xiaoojuan. Analyzing Dynamic Informational, Navigational and Transactional Online Queries. Data Analysis and Knowledge Discovery, 2017, 1(4): 9-19.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2017.04.02      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2017/V1/I4/9
  查询流行度分布中的波峰类别
  查询流行度分布中的波峰形状
  查询流行度分布中整体趋势图
用户访问时间 用户ID 查询词 用户点击URL在
返回结果中的排名
用户点击的
顺序号
用户点击的URL
00:00:03 35804326352621896 [免费取名] 3 1 http://huaxia.wangzhan8.com/
00:00:03 07321773511158924 [欧美金发女郎] 2 4 http://a.se2222.com/Html/OPIC/index.html
00:00:03 43080219994871455 [google] 1 1 http://www.google.com/
  Sogou查询日志数据格式样
查询类别
波峰特征
信息类 导航类 事务类
波峰数 无波峰 32% 90% 36%
一个波峰 59% 10% 36%
多个波峰 9% 0% 28%
周期性 No 8% 0 18%
Yes 1% 0 10%
波峰形状 城堡 2% 0% 1%
左帆状 6% 0% 7%
右帆状 38% 8% 3%
楔子 13% 0% 28%
整体趋势 向下 25% 0% 22%
平滑 10% 68% 23%
向上 20% 17% 45%
上升-下降 45% 15% 10%
  信息类、导航类与事务类查询在各类查询流行度分布中的比值
查询类别 AvgClickEntropy
信息类 3.31
导航类 1.78
事务类 1.17
  信息类、导航类与事务类查询的AvgClickEntropy
查询类别 t统计量的观测值
信息类与导航类 32.64*
导航类与事务类 1.04
信息类与事务类 21.21*
  信息类、导航类与事务类查询间信息需求变化差异度
查询类别 TF-IDF平均值 ShDiff平均值
信息类 0.46 0.34
导航类 0.23 0.19
事务类 0.32 0.25
  信息类、导航类与事务类查询中的$ContentChange(q)$平均值
查询类别 TF-IDF平均值 ShDiff平均值
信息类与导航类 23.10* 13.40*
导航类与事务类 0.25* 0.44*
信息类与事务类 2.45* 5.23*
  信息类、导航类与事务类查询之间随时间的网页内容变化差异度
查询类别
波峰特征
信息类 导航类 事务类
波峰数 0.02 0.11 0.23
一个波峰 1.74 0.81 1.01
多个波峰 3.52 - 2.34
周期性 Yes 5.51 - 3.28
No 3.52 3.24 2.34
波峰形状 城堡 0.09 1.54 0.09
左帆状 1.52 - 1.52
右帆状 1.52 1.48 1.50
楔子 3.12 - 2.24
整体趋势 下降 4.45 - 4.35
上升 2.53 1.70 2.31
平滑 1.12 0.71 1.13
上升-下降 5.24 2.09 4.03
  信息类、导航类与事务类查询在不同查询动态中AvgClickEntropy平均值
查询流行度类别 ContentChange(q)
(TF-IDF) (ShDiff)
信息类 导航类 事务类 信息类 导航类 事务类
波峰数 无波峰 0.10 0.09 0.20 0.41 0.18 0.35
一个波峰 0.42 0.19 0.30 0.44 0.32 0.43
多个波峰 0.49 - 0.41 0.52 - 0.44
周期性 Yes 0.44 0.32 0.34 0.43 0.20 0.27
No 0.49 - 0.45 0.57 - 0.38
波峰
形状
城堡 0.30 0.21 0.41 0.43 0.42 0.33
左帆状 0.38 - 0.42 0.35 - 0.40
右帆状 0.36 0.38 0.38 0.34 0.35 0.38
楔子 0.52 - 0.54 0.48 - 0.52
整体
趋势
平滑 0.54 0.45 0.52 0.61 0.41 0.57
下降 0.52 - 0.52 0.52 - 0.52
上升 0.32 0.27 0.31 0.42 0.30 0.42
上升-下降 0.20 0.19 0.21 0.29 0.19 0.28
  信息类、导航类与事务类查询在不同查询流行度特征中的网页内容变化情况
[1] Broder A.A Taxonomy of Web Search[J]. SIGIR Forum, 2002, 36(2) : 3-10.
[2] 伍大勇, 赵世奇, 刘挺, 等. 融合多类特征的Web查询意图识别[J]. 模式识别与人工智能, 2012, 25(3): 500-505.
doi: 10.3969/j.issn.1003-6059.2012.03.020
[2] (Wu Dayong, Zhao Shiqi, Liu Ting, et al. Identification of Query Intent via Combining Multiple Features[J]. Pattern Recognition and Artificial Intelligence, 2012,25(3): 500-505).
doi: 10.3969/j.issn.1003-6059.2012.03.020
[3] Figueroa A.Exploring Effective Features for Recognizing the User Intent Behind Web Queries[J]. Computers in Industry, 2015, 68: 162-169.
doi: 10.1016/j.compind.2015.01.005
[4] Zamora J, Mendoza M, Allende E.Query Intent Detection Based on Query Log Mining[J]. Journal of Web Engineering, 2014, 13(1): 24-52.
[5] Kulkarni A, Teevan J, Svore K M, et al.Understanding Temporal Query Dynamics[C]// Proceedings of the 4th International Conference on Web Search and Web Data Mining, Hong Kong, China. 2011.
[6] Fujii A.Modeling Anchor Text and Classifying Queries to Enhance Web Document Retrieval[C]//Proceedings of the 17th International Conference on World Wide Web. 2008.
[7] Craswell N, Hawking D, Robertson S.Effective Site Finding Using Link Anchor Information[C]// Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 2001: 250-257.
[8] Ali S, Gul S, Gorman, G E.Search Engine Effectiveness Using Query Classification: A Study[J]. Online Information Review, 2016, 4(40): 515-528.
doi: 10.1108/OIR-07-2015-0243
[9] Beitzel S M, Jensen E C, Chowdhury A, et al.Hourly Analysis of a Very Large Topically Categorized Web Query Log[C]//Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 2004: 321-328.
[10] Vlachos M, Meek C, Vagena Z. Identifying Similarities, Periodicities and Bursts for Online Search Queries[C]// Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data. 2004:131-142.
[11] Ginsberg J, Mohebbi M H, Patel R S, et al.Detecting Influenza Epidemics Using Search Engine Query Data[J]. Nature, 2009, 457(7232): 1012-1014. DOI: 10.1038/nature 07634.
[12] Adar E, Weld D, Bershad B, et al.Why We Search: Visualizing and Predicting User Behavior[C]// Proceedings of the 16th International Conference on World Wide Web. 2007: 161-170.
[13] Johansson F, Färdig T, Jethava V, et al.Intent-aware Temporal Query Modeling for Keyword Suggestion[C]// Proceedings of the 21st ACM International Confenrence on Information and Knowledge Managent. 2012: 83-86.
[14] Whilting S, McMinn A J, Jose J M. Exploring Real-Time Temporal Query Auto-Completion[C]// Proceedings of the 13th Dutch-Belgain Workshop on Information Retrieval. 2013: 12-15.
[15] Alonso O, Baeza-Yates R, Gertz G.Effectiveness of Temporal Snippets[C]//Proceedings of the 18th International Conference on World Wide Web. 2009.
[16] Berberich K, Bedathur S.Temporal Diversification of Search Results[C]// Proceedings of the SIGIR 2013 Workshop on Time-aware Information Access. 2013.
[17] Cho J, Garcia-Molina H.The Evolution of the Web and Implications for an Incremental Crawler[C]// Proceedings of the 26th International Conference on Very Large Databases. 2000.
[18] Fetterly D, Manasse M, Najork M, et al.A Large-scale Study of the Evolution of Web pages[C]// Proceedings of the 18th International Conference on World Wide Web. 2003.
[19] Ntoulas A, Cho J, Olston C.What’s New on the Web? The Evolution of the Web from a Search Engine Perspective[C]// Proceedings of the 13th International Conference on World Wide Web. 2004.
[20] Kim S J, Lee S H.An Empirical Study on the Change of Web Pages[A]// Web Technologies Research and Development[M]. Springer Berlin Heidelberg, 2004: 632-642.
[21] Cho J, Roy S, Adams R E.Page Quality: In Search of an Unbiased Web Ranking[C]// Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data. 2005: 551-562.
[22] Adar E, Teevan J, Dumais S T, et al.The Web Changes Everything: Understanding the Dynamics of Web Content[C]// Proceedings of the 2nd ACM International Conference on Web Search and Data Mining. 2009.
[23] Kausar A, Dhaka V S, Singh S K.A Novel Web Page Change Detection Approach Using SQL Server[J]. Journal of Modern Educatin and Computer Science, 2015, 9(7): 36-43.
doi: 10.5815/ijmecs.2015.09.05
[24] Alonso O, Gertz M.Clustering of Search Results Using Temporal Attributes[C]// Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Seattle, Washington, USA. 2006.
[25] Alfonseca E, Ciaramita M, Hall H, et al.Lexical Relationships from Temporal Patterns of Web Search Queries[C]//Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing. 2009.
[26] Dakka W, Gravano L, Ipeirotis P G.Answering General Time Sensitive Queries[C]//Proceedings of the ACM 17th Conference on Information and Knowledge Management. 2008.
[27] Zahedi M, Aleahmad A, Rahgozar M, et al.Time Sensitive Blog Retrieval Using Temporal Properties of Queries[J]. Journal of Information Science, 2015, 43(1): 1-19. DOI: 10.1177/0165551515618589.
doi: 10.1177/0165551515618589
[28] Elsas J, Dumais S T.Leveraging Temporal Dynamics of Document Content in Relevance Ranking[C]// Proceedings of the 3rd ACM International Conference on Web Search and Data Mining. 2010: 1-10.
[29] Syed U, Slivkins A, Mishra N.Adapting to the Shifting Intent of Search Queries[C]// Proceedings of the 23rd Annual Conference on Neural Information Processing Systems. 2009.
[30] Broder A Z, Glassman S C, Manasse M S.Syntactic Clustering of the Web[J]. Journal of Computer Networks and ISDN Systems, 1997,29(8-13): 1157-1166.
[31] Ozmutl H C, Spink A, Ozmutlu S.Analysis of Large Data Logs: An Application of Poisson Sampling on Excite Web Queries[J]. Information Processing & Management, 2002, 38(4): 473-490.
doi: 10.1016/S0306-4573(01)00043-7
[1] 张晓娟, 唐祥彬. 面向用户任务的查询推荐研究[J]. 现代图书情报技术, 2014, 30(4): 34-40.
[2] 张晓娟, 陆伟. 利用查询重构识别查询意图[J]. 现代图书情报技术, 2013, 29(1): 8-14.
[3] 张晓娟, 陆伟, 程齐凯. PLSA在图情领域专家专长识别中的应用[J]. 现代图书情报技术, 2012, 28(2): 76-81.
[4] 张晓娟, 陆伟, 周红霞. 用户查询中潜在时间意图分析及其检索建模[J]. 现代图书情报技术, 2011, (11): 38-43.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn