Please wait a minute...
New Technology of Library and Information Service  2009, Vol. 25 Issue (4): 50-56    DOI: 10.11925/infotech.1003-3513.2009.04.10
article Current Issue | Archive | Adv Search |
The POS &|Mining Study on Search Engine’s Query Log
Lai Maosheng  Qu Peng
(Department of Information Management, Peking University, Beijing 100871, China)
Download: PDF(455 KB)   HTML  
Export: BibTeX | EndNote (RIS)      
Abstract  

The paper analyzes the query logs in March, 2007, from Sogou search engine. POS tagging is used to get the characters of high frequency POS results. Web users use nouns as primary and verbs as complementary methods in Web queries; but other parts of speech seldom appear in the queries. The empty words in natural language, such as “的”, do not appear in the high frequency POS results very often. Queries in the Web searching are different from natural language in syntax to a certain degree and they have shared characters at the same time. Web users’ use nouns to do concept-focused retrieval and keywords are still the primary method to search on the Web. The high frequency results of POS tagging partially obey the Zipf’s law.

Key wordsLog mining      Part-of-speech tagging      Language behavior      POS distribution      Query syntax     
Received: 16 February 2009      Published: 25 April 2009
: 

G352

 
Corresponding Authors: Qu Peng     E-mail: pqu@pku.edu.cn
About author:: Lai Maosheng,Qu Peng

Cite this article:

Lai Maosheng,Qu Peng. The POS &|Mining Study on Search Engine’s Query Log. New Technology of Library and Information Service, 2009, 25(4): 50-56.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2009.04.10     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2009/V25/I4/50

[1] 赖茂生, 屈鹏. 网络搜索中语言使用特征研究 [J]. 现代图书情报技术, 2008(7): 47-53.
[2] Jansen B J, Spink A, SarcevicT. Real Life, Real Users, and Real Needs: A Study and Analysis of User Queries on the Web [J]. Information Processing and Management, 2000, 36(2): 207-227.
[3] Spink A, Jansen B J, Wolfman D, et al. 2002. From E-sex to E-commerce: Web Search Changes [J]. IEEE Computer, 35(3): 133-135.
[4] Jansen B J, Spink A. How are We Searching the World Wide Web? A Comparison of Nine Search Engine Transaction Logs [J]. Information Processing and Management, 2006, 42(1): 248-263.
[5] Rieh S Y, Xie, H I. Analysis of Multiple Query Reformulations on the Web: the Interactive Information Retrieval Context[J]. Information Processing and Management, 2006, 42(3): 751-768.
[6] 王继民, 彭波. 搜索引擎用户点击行为分析 [J]. 情报学报, 2006, 25(2): 154-162.
[7] 王继民, 孟涛. Web搜索引擎日志挖掘研究 [R/OL] // 中国人搜索行为研究实验室年度报告2006. 北京: 北京大学信息管理系, 2006: 35-48. [2008-08-22]. http://www.searchlab.com.cn/web/thesis/thesis_151.html.
[8] 余慧佳, 刘奕群, 张敏, 等. 基于大规模日志分析的搜索引擎用户行为分析 [J]. 中文信息学报, 2007, 21(1): 109-114.
[9] 郭岩, 白硕, 杨志峰, 等. 网络日志规模分析和用户兴趣挖掘 [J]. 计算机学报, 2005, 28(9): 1483-1496.
[10] 李亚楠, 王斌. 一个中文搜索引擎的查询日志分析 [J]. 数字图书馆论坛, 2008(7): 2-10.
[11] Jurafsky D, Martin J H. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition [M]. Upper Saddle River: Prentice Hall, 2000.
[12] 中文Web信息检索论坛. 天网相关工具 [CP/OL]. (2004-12-06). [2008-07-25]. http://www.cwirf.org/.
[13] 中文自然语言处理开放平台ICTCLAS [CP/OL]. [2008-07-05]. http://www.nlp.org.cn/.
[14] 词性标记集汇总 [EB/OL]. [2008-07-25]. http://nlp.org.cn/~liuqun/research/publications/%BA%BA%D3%EF%B4%CA%D0%D4%B1%EA%BC%C7%BC%AF%B6%D4%D5%D5%B1%ED.xls.

[1] Tong Guoping, Sun Jianjun. User Behavior Analysis Based on Search Engine Log[J]. 现代图书情报技术, 2015, 31(7-8): 80-88.
[2] Gu Wei, Li Chaofan, Wang Hongjun, Xiao Shibin, Shi Shuicai. Acquisition of Synonym from Patent Query Logs[J]. 现代图书情报技术, 2015, 31(2): 24-30.
[3] Qiang Shaohua, Wu Peng. The Research of Spatial Measure of Users' Mental Model of Website Category from the View of Regional Differences[J]. 现代图书情报技术, 2015, 31(11): 68-74.
[4] Wang Jimin, Lilei Mingzi, Zhang Peng. Co-authorship Network Analysis in the Research Field of Search Engine’s Log Mining[J]. 现代图书情报技术, 2011, 27(4): 58-63.
[5] Zhu Ling, Nie Hua. Research of User’s Searching Behaviour of Library Resource Discovery Service by Log Mining[J]. 现代图书情报技术, 2011, 27(12): 74-78.
[6] Lai Maosheng,Qu Peng. Study on the Characters of Language Used in Web Searching[J]. 现代图书情报技术, 2008, 24(7): 47-53.
[7] Wang Yuanyuan,Zhong Yongheng . The Architecture of Web Log Mining System Based on SQL Server 2005[J]. 现代图书情报技术, 2006, 1(5): 58-61.
[8] Liu Shengguo. Research on Data Preprocessing Method in Web Log Mining[J]. 现代图书情报技术, 2004, 20(12): 55-57.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn