The paper analyzes the query logs in March, 2007, from Sogou search engine. POS tagging is used to get the characters of high frequency POS results. Web users use nouns as primary and verbs as complementary methods in Web queries; but other parts of speech seldom appear in the queries. The empty words in natural language, such as “的”, do not appear in the high frequency POS results very often. Queries in the Web searching are different from natural language in syntax to a certain degree and they have shared characters at the same time. Web users’ use nouns to do concept-focused retrieval and keywords are still the primary method to search on the Web. The high frequency results of POS tagging partially obey the Zipf’s law.
赖茂生,屈鹏. 搜索引擎查询日志的词性标注和挖掘研究[J]. 现代图书情报技术, 2009, 25(4): 50-56.
Lai Maosheng,Qu Peng. The POS &|Mining Study on Search Engine’s Query Log. New Technology of Library and Information Service, 2009, 25(4): 50-56.
[1] 赖茂生, 屈鹏. 网络搜索中语言使用特征研究 [J]. 现代图书情报技术, 2008(7): 47-53.
[2] Jansen B J, Spink A, SarcevicT. Real Life, Real Users, and Real Needs: A Study and Analysis of User Queries on the Web [J]. Information Processing and Management, 2000, 36(2): 207-227.
[3] Spink A, Jansen B J, Wolfman D, et al. 2002. From E-sex to E-commerce: Web Search Changes [J]. IEEE Computer, 35(3): 133-135.
[4] Jansen B J, Spink A. How are We Searching the World Wide Web? A Comparison of Nine Search Engine Transaction Logs [J]. Information Processing and Management, 2006, 42(1): 248-263.
[5] Rieh S Y, Xie, H I. Analysis of Multiple Query Reformulations on the Web: the Interactive Information Retrieval Context[J]. Information Processing and Management, 2006, 42(3): 751-768.
[6] 王继民, 彭波. 搜索引擎用户点击行为分析 [J]. 情报学报, 2006, 25(2): 154-162.
[7] 王继民, 孟涛. Web搜索引擎日志挖掘研究 [R/OL] // 中国人搜索行为研究实验室年度报告2006. 北京: 北京大学信息管理系, 2006: 35-48. [2008-08-22]. http://www.searchlab.com.cn/web/thesis/thesis_151.html.
[8] 余慧佳, 刘奕群, 张敏, 等. 基于大规模日志分析的搜索引擎用户行为分析 [J]. 中文信息学报, 2007, 21(1): 109-114.
[9] 郭岩, 白硕, 杨志峰, 等. 网络日志规模分析和用户兴趣挖掘 [J]. 计算机学报, 2005, 28(9): 1483-1496.
[10] 李亚楠, 王斌. 一个中文搜索引擎的查询日志分析 [J]. 数字图书馆论坛, 2008(7): 2-10.
[11] Jurafsky D, Martin J H. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition [M]. Upper Saddle River: Prentice Hall, 2000.
[12] 中文Web信息检索论坛. 天网相关工具 [CP/OL]. (2004-12-06). [2008-07-25]. http://www.cwirf.org/.
[13] 中文自然语言处理开放平台ICTCLAS [CP/OL]. [2008-07-05]. http://www.nlp.org.cn/.
[14] 词性标记集汇总 [EB/OL]. [2008-07-25]. http://nlp.org.cn/~liuqun/research/publications/%BA%BA%D3%EF%B4%CA%D0%D4%B1%EA%BC%C7%BC%AF%B6%D4%D5%D5%B1%ED.xls.