|
|
Research on the Framework of a User Intent-oriented Intelligent Search Engine |
Zheng Wei1, Liang Zhanping1,2, Liang Jian3 |
1 Department of Information Management, Peking University, Beijing 100871, China;
2 Institute of Scientific & Technical Information of China, Beijing 100038, China;
3 Information Center of Ministry of Science and Technology, Beijing 100038, China |
|
|
Abstract [Objective] This paper proposes a framework of the intent-oriented intelligent search engine system, and studies the key content ranking algorithm in detail. [Methods] This paper reinvents the search engine algorithms based on the user search intent in three aspects, i.e., content storage, content retrieval and content ranking, and considers multiple factors in the content ranking algorithm, including relevance, reliability, variety and hotness of the content. [Results] Experiments indicate that the relavence of the search results from the intent-based intelligent search algorithm has stably better performance which dominates the traditional keywords-based algorithm. [Limitations] Building intelligent search engine is so complicated that there are still many technical and engineering problems to resolve. Much more experiments need to be conducted to futher verify and improve the content ranking algorithm. [Conclusions] This research lays a foundation of building the next generation intent-oriented intelligent search engine.
|
Received: 29 September 2013
Published: 15 April 2014
|
|
[1] 李子臣. 搜索技术的现状及发展前景[J]. 情报科学, 2007, 25(7): 1114-1120.(Li Zichen. The Present Situation and the Development Foreground of Seeking Technique [J]. Information Science, 2007, 25(7): 1114-1120.)
[2] Vise D A, Malseed M. The Google Story [M]. New York: Delacorte Press, 2005.
[3] Brin S, Page L. The Anatomy of a Large-scale Hypertextual Web Search Engine [J]. Computer Networks and ISDN Systems, 1998, 30 (1-7): 107-117.
[4] Page L, Brin S, Motwani R, et al. The PageRank Citation Ranking: Bringing Order to the Web [EB/OL]. [2013-08-08]. http://ilpubs.stanford.edu:8090/422.
[5] 张立彬, 杨军花, 杨琴茹. 第三代搜索引擎的研究现状及其发展趋向探析[J]. 情报理论与实践, 2008, 31(5): 785-789.(Zhang Libin, Yang Junhua, Yang Qinru. Probe into the Research Status and Developing Trend of the Third Generation Search Engines [J]. Information Studies: Theory and Application, 2008, 31(5): 785-789.)
[6] 傅欣. 第三代搜索引擎的智能化趋势研究[J]. 现代图书情报技术, 2002(6): 28-30. (Fu Xin. Studies on Intelligent Trends in Third Generation Search Engines [J]. New Technology of Library and Information Service, 2002(6): 28-30.)
[7] 陈林, 杨丹, 赵俊芹. 基于语义理解的智能搜索引擎研究[J]. 计算机科学, 2008, 35(6): 152-154. (Chen Lin, Yang Dan, Zhao Junqin. Research on Intelligent Search Engine Based on Semantic Comprehension [J]. Computer Science, 2008, 35(6): 152-154.)
[8] 杨艺, 周元. 基于用户查询意图识别的Web搜索优化模型[J]. 计算机科学, 2012, 39(1): 264-267. (Yang Yi, Zhou Yuan. Web Retrieval Optimization Model Based on User's Query Intention Identification [J]. Computer Science, 2012, 39(1): 264-267.)
[9] Jansen B J, Booth D L, Spink A. Determining the User Intent of Web Search Engine Queries [C]. In: Proceedings of the 16th International Conference on World Wide Web. New York: ACM, 2007: 1149-1150.
[10] 林国, 李伟超. 个性化搜索引擎中用户兴趣模型研究[J]. 软件导刊, 2012, 11(8): 26-28. (Lin Guo, Li Weichao. Research on User Profile in Personalized Search Engine [J]. Software Guide, 2012, 11(8): 26-28.)
[11] MacKay D. Information Theory, Inference, and Learning Algorithms [M]. UK: Cambridge University Press, 2003: 284-292.
[12] Rice J A. Mathematical Statistics and Data Analysis [M]. The 3rd Edition.Belmont: Thomson Brooks/Cole, 2006.
[13] Goldwater S, Griffiths T L, Johnson M. A Bayesian Framework for Word Segmentation: Exploring the Effects of Context [J]. Cognition, 2009, 112(1): 21-54.
[14] Zhang T, Ramakrishnan R, Livny M. BIRCH: An Efficient Data Clustering Method for Very Large Databases [J]. ACM SIGMOD Record, 1996, 25(2): 103-114.
[15] 陈宝林. 最优化理论与算法 [M].第2版.北京: 清华大学出版社, 2005. (Chen Baolin. Optimization Theory and Algo- rithms [M]. The 2nd Edition.Beijing: Tsinghua University Press, 2005.)
[16] 黄名选, 陈燕红. 关联规则挖掘技术研究 [J]. 情报杂志, 2008,27(4): 119-121,115. (Huang Mingxuan, Chen Yanhong. Studies on Association Rules Mining Techniques[J].Journal of Intelligence, 2008,27(4):119-121,115.)
[17] Wu H, Luk R W P, Wong K F, et al. Interpreting TF-IDF Term Weights as Making Relevance Decisions [J]. ACM Transactions on Information Systems (TOIS), 2008, 26(3): Article No.13.
[18] Tan P, Steinbach M, Kumar V. Introduction to Data Mining [M]. Boston: Pearson Addison-Wesley, 2005.
[19] Herlocker J L, Konstan J A, Terveen L G, et al. Evaluating Collaborative Filtering Recommender Systems[J].ACM Transactions on Information Systems (TOIS), 2004, 22(1): 5-53.
[20] Blei D M, Ng A Y, Jordan M I. Latent Dirichlet Allocation [J]. The Journal of Machine Learning Research, 2003,3: 993-1022.
[21] Wikipedia. Jaccard Index [EB/OL]. [2013-10-08]. http://en. wikipedia.org/wiki/Jaccard_index. |
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|