Please wait a minute...
New Technology of Library and Information Service  2015, Vol. 31 Issue (4): 10-17    DOI: 10.11925/infotech.1003-3513.2015.04.02
Current Issue | Archive | Adv Search |
Research on Query Topic Classification Method
Liu Feng1, Li Yu2, Lv Xueqiang2, Li Zhuo2
1 First Research Institute of the Ministry of Public Security of P.R.C, Beijing 100048, China;
2 Beijing Key Laboratory of Internet Culture and Digital Dissemination Research, Beijing Information Science and Technology University, Beijing 100101, China
Download: PDF(606 KB)   HTML  
Export: BibTeX | EndNote (RIS)      

[Objective] Expand the queries to get the query topic. [Methods] Get the query expansion text by using the pseudo-feedback technology, extract the text features and combine them by the proposed partial matching rules and vector space compression algorithm. In the end, the query topic classification can be done by the Cosine Include Angle and SVM. [Results] The precision can reach 90.34%, the recall rate is 89.34%, the F value is 89.67% and the accuracy is 89.24%. [Limitations] Online processing efficiency is not high because of expanding the queries using the searching results. [Conclusions] The proposed method is effective in query topic classification. Using the machine learning method can get the better experimental results than the Cosine Include Angle and it is significative for improving the quality of search engine.

Key wordsQuery topic classification      Pseudo feedback      Query expansion      Vector space compression algorithm     
Received: 19 September 2014      Published: 21 May 2015
:  TP391  

Cite this article:

Liu Feng, Li Yu, Lv Xueqiang, Li Zhuo. Research on Query Topic Classification Method. New Technology of Library and Information Service, 2015, 31(4): 10-17.

URL:     OR

[1] 张宇, 宋巍, 刘挺, 等. 基于URL主题的查询分类方法[J]. 计算机研究与发展, 2012, 49(6): 1298-1305. (Zhang Yu, Song Wei, Liu Ting, et al. Query Classification Based on URL Topic [J]. Journal of Computer Research and Development, 2012, 49(6): 1298-1305.)
[2] 余慧佳, 刘奕群, 张敏, 等. 基于大规模日志分析的搜索引擎用户行为分析[J]. 中文信息学报, 2007, 21(1): 109-114. (Yu Huijia, Liu Yiqun, Zhang Min, et al. Research in Search Engine User Behavior Based on Log Analysis[J]. Journal of Chinese Information Processing, 2007, 21(1): 109-114.)
[3] 付博, 赵世奇, 刘挺. Web 查询日志研究综述[J].电子学报, 2013, 40(9): 1800-1808. (Fu Bo, Zhao Shiqi, Liu Ting. Research on Analysis and Mining of Web Query Logs [J]. Acta Electronica Sinica, 2013, 40(9): 1800-1808.)
[4] Broder A. A Taxonomy of Web Search [J]. ACM SIGIR Forum, 2002, 36(2): 3-10.
[5] 陆伟, 周红霞, 张晓娟. 查询意图研究综述[J]. 中国图书馆学报, 2013, 39 (1): 100-111. (Lu Wei, Zhou Hongxia, Zhang Xiaojuan. Review of Research on Query Intent [J]. Journal of Library Science in China, 2013, 39(1): 100-111.)
[6] Shen D, Pan R, Sun J, et al. Query Enrichment for Web-query Classification [J]. ACM Transactions on Information Systems, 2006, 24(3): 320-352.
[7] Broder A Z, Fontoura M, Gabrilovich E, et al. Robust Classification of Rare Queries Using Web Knowledge[C]. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'07). New York: ACM, 2007: 231-238.
[8] Shen D, Li Y, Li X, et al. Product Query Classification[C]. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management (CIKM'09). New York: ACM, 2009: 741-750.
[9] Poli R, Healy M, Kameas A. Theory and Applications of Ontology: Computer Applications [M]. Dordrecht: Springer, 2010.
[10] Hu J, Wang G, Lochovsky F, et al. Understanding User's Query Intent with Wikipedia [C]. In: Proceedings of the 18th International Conference on World Wide Web (WWW'09). New York: ACM, 2009: 471-480.
[11] Beitzel S M, Jensen E C, Lewis D D, et al. Automatic Classification of Web Queries Using Very Large Unlabeled Query Logs [J]. ACM Transactions on Information Systems, 2007, 25(2): Article No.9.
[12] 夏火松, 刘建. 基于VSM 的文本分类挖掘算法综述[J].情报探索, 2010(9): 18-21. (Xia Huosong, Liu Jian. Review of Mining Text Classification Based on SVM [J]. Information Research, 2010 (9): 18-21.)
[13] 罗玉华, 左军, 李岩. SVM及其在文本分类中的应用[J]. 科技信息, 2010(3): 49-50. (Luo Yuhua, Zuo Jun, Li Yan. SVM and the Application in Text Classification [J]. Science and Technology Information, 2010(3): 49-50.)
[14] 郭红钰. 基于信息熵理论的特征权重算法研究[J]. 计算机工程与应用, 2013, 49(10): 140-146. (Guo Hongyu. Research on Term Weighting Algorithm Based on Information Entropy Theory [J]. Computer Engineering and Applications, 2013, 49(10): 140-146.)

[1] Huang Mingxuan, Ma Ruixing, Lan Huihong. Query Expansion Oriented Algorithm of Feature-words Frequent Itemsets Mining[J]. 现代图书情报技术, 2011, 27(4): 48-51.
[2] Feng Ping, Huang Mingxuan. Query Expansion of Pseudo Relevance Feedback Based on Feature Terms Extraction and Correlation Fusion[J]. 现代图书情报技术, 2011, 27(1): 52-56.
[3] Yang Jing,Wang Yamin. P2P Search Approach Based on Query Expansion and Node Aggregation[J]. 现代图书情报技术, 2009, (9): 51-56.
[4] Zhang Yulian ,Liu Juan,Qi Feng ,Zhou Xinglin. Mobile Query Expansion Based on Related Word Co-occurrence of Abstract and Log[J]. 现代图书情报技术, 2009, (10): 40-44.
[5] Zhang Kezhuang,Liu Youhua,Huang Fang,Li Yin . A Semantic and Personalized Query Expansion Method Based on Users’Interests[J]. 现代图书情报技术, 2008, 24(8): 48-52.
[6] Zeng Xinhong, LIN Weiming, Ming Zhong. Implementing Retrieval to OntoThesaurus and Research on Its Terminology Service[J]. 现代图书情报技术, 2008, 24(2): 8-13.
[7] Chen Yanhong,Huang Mingxuan. Query Expansion of Local Feedback Based on Improved Apriori Algorithm[J]. 现代图书情报技术, 2007, 2(9): 84-87.
[8] Nie Hui . Query Expansion & Standardization Based on Ontology[J]. 现代图书情报技术, 2007, 2(3): 35-38.
[9] Huang Mingxuan,Chen Yanhong,Zhang Shichao. Study on Query Expansion Model Based on Association Rules Mining[J]. 现代图书情报技术, 2007, 2(10): 47-51.
[10] Hang Yueqin,Yao Ying,Shen Jie . Towards Context Query Information Extraction Based on Single Document[J]. 现代图书情报技术, 2006, 1(10): 30-33.
[11] Chen Dingquan. User Relevance Feedback for Information Retrieval System[J]. 现代图书情报技术, 2002, 18(4): 33-35.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938