Expanding Queries Based on Word Embedding and Expansion Terms
Huang Mingxuan1,2(),Jiang Caoqing1,2,Lu Shoudong2
1Guangxi Key Laboratory of Cross-border E-commerce Intelligent Information Processing, Guangxi University of Finance and Economics, Nanning 530003, China 2School of Information and Statistics, Guangxi University of Finance and Economics, Nanning 530003, China
[Objective] This paper proposes a query expansion model based on the intersection of word embedding and expansion terms, aiming to reduce the mismatched words in information retrieval. [Methods] First, we trained the word embedding learning with the retrieved documents to obtain the Word Embedding Candidate Expansion Term set. Then, we examined the association rules and generated the Mining Candidate Expansion Term set. Finally, we created the final expansion term set by merging the previous two sets and expanded the queries. [Results] The MAP and P@5 of the proposed model were higher than those of the benchmark ones. Compared with the similar query expansion methods developed in recent years, the average increase of the MAP and P@5 were 0.96%-31.24% and 1.07%-13.55%, respectively. [Limitations] The proposed model needs to be examined with real world information retrieval systems. [Conclusions] The proposed model can improve the quality of expansion terms and the performance of information retrieval systems, which also reduces query topic drifting and word mismatch issues.
黄名选,蒋曹清,卢守东. 基于词嵌入与扩展词交集的查询扩展*[J]. 数据分析与知识发现, 2021, 5(6): 115-125.
Huang Mingxuan,Jiang Caoqing,Lu Shoudong. Expanding Queries Based on Word Embedding and Expansion Terms. Data Analysis and Knowledge Discovery, 2021, 5(6): 115-125.
Keikha A, Ensan F, Bagheri E. Query Expansion Using Pseudo Relevance Feedback on Wikipedia[J]. Journal of Intelligent Information Systems, 2018,50(3):455-478.
doi: 10.1007/s10844-017-0466-3
[2]
Pan M, Huang J X, He T T, et al. A Simple Kernel Co-Occurrence-Based Enhancement for Pseudo-Relevance Feedback[J]. Journal of the Association for Information Science and Technology, 2020,71(3):264-281.
doi: 10.1002/asi.v71.3
[3]
Rungsawang A, Tangpong A, Laohawee P, et al. Novel Query Expansion Technique Using Apriori Algorithm[C]// Proceedings of the 8th Text Retrieval Conference(TREC 8), 1999: 453-456.
[4]
Latiri C, Haddad H, Hamrouni T. Towards an Effective Automatic Query Expansion Process Using an Association Rule Mining Approach[J]. Journal of Intelligent Information Systems, 2012,39(1):209-247.
doi: 10.1007/s10844-011-0189-9
[5]
Liu C H, Qi R H, Liu Q. Query Expansion Terms Based on Positive and Negative Association Rules[C]// Proceedings of the 3rd International Conference on Information Science and Technology (ICIST). 2013: 802-808.
[6]
Bouziri A, Latiri C, Gaussier E, et al. Learning Query Expansion from Association Rules Between Terms[C]// Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K). 2015: 525-530.
[7]
Bouziri A, Latiri C, Gaussier E. Efficient Association Rules Selecting for Automatic Query Expansion[C]// Proceedings of the 18th International Conference on Computational Linguistics & Intelligent Text Processing (CICLing 2017). 2017: 563-574.
[8]
Bouziri A, Latiri C, Gaussier E. LTR-expand: Query Expansion Model Based on Learning to Rank Association Rules[J]. Journal of Intelligent Information Systems, 2020,55:261-286.
doi: 10.1007/s10844-020-00596-8
[9]
Jabri S, Dahbi A, Gadi T. A Graph-Based Approach for Text Query Expansion Using Pseudo Relevance Feedback and Association Rules Mining[J]. International Journal of Electrical & Computer Engineering, 2019,9(6):5016-5023.
(Huang Mingxuan. Vietnamese-English Cross Language Query Expansion Based on Weighted Association Patterns Mining[J]. Journal of the China Society for Scientific and Technical Information, 2017,36(3):307-318.)
(Huang Mingxuan, Yan Xiaowei, Zhang Shichao. Query Expansion of Pseudo Relevance Feedback Based on Matrix-Weighted Association Rules Mining[J]. Journal of Software, 2009,20(7):1854-1865.)
(Huang Mingxuan. Indonesian-Chinese Cross Language Query Expansion Based on All-Weighted Patterns Mining and Relevance Feedback[J]. Journal of Chinese Computer Systems, 2017,38(8):1783-1791.)
(Huang Mingxuan, Jiang Caoqing. Vietnamese-English Cross Language Query Post-Translation Expansion Based on All-Weighted Positive and Negative Association Patterns Mining[J]. Acta Electronica Sinica, 2018,46(12):3029-3036.)
(Huang Mingxuan, Jiang Caoqing. Cross Language Query Expansion Based on Item Weight Sorting Mining[J]. Acta Electronica Sinica, 2020,48(3):568-576.)
[15]
Zhang H R, Zhang J W, Wei X Y, et al. A New Frequent Pattern Mining Algorithm with Weighted Multiple Minimum Supports[J]. Intelligent Automation & Soft Computing, 2017,23(4):605-612.
[16]
Sklar A. Fonctions de Repartition À N Dimensions Et Leurs Marges[J]. Publication de l'Institut de Statistique l'Universite Paris, 1959,8(1):229-231.
[17]
Kuzi S, Shtok A, Kurland O. Query Expansion Using Word Embeddings[C]// Proceedings of the 25th ACM International Conference on Information and Knowledge Management. 2016: 1929-1932.
[18]
ALMasri M, Berrut C, Chevallet J P. A Comparison of Deep Learning Based Query Expansion with Pseudo-Relevance Feedback and Mutual Information[C]// Proceedings of the 38th European Conference on IR Research. 2016: 709-715.
[19]
Roy D, Ganguly D, Mitra M, et al. Word Vector Compositionality Based Relevance Feedback Using Kernel Density Estimation[C]// Proceedings of the 25th ACM International Conference on Information and Knowledge Management. 2016: 1281-1290.
[20]
Li W J, Sheng W, Yu Z T. Deep Learning and Semantic Concept Spaceare Used in Query Expansion[J]. Automatic Control and Computer Sciences, 2018,52(3):175-183.
doi: 10.3103/S0146411618030082
(Xu Kan, Lin Yuan, Qu Chen, et al. Research on Patent Query Expansion Methods Using Word Embedding[J]. Journal of Frontiers of Computer Science and Technology, 2018,12(6):972-980.)
(Yu Chuanming, Cai Lin, Hu Shasha, et al. Research on Query Expansion Based on Deep Learning[J]. Journal of the China Society for Scientific and Technical Information, 2019,38(10):1066-1077.)
[23]
Mikolov T, Chen K, Corrado G, et al. Efficient Estimation of Word Representations in Vector Space[C]// Proceedings of the 1st International Conference on Learning Representations. 2013.
(Zhang Jian, Qu Dan, Li Zhen. Recurrent Neural Network Language Model Based on Word Vector Features[J]. Pattern Recognition and Artificial Intelligence, 2015,28(4):299-305.)
[25]
Eickhoff C, Vries A P, Collins-Thompson K. Copulas for Information Retrieval[C]// Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’13). 2013: 663-672.