|
|
Expanding Queries Based on Word Embedding and Expansion Terms |
Huang Mingxuan1,2( ),Jiang Caoqing1,2,Lu Shoudong2 |
1Guangxi Key Laboratory of Cross-border E-commerce Intelligent Information Processing, Guangxi University of Finance and Economics, Nanning 530003, China 2School of Information and Statistics, Guangxi University of Finance and Economics, Nanning 530003, China |
|
|
Abstract [Objective] This paper proposes a query expansion model based on the intersection of word embedding and expansion terms, aiming to reduce the mismatched words in information retrieval. [Methods] First, we trained the word embedding learning with the retrieved documents to obtain the Word Embedding Candidate Expansion Term set. Then, we examined the association rules and generated the Mining Candidate Expansion Term set. Finally, we created the final expansion term set by merging the previous two sets and expanded the queries. [Results] The MAP and P@5 of the proposed model were higher than those of the benchmark ones. Compared with the similar query expansion methods developed in recent years, the average increase of the MAP and P@5 were 0.96%-31.24% and 1.07%-13.55%, respectively. [Limitations] The proposed model needs to be examined with real world information retrieval systems. [Conclusions] The proposed model can improve the quality of expansion terms and the performance of information retrieval systems, which also reduces query topic drifting and word mismatch issues.
|
Received: 30 December 2020
Published: 06 July 2021
|
|
Fund:National Natural Science Foundation of China(61762006) |
Corresponding Authors:
Huang Mingxuan
E-mail: mingxh05@163.com
|
[1] |
Keikha A, Ensan F, Bagheri E. Query Expansion Using Pseudo Relevance Feedback on Wikipedia[J]. Journal of Intelligent Information Systems, 2018,50(3):455-478.
doi: 10.1007/s10844-017-0466-3
|
[2] |
Pan M, Huang J X, He T T, et al. A Simple Kernel Co-Occurrence-Based Enhancement for Pseudo-Relevance Feedback[J]. Journal of the Association for Information Science and Technology, 2020,71(3):264-281.
doi: 10.1002/asi.v71.3
|
[3] |
Rungsawang A, Tangpong A, Laohawee P, et al. Novel Query Expansion Technique Using Apriori Algorithm[C]// Proceedings of the 8th Text Retrieval Conference(TREC 8), 1999: 453-456.
|
[4] |
Latiri C, Haddad H, Hamrouni T. Towards an Effective Automatic Query Expansion Process Using an Association Rule Mining Approach[J]. Journal of Intelligent Information Systems, 2012,39(1):209-247.
doi: 10.1007/s10844-011-0189-9
|
[5] |
Liu C H, Qi R H, Liu Q. Query Expansion Terms Based on Positive and Negative Association Rules[C]// Proceedings of the 3rd International Conference on Information Science and Technology (ICIST). 2013: 802-808.
|
[6] |
Bouziri A, Latiri C, Gaussier E, et al. Learning Query Expansion from Association Rules Between Terms[C]// Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K). 2015: 525-530.
|
[7] |
Bouziri A, Latiri C, Gaussier E. Efficient Association Rules Selecting for Automatic Query Expansion[C]// Proceedings of the 18th International Conference on Computational Linguistics & Intelligent Text Processing (CICLing 2017). 2017: 563-574.
|
[8] |
Bouziri A, Latiri C, Gaussier E. LTR-expand: Query Expansion Model Based on Learning to Rank Association Rules[J]. Journal of Intelligent Information Systems, 2020,55:261-286.
doi: 10.1007/s10844-020-00596-8
|
[9] |
Jabri S, Dahbi A, Gadi T. A Graph-Based Approach for Text Query Expansion Using Pseudo Relevance Feedback and Association Rules Mining[J]. International Journal of Electrical & Computer Engineering, 2019,9(6):5016-5023.
|
[10] |
黄名选. 基于加权关联模式挖掘的越英跨语言查询扩展[J]. 情报学报, 2017,36(3):307-318.
|
[10] |
(Huang Mingxuan. Vietnamese-English Cross Language Query Expansion Based on Weighted Association Patterns Mining[J]. Journal of the China Society for Scientific and Technical Information, 2017,36(3):307-318.)
|
[11] |
黄名选, 严小卫, 张师超. 基于矩阵加权关联规则挖掘的伪相关反馈查询扩展[J]. 软件学报, 2009,20(7):1854-1865.
|
[11] |
(Huang Mingxuan, Yan Xiaowei, Zhang Shichao. Query Expansion of Pseudo Relevance Feedback Based on Matrix-Weighted Association Rules Mining[J]. Journal of Software, 2009,20(7):1854-1865.)
|
[12] |
黄名选. 完全加权模式挖掘与相关反馈融合的印尼汉跨语言查询扩展[J]. 小型微型计算机系统, 2017,38(8):1783-1791.
|
[12] |
(Huang Mingxuan. Indonesian-Chinese Cross Language Query Expansion Based on All-Weighted Patterns Mining and Relevance Feedback[J]. Journal of Chinese Computer Systems, 2017,38(8):1783-1791.)
|
[13] |
黄名选, 蒋曹清. 基于完全加权正负关联模式挖掘的越-英跨语言查询译后扩展[J]. 电子学报, 2018,46(12):3029-3036.
|
[13] |
(Huang Mingxuan, Jiang Caoqing. Vietnamese-English Cross Language Query Post-Translation Expansion Based on All-Weighted Positive and Negative Association Patterns Mining[J]. Acta Electronica Sinica, 2018,46(12):3029-3036.)
|
[14] |
黄名选, 蒋曹清. 基于项权值排序挖掘的跨语言查询扩展[J]. 电子学报, 2020,48(3):568-576.
|
[14] |
(Huang Mingxuan, Jiang Caoqing. Cross Language Query Expansion Based on Item Weight Sorting Mining[J]. Acta Electronica Sinica, 2020,48(3):568-576.)
|
[15] |
Zhang H R, Zhang J W, Wei X Y, et al. A New Frequent Pattern Mining Algorithm with Weighted Multiple Minimum Supports[J]. Intelligent Automation & Soft Computing, 2017,23(4):605-612.
|
[16] |
Sklar A. Fonctions de Repartition À N Dimensions Et Leurs Marges[J]. Publication de l'Institut de Statistique l'Universite Paris, 1959,8(1):229-231.
|
[17] |
Kuzi S, Shtok A, Kurland O. Query Expansion Using Word Embeddings[C]// Proceedings of the 25th ACM International Conference on Information and Knowledge Management. 2016: 1929-1932.
|
[18] |
ALMasri M, Berrut C, Chevallet J P. A Comparison of Deep Learning Based Query Expansion with Pseudo-Relevance Feedback and Mutual Information[C]// Proceedings of the 38th European Conference on IR Research. 2016: 709-715.
|
[19] |
Roy D, Ganguly D, Mitra M, et al. Word Vector Compositionality Based Relevance Feedback Using Kernel Density Estimation[C]// Proceedings of the 25th ACM International Conference on Information and Knowledge Management. 2016: 1281-1290.
|
[20] |
Li W J, Sheng W, Yu Z T. Deep Learning and Semantic Concept Spaceare Used in Query Expansion[J]. Automatic Control and Computer Sciences, 2018,52(3):175-183.
doi: 10.3103/S0146411618030082
|
[21] |
许侃, 林原, 曲忱, 等. 专利查询扩展的词向量方法研究[J]. 计算机科学与探索, 2018,12(6):972-980.
|
[21] |
(Xu Kan, Lin Yuan, Qu Chen, et al. Research on Patent Query Expansion Methods Using Word Embedding[J]. Journal of Frontiers of Computer Science and Technology, 2018,12(6):972-980.)
|
[22] |
余传明, 蔡林, 胡莎莎, 等. 基于深度学习的查询扩展研究[J]. 情报学报, 2019,38(10):1066-1077.
|
[22] |
(Yu Chuanming, Cai Lin, Hu Shasha, et al. Research on Query Expansion Based on Deep Learning[J]. Journal of the China Society for Scientific and Technical Information, 2019,38(10):1066-1077.)
|
[23] |
Mikolov T, Chen K, Corrado G, et al. Efficient Estimation of Word Representations in Vector Space[C]// Proceedings of the 1st International Conference on Learning Representations. 2013.
|
[24] |
张剑, 屈丹, 李真. 基于词向量特征的循环神经网络语言模型[J]. 模式识别与人工智能, 2015,28(4):299-305.
|
[24] |
(Zhang Jian, Qu Dan, Li Zhen. Recurrent Neural Network Language Model Based on Word Vector Features[J]. Pattern Recognition and Artificial Intelligence, 2015,28(4):299-305.)
|
[25] |
Eickhoff C, Vries A P, Collins-Thompson K. Copulas for Information Retrieval[C]// Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’13). 2013: 663-672.
|
[26] |
张书波, 张引, 张斌, 等. 基于Copulas框架的混合式查询扩展方法[J]. 计算机科学, 2016,43(6A):485-488.
|
[26] |
(Zhang Shubo, Zhang Yin, Zhang Bin, et al. Combined Query Expansion Method Based on Copulas Framework[J]. Computer Science, 2016,43(4A):485-488.)
|
[27] |
Nelson R B. An Introduction to Copulas (The 2nd Edition)[M]. New York, USA: Springer Science+Business Media, Inc., 2006.
|
[28] |
欧俊豪, 王家生, 徐漪萍, 等. 应用概率统计[M]. 第二版. 天津: 天津大学出版社, 1999.
|
[28] |
(Ou Junhao, Wang Jiasheng, Xu Yiping, et al. Applied Probability and Statistics [M]. The 2nd Edition. Tianjin: Tianjin University Press, 1999.)
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|