Cross Language Information Retrieval Model Based on Matrix-weighted Association Patterns Mining
Huang Mingxuan()
Guangxi Key Laboratory Cultivation Base of Cross-border E-commerce Intelligent Information Processing, Guangxi University of Finance and Economics, Nanning 530003, China Department of Computer Science, Guangxi University of Finance and Economics, Nanning 530003, China
[Objective]The purpose of this paper is to solve the query drift issue facing cross language information retrieval. It proposes a new model to retrieve Chinese documents with Indonesian queries. [Methods] The new model integrated the algorithms of matrix-weighted association patterns mining, query expansion, as well as user click-download behaviors. [Results] The R_prec, p@10 and p@20 values of the proposed model were higher than the 60% benchmark of the monolingual retrieval on the CLIR NTCIR-5 data set. These results were 37% higher than cross language retrieval baseline and 28% higher than the existing algorithms based on pseudo relevance feedback. [Limitations] The proposed model was only examined in the cross language retrieval system built with the vector space model, which needs to be done with the real world search engines. [Conclusions] The proposed model could effectively reduce query drift in cross language retrieval, and retrieve more relevant Chinese documents with Indonesian long queries.
黄名选. 基于矩阵加权关联模式的印尼中跨语言信息检索模型*[J]. 数据分析与知识发现, 2017, 1(1): 26-36.
Huang Mingxuan. Cross Language Information Retrieval Model Based on Matrix-weighted Association Patterns Mining. Data Analysis and Knowledge Discovery, 2017, 1(1): 26-36.
(Wu Dan, He Daqing, Wang Huilin.Cross-Language Query Expansion Using Pseudo Relevance Feedback[J]. Journal of the China Society for Scientific and Technical Information, 2010, 29(2): 232-239. )
doi: 10.3772/j.issn.1000-0135.2010.02.006
(Wu Dan, He Daqing, Wang Huilin.A Relevance Feedback Based Query Translation Enhancement Technique in Cross Language Information Retrieval[J]. Journal of the China Society for Scientific and Technical Information, 2012, 31(4): 398-406.)
doi: 10.3772/j.issn.1000-0135.2012.04.008
[4]
Chinnakotla M K, Raman K, Bhattacharyya P.Multilingual Pseudo-relevance Feedback: Performance Study of Assisting Languages[C]//Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 2010: 1346-1356.
[5]
Parton K, Gao J.Combining Signals for Cross-Lingual Relevance Feedback[C]//Proceedings of the 8th Asia Information Retrieval Societies Conference (AIRS 2012), Tianjin, China. Springer Berlin Heidelberg. 2012.
[6]
Lee C J, Croft W B.Cross-Language Pseudo-Relevance Feedback Techniques for Informal Text [C]//Proceedings of the 36th European Conference on IR Research (ECIR 2014), Amsterdam, The Netherlands. Springer International Publishing, 2014.
(Ning Jian, Lin Hongfei.Cross-Language Information Retrieval Based on Improved Latent Semantic Indexing[J]. Journal of Chinese Information Processing, 2010, 24(3): 105-111.)
(Luo Yuansheng, Wang Mingwen, Le Zhongjian, et al.Bilingual Topic Correlation Model in Cross-lingual Information Retrieval[J]. Journal of Chinese Computer Systems, 2013, 34(12): 2758-2763.)
[11]
Rahimi R, Shakery A, King I.Multilingual Information Retrieval in the Language Modeling Framework[J]. Information Retrieval Journal, 2015, 18(3): 246-281.
[12]
Ganguly D, Leveling J, Jones G J F. Cross-lingual Topical Relevance Models[C]//Proceedings of the 24th International Conference on Computational Linguistics (COLING 2012). 2012.
[13]
Wang X W, Zhang Q, Wang X J, et al.LDA Based PSEUDO Relevance Feedback for Cross Language Information Retrieval[C]//Proceedings of the 2nd International Conference on Cloud Computing and Intelligence Systems. IEEE, 2012.
[14]
Wang X W, Wang X J, Zhang Q, et al.A Web-Based CLIR System with Cross-Lingual Topical Pseudo Relevance Feedback[C] // Proceedings of the 4th International Conference on Conference and Labs of the Evaluation Forum (CLEF) Initiative, Valencia, Spain. 2013.
(Wang Xuwen, Wang Xiaojie, Sun Yueping.Cross-lingual Pseudo Relevance Feedback Based on Bilingual Topics[J]. Journal of Beijing University of Posts and Telecommunications, 2013, 36(4): 81-84.)
doi: 10.13190/jbupt.201304.81.wangxw
[16]
Wang X W, Zhang Q, Wang X J, et al.Cross-lingual Pseudo Relevance Feedback Based on Weak Relevant Topic Alignment[C]//Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation Shanghai, China. 2015: 529-534.
(Huang Mingxuan, Yan Xiaowei, Zhang Shichao.Query Expansion of Pseudo Relevance Feedback Based on Matrix-Weighted Association Rules Mining[J]. Journal of Software, 2009, 20(7): 1854-1865.)
doi: 10.3724/SP.J.1001.2009.03368
[18]
Agrawal R, Imielinski T, Swami A.Mining Association Rules Between Sets of Items in Large Database[C]//Proceedings of 1993 ACM SIGMOD International Conference on Management of Data. 1993.
[19]
Salton G, Buckley C.Term-weighting Approaches in Automatic Text Retrieval[J]. Information Processing & Management, 1988, 24(5): 513-523.
doi: 10.1016/0306-4573(88)90021-0