Data Analysis and Knowledge Discovery  2017, Vol. 1 Issue (1): 26-36    DOI: 10.11925/infotech.2096-3467.2017.01.04
Cross Language Information Retrieval Model Based on Matrix-weighted Association Patterns Mining
Mingxuan Huang()
Guangxi Key Laboratory Cultivation Base of Cross-border E-commerce Intelligent Information Processing, Guangxi University of Finance and Economics, Nanning 530003, China
Department of Computer Science, Guangxi University of Finance and Economics, Nanning 530003, China
[Objective]The purpose of this paper is to solve the query drift issue facing cross language information retrieval. It proposes a new model to retrieve Chinese documents with Indonesian queries. [Methods] The new model integrated the algorithms of matrix-weighted association patterns mining, query expansion, as well as user click-download behaviors. [Results] The R_prec, p@10 and p@20 values of the proposed model were higher than the 60% benchmark of the monolingual retrieval on the CLIR NTCIR-5 data set. These results were 37% higher than cross language retrieval baseline and 28% higher than the existing algorithms based on pseudo relevance feedback. [Limitations] The proposed model was only examined in the cross language retrieval system built with the vector space model, which needs to be done with the real world search engines. [Conclusions] The proposed model could effectively reduce query drift in cross language retrieval, and retrieve more relevant Chinese documents with Indonesian long queries.

Key wordsClick Behavior      Association Patterns Mining      Indonesian-Chinese Cross Language Retrieval Model      Cross Language Information Retrieval      Matrix-weighted Association Rule     
Received: 18 September 2016      Published: 22 February 2017

Cite this article:

Mingxuan Huang. Cross Language Information Retrieval Model Based on Matrix-weighted Association Patterns Mining. Data Analysis and Knowledge Discovery, 2017, 1(1): 26-36.

