Please wait a minute...
Data Analysis and Knowledge Discovery  2017, Vol. 1 Issue (1): 26-36    DOI: 10.11925/infotech.2096-3467.2017.01.04
Orginal Article Current Issue | Archive | Adv Search |
Cross Language Information Retrieval Model Based on Matrix-weighted Association Patterns Mining
Mingxuan Huang()
Guangxi Key Laboratory Cultivation Base of Cross-border E-commerce Intelligent Information Processing, Guangxi University of Finance and Economics, Nanning 530003, China
Department of Computer Science, Guangxi University of Finance and Economics, Nanning 530003, China
Download: PDF(602 KB)   HTML ( 45
Export: BibTeX | EndNote (RIS)      

[Objective]The purpose of this paper is to solve the query drift issue facing cross language information retrieval. It proposes a new model to retrieve Chinese documents with Indonesian queries. [Methods] The new model integrated the algorithms of matrix-weighted association patterns mining, query expansion, as well as user click-download behaviors. [Results] The R_prec, p@10 and p@20 values of the proposed model were higher than the 60% benchmark of the monolingual retrieval on the CLIR NTCIR-5 data set. These results were 37% higher than cross language retrieval baseline and 28% higher than the existing algorithms based on pseudo relevance feedback. [Limitations] The proposed model was only examined in the cross language retrieval system built with the vector space model, which needs to be done with the real world search engines. [Conclusions] The proposed model could effectively reduce query drift in cross language retrieval, and retrieve more relevant Chinese documents with Indonesian long queries.

Key wordsClick Behavior      Association Patterns Mining      Indonesian-Chinese Cross Language Retrieval Model      Cross Language Information Retrieval      Matrix-weighted Association Rule     
Received: 18 September 2016      Published: 22 February 2017

Cite this article:

Mingxuan Huang. Cross Language Information Retrieval Model Based on Matrix-weighted Association Patterns Mining. Data Analysis and Knowledge Discovery, 2017, 1(1): 26-36.

URL:     OR

[1] Gao J F, Nie J Y, Zhang J, et al.TREC-9 CLIR Experiments at MSRCN[C]//Proceedings of the 9th Text Retrieval Evaluation Conference. 2001.
[2] 吴丹, 何大庆, 王惠临. 基于伪相关反馈的跨语言查询扩展[J]. 情报学报, 2010, 29(2): 232-239.
[2] (Wu Dan, He Daqing, Wang Huilin.Cross-Language Query Expansion Using Pseudo Relevance Feedback[J]. Journal of the China Society for Scientific and Technical Information, 2010, 29(2): 232-239. )
[3] 吴丹, 何大庆, 王惠临. 一种基于相关反馈的跨语言信息检索查询翻译优化技木研究[J]. 情报学报, 2012, 31(4): 398-406.
[3] (Wu Dan, He Daqing, Wang Huilin.A Relevance Feedback Based Query Translation Enhancement Technique in Cross Language Information Retrieval[J]. Journal of the China Society for Scientific and Technical Information, 2012, 31(4): 398-406.)
[4] Chinnakotla M K, Raman K, Bhattacharyya P.Multilingual Pseudo-relevance Feedback: Performance Study of Assisting Languages[C]//Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 2010: 1346-1356.
[5] Parton K, Gao J.Combining Signals for Cross-Lingual Relevance Feedback[C]//Proceedings of the 8th Asia Information Retrieval Societies Conference (AIRS 2012), Tianjin, China. Springer Berlin Heidelberg. 2012.
[6] Lee C J, Croft W B.Cross-Language Pseudo-Relevance Feedback Techniques for Informal Text [C]//Proceedings of the 36th European Conference on IR Research (ECIR 2014), Amsterdam, The Netherlands. Springer International Publishing, 2014.
[7] 闭剑婷, 苏一丹. 基于潜在语义分析的跨语言查询扩展方法[J]. 计算机工程, 2009, 35(10): 49-50.
[7] (Bi Jianting, Su Yidan.Expansion Method for Language-crossed Query Based on Latent Semantic Analysis[J]. Computer Engineering, 2009, 35(10): 49-50.)
[8] 魏露, 李书琴, 李伟男, 等. 跨语言查询扩展优化[J]. 计算机工程与设计, 2014, 35(8): 2785-2788, 2803.
[8] (Wei Lu, Li Shuqin, Li Weinan, et al.Optimization of Cross-language Query Expansion[J]. Computer Engineering and Design, 2014, 35(8): 2785-2803.)
[9] 宁健, 林鸿飞. 基于改进潜在语义分析的跨语言检索[J]. 中文信息学报, 2010, 24(3): 105-111.
[9] (Ning Jian, Lin Hongfei.Cross-Language Information Retrieval Based on Improved Latent Semantic Indexing[J]. Journal of Chinese Information Processing, 2010, 24(3): 105-111.)
[10] 罗远胜, 王明文, 勒中坚, 等. 跨语言信息检索中的双语主题相关模型[J]. 小型微型计算机系统, 2013, 34(12): 2758-2763.
[10] (Luo Yuansheng, Wang Mingwen, Le Zhongjian, et al.Bilingual Topic Correlation Model in Cross-lingual Information Retrieval[J]. Journal of Chinese Computer Systems, 2013, 34(12): 2758-2763.)
[11] Rahimi R, Shakery A, King I.Multilingual Information Retrieval in the Language Modeling Framework[J]. Information Retrieval Journal, 2015, 18(3): 246-281.
[12] Ganguly D, Leveling J, Jones G J F. Cross-lingual Topical Relevance Models[C]//Proceedings of the 24th International Conference on Computational Linguistics (COLING 2012). 2012.
[13] Wang X W, Zhang Q, Wang X J, et al.LDA Based PSEUDO Relevance Feedback for Cross Language Information Retrieval[C]//Proceedings of the 2nd International Conference on Cloud Computing and Intelligence Systems. IEEE, 2012.
[14] Wang X W, Wang X J, Zhang Q, et al.A Web-Based CLIR System with Cross-Lingual Topical Pseudo Relevance Feedback[C] // Proceedings of the 4th International Conference on Conference and Labs of the Evaluation Forum (CLEF) Initiative, Valencia, Spain. 2013.
[15] 王序文, 王小捷, 孙月萍. 双语主题跨语言伪相关反馈[J]. 北京邮电大学学报, 2013, 36(4): 81-84.
[15] (Wang Xuwen, Wang Xiaojie, Sun Yueping.Cross-lingual Pseudo Relevance Feedback Based on Bilingual Topics[J]. Journal of Beijing University of Posts and Telecommunications, 2013, 36(4): 81-84.)
[16] Wang X W, Zhang Q, Wang X J, et al.Cross-lingual Pseudo Relevance Feedback Based on Weak Relevant Topic Alignment[C]//Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation Shanghai, China. 2015: 529-534.
[17] 黄名选, 严小卫, 张师超. 基于矩阵加权关联规则挖掘的伪相关反馈查询扩展[J]. 软件学报, 2009, 20(7): 1854-1865.
[17] (Huang Mingxuan, Yan Xiaowei, Zhang Shichao.Query Expansion of Pseudo Relevance Feedback Based on Matrix-Weighted Association Rules Mining[J]. Journal of Software, 2009, 20(7): 1854-1865.)
[18] Agrawal R, Imielinski T, Swami A.Mining Association Rules Between Sets of Items in Large Database[C]//Proceedings of 1993 ACM SIGMOD International Conference on Management of Data. 1993.
[19] Salton G, Buckley C.Term-weighting Approaches in Automatic Text Retrieval[J]. Information Processing & Management, 1988, 24(5): 513-523.
[1] Wu Dan . Ontology Driven Cross Language Information Retrieval[J]. 现代图书情报技术, 2006, 1(5): 22-26.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938