Please wait a minute...
Data Analysis and Knowledge Discovery  2017, Vol. 1 Issue (1): 26-36    DOI: 10.11925/infotech.2096-3467.2017.01.04
Orginal Article Current Issue | Archive | Adv Search |
Cross Language Information Retrieval Model Based on Matrix-weighted Association Patterns Mining
Huang Mingxuan()
Guangxi Key Laboratory Cultivation Base of Cross-border E-commerce Intelligent Information Processing, Guangxi University of Finance and Economics, Nanning 530003, China
Department of Computer Science, Guangxi University of Finance and Economics, Nanning 530003, China
Download: PDF (602 KB)   HTML ( 46
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective]The purpose of this paper is to solve the query drift issue facing cross language information retrieval. It proposes a new model to retrieve Chinese documents with Indonesian queries. [Methods] The new model integrated the algorithms of matrix-weighted association patterns mining, query expansion, as well as user click-download behaviors. [Results] The R_prec, p@10 and p@20 values of the proposed model were higher than the 60% benchmark of the monolingual retrieval on the CLIR NTCIR-5 data set. These results were 37% higher than cross language retrieval baseline and 28% higher than the existing algorithms based on pseudo relevance feedback. [Limitations] The proposed model was only examined in the cross language retrieval system built with the vector space model, which needs to be done with the real world search engines. [Conclusions] The proposed model could effectively reduce query drift in cross language retrieval, and retrieve more relevant Chinese documents with Indonesian long queries.

Key wordsClick Behavior      Association Patterns Mining      Indonesian-Chinese Cross Language Retrieval Model      Cross Language Information Retrieval      Matrix-weighted Association Rule     
Received: 18 September 2016      Published: 22 February 2017
:  TP311  

Cite this article:

Huang Mingxuan. Cross Language Information Retrieval Model Based on Matrix-weighted Association Patterns Mining. Data Analysis and Knowledge Discovery, 2017, 1(1): 26-36.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2017.01.04     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2017/V1/I1/26

查询类型 评测类型 评价指标 MRB CLRB CLRB占MRB (%) CLR_PRF CLR_PRF占MRB (%) CLR_PRF比CLRB提高(%)
TITLE Relax R_prec 0.258 0.1313 50.89 0.1278 49.53 -2.67
p@10 0.2292 0.0792 34.55 0.1083 47.25 36.74
p@20 0.1542 0.0625 40.53 0.0792 51.36 26.72
Rigid R_prec 0.1919 0.1442 75.14 0.1113 58.00 -22.82
p@10 0.1417 0.0458 32.32 0.0625 44.11 36.46
p@20 0.0979 0.0333 34.01 0.0479 48.93 43.84
DESC Relax R_prec 0.227 0.1205 53.08 0.0354 15.59 -70.62
p@10 0.2375 0.1333 56.13 0.0958 40.34 -28.13
p@20 0.1667 0.1 59.99 0.0979 58.73 -2.10
Rigid R_prec 0.1867 0.1226 65.67 0.0587 31.44 -52.12
p@10 0.15 0.0542 36.13 0.0458 30.53 -15.50
p@20 0.1063 0.0458 43.09 0.0521 49.01 13.76
查询类型 评测类型 评价指标 本文检索模型 本文模型占MRB (%) 本文模型比CLRB提高(%) 本文模型比CLR_PRF提高(%)
TITLE Relax R_prec 0.2355 91.28 79.36 84.27
p@10 0.1410 61.52 78.03 30.19
p@20 0.1056 68.46 68.91 33.33
Rigid R_prec 0.2176 113.39 50.90 95.51
p@10 0.0903 63.70 97.09 44.48
p@20 0.0653 66.67 96.00 36.33
DESC Relax R_prec 0.2383 104.99 97.79 573.16
p@10 0.1882 79.24 41.19 96.45
p@20 0.1424 85.41 42.38 45.45
Rigid R_prec 0.2321 124.32 89.31 295.40
p@10 0.0896 59.72 65.28 95.63
p@20 0.0764 71.87 66.81 46.64
查询类型 评测类型 评价指标 本文检索模型 本文模型占MRB (%) 本文模型比CLRB提高(%) 本文模型比CLR_PRF提高(%)
TITLE Relax R_prec 0.2351 91.14 79.09 83.99
p@10 0.1392 60.72 75.73 28.51
p@20 0.1021 66.21 63.36 28.91
Rigid R_prec 0.2433 126.78 68.72 118.60
p@10 0.0867 61.16 89.21 38.66
p@20 0.0633 64.70 90.21 32.23
DESC Relax R_prec 0.2295 101.09 90.44 548.25
p@10 0.1842 77.55 38.17 92.25
p@20 0.1371 82.23 37.08 40.02
Rigid R_prec 0.2133 114.24 73.96 263.34
p@10 0.0942 62.77 73.73 105.59
p@20 0.0767 72.14 67.42 47.18
查询类型 评测类型 评价指标 矩阵加权支持度ms
0.5 0.55 0.6 0.65 0.7 0.75
TITLE Relax R_prec 0.2359 0.2361 0.234 0.2328 0.2318 0.2424
p@10 0.1417 0.1625 0.1417 0.1417 0.1417 0.1167
p@20 0.1042 0.1104 0.1021 0.1021 0.1000 0.1146
Rigid R_prec 0.2443 0.2443 0.2032 0.202 0.2008 0.211
p@10 0.0875 0.1083 0.0875 0.0875 0.0875 0.0833
p@20 0.0646 0.0708 0.0625 0.0625 0.0604 0.0708
DESC Relax R_prec 0.2399 0.2376 0.2367 0.2371 0.2332 0.2455
p@10 0.1875 0.1917 0.1792 0.1875 0.1875 0.1958
p@20 0.1396 0.1438 0.1458 0.1438 0.1396 0.1417
Rigid R_prec 0.2443 0.2421 0.2413 0.242 0.2056 0.2173
p@10 0.0958 0.0917 0.0875 0.0875 0.0833 0.0917
p@20 0.0771 0.0771 0.0792 0.0771 0.0729 0.075
查询类型 评测类型 评价指标 矩阵加权置信度mc
0.008 0.01 0.05 0.08 0.1
TITLE Relax R_prec 0.2362 0.2359 0.2349 0.2345 0.2342
p@10 0.1417 0.1417 0.1417 0.1375 0.1333
p@20 0.1042 0.1042 0.1021 0.1 0.1
Rigid R_prec 0.2445 0.2443 0.2434 0.2425 0.2418
p@10 0.0875 0.0875 0.0875 0.0875 0.0833
p@20 0.0646 0.0646 0.0625 0.0625 0.0625
DESC Relax R_prec 0.2399 0.2394 0.2401 0.2156 0.2124
p@10 0.1875 0.1875 0.1875 0.1792 0.1792
p@20 0.1396 0.1375 0.1396 0.1354 0.1333
Rigid R_prec 0.2443 0.1402 0.2444 0.2204 0.2171
p@10 0.0958 0.0958 0.0958 0.0917 0.0917
p@20 0.0771 0.0771 0.0771 0.0771 0.075
[1] Gao J F, Nie J Y, Zhang J, et al.TREC-9 CLIR Experiments at MSRCN[C]//Proceedings of the 9th Text Retrieval Evaluation Conference. 2001.
[2] 吴丹, 何大庆, 王惠临. 基于伪相关反馈的跨语言查询扩展[J]. 情报学报, 2010, 29(2): 232-239.
doi: 10.3772/j.issn.1000-0135.2010.02.006
[2] (Wu Dan, He Daqing, Wang Huilin.Cross-Language Query Expansion Using Pseudo Relevance Feedback[J]. Journal of the China Society for Scientific and Technical Information, 2010, 29(2): 232-239. )
doi: 10.3772/j.issn.1000-0135.2010.02.006
[3] 吴丹, 何大庆, 王惠临. 一种基于相关反馈的跨语言信息检索查询翻译优化技木研究[J]. 情报学报, 2012, 31(4): 398-406.
doi: 10.3772/j.issn.1000-0135.2012.04.008
[3] (Wu Dan, He Daqing, Wang Huilin.A Relevance Feedback Based Query Translation Enhancement Technique in Cross Language Information Retrieval[J]. Journal of the China Society for Scientific and Technical Information, 2012, 31(4): 398-406.)
doi: 10.3772/j.issn.1000-0135.2012.04.008
[4] Chinnakotla M K, Raman K, Bhattacharyya P.Multilingual Pseudo-relevance Feedback: Performance Study of Assisting Languages[C]//Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 2010: 1346-1356.
[5] Parton K, Gao J.Combining Signals for Cross-Lingual Relevance Feedback[C]//Proceedings of the 8th Asia Information Retrieval Societies Conference (AIRS 2012), Tianjin, China. Springer Berlin Heidelberg. 2012.
[6] Lee C J, Croft W B.Cross-Language Pseudo-Relevance Feedback Techniques for Informal Text [C]//Proceedings of the 36th European Conference on IR Research (ECIR 2014), Amsterdam, The Netherlands. Springer International Publishing, 2014.
[7] 闭剑婷, 苏一丹. 基于潜在语义分析的跨语言查询扩展方法[J]. 计算机工程, 2009, 35(10): 49-50.
[7] (Bi Jianting, Su Yidan.Expansion Method for Language-crossed Query Based on Latent Semantic Analysis[J]. Computer Engineering, 2009, 35(10): 49-50.)
[8] 魏露, 李书琴, 李伟男, 等. 跨语言查询扩展优化[J]. 计算机工程与设计, 2014, 35(8): 2785-2788, 2803.
[8] (Wei Lu, Li Shuqin, Li Weinan, et al.Optimization of Cross-language Query Expansion[J]. Computer Engineering and Design, 2014, 35(8): 2785-2803.)
[9] 宁健, 林鸿飞. 基于改进潜在语义分析的跨语言检索[J]. 中文信息学报, 2010, 24(3): 105-111.
[9] (Ning Jian, Lin Hongfei.Cross-Language Information Retrieval Based on Improved Latent Semantic Indexing[J]. Journal of Chinese Information Processing, 2010, 24(3): 105-111.)
[10] 罗远胜, 王明文, 勒中坚, 等. 跨语言信息检索中的双语主题相关模型[J]. 小型微型计算机系统, 2013, 34(12): 2758-2763.
[10] (Luo Yuansheng, Wang Mingwen, Le Zhongjian, et al.Bilingual Topic Correlation Model in Cross-lingual Information Retrieval[J]. Journal of Chinese Computer Systems, 2013, 34(12): 2758-2763.)
[11] Rahimi R, Shakery A, King I.Multilingual Information Retrieval in the Language Modeling Framework[J]. Information Retrieval Journal, 2015, 18(3): 246-281.
[12] Ganguly D, Leveling J, Jones G J F. Cross-lingual Topical Relevance Models[C]//Proceedings of the 24th International Conference on Computational Linguistics (COLING 2012). 2012.
[13] Wang X W, Zhang Q, Wang X J, et al.LDA Based PSEUDO Relevance Feedback for Cross Language Information Retrieval[C]//Proceedings of the 2nd International Conference on Cloud Computing and Intelligence Systems. IEEE, 2012.
[14] Wang X W, Wang X J, Zhang Q, et al.A Web-Based CLIR System with Cross-Lingual Topical Pseudo Relevance Feedback[C] // Proceedings of the 4th International Conference on Conference and Labs of the Evaluation Forum (CLEF) Initiative, Valencia, Spain. 2013.
[15] 王序文, 王小捷, 孙月萍. 双语主题跨语言伪相关反馈[J]. 北京邮电大学学报, 2013, 36(4): 81-84.
doi: 10.13190/jbupt.201304.81.wangxw
[15] (Wang Xuwen, Wang Xiaojie, Sun Yueping.Cross-lingual Pseudo Relevance Feedback Based on Bilingual Topics[J]. Journal of Beijing University of Posts and Telecommunications, 2013, 36(4): 81-84.)
doi: 10.13190/jbupt.201304.81.wangxw
[16] Wang X W, Zhang Q, Wang X J, et al.Cross-lingual Pseudo Relevance Feedback Based on Weak Relevant Topic Alignment[C]//Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation Shanghai, China. 2015: 529-534.
[17] 黄名选, 严小卫, 张师超. 基于矩阵加权关联规则挖掘的伪相关反馈查询扩展[J]. 软件学报, 2009, 20(7): 1854-1865.
doi: 10.3724/SP.J.1001.2009.03368
[17] (Huang Mingxuan, Yan Xiaowei, Zhang Shichao.Query Expansion of Pseudo Relevance Feedback Based on Matrix-Weighted Association Rules Mining[J]. Journal of Software, 2009, 20(7): 1854-1865.)
doi: 10.3724/SP.J.1001.2009.03368
[18] Agrawal R, Imielinski T, Swami A.Mining Association Rules Between Sets of Items in Large Database[C]//Proceedings of 1993 ACM SIGMOD International Conference on Management of Data. 1993.
[19] Salton G, Buckley C.Term-weighting Approaches in Automatic Text Retrieval[J]. Information Processing & Management, 1988, 24(5): 513-523.
doi: 10.1016/0306-4573(88)90021-0
[1] Wu Dan . Ontology Driven Cross Language Information Retrieval[J]. 现代图书情报技术, 2006, 1(5): 22-26.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn