Please wait a minute...
Advanced Search
数据分析与知识发现  2017, Vol. 1 Issue (1): 26-36     https://doi.org/10.11925/infotech.2096-3467.2017.01.04
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
基于矩阵加权关联模式的印尼中跨语言信息检索模型*
黄名选()
广西跨境电商智能信息处理重点实验室培育基地(广西财经学院) 南宁 530003
广西财经学院计算机系 南宁 530003
Cross Language Information Retrieval Model Based on Matrix-weighted Association Patterns Mining
Huang Mingxuan()
Guangxi Key Laboratory Cultivation Base of Cross-border E-commerce Intelligent Information Processing, Guangxi University of Finance and Economics, Nanning 530003, China
Department of Computer Science, Guangxi University of Finance and Economics, Nanning 530003, China
全文: PDF (602 KB)   HTML ( 45
输出: BibTeX | EndNote (RIS)      
摘要 

目的】针对跨语言信息检索存在的查询漂移问题, 提出一种融合用户点击下载行为与矩阵加权关联模式挖掘的印尼中跨语言信息检索模型。【方法】将矩阵加权关联模式挖掘、查询扩展以及用户点击下载行为集成应用到印尼中跨语言信息检索模型, 给出模型实现的关键技术, 即面向跨语言信息检索的矩阵加权关联模式挖掘算法、跨语言查询扩展模型以及印尼中跨语言信息检索算法。【结果】在 NTCIR-5 CLIR数据集上的实验结果表明, 该检索模型的R_prec、p@10和p@20值均达到单语言检索基准的60%以上, 比跨语言检索基准提高37%以上, 比现有基于伪相关反馈的跨语言检索算法提高28%以上。【局限】该模型实验在基于向量空间模型的跨语言检索系统中进行, 需要探讨和研究在实际搜索引擎中的具体应用。【结论】该模型能有效地减少跨语言检索中的查询漂移问题, 提高和改善印尼中跨语言检索性能, 对长查询的检索效果更好, 有较好的实际应用价值。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
黄名选
关键词 点击行为关联模式挖掘印尼中跨语言检索模型跨语言信息检索矩阵加权关联规则    
Abstract

[Objective]The purpose of this paper is to solve the query drift issue facing cross language information retrieval. It proposes a new model to retrieve Chinese documents with Indonesian queries. [Methods] The new model integrated the algorithms of matrix-weighted association patterns mining, query expansion, as well as user click-download behaviors. [Results] The R_prec, p@10 and p@20 values of the proposed model were higher than the 60% benchmark of the monolingual retrieval on the CLIR NTCIR-5 data set. These results were 37% higher than cross language retrieval baseline and 28% higher than the existing algorithms based on pseudo relevance feedback. [Limitations] The proposed model was only examined in the cross language retrieval system built with the vector space model, which needs to be done with the real world search engines. [Conclusions] The proposed model could effectively reduce query drift in cross language retrieval, and retrieve more relevant Chinese documents with Indonesian long queries.

Key wordsClick Behavior    Association Patterns Mining    Indonesian-Chinese Cross Language Retrieval Model    Cross Language Information Retrieval    Matrix-weighted Association Rule
收稿日期: 2016-09-18      出版日期: 2017-02-22
ZTFLH:  TP311  
基金资助:*本文系国家自然科学基金项目“面向东盟国家语言的基于完全加权正负模式挖掘的跨语言查询扩展研究”(项目编号: 61262028)、广西财经学院信息与统计学院开放性课题“基于矩阵加权关联模式挖掘的越汉英跨语言信息检索研究”(项目编号: 2015XK01)和广西财经学院2016年度应用统计硕士专业学位点学术研究项目“基于完全加权关联模式挖掘的中英跨语言伪相关反馈扩展研究”(项目编号: 2016TJYB05)的研究成果之一
引用本文:   
黄名选. 基于矩阵加权关联模式的印尼中跨语言信息检索模型*[J]. 数据分析与知识发现, 2017, 1(1): 26-36.
Huang Mingxuan. Cross Language Information Retrieval Model Based on Matrix-weighted Association Patterns Mining. Data Analysis and Knowledge Discovery, 2017, 1(1): 26-36.
链接本文:  
http://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2017.01.04      或      http://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2017/V1/I1/26
  基于矩阵加权关联模式挖掘的印尼中跨语言信息检索模型
查询类型 评测类型 评价指标 MRB CLRB CLRB占MRB (%) CLR_PRF CLR_PRF占MRB (%) CLR_PRF比CLRB提高(%)
TITLE Relax R_prec 0.258 0.1313 50.89 0.1278 49.53 -2.67
p@10 0.2292 0.0792 34.55 0.1083 47.25 36.74
p@20 0.1542 0.0625 40.53 0.0792 51.36 26.72
Rigid R_prec 0.1919 0.1442 75.14 0.1113 58.00 -22.82
p@10 0.1417 0.0458 32.32 0.0625 44.11 36.46
p@20 0.0979 0.0333 34.01 0.0479 48.93 43.84
DESC Relax R_prec 0.227 0.1205 53.08 0.0354 15.59 -70.62
p@10 0.2375 0.1333 56.13 0.0958 40.34 -28.13
p@20 0.1667 0.1 59.99 0.0979 58.73 -2.10
Rigid R_prec 0.1867 0.1226 65.67 0.0587 31.44 -52.12
p@10 0.15 0.0542 36.13 0.0458 30.53 -15.50
p@20 0.1063 0.0458 43.09 0.0521 49.01 13.76
  三种基准算法跨语言检索实验结果
查询类型 评测类型 评价指标 本文检索模型 本文模型占MRB (%) 本文模型比CLRB提高(%) 本文模型比CLR_PRF提高(%)
TITLE Relax R_prec 0.2355 91.28 79.36 84.27
p@10 0.1410 61.52 78.03 30.19
p@20 0.1056 68.46 68.91 33.33
Rigid R_prec 0.2176 113.39 50.90 95.51
p@10 0.0903 63.70 97.09 44.48
p@20 0.0653 66.67 96.00 36.33
DESC Relax R_prec 0.2383 104.99 97.79 573.16
p@10 0.1882 79.24 41.19 96.45
p@20 0.1424 85.41 42.38 45.45
Rigid R_prec 0.2321 124.32 89.31 295.40
p@10 0.0896 59.72 65.28 95.63
p@20 0.0764 71.87 66.81 46.64
  支持度变化时本文检索模型与基准算法的检索性能比较
查询类型 评测类型 评价指标 本文检索模型 本文模型占MRB (%) 本文模型比CLRB提高(%) 本文模型比CLR_PRF提高(%)
TITLE Relax R_prec 0.2351 91.14 79.09 83.99
p@10 0.1392 60.72 75.73 28.51
p@20 0.1021 66.21 63.36 28.91
Rigid R_prec 0.2433 126.78 68.72 118.60
p@10 0.0867 61.16 89.21 38.66
p@20 0.0633 64.70 90.21 32.23
DESC Relax R_prec 0.2295 101.09 90.44 548.25
p@10 0.1842 77.55 38.17 92.25
p@20 0.1371 82.23 37.08 40.02
Rigid R_prec 0.2133 114.24 73.96 263.34
p@10 0.0942 62.77 73.73 105.59
p@20 0.0767 72.14 67.42 47.18
  置信度变化时本文检索模型与基准算法的检索性能比较
查询类型 评测类型 评价指标 矩阵加权支持度ms
0.5 0.55 0.6 0.65 0.7 0.75
TITLE Relax R_prec 0.2359 0.2361 0.234 0.2328 0.2318 0.2424
p@10 0.1417 0.1625 0.1417 0.1417 0.1417 0.1167
p@20 0.1042 0.1104 0.1021 0.1021 0.1000 0.1146
Rigid R_prec 0.2443 0.2443 0.2032 0.202 0.2008 0.211
p@10 0.0875 0.1083 0.0875 0.0875 0.0875 0.0833
p@20 0.0646 0.0708 0.0625 0.0625 0.0604 0.0708
DESC Relax R_prec 0.2399 0.2376 0.2367 0.2371 0.2332 0.2455
p@10 0.1875 0.1917 0.1792 0.1875 0.1875 0.1958
p@20 0.1396 0.1438 0.1458 0.1438 0.1396 0.1417
Rigid R_prec 0.2443 0.2421 0.2413 0.242 0.2056 0.2173
p@10 0.0958 0.0917 0.0875 0.0875 0.0833 0.0917
p@20 0.0771 0.0771 0.0792 0.0771 0.0729 0.075
  支持度变化时本文跨语言检索模型的检索性能(mc=0.01)
查询类型 评测类型 评价指标 矩阵加权置信度mc
0.008 0.01 0.05 0.08 0.1
TITLE Relax R_prec 0.2362 0.2359 0.2349 0.2345 0.2342
p@10 0.1417 0.1417 0.1417 0.1375 0.1333
p@20 0.1042 0.1042 0.1021 0.1 0.1
Rigid R_prec 0.2445 0.2443 0.2434 0.2425 0.2418
p@10 0.0875 0.0875 0.0875 0.0875 0.0833
p@20 0.0646 0.0646 0.0625 0.0625 0.0625
DESC Relax R_prec 0.2399 0.2394 0.2401 0.2156 0.2124
p@10 0.1875 0.1875 0.1875 0.1792 0.1792
p@20 0.1396 0.1375 0.1396 0.1354 0.1333
Rigid R_prec 0.2443 0.1402 0.2444 0.2204 0.2171
p@10 0.0958 0.0958 0.0958 0.0917 0.0917
p@20 0.0771 0.0771 0.0771 0.0771 0.075
  置信度变化时本文跨语言检索模型的检索性能(ms =0.5)
[1] Gao J F, Nie J Y, Zhang J, et al.TREC-9 CLIR Experiments at MSRCN[C]//Proceedings of the 9th Text Retrieval Evaluation Conference. 2001.
[2] 吴丹, 何大庆, 王惠临. 基于伪相关反馈的跨语言查询扩展[J]. 情报学报, 2010, 29(2): 232-239.
doi: 10.3772/j.issn.1000-0135.2010.02.006
[2] (Wu Dan, He Daqing, Wang Huilin.Cross-Language Query Expansion Using Pseudo Relevance Feedback[J]. Journal of the China Society for Scientific and Technical Information, 2010, 29(2): 232-239. )
doi: 10.3772/j.issn.1000-0135.2010.02.006
[3] 吴丹, 何大庆, 王惠临. 一种基于相关反馈的跨语言信息检索查询翻译优化技木研究[J]. 情报学报, 2012, 31(4): 398-406.
doi: 10.3772/j.issn.1000-0135.2012.04.008
[3] (Wu Dan, He Daqing, Wang Huilin.A Relevance Feedback Based Query Translation Enhancement Technique in Cross Language Information Retrieval[J]. Journal of the China Society for Scientific and Technical Information, 2012, 31(4): 398-406.)
doi: 10.3772/j.issn.1000-0135.2012.04.008
[4] Chinnakotla M K, Raman K, Bhattacharyya P.Multilingual Pseudo-relevance Feedback: Performance Study of Assisting Languages[C]//Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 2010: 1346-1356.
[5] Parton K, Gao J.Combining Signals for Cross-Lingual Relevance Feedback[C]//Proceedings of the 8th Asia Information Retrieval Societies Conference (AIRS 2012), Tianjin, China. Springer Berlin Heidelberg. 2012.
[6] Lee C J, Croft W B.Cross-Language Pseudo-Relevance Feedback Techniques for Informal Text [C]//Proceedings of the 36th European Conference on IR Research (ECIR 2014), Amsterdam, The Netherlands. Springer International Publishing, 2014.
[7] 闭剑婷, 苏一丹. 基于潜在语义分析的跨语言查询扩展方法[J]. 计算机工程, 2009, 35(10): 49-50.
[7] (Bi Jianting, Su Yidan.Expansion Method for Language-crossed Query Based on Latent Semantic Analysis[J]. Computer Engineering, 2009, 35(10): 49-50.)
[8] 魏露, 李书琴, 李伟男, 等. 跨语言查询扩展优化[J]. 计算机工程与设计, 2014, 35(8): 2785-2788, 2803.
[8] (Wei Lu, Li Shuqin, Li Weinan, et al.Optimization of Cross-language Query Expansion[J]. Computer Engineering and Design, 2014, 35(8): 2785-2803.)
[9] 宁健, 林鸿飞. 基于改进潜在语义分析的跨语言检索[J]. 中文信息学报, 2010, 24(3): 105-111.
[9] (Ning Jian, Lin Hongfei.Cross-Language Information Retrieval Based on Improved Latent Semantic Indexing[J]. Journal of Chinese Information Processing, 2010, 24(3): 105-111.)
[10] 罗远胜, 王明文, 勒中坚, 等. 跨语言信息检索中的双语主题相关模型[J]. 小型微型计算机系统, 2013, 34(12): 2758-2763.
[10] (Luo Yuansheng, Wang Mingwen, Le Zhongjian, et al.Bilingual Topic Correlation Model in Cross-lingual Information Retrieval[J]. Journal of Chinese Computer Systems, 2013, 34(12): 2758-2763.)
[11] Rahimi R, Shakery A, King I.Multilingual Information Retrieval in the Language Modeling Framework[J]. Information Retrieval Journal, 2015, 18(3): 246-281.
[12] Ganguly D, Leveling J, Jones G J F. Cross-lingual Topical Relevance Models[C]//Proceedings of the 24th International Conference on Computational Linguistics (COLING 2012). 2012.
[13] Wang X W, Zhang Q, Wang X J, et al.LDA Based PSEUDO Relevance Feedback for Cross Language Information Retrieval[C]//Proceedings of the 2nd International Conference on Cloud Computing and Intelligence Systems. IEEE, 2012.
[14] Wang X W, Wang X J, Zhang Q, et al.A Web-Based CLIR System with Cross-Lingual Topical Pseudo Relevance Feedback[C] // Proceedings of the 4th International Conference on Conference and Labs of the Evaluation Forum (CLEF) Initiative, Valencia, Spain. 2013.
[15] 王序文, 王小捷, 孙月萍. 双语主题跨语言伪相关反馈[J]. 北京邮电大学学报, 2013, 36(4): 81-84.
doi: 10.13190/jbupt.201304.81.wangxw
[15] (Wang Xuwen, Wang Xiaojie, Sun Yueping.Cross-lingual Pseudo Relevance Feedback Based on Bilingual Topics[J]. Journal of Beijing University of Posts and Telecommunications, 2013, 36(4): 81-84.)
doi: 10.13190/jbupt.201304.81.wangxw
[16] Wang X W, Zhang Q, Wang X J, et al.Cross-lingual Pseudo Relevance Feedback Based on Weak Relevant Topic Alignment[C]//Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation Shanghai, China. 2015: 529-534.
[17] 黄名选, 严小卫, 张师超. 基于矩阵加权关联规则挖掘的伪相关反馈查询扩展[J]. 软件学报, 2009, 20(7): 1854-1865.
doi: 10.3724/SP.J.1001.2009.03368
[17] (Huang Mingxuan, Yan Xiaowei, Zhang Shichao.Query Expansion of Pseudo Relevance Feedback Based on Matrix-Weighted Association Rules Mining[J]. Journal of Software, 2009, 20(7): 1854-1865.)
doi: 10.3724/SP.J.1001.2009.03368
[18] Agrawal R, Imielinski T, Swami A.Mining Association Rules Between Sets of Items in Large Database[C]//Proceedings of 1993 ACM SIGMOD International Conference on Management of Data. 1993.
[19] Salton G, Buckley C.Term-weighting Approaches in Automatic Text Retrieval[J]. Information Processing & Management, 1988, 24(5): 513-523.
doi: 10.1016/0306-4573(88)90021-0
[1] 刘飒 章成志. 多语言文本表示研究综述*[J]. 现代图书情报技术, 2010, 26(6): 33-41.
[2] 张李义,张震云. 一种新的跨语言商品信息检索方法在图书搜索中的应用*[J]. 现代图书情报技术, 2010, 26(1): 9-14.
[3] 吴丹. 英汉交互式跨语言检索系统设计与实现*[J]. 现代图书情报技术, 2009, 3(2): 89-95.
[4] 郝嘉树,王惠临. 跨语言检索中统一提问式翻译与检索过程方法探讨*[J]. 现代图书情报技术, 2008, 24(4): 18-22.
[5] 吴丹 . 本体驱动的跨语言信息检索研究[J]. 现代图书情报技术, 2006, 1(5): 22-26.
[6] 王妙娅,赖茂生. 跨语言信息检索中的询问翻译方法及其研究进展[J]. 现代图书情报技术, 2005, 21(4): 37-41.
[7] 黄国才. 跨语言综合搜索引擎设计[J]. 现代图书情报技术, 2001, 17(4): 31-33.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn