利用主题标引进行查询重排序

doi:10.11925/infotech.1003-3513.2014.07.07

现代图书情报技术

2014, Vol. 30

Issue (7): 48-55 https://doi.org/10.11925/infotech.1003-3513.2014.07.07

知识组织与知识管理

本期目录 | 过刊浏览 | 高级检索

利用主题标引进行查询重排序

毛进¹, 李纲¹, 操玉杰²

1. 武汉大学信息资源研究中心, 武汉430072;
2. 网易杭州网络有限公司, 杭州310052

Re-rank Retrieval Results Through Subject Indexing

Mao Jin¹, Li Gang¹, Cao Yujie²

1. Center for the Studies of Information Resources, Wuhan University, Wuhan 430072, China;
2. Net EaseHangzhou Inc., Hangzhou 310052, China

摘要
参考文献
相关文章
Metrics

全文: PDF (454 KB) HTML
输出: BibTeX | EndNote (RIS)

摘要

[目的]在伪相关反馈过程中，利用主题标引对查询结果进行重排序。[方法]借助语言模型方法，挖掘主题同与用户查询关系，将用户查询表达为主题同的概率分布，并建立主题同语言模型，进而判断主题同在文档中的权重。在此基础上，重新计算初次查询结果文档分值，进行查询重排序。[结果]本文方法能够较好地为主题同建立语言模型表示，挖掘得到主题同在文档中的权重，重排序结果相较于初次检索具有普遍性能提升。[局限]未比较挖掘主题同与文档关系的不同方法；未在不同规模、不同语言数据集中实验。[结论]挖掘主题同与用户查询关系、主题同与文档关系、进行查询重排序、能够提升查询精确度。

	服务

	把本文推荐给朋友
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	李纲
	操玉杰
	毛进

Abstract：

[Objective] This paper tries to re-rank search results with the help of subject indexing in the process of pseudo feedback.[Methods] User queries are represented with probability distributions over subject terms by mining the user query and subject term association in the manner of language modeling. The weights of subject terms in documents are calculated by incorporating the generative language models for subject terms. Then re-calculate the score of search documents in the first retrieval and re-rank the documents according to their scores.[Results] The proposed method constructs the generative langauge models for subject terms and mines weights of subject terms in documents appropriately. The re-rank results are pervasively improved over the initial retieval.[Limitations] Different methods of mining the associations between subject terms and documents are not compared. This approach doesn't test the data sets with different scales or in different languages.[Conclusions] The re-rank approach can improve the retrieval precision,which exploits the associations between user queries, documents and subject terms.

Key words： Language model Information retrieval Subject heading Subject indexing Re-rank results

收稿日期: 2014-04-09 出版日期: 2014-10-20

TP391.3

基金资助:

国家社会科学基金重大项目“智慧城市应急决策情报体系建设研究”（项目编号：13&ZD173）的研究成果之一

通讯作者: 毛进E-mail：danveno@163.com E-mail: danveno@163.com

作者简介: 作者贡献声明：毛进：设计研究方案，进行实验；李纲：提出研究思路，起草论文；操玉杰：参与研究方案设计，论文最终版修订。

引用本文:

毛进, 李纲, 操玉杰. 利用主题标引进行查询重排序[J]. 现代图书情报技术, 2014, 30(7): 48-55.
Mao Jin, Li Gang, Cao Yujie. Re-rank Retrieval Results Through Subject Indexing. New Technology of Library and Information Service, 2014, 30(7): 48-55.

链接本文:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.1003-3513.2014.07.07 或 https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2014/V30/I7/48

[1] Furnas G W, Landauer T K, Gomez L M, et al. The Vocabulary Problem in Human-system Communication[J]. Communications of the ACM, 1987, 30(11): 964-971.
[2] PubMed[EB/OL].[2013-12-09]. http://www.ncbi.nlm.nih.gov/pubmed/.
[3] Lu Z Y, Kim W, Wilbur W J. Evaluation of Query Expansion Using MeSH in PubMed[J]. Information Retrieval, 2009, 12(1): 69-80.
[4] Shin K, Han S Y. Improving Information Retrieval in MEDLINE by Modulating MeSH Term Weights[C]. In: Proceedings of the 9th International Conference on Applications of Natural Languages to Information Systems, NLDB 2004, Salford, UK. Berlin: Springer, 2004: 388-394.
[5] Jalali V, Borujerdi M R M. Information Retrieval with Concept-based Pseudo-relevance Feedback in MEDLINE[J]. Knowledge and Information Systems, 2011, 29(1): 237-248.
[6] Meij E, De Rijke M. Integrating Conceptual Knowledge into Relevance Models: A Model and Estimation Method[C]. In: Proceedings of International Conference on the Theory of Information Retrieval (ICTIR 2007). 2007.
[7] Meij E, Trieschnigg D, De Rijke M, et al. Conceptual Language Models for Domain-specific Retrieval[J]. Information Processing and Management, 2010, 46(4): 448-469.
[8] Croft W B. What do People Want from Information Retrieval[J]. D-Lib Magazine, 1995, 1(5). http://www.dlib.org/dlib/november95/11croft.html.
[9] Krestel R, Fankhauser P. Reranking Web Search Results for Diversity[J]. Information Retrieval, 2012, 15(5): 458-477.
[10] Santos R L, Macdonald C, Ounis I. On the Role of Novelty for Search Result Diversification[J]. Information Retrieval, 2012, 15(5): 478-502.
[11] Yan X, Li X, Song D. Document Re-ranking by Generality in Bio-medical Information Retrieval[A].//Web Information Systems Engineering-WISE 2005[M]. New York: Springer, 2005: 376-389.
[12] Yin X, Huang X, Li Z. Towards a Better Ranking for Biomedical Information Retrieval Using Context[C]. In: Proceedings of 2009 IEEE International Conference on Bioinformatics and Biomedicine (BIBM 2009), Washington, DC, USA, Washington D.C.: IEEE, 2009: 344-349.
[13] Sakai T, Manabe T, Koyama M. Flexible Pseudo-relevance Feedback via Selective Sampling[J]. ACM Transactions on Asian Language Information Processing, 2005, 4(2): 111-135.
[14] 周博，岑荣伟，刘奕群，等. 一种基于文档相似度的检索结果重排序方法[J]. 中文信息学报, 2010, 24(3): 19-23, 36. (Zhou Bo, Cen Rongwei, Liu Yiqun, et al. A Document Relevance Based Search Result Re-Ranking[J]. Journal of Chinese Information Processing, 2010, 24(3): 19-23, 36.)
[15] 原福永，郭丽娜，毛伟伟. 基于内部文档比较的重排序算法[J]. 现代图书情报技术，2009(11): 49-52. (Yuan Fuyong, Guo Lina, Mao Weiwei. Re-ranking Algorithm Based on the Inter-Documents Comparison[J]. New Technology of Library and Information Service, 2009(11): 49-52.)
[16] Diaz F. Regularizing Query-based Retrieval Scores[J]. Information Retrieval, 2007, 10(6): 531-562.
[17] Kurland O. Re-ranking Search Results Using Language Models of Query-specific Clusters[J]. Information Retrieval, 2009, 12(4): 437-460.
[18] Croft W B, Metzler D, Strohman T. Search Engines: Information Retrieval in Practice[M]. Reading, MA: Addison-Wesley, 2010.
[19] Kamps J. Improving Retrieval Effectiveness by Reranking Documents Based on Controlled Vocabulary[A].//Advances in Information Retrieval[M]. Berlin: Springer, 2004: 283-295.
[20] Manning C D, Raghavan P, Schütze H. Introduction to Information Retrieval[M]. Cambridge: Cambridge University Press, 2008.
[21] PubMed Tutorial[EB/OL].[2013-07-28]. http://www.nlm.nih. gov/bsd/disted/pubmedtutorial/015_030.html.
[22] Kent A, Lancour H, Daily J E. Encyclopedia of Library and Information Science[M]. Boca Raton: CRC Press, 1978.
[23] Zhang H, Smith L C, Twidale M, et al. Seeing the Wood for the Trees: Enhancing Metadata Subject Elements with Weights[J]. Information Technology and Libraries, 2011, 30(2): 75-80.
[24] Wolfram D, Zhang J. The Influence of Indexing Practices and Weighting Algorithms on Document Spaces[J]. Journal of the American Society for Information Science and Technology, 2008, 59(1): 3-11.
[25] Moens M F. Automatic Indexing and Abstracting of Document Texts[M]. Berlin: Springer, 2000.
[26] Chung E, Miksa S, Hastings S K. A Framework of Automatic Subject Term Assignment for Text Categorization: An Indexing Conception-based Approach[J]. Journal of the American Society for Information Science and Technology, 2010, 61(4): 688-699.
[27] Lu K, Mao J. Automatically Infer Subject Terms and Documents Associations Through Text Mining[C]. In: Proceedings of the 76th Annual Conference of Association for Information Science and Technology (ASIST 2013). Montreal: ASIS&T, 2013.
[28] OHSUMED Test Collection[EB/OL].[2012-12-01]. http://ir. ohsu.edu/ohsumed/ohsumed.html.
[29] The Lemur Project[EB/OL].[2012-10-13]. http://www.lemurproject.org/.

[1]	王思丽, 祝忠明, 杨恒, 刘巍. 基于模式和投影学习的领域概念上下位关系自动识别研究 [J]. 数据分析与知识发现, 0, (): 1-.
[2]	李纲, 毛进, 陈璟浩. 基于语义指纹的中文文本快速去重[J]. 现代图书情报技术, 2013, 29(9): 41-47.
[3]	卢明, 沈奎林, 邵波. Linux实现的图书馆查询机[J]. 现代图书情报技术, 2013, 29(3): 88-93.
[4]	柳佳刚,陈山,贺令亚. 基于本体和DOM相结合的Web信息抽取器[J]. 现代图书情报技术, 2009, 25(5): 44-49.
[5]	丁振国,吴宝贵,辛友强. 基于Bloom Filter的超大规模网页去重策略研究[J]. 现代图书情报技术, 2008, 24(3): 45-50.
[6]	陈红刚,庄超. 基于多协同的即时信息检索框架[J]. 现代图书情报技术, 2008, 24(2): 48-52.
[7]	白光祖,吕俊生. 基于WebSPHINX的主题搜索引擎原理研究与结构设计[J]. 现代图书情报技术, 2007, 2(11): 58-62.
[8]	周宁,陈勇跃,金大卫. 基于移动Agent的电子商务应用研究[J]. 现代图书情报技术, 2007, 2(8): 44-47.
[9]	田俊华,杨晓江. 分布式并行信息检索系统的设计与实现－基础教育资源搜索引擎个案研究[J]. 现代图书情报技术, 2007, 2(8): 76-79.
[10]	藕军,任明仑 . 搜索引擎返回结果自动抽取[J]. 现代图书情报技术, 2007, 2(2): 49-52.
[11]	孟晓明. 对称搜索技术P2P在网格资源检索中的应用*[J]. 现代图书情报技术, 2006, 1(2): 54-58.
[12]	张健,欧红. 应用正则式抽取Google网页内容[J]. 现代图书情报技术, 2005, 21(9): 50-53.
[13]	张学宏（编译）. 元搜索引擎Dogpile研究[J]. 现代图书情报技术, 2005, 21(7): 34-37.
[14]	史艳梅. 个性化服务中挖掘用户兴趣的CMPS[J]. 现代图书情报技术, 2005, 21(3): 85-87.
[15]	李勇文. OAI元数据搜索引擎的设计与实现[J]. 现代图书情报技术, 2005, 21(2): 37-39.

Viewed

Full text

Abstract

Cited

Shared

Discussed