[Objective] This paper tries to re-rank search results with the help of subject indexing in the process of pseudo feedback.[Methods] User queries are represented with probability distributions over subject terms by mining the user query and subject term association in the manner of language modeling. The weights of subject terms in documents are calculated by incorporating the generative language models for subject terms. Then re-calculate the score of search documents in the first retrieval and re-rank the documents according to their scores.[Results] The proposed method constructs the generative langauge models for subject terms and mines weights of subject terms in documents appropriately. The re-rank results are pervasively improved over the initial retieval.[Limitations] Different methods of mining the associations between subject terms and documents are not compared. This approach doesn't test the data sets with different scales or in different languages.[Conclusions] The re-rank approach can improve the retrieval precision,which exploits the associations between user queries, documents and subject terms.
毛进, 李纲, 操玉杰. 利用主题标引进行查询重排序[J]. 现代图书情报技术, 2014, 30(7): 48-55.
Mao Jin, Li Gang, Cao Yujie. Re-rank Retrieval Results Through Subject Indexing. New Technology of Library and Information Service, 2014, 30(7): 48-55.
[1] Furnas G W, Landauer T K, Gomez L M, et al. The Vocabulary Problem in Human-system Communication[J]. Communications of the ACM, 1987, 30(11): 964-971.
[2] PubMed[EB/OL].[2013-12-09]. http://www.ncbi.nlm.nih.gov/pubmed/.
[3] Lu Z Y, Kim W, Wilbur W J. Evaluation of Query Expansion Using MeSH in PubMed[J]. Information Retrieval, 2009, 12(1): 69-80.
[4] Shin K, Han S Y. Improving Information Retrieval in MEDLINE by Modulating MeSH Term Weights[C]. In: Proceedings of the 9th International Conference on Applications of Natural Languages to Information Systems, NLDB 2004, Salford, UK. Berlin: Springer, 2004: 388-394.
[5] Jalali V, Borujerdi M R M. Information Retrieval with Concept-based Pseudo-relevance Feedback in MEDLINE[J]. Knowledge and Information Systems, 2011, 29(1): 237-248.
[6] Meij E, De Rijke M. Integrating Conceptual Knowledge into Relevance Models: A Model and Estimation Method[C]. In: Proceedings of International Conference on the Theory of Information Retrieval (ICTIR 2007). 2007.
[7] Meij E, Trieschnigg D, De Rijke M, et al. Conceptual Language Models for Domain-specific Retrieval[J]. Information Processing and Management, 2010, 46(4): 448-469.
[8] Croft W B. What do People Want from Information Retrieval[J]. D-Lib Magazine, 1995, 1(5). http://www.dlib.org/dlib/november95/11croft.html.
[9] Krestel R, Fankhauser P. Reranking Web Search Results for Diversity[J]. Information Retrieval, 2012, 15(5): 458-477.
[10] Santos R L, Macdonald C, Ounis I. On the Role of Novelty for Search Result Diversification[J]. Information Retrieval, 2012, 15(5): 478-502.
[11] Yan X, Li X, Song D. Document Re-ranking by Generality in Bio-medical Information Retrieval[A].//Web Information Systems Engineering-WISE 2005[M]. New York: Springer, 2005: 376-389.
[12] Yin X, Huang X, Li Z. Towards a Better Ranking for Biomedical Information Retrieval Using Context[C]. In: Proceedings of 2009 IEEE International Conference on Bioinformatics and Biomedicine (BIBM 2009), Washington, DC, USA, Washington D.C.: IEEE, 2009: 344-349.
[13] Sakai T, Manabe T, Koyama M. Flexible Pseudo-relevance Feedback via Selective Sampling[J]. ACM Transactions on Asian Language Information Processing, 2005, 4(2): 111-135.
[14] 周博,岑荣伟,刘奕群,等. 一种基于文档相似度的检索结果重排序方法[J]. 中文信息学报, 2010, 24(3): 19-23, 36. (Zhou Bo, Cen Rongwei, Liu Yiqun, et al. A Document Relevance Based Search Result Re-Ranking[J]. Journal of Chinese Information Processing, 2010, 24(3): 19-23, 36.)
[15] 原福永,郭丽娜,毛伟伟. 基于内部文档比较的重排序算法[J]. 现代图书情报技术,2009(11): 49-52. (Yuan Fuyong, Guo Lina, Mao Weiwei. Re-ranking Algorithm Based on the Inter-Documents Comparison[J]. New Technology of Library and Information Service, 2009(11): 49-52.)
[16] Diaz F. Regularizing Query-based Retrieval Scores[J]. Information Retrieval, 2007, 10(6): 531-562.
[17] Kurland O. Re-ranking Search Results Using Language Models of Query-specific Clusters[J]. Information Retrieval, 2009, 12(4): 437-460.
[18] Croft W B, Metzler D, Strohman T. Search Engines: Information Retrieval in Practice[M]. Reading, MA: Addison-Wesley, 2010.
[19] Kamps J. Improving Retrieval Effectiveness by Reranking Documents Based on Controlled Vocabulary[A].//Advances in Information Retrieval[M]. Berlin: Springer, 2004: 283-295.
[20] Manning C D, Raghavan P, Schütze H. Introduction to Information Retrieval[M]. Cambridge: Cambridge University Press, 2008.
[21] PubMed Tutorial[EB/OL].[2013-07-28]. http://www.nlm.nih. gov/bsd/disted/pubmedtutorial/015_030.html.
[22] Kent A, Lancour H, Daily J E. Encyclopedia of Library and Information Science[M]. Boca Raton: CRC Press, 1978.
[23] Zhang H, Smith L C, Twidale M, et al. Seeing the Wood for the Trees: Enhancing Metadata Subject Elements with Weights[J]. Information Technology and Libraries, 2011, 30(2): 75-80.
[24] Wolfram D, Zhang J. The Influence of Indexing Practices and Weighting Algorithms on Document Spaces[J]. Journal of the American Society for Information Science and Technology, 2008, 59(1): 3-11.
[25] Moens M F. Automatic Indexing and Abstracting of Document Texts[M]. Berlin: Springer, 2000.
[26] Chung E, Miksa S, Hastings S K. A Framework of Automatic Subject Term Assignment for Text Categorization: An Indexing Conception-based Approach[J]. Journal of the American Society for Information Science and Technology, 2010, 61(4): 688-699.
[27] Lu K, Mao J. Automatically Infer Subject Terms and Documents Associations Through Text Mining[C]. In: Proceedings of the 76th Annual Conference of Association for Information Science and Technology (ASIST 2013). Montreal: ASIS&T, 2013.
[28] OHSUMED Test Collection[EB/OL].[2012-12-01]. http://ir. ohsu.edu/ohsumed/ohsumed.html.
[29] The Lemur Project[EB/OL].[2012-10-13]. http://www.lemurproject.org/.