Please wait a minute...
New Technology of Library and Information Service  2014, Vol. 30 Issue (7): 48-55    DOI: 10.11925/infotech.1003-3513.2014.07.07
Current Issue | Archive | Adv Search |
Re-rank Retrieval Results Through Subject Indexing
Mao Jin1, Li Gang1, Cao Yujie2
1. Center for the Studies of Information Resources, Wuhan University, Wuhan 430072, China;
2. Net EaseHangzhou Inc., Hangzhou 310052, China
Download:
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper tries to re-rank search results with the help of subject indexing in the process of pseudo feedback.[Methods] User queries are represented with probability distributions over subject terms by mining the user query and subject term association in the manner of language modeling. The weights of subject terms in documents are calculated by incorporating the generative language models for subject terms. Then re-calculate the score of search documents in the first retrieval and re-rank the documents according to their scores.[Results] The proposed method constructs the generative langauge models for subject terms and mines weights of subject terms in documents appropriately. The re-rank results are pervasively improved over the initial retieval.[Limitations] Different methods of mining the associations between subject terms and documents are not compared. This approach doesn't test the data sets with different scales or in different languages.[Conclusions] The re-rank approach can improve the retrieval precision,which exploits the associations between user queries, documents and subject terms.

Key wordsLanguage model      Information retrieval      Subject heading      Subject indexing      Re-rank results     
Received: 09 April 2014      Published: 20 October 2014
:  TP391.3  

Cite this article:

Mao Jin, Li Gang, Cao Yujie. Re-rank Retrieval Results Through Subject Indexing. New Technology of Library and Information Service, 2014, 30(7): 48-55.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2014.07.07     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2014/V30/I7/48

[1] Furnas G W, Landauer T K, Gomez L M, et al. The Vocabulary Problem in Human-system Communication[J]. Communications of the ACM, 1987, 30(11): 964-971.
[2] PubMed[EB/OL].[2013-12-09]. http://www.ncbi.nlm.nih.gov/pubmed/.
[3] Lu Z Y, Kim W, Wilbur W J. Evaluation of Query Expansion Using MeSH in PubMed[J]. Information Retrieval, 2009, 12(1): 69-80.
[4] Shin K, Han S Y. Improving Information Retrieval in MEDLINE by Modulating MeSH Term Weights[C]. In: Proceedings of the 9th International Conference on Applications of Natural Languages to Information Systems, NLDB 2004, Salford, UK. Berlin: Springer, 2004: 388-394.
[5] Jalali V, Borujerdi M R M. Information Retrieval with Concept-based Pseudo-relevance Feedback in MEDLINE[J]. Knowledge and Information Systems, 2011, 29(1): 237-248.
[6] Meij E, De Rijke M. Integrating Conceptual Knowledge into Relevance Models: A Model and Estimation Method[C]. In: Proceedings of International Conference on the Theory of Information Retrieval (ICTIR 2007). 2007.
[7] Meij E, Trieschnigg D, De Rijke M, et al. Conceptual Language Models for Domain-specific Retrieval[J]. Information Processing and Management, 2010, 46(4): 448-469.
[8] Croft W B. What do People Want from Information Retrieval[J]. D-Lib Magazine, 1995, 1(5). http://www.dlib.org/dlib/november95/11croft.html.
[9] Krestel R, Fankhauser P. Reranking Web Search Results for Diversity[J]. Information Retrieval, 2012, 15(5): 458-477.
[10] Santos R L, Macdonald C, Ounis I. On the Role of Novelty for Search Result Diversification[J]. Information Retrieval, 2012, 15(5): 478-502.
[11] Yan X, Li X, Song D. Document Re-ranking by Generality in Bio-medical Information Retrieval[A].//Web Information Systems Engineering-WISE 2005[M]. New York: Springer, 2005: 376-389.
[12] Yin X, Huang X, Li Z. Towards a Better Ranking for Biomedical Information Retrieval Using Context[C]. In: Proceedings of 2009 IEEE International Conference on Bioinformatics and Biomedicine (BIBM 2009), Washington, DC, USA, Washington D.C.: IEEE, 2009: 344-349.
[13] Sakai T, Manabe T, Koyama M. Flexible Pseudo-relevance Feedback via Selective Sampling[J]. ACM Transactions on Asian Language Information Processing, 2005, 4(2): 111-135.
[14] 周博,岑荣伟,刘奕群,等. 一种基于文档相似度的检索结果重排序方法[J]. 中文信息学报, 2010, 24(3): 19-23, 36. (Zhou Bo, Cen Rongwei, Liu Yiqun, et al. A Document Relevance Based Search Result Re-Ranking[J]. Journal of Chinese Information Processing, 2010, 24(3): 19-23, 36.)
[15] 原福永,郭丽娜,毛伟伟. 基于内部文档比较的重排序算法[J]. 现代图书情报技术,2009(11): 49-52. (Yuan Fuyong, Guo Lina, Mao Weiwei. Re-ranking Algorithm Based on the Inter-Documents Comparison[J]. New Technology of Library and Information Service, 2009(11): 49-52.)
[16] Diaz F. Regularizing Query-based Retrieval Scores[J]. Information Retrieval, 2007, 10(6): 531-562.
[17] Kurland O. Re-ranking Search Results Using Language Models of Query-specific Clusters[J]. Information Retrieval, 2009, 12(4): 437-460.
[18] Croft W B, Metzler D, Strohman T. Search Engines: Information Retrieval in Practice[M]. Reading, MA: Addison-Wesley, 2010.
[19] Kamps J. Improving Retrieval Effectiveness by Reranking Documents Based on Controlled Vocabulary[A].//Advances in Information Retrieval[M]. Berlin: Springer, 2004: 283-295.
[20] Manning C D, Raghavan P, Schütze H. Introduction to Information Retrieval[M]. Cambridge: Cambridge University Press, 2008.
[21] PubMed Tutorial[EB/OL].[2013-07-28]. http://www.nlm.nih. gov/bsd/disted/pubmedtutorial/015_030.html.
[22] Kent A, Lancour H, Daily J E. Encyclopedia of Library and Information Science[M]. Boca Raton: CRC Press, 1978.
[23] Zhang H, Smith L C, Twidale M, et al. Seeing the Wood for the Trees: Enhancing Metadata Subject Elements with Weights[J]. Information Technology and Libraries, 2011, 30(2): 75-80.
[24] Wolfram D, Zhang J. The Influence of Indexing Practices and Weighting Algorithms on Document Spaces[J]. Journal of the American Society for Information Science and Technology, 2008, 59(1): 3-11.
[25] Moens M F. Automatic Indexing and Abstracting of Document Texts[M]. Berlin: Springer, 2000.
[26] Chung E, Miksa S, Hastings S K. A Framework of Automatic Subject Term Assignment for Text Categorization: An Indexing Conception-based Approach[J]. Journal of the American Society for Information Science and Technology, 2010, 61(4): 688-699.
[27] Lu K, Mao J. Automatically Infer Subject Terms and Documents Associations Through Text Mining[C]. In: Proceedings of the 76th Annual Conference of Association for Information Science and Technology (ASIST 2013). Montreal: ASIS&T, 2013.
[28] OHSUMED Test Collection[EB/OL].[2012-12-01]. http://ir. ohsu.edu/ohsumed/ohsumed.html.
[29] The Lemur Project[EB/OL].[2012-10-13]. http://www.lemurproject.org/.

[1] Huang Mingxuan,Jiang Caoqing,Lu Shoudong. Expanding Queries Based on Word Embedding and Expansion Terms[J]. 数据分析与知识发现, 2021, 5(6): 115-125.
[2] Meng Zhen,Wang Hao,Yu Wei,Deng Sanhong,Zhang Baolong. Vocal Music Classification Based on Multi-category Feature Fusion[J]. 数据分析与知识发现, 2021, 5(5): 59-70.
[3] Wang Yizhen,Ou Shiyan,Chen Jinju. Automatic Abstracting Civil Judgment Documents with Two-Stage Procedure[J]. 数据分析与知识发现, 2021, 5(5): 104-114.
[4] Li Yueyan,Wang Hao,Deng Sanhong,Wang Wei. Research Trends of Information Retrieval——Case Study of SIGIR Conference Papers[J]. 数据分析与知识发现, 2021, 5(4): 13-24.
[5] Shen Zhuo,Li Yan. Mining User Reviews with PreLM-FT Fine-Grain Sentiment Analysis[J]. 数据分析与知识发现, 2020, 4(4): 63-71.
[6] Mingxuan Huang,Shoudong Lu,Hui Xu. Cross-Language Information Retrieval Based on Weighted Association Patterns and Rule Consequent Expansion[J]. 数据分析与知识发现, 2019, 3(9): 77-87.
[7] Sun Haixia,Wang Lei,Wu Yingjie,Hua Weina,Li Junlian. Matching Strategies for Institution Names in Literature Database[J]. 数据分析与知识发现, 2018, 2(8): 88-97.
[8] Yang Chaofan,Deng Zhonghua,Peng Xin,Liu Bin. Review of Information Retrieval Research: Case Study of Conference Papers[J]. 数据分析与知识发现, 2017, 1(7): 35-43.
[9] Zhang Xiaojuan,Han Yi. Reviews on Temporal Information Retrieval[J]. 数据分析与知识发现, 2017, 1(1): 3-15.
[10] Huang Mingxuan. Cross Language Information Retrieval Model Based on Matrix-weighted Association Patterns Mining[J]. 数据分析与知识发现, 2017, 1(1): 26-36.
[11] Ding Heng,Lu Wei. Building Standard Literature Knowledge Service System[J]. 现代图书情报技术, 2016, 32(7-8): 120-128.
[12] Ba Zhichao,Li Gang,Zhu Shiwei. Similarity Measurement of Research Interests in Semantic Network[J]. 现代图书情报技术, 2016, 32(4): 81-90.
[13] Zhu Ling,Xue Chunxiang,Zhang Chengzhi,Fu Zhu. User Tags and Microblog Posts: Case Study of Sina Weibo[J]. 现代图书情报技术, 2016, 32(3): 18-24.
[14] Heng Ding, Wei Lu. A Study on Correlation-based Cross-Modal Information Retrieval[J]. 现代图书情报技术, 2016, 32(1): 17-23.
[15] Huang Xiaoxi, Zhang Hua, Lu Bei, Wang Rongbo, Wu Ting. An Approach to Chinese Metaphor Identification Based on Word Abstractness[J]. 现代图书情报技术, 2015, 31(4): 34-40.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn