Please wait a minute...
New Technology of Library and Information Service  2010, Vol. 26 Issue (6): 33-41    DOI: 10.11925/infotech.1003-3513.2010.06.06
article Current Issue | Archive | Adv Search |
Survey of Multilingual Document Representation
Liu Sa, Zhang Chengzhi
(Department of Information Management, Nanjing University of Science and Technology, Nanjing 210094, China)
Download: PDF(1080 KB)   HTML  
Export: BibTeX | EndNote (RIS)      

This article discusses the issues of document representation in multilingual information processing. Firstly, it describes the process of multilingual document representation, introduces different methods in detail and compares their strengths and weaknesses. Then it summarizes the characteristics of multilingual document representation, and points out some existing problems.Finally, it shows some development trends of multilingual document representation.

Key wordsMultilingual document representation      Cross-language information retrieval      Latent semantic analysis      Explicit semantic analysis     
Received: 26 May 2010      Published: 26 July 2010


Corresponding Authors: Liu Sa     E-mail:

Cite this article:

Liu Sa Zhang Chengzhi. Survey of Multilingual Document Representation. New Technology of Library and Information Service, 2010, 26(6): 33-41.

URL:     OR

[1] 冯志伟. 关于信息时代的多语言问题的一些思考[J].现代语文:下旬,2006(7):14-16.
[2] Baeza-Yates R, Ribeiro-Neto B. Modern Information Retrieval[M]. Addison Wesley,1999.
[3] Salton G, Wong A, Yang C S. A Vector Space Model for Automatic Indexing[J]. Communications of the ACM, 1975, 18(11): 613-620.
[4] Fuhr N. Probabilistic Models in Information Retrieval[J]. The Computer Journal, 1992, 35(3): 243-255.
[5] Ponte J M, Croft W B. A Language Modeling Approach to Information Retrieval[C]. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Melbourne, Australia. New York, NY, USA: ACM, 1998: 275-281.
[6] 周昭涛,卜东波,程学旗. 文本的图表示初探[J].中文信息学报,2005, 19(2): 36-43.
[7] Jin W, Srihari R K. Graph-based Text Representation and Knowledge Discovery[C]. In: Proceedings of the 2007 ACM Symposium on Applied Computing, Seoul, Korea. New York, NY, USA: ACM,2007:807-811.
[8] Jiang C, Coenen F, Sanderson R, et al. Text Classification Using Graph Mining-based Feature Extraction[A].//Research and Development in Intelligent Systems XXVI: Incorporating Applications and Innovations in Intelligent Systems XVII[M].1st Edition.Spring,2009: 21-34.
[9] Egozi O, Gabrilovich E, Markovitch S. Concept-based Feature Generation and Selection for Information Retrieval[C]. In: Proceedings of the 23rd AAAI Conference on Artificial Intelligence.2008:1133-1137.
[10] Gabrilovich E, Markovitch S. Wikipedia-based Semantic Interpretation for Natural Language Processing[J]. Journal of Artificial Intelligence Research, 2009,34(1): 443-498.
[11] 徐红绞, 王惠临. 跨语言信息检索中的查询翻译方法研究[J]. 数字图书馆论坛,2009(4): 41-46.
[12] Yang C C, Wei C P, Li K W. Cross-Lingual Thesaurus for Multilingual Knowledge Management[J].Decision Support Systems, 2008,45(3): 596-605.
[13] Sorg P,Cimiano P. Cross-Lingual Information Retrieval with Explicit Semantic Analysis[C]. In: Proceedings of CLEF 2008 Workshop, Aarhus, Denmark.2008.
[14] Abdelali A, Cowie J, Farwell D, et al. Cross-Language Information Retrieval Using Ontology[C]. In: Proceedings of TALN ’2003, Batz-sur-Mer, France. 2003.
[15] Kishida K,Ishita E. Translation Disambiguation for Cross-Language Information Retrieval Using Context-based Translation Probability[J]. Journal of Information Science, 2009,35(4):481-495.
[16] Davis M,Dunning T. A TREC Evaluation of Query Translation Methods for Multi-Lingual Text Retrieval[C]. In: Proceedings of the 4th Text Retrieval Conference, Gaithersburg, USA. 1996: 483-497.
[17] 王进,陈恩红,张振亚,等. 基于本体的跨语言信息检索模型[J].中文信息学报,2004,18(3):1-8,60.
[18] Dini L, Peters W, Liebwald D, et al. Cross-Lingual Legal Information Retrieval Using a WordNet Architecture[C]. In: Proceedings of the 10th International Conference on Artificial Intelligence and Law, Bologna, Italy. New York, NY, USA: ACM, 2005: 163-167.
[19] Schnhofen P, Benczúr A,Bíró I, et al. Performing Cross-Language Retrieval with Wikipedia[C]. In: Proceedings of CLEF 2007 Workshop, Budapest, Hungary. 2007.
[20] Ferrández S, Toral A, Ferrández , et al. Exploiting Wikipedia and EuroWordNet to Solve Cross-Lingual Question Answering[J].Information Sciences, 2009, 179(20):3473-3488.
[21] Ruiz M,Diekema A, Sheridan P. CINDOR Conceptual Interlingua Document Retrieval: TREC-8 Evaluation[C]. In: Proceedings of the 8th Text Retrieval Conference, Gaithersburg, Maryland.1999: 597-606.
[22] Gonzalo J, Verdejo F, Chugur I. Using EuroWordNet in a Concept-based Approach to Cross-Language Text Retrieval[J]. Applied Artificial Intelligence: An International Journal, 1999, 13(7): 647-678.
[23] Hasan M M, Matsumoto Y. Japanese-Chinese Cross-Language Information Retrieval: An Interlingua Approach[J]. Computational Linguistics and Chinese Language Processing, 2000, 5(2): 59-86.
[24] Korn U H, Hahn U, Markó K, et al. Crossing Languages in Text Retrieval via an Interlingua[C]. In: Proceedings of RIAO 2004- Conference Proceedings: Coupling Approaches, Coupling Media and Coupling Languages for Information Retrieval. 2004: 100-115.
[25] Guyot J, Radhouani S, Falquet G. Ontology-based Multilingual Information Retrieval[C]. In: Proceedings of CLEF 2005 Workshop, Vienna, Austria. 2005: 21-23.
[26] Deerwester S, Dumais S T, Furnas G W, et al. Indexing by Latent Semantic Analysis[J]. Journal of the American Society for Information Science, 1990, 41(6): 391-407.
[27] Dumais S T, Letsche T A, Littman M L, et al. Automatic Cross-Language Retrieval Using Latent Semantic Indexing[C]. In: Proceedings of the AAAI Symposium on Cross-Language Text and Speech Retrieval. American Association for Artificial Intelligence, 1997: 15-21.
[28] Bader B W, Chew  P A. Enhancing Multilingual Latent Semantic Analysis with Term Alignment Information[C]. In: Proceedings of the 22nd International Conference on Computational Linguistics, Manchester, United Kingdom.2008:49-56.
[29] Chew  P A, Bader B W, Abdelali A. Latent Morpho-Semantic Analysis: Multilingual Information Retrieval with Character N-grams and Mutual Information[C]. In: Proceedings of the 22nd International Conference on Computational Linguistics, Manchester, United Kingdom. 2008:129-136.
[30] Wei C, Yang C C, Lin C M. A Latent Semantic Indexing-based Approach to Multilingual Document Clustering[J]. Decision Support Systems, 2008, 45(3): 606-620.
[31] Bader B W, Chew  P A. Algebraic Techniques for Multilingual Document Clustering[C]. In: Proceedings of the 7th Text Mining Workshops, NV, USA.2009.
[32] 黄国斌, 王明文, 叶浩. 一种新的基于中间语义的跨语言信息检索模型[J]. 中文信息学报, 2009, 23(2): 77-82.
[33] Vinokourov A,Shawe-Taylor J,Cristianini N. Inferring a Semantic Representation of Text via Cross-Language Correlation Analysis[A]. //Advances of Neural Information Processing Systems 15[M]. MIT Press,2002.
[34] Li Y, Shawe-Taylor J. Using KCCA for Japanese-English Cross-Language Information Retrieval and Document Classification[J]. Journal of Intelligent Information Systems, 2006, 27 (2): 117-133.
[35] Gabrilovich E, Markovitch S. Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis[C]. In: Proceedings of the International Joint Conference on Artificial Intelligence, Hyderabad, India. 2007:1606-1611.
[36] Potthast M, Stein B, Anderka M. A Wikipedia-based Multilingual Retrieval Model[C].In: Proceedings of the 30th European Conference on IR Research, Glasgow.2008: 522-530.
[37] Sorg P, Cimiano P. An Experimental Comparison of Explicit Semantic Analysis Implementations for Cross-Language Retrieval[C].In: Proceedings of the 14th International Conference on Applications of Natural Language to Information Systems, Saarbrücken, Germany.2009.
[38] 邹小芳, 王明文, 左家莉, 等. 新的基于中间语义的多语言信息检索模型[J]. 小型微型计算机系统,2010,31(4):696-701.
[39] Chen J. A Lexical Knowledge Base Approach for English-Chinese Cross-Language Information Retrieval[J]. Journal of the American Society for Information Science and Technology, 2006, 57(2): 233-243.
[40] Cimiano P, Schultz A, Sizov S, et al. Explicit Versus Latent Concept Models for Cross-Language Information Retrieval[C]. In: Proceedings of the 21st International Joint Conference on Artificial Intelligence, Pasadena, California, USA. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.,2009:1513-1518.
[41] Anderka M, Lipka N, Stein B. Evaluating Cross-Language Explicit Semantic Analysis and Cross Querying[C]. In: Proceedings of CLEF 2009 Workshop, Corfu, Greece. 2009.
[42] Sorg P,Braun M, Nicolay D. Cross-Lingual Information Retrieval Based on Multiple Indexes[C].In: Proceedings of CLEF 2009 Workshop, Corfu, Greece. 2009.

[1] Shihai Tian,Deli Lyu. An Early Warning Algorithm for Public Opinion of Safety Emergency[J]. 数据分析与知识发现, 2017, 1(2): 11-18.
[2] Xia Tian. Generating Hierarchical Paths of Chinese Text from Wikipedia[J]. 现代图书情报技术, 2016, 32(3): 25-32.
[3] Zhao Yiping,Bi Qiang. Using Linked Data to Retrieve Similar Documents from the Academic Resource Websites[J]. 现代图书情报技术, 2016, 32(3): 41-49.
[4] Li Guolei, Chen Xianlai, Xia Dong, Yang Rong. Latent Semantic Analysis of Electronic Medical Record Text for Clinical Decision Making[J]. 数据分析与知识发现, 2016, 32(3): 50-57.
[5] Wu Ni, Zhao Pengwei, Qin Chunxiu. Microblog Hotspot Detection Based on Semantic Analysis and Similarity Strength[J]. 现代图书情报技术, 2015, 31(5): 57-64.
[6] Xia Dong, Xiao Xiaodan, Li Guolei, Chen Xianlai. Research on Correspondence Between Keyword and Chinese Library Classification Based on Latent Semantic Analysis[J]. 现代图书情报技术, 2014, 30(12): 92-96.
[7] Wang Song,Dai Yisheng,Li Baozhen. Explore Network Resource Topics from Social Annotations System Based on PLSA[J]. 现代图书情报技术, 2010, 26(3): 47-51.
[8] Zhang Liyi,Zhang Zhenyun. A New Cross-Language Commodity Information Retrieval Approach in Book Searching[J]. 现代图书情报技术, 2010, 26(1): 9-14.
[9] Wu Dan. Design and Implementation of an English-Chinese Interactive Cross-Language Information Retrieval System[J]. 现代图书情报技术, 2009, 3(2): 89-95.
[10] Wang Miaoya,Lai Maosheng. Query Translation Techniques and It’s Research Development in  Cross-Language Information Retrieval[J]. 现代图书情报技术, 2005, 21(4): 37-41.
[11] Huang Guocai. Design of Cross-language Meta Search Engine[J]. 现代图书情报技术, 2001, 17(4): 31-33.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938