Please wait a minute...
Advanced Search
现代图书情报技术  2010, Vol. 26 Issue (6): 33-41    DOI: 10.11925/infotech.1003-3513.2010.06.06
  知识组织与知识管理 本期目录 | 过刊浏览 | 高级检索 |
多语言文本表示研究综述*
刘飒, 章成志
(南京理工大学信息管理系南京 210094)
Survey of Multilingual Document Representation
Liu Sa, Zhang Chengzhi
(Department of Information Management, Nanjing University of Science and Technology, Nanjing 210094, China)
全文: PDF(1080 KB)   HTML  
输出: BibTeX | EndNote (RIS)      
摘要 

 对多语言信息处理中的文本表示问题进行阐述。在分析单语言文本表示的模型和过程的基础上,说明多语言文本表示的过程,详细分类并阐述其中的各种方法,对其进行比较分析。概括多语言文本表示的特点,指出尚存在的问题,并对多语言文本表示的发展趋势进行探讨。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
刘飒
章成志
关键词 多语言文本表示跨语言信息检索潜在语义分析显式语义分析    
Abstract

This article discusses the issues of document representation in multilingual information processing. Firstly, it describes the process of multilingual document representation, introduces different methods in detail and compares their strengths and weaknesses. Then it summarizes the characteristics of multilingual document representation, and points out some existing problems.Finally, it shows some development trends of multilingual document representation.

Key wordsMultilingual document representation    Cross-language information retrieval    Latent semantic analysis    Explicit semantic analysis
收稿日期: 2010-05-26     
: 

TP391

 
基金资助:

*本文系国家自然科学基金项目“基于可比语料的多语言文本聚类研究”(项目编号:70903032)和教育部人文社会科学研究一般项目“多语领域本体自动构建研究”(项目编号:08JC870007)的研究成果之一。

通讯作者: 刘飒     E-mail: liusa321@163.com
引用本文:   
刘飒 章成志. 多语言文本表示研究综述*[J]. 现代图书情报技术, 2010, 26(6): 33-41.
Liu Sa Zhang Chengzhi. Survey of Multilingual Document Representation. New Technology of Library and Information Service, DOI:10.11925/infotech.1003-3513.2010.06.06.
链接本文:  
http://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.1003-3513.2010.06.06

[1] 冯志伟. 关于信息时代的多语言问题的一些思考[J].现代语文:下旬,2006(7):14-16.
[2] Baeza-Yates R, Ribeiro-Neto B. Modern Information Retrieval[M]. Addison Wesley,1999.
[3] Salton G, Wong A, Yang C S. A Vector Space Model for Automatic Indexing[J]. Communications of the ACM, 1975, 18(11): 613-620.
[4] Fuhr N. Probabilistic Models in Information Retrieval[J]. The Computer Journal, 1992, 35(3): 243-255.
[5] Ponte J M, Croft W B. A Language Modeling Approach to Information Retrieval[C]. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Melbourne, Australia. New York, NY, USA: ACM, 1998: 275-281.
[6] 周昭涛,卜东波,程学旗. 文本的图表示初探[J].中文信息学报,2005, 19(2): 36-43.
[7] Jin W, Srihari R K. Graph-based Text Representation and Knowledge Discovery[C]. In: Proceedings of the 2007 ACM Symposium on Applied Computing, Seoul, Korea. New York, NY, USA: ACM,2007:807-811.
[8] Jiang C, Coenen F, Sanderson R, et al. Text Classification Using Graph Mining-based Feature Extraction[A].//Research and Development in Intelligent Systems XXVI: Incorporating Applications and Innovations in Intelligent Systems XVII[M].1st Edition.Spring,2009: 21-34.
[9] Egozi O, Gabrilovich E, Markovitch S. Concept-based Feature Generation and Selection for Information Retrieval[C]. In: Proceedings of the 23rd AAAI Conference on Artificial Intelligence.2008:1133-1137.
[10] Gabrilovich E, Markovitch S. Wikipedia-based Semantic Interpretation for Natural Language Processing[J]. Journal of Artificial Intelligence Research, 2009,34(1): 443-498.
[11] 徐红绞, 王惠临. 跨语言信息检索中的查询翻译方法研究[J]. 数字图书馆论坛,2009(4): 41-46.
[12] Yang C C, Wei C P, Li K W. Cross-Lingual Thesaurus for Multilingual Knowledge Management[J].Decision Support Systems, 2008,45(3): 596-605.
[13] Sorg P,Cimiano P. Cross-Lingual Information Retrieval with Explicit Semantic Analysis[C]. In: Proceedings of CLEF 2008 Workshop, Aarhus, Denmark.2008.
[14] Abdelali A, Cowie J, Farwell D, et al. Cross-Language Information Retrieval Using Ontology[C]. In: Proceedings of TALN ’2003, Batz-sur-Mer, France. 2003.
[15] Kishida K,Ishita E. Translation Disambiguation for Cross-Language Information Retrieval Using Context-based Translation Probability[J]. Journal of Information Science, 2009,35(4):481-495.
[16] Davis M,Dunning T. A TREC Evaluation of Query Translation Methods for Multi-Lingual Text Retrieval[C]. In: Proceedings of the 4th Text Retrieval Conference, Gaithersburg, USA. 1996: 483-497.
[17] 王进,陈恩红,张振亚,等. 基于本体的跨语言信息检索模型[J].中文信息学报,2004,18(3):1-8,60.
[18] Dini L, Peters W, Liebwald D, et al. Cross-Lingual Legal Information Retrieval Using a WordNet Architecture[C]. In: Proceedings of the 10th International Conference on Artificial Intelligence and Law, Bologna, Italy. New York, NY, USA: ACM, 2005: 163-167.
[19] Schnhofen P, Benczúr A,Bíró I, et al. Performing Cross-Language Retrieval with Wikipedia[C]. In: Proceedings of CLEF 2007 Workshop, Budapest, Hungary. 2007.
[20] Ferrández S, Toral A, Ferrández , et al. Exploiting Wikipedia and EuroWordNet to Solve Cross-Lingual Question Answering[J].Information Sciences, 2009, 179(20):3473-3488.
[21] Ruiz M,Diekema A, Sheridan P. CINDOR Conceptual Interlingua Document Retrieval: TREC-8 Evaluation[C]. In: Proceedings of the 8th Text Retrieval Conference, Gaithersburg, Maryland.1999: 597-606.
[22] Gonzalo J, Verdejo F, Chugur I. Using EuroWordNet in a Concept-based Approach to Cross-Language Text Retrieval[J]. Applied Artificial Intelligence: An International Journal, 1999, 13(7): 647-678.
[23] Hasan M M, Matsumoto Y. Japanese-Chinese Cross-Language Information Retrieval: An Interlingua Approach[J]. Computational Linguistics and Chinese Language Processing, 2000, 5(2): 59-86.
[24] Korn U H, Hahn U, Markó K, et al. Crossing Languages in Text Retrieval via an Interlingua[C]. In: Proceedings of RIAO 2004- Conference Proceedings: Coupling Approaches, Coupling Media and Coupling Languages for Information Retrieval. 2004: 100-115.
[25] Guyot J, Radhouani S, Falquet G. Ontology-based Multilingual Information Retrieval[C]. In: Proceedings of CLEF 2005 Workshop, Vienna, Austria. 2005: 21-23.
[26] Deerwester S, Dumais S T, Furnas G W, et al. Indexing by Latent Semantic Analysis[J]. Journal of the American Society for Information Science, 1990, 41(6): 391-407.
[27] Dumais S T, Letsche T A, Littman M L, et al. Automatic Cross-Language Retrieval Using Latent Semantic Indexing[C]. In: Proceedings of the AAAI Symposium on Cross-Language Text and Speech Retrieval. American Association for Artificial Intelligence, 1997: 15-21.
[28] Bader B W, Chew  P A. Enhancing Multilingual Latent Semantic Analysis with Term Alignment Information[C]. In: Proceedings of the 22nd International Conference on Computational Linguistics, Manchester, United Kingdom.2008:49-56.
[29] Chew  P A, Bader B W, Abdelali A. Latent Morpho-Semantic Analysis: Multilingual Information Retrieval with Character N-grams and Mutual Information[C]. In: Proceedings of the 22nd International Conference on Computational Linguistics, Manchester, United Kingdom. 2008:129-136.
[30] Wei C, Yang C C, Lin C M. A Latent Semantic Indexing-based Approach to Multilingual Document Clustering[J]. Decision Support Systems, 2008, 45(3): 606-620.
[31] Bader B W, Chew  P A. Algebraic Techniques for Multilingual Document Clustering[C]. In: Proceedings of the 7th Text Mining Workshops, NV, USA.2009.
[32] 黄国斌, 王明文, 叶浩. 一种新的基于中间语义的跨语言信息检索模型[J]. 中文信息学报, 2009, 23(2): 77-82.
[33] Vinokourov A,Shawe-Taylor J,Cristianini N. Inferring a Semantic Representation of Text via Cross-Language Correlation Analysis[A]. //Advances of Neural Information Processing Systems 15[M]. MIT Press,2002.
[34] Li Y, Shawe-Taylor J. Using KCCA for Japanese-English Cross-Language Information Retrieval and Document Classification[J]. Journal of Intelligent Information Systems, 2006, 27 (2): 117-133.
[35] Gabrilovich E, Markovitch S. Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis[C]. In: Proceedings of the International Joint Conference on Artificial Intelligence, Hyderabad, India. 2007:1606-1611.
[36] Potthast M, Stein B, Anderka M. A Wikipedia-based Multilingual Retrieval Model[C].In: Proceedings of the 30th European Conference on IR Research, Glasgow.2008: 522-530.
[37] Sorg P, Cimiano P. An Experimental Comparison of Explicit Semantic Analysis Implementations for Cross-Language Retrieval[C].In: Proceedings of the 14th International Conference on Applications of Natural Language to Information Systems, Saarbrücken, Germany.2009.
[38] 邹小芳, 王明文, 左家莉, 等. 新的基于中间语义的多语言信息检索模型[J]. 小型微型计算机系统,2010,31(4):696-701.
[39] Chen J. A Lexical Knowledge Base Approach for English-Chinese Cross-Language Information Retrieval[J]. Journal of the American Society for Information Science and Technology, 2006, 57(2): 233-243.
[40] Cimiano P, Schultz A, Sizov S, et al. Explicit Versus Latent Concept Models for Cross-Language Information Retrieval[C]. In: Proceedings of the 21st International Joint Conference on Artificial Intelligence, Pasadena, California, USA. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.,2009:1513-1518.
[41] Anderka M, Lipka N, Stein B. Evaluating Cross-Language Explicit Semantic Analysis and Cross Querying[C]. In: Proceedings of CLEF 2009 Workshop, Corfu, Greece. 2009.
[42] Sorg P,Braun M, Nicolay D. Cross-Lingual Information Retrieval Based on Multiple Indexes[C].In: Proceedings of CLEF 2009 Workshop, Corfu, Greece. 2009.

[1] 田世海,吕德丽. 改进潜在语义分析和支持向量机算法用于突发安全事件舆情预警*[J]. 数据分析与知识发现, 2017, 1(2): 11-18.
[2] 黄名选. 基于矩阵加权关联模式的印尼中跨语言信息检索模型*[J]. 数据分析与知识发现, 2017, 1(1): 26-36.
[3] 赵夷平,毕强. 关联数据在学术资源网相似文献发现中的应用研究*[J]. 现代图书情报技术, 2016, 32(3): 41-49.
[4] 李国垒, 陈先来, 夏冬, 杨荣. 面向临床决策的电子病历文本潜在语义分析*[J]. 数据分析与知识发现, 2016, 32(3): 50-57.
[5] 吴妮, 赵捧未, 秦春秀. 基于语义分析和相似强度的微博热点发现方法[J]. 现代图书情报技术, 2015, 31(5): 57-64.
[6] 夏冬, 肖晓旦, 李国垒, 陈先来. 基于潜在语义分析的关键词-分类号对应关系研究[J]. 现代图书情报技术, 2014, 30(12): 92-96.
[7] 王嵩,代逸生,李保珍. 基于PLSA的大众标注资源主题挖掘*[J]. 现代图书情报技术, 2010, 26(3): 47-51.
[8] 张李义,张震云. 一种新的跨语言商品信息检索方法在图书搜索中的应用*[J]. 现代图书情报技术, 2010, 26(1): 9-14.
[9] 吴丹. 英汉交互式跨语言检索系统设计与实现*[J]. 现代图书情报技术, 2009, 3(2): 89-95.
[10] 郝嘉树,王惠临. 跨语言检索中统一提问式翻译与检索过程方法探讨*[J]. 现代图书情报技术, 2008, 24(4): 18-22.
[11] 吴丹 . 本体驱动的跨语言信息检索研究[J]. 现代图书情报技术, 2006, 1(5): 22-26.
[12] 王妙娅,赖茂生. 跨语言信息检索中的询问翻译方法及其研究进展[J]. 现代图书情报技术, 2005, 21(4): 37-41.
[13] 黄国才. 跨语言综合搜索引擎设计[J]. 现代图书情报技术, 2001, 17(4): 31-33.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn