Please wait a minute...
New Technology of Library and Information Service  2009, Vol. Issue (10): 7-13    DOI: 10.11925/infotech.1003-3513.2009.10.02
article Current Issue | Archive | Adv Search |
Survey on Bilingual Terminology Extraction from Comparable Corpora
Kang Xiaoli1   Zhang Chengzhi1, 2   Wang Huilin1
1(Institute of Scientific & Technical Information of China, Beijing 100038, China)
2(Department of Information Management, Nanjing University of Science & Technology, Nanjing 210094,China)
Download: PDF (648 KB)  
Export: BibTeX | EndNote (RIS)      

 By comparing with extracting bilingual terminology from parallel corpora, this paper describes the value of extracting bilingual terminology from comparable corpora. It summarizes the main method and the optimization methods of implementation of bilingual terminology extraction. And some perspectives and prospects about bilingual terminology extraction based on the comparable corpus are proposed.

Key words Bilingual terminology extraction      Comparable corpora      Context vector      Vector similarity computation     
Received: 22 September 2009      Published: 25 October 2009


Corresponding Authors: Kang Xiao li     E-mail:
About author:: Kang Xiaoli,Zhang Chengzhi,Wang Huilin

Cite this article:

Kang Xiaoli,Zhang Chengzhi,Wang Huilin. Survey on Bilingual Terminology Extraction from Comparable Corpora. New Technology of Library and Information Service, 2009, (10): 7-13.

URL:     OR

[1]  Velupillai S, Dalianis H. Automatic Construction of Domain-specific Dictionaries on Sparse Parallel Corpora in the Nordic Languages[C].In: Proceedings of Workshop on Multi-source, Multilingual Information Extraction and Summarization.2008:10-16.
[2]  Miangah T M. Automatic Term Extraction for Cross-Language Information Retrieval Using a Bilingual Parallel Corpus[C].In: Proceedings of the 6th International Conference on Informatics and Systems Special Track on Natural Language Processing,Cairo,Egypt.2008:81-84.
[3]  Tatsuya Izuha.Machine Translation Using Bilingual Term Entries Extracted from Parallel Texts[J].IEIC Technical Report,2001,101(89):1-7.
[4]  Baker M.Corpora in Translation Studies: An Overview and Some Suggestions for Future Research [J].Target,1995,7(2): 223-243.
[5]  孙广范,宋金平,袁琦,等.中英可比语料库中翻译等价对抽取方法研究[J].计算机工程与应用,2007,43(32):44-48.
[6]  Translational English Corpus[EB/OL].[2009-08-17].
[7]  孙乐,金友兵,杜林,等.平行语料库中双语术语词典的自动抽取[J].中文信息学报,2000,14(6):33-39.
[8]  Oh J H, Choi K S, Isahara H. A Hybrid Model for Extracting Transliteration Equivalents from Parallel Corpora[C]. In: Proceedings of the International Conference on the Computer Processing of Oriental Languages,Singapore.2006: 222-233.
[9]  Jin C G, Na S H, Lee J H,et al. Automatic Extraction of English-Chinese Transliteration Pairs Using Dynamic Window and Tokenizer [C].In: Proceedings of the 6th SIGHAN Workshop on Chinese Language Proceeding, Hyderabad, India.2008:9-15.
[10]  Widdows D, Dorow B, Chan H K. Using Parallel Corpora to Enrich Multilingual Lexical Resources[C]. In: Proceedings of the 3rd International Conference on Language Resources and Evaluation, Las Palmas de Gran Canaria, Spain.2002: 240-245.
[11]  Lee C J, Chang J S, Jang T S R. Alignment of Bilingual Named Entities in Parallel Corpora Using Statistical Models and Multiple Knowledge Sources[J]. Association for Computing Machinery Transactions on Asian Language Information Processing,2006,5(2):121-145.
[12]  Haghigi A, Liang P,Taylor B K, et al. Learning Bilingual Lexicons from Monolingual Corpora[C].In: Proceedings of the Association for Computation Linguistics: Human Language Technology, Ohio, USA.2008:771-779.
[13]  Lefever E, Macken L, Hoste V. Language-independent Bilingual Terminology Extraction from a Multilingual Parallel Corpus[C]. In: Proceedings of  the 12th Conference of the European Chapter of the Association for Computational Linguistics,Athens,Greece.2009:496-504.
[14]  Fung P, Church K W. K-vec: A New Approach for Aligning Parallel Texts[C]. In: Proceedings of The 15th International Conference on Computational Linguistics,Kyoto,Japan.2004: 1096-1102.
[15]  Rapp R. Identifying Word Translations in Nonparallel Texts[C].In: Proceedings of the 33rd Annual Meeting on Association for Computational Linguistics, Cambridge, Massachusetts, USA.1995: 320-322.
[16]  Kumiko Tanaka,  Hideya Iwasaki. Extraction of Lexical Translations from Non-aligned Corpora[C].In: Proceedings of the 16th International Conference on Computational Linguistics, Copenhagen, Denmark.1996: 580-585.
[17]  Iram Shahzad, Kiyonori Ohtake, Shigeru Masuyama, et al. Identifying Translations of Compound Using Non-aligned Corpora[C].In: Proceedings of the Workshop on Multilingual Information Processing and Asian Language Processing, Beijing, China.1999:108-113.
[18]  Fung P. A Statistical View on Bilingual Lexicon Extraction: From Parallel Corpora to Non-parallel Corpora[C]. In: Proceedings of the 3rd Translation in the Americans, Pittsburgh,USA.1998:1-16.
[19]  Saralegi X, San Vicente I, Gurrutxaga A. Automatic Extraction of Bilingual Terms from Comparable Corpora in a Popular Science Domain[C].In: Proceedings of the Workshop on Comparable Corpora of Language Resources and Evaluation,Marrakech, Morocco. 2008:27-32.
[20]  Daille B,Morin E.French-English Treminology Extraction from Comparable Corpora[C].In:Proceedings of the International Joint Conference on Natural, Jeju Island, Korea.2005:707-718.
[21]  Otero  P G. Learning Bilingual Lexicons from Comparable English and Spanish Corpora[C].In: Proceedings of Machine Translation Summit XI, Copenhagen, Denmark.2007:191-198.
[22]  Tiu E P,Roxas R. Automatic Bilingual Lexicon Extraction for a Minority Target Language[C].In: Proceeding of the 22nd Pacific Asia Conference on Language, Information and Computation, Cebu City, Philippines.2008:368-376.
[23]  Hervé Déjean, Eric Gaussier,  Fatia Sadat. Bilingual Terminology Extraction: An Approach Based on a Multilingual Thesaurus Applicable to Comparable Corpora[C].In: Proceedings of the 19th International Conference on Computational Linguistics,Taipei, Taiwan. 2002:218-224.
[24]  Frantzi K T, Ananiadou S, Mima H. Automatic Recognition of Multi-word Terms: The C-value/NC-value Method[J].International Journal on Digital Libraries,2000,3(2): 115-130.
[25]  Justeson J, Katz S. Technical Terminology: Some Linguistic Properties and an Algorithm for Identification in Text[J]. Natural Language Engineering,1995(1):9-27.
[26]  Gaussier E, Renders J M, Matveeva I, et al. A Geometric View on Bilingual Lexicon Extraction from Comparable Corpora[C].In: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, Barcelona, Spain.2004:526-533.
[27]  Pekar V, Mitkov R, Blagoev D, et al. Finding Translations for Low-frequency Words in Comparable Corpora[J].Machine Translation,2006,20(4):247-266.
[28]  Shinyama Y. Named Entity Discovery Using Comparable News Articles[C].In: Proceedings of the 20th International Conference on Computational Linguistics,Geneva,Switzerland.2004:848-853.
[29]  Rayson P, Garside R. Comparing Corpora Using Frequency Profiling[C].In: Proceedings of the Workshop on Comparing Corpora at the 38th Annual Meeting of the Association of Computational Linguistics, Hong Kong, China.2000:1-6.
[30]  夏云,李德凤.可比语料量化比较分析与应用文体翻译——一项基于自建小型语料库的研究[C].见:第18届世界翻译大会论文集.北京:外文出版社,2008:561-566.

[1] Guo Shaoyou. Research on Automatic Classification Based on Term Context Relations[J]. 现代图书情报技术, 2008, 24(5): 44-49.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938