Please wait a minute...
Advanced Search
现代图书情报技术  2009, Vol. Issue (10): 7-13     https://doi.org/10.11925/infotech.1003-3513.2009.10.02
  数字图书馆 本期目录 | 过刊浏览 | 高级检索 |
基于可比语料库的双语术语抽取研究述评*
康小丽章成志1,2 王惠临1
1(中国科学技术信息研究所 北京 100038)
2(南京理工大学信息管理系 南京 210094)
Survey on Bilingual Terminology Extraction from Comparable Corpora
Kang Xiaoli1   Zhang Chengzhi1, 2   Wang Huilin1
1(Institute of Scientific & Technical Information of China, Beijing 100038, China)
2(Department of Information Management, Nanjing University of Science & Technology, Nanjing 210094,China)
全文: PDF (648 KB)  
输出: BibTeX | EndNote (RIS)      
摘要 

对基于平行语料库的双语术语抽取和基于可比语料库的双语术语抽取进行对比分析,说明基于可比语料库的双语术语抽取的研究意义与应用价值。分析总结解决该问题的主要方法及其优化方法,指出存在的问题并展望该研究的未来发展方向。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
康小丽
章成志
王惠临
关键词 双语术语抽取可比语料库上下文向量向量相似度计算    
Abstract

 By comparing with extracting bilingual terminology from parallel corpora, this paper describes the value of extracting bilingual terminology from comparable corpora. It summarizes the main method and the optimization methods of implementation of bilingual terminology extraction. And some perspectives and prospects about bilingual terminology extraction based on the comparable corpus are proposed.

Key words Bilingual terminology extraction    Comparable corpora    Context vector    Vector similarity computation
收稿日期: 2009-09-22      出版日期: 2009-10-25
: 

TP391

 
基金资助:

*本文系“十一五”国家科技支撑计划重点项目“多语言信息服务环境关键技术研究”(项目编号:2006BAH03B02)、中国博士后科学基金特别资助项目“多语领域本体学习研究”(项目编号:200801105)和中国博士后科学基金面上资助项目“多语领域本体学习关键技术研究”(项目编号:20080430463)的研究成果之一。

通讯作者: 康小丽     E-mail: kangli0810@163.com
作者简介: 康小丽,章成志,王惠临
引用本文:   
康小丽,章成志,王惠临. 基于可比语料库的双语术语抽取研究述评*[J]. 现代图书情报技术, 2009, (10): 7-13.
Kang Xiaoli,Zhang Chengzhi,Wang Huilin. Survey on Bilingual Terminology Extraction from Comparable Corpora. New Technology of Library and Information Service, 2009, (10): 7-13.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.1003-3513.2009.10.02      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2009/V/I10/7

[1]  Velupillai S, Dalianis H. Automatic Construction of Domain-specific Dictionaries on Sparse Parallel Corpora in the Nordic Languages[C].In: Proceedings of Workshop on Multi-source, Multilingual Information Extraction and Summarization.2008:10-16.
[2]  Miangah T M. Automatic Term Extraction for Cross-Language Information Retrieval Using a Bilingual Parallel Corpus[C].In: Proceedings of the 6th International Conference on Informatics and Systems Special Track on Natural Language Processing,Cairo,Egypt.2008:81-84.
[3]  Tatsuya Izuha.Machine Translation Using Bilingual Term Entries Extracted from Parallel Texts[J].IEIC Technical Report,2001,101(89):1-7.
[4]  Baker M.Corpora in Translation Studies: An Overview and Some Suggestions for Future Research [J].Target,1995,7(2): 223-243.
[5]  孙广范,宋金平,袁琦,等.中英可比语料库中翻译等价对抽取方法研究[J].计算机工程与应用,2007,43(32):44-48.
[6]  Translational English Corpus[EB/OL].[2009-08-17]. http://www.monabaker.com/tsresources/TranslationalEnglishCorpus.htm.
[7]  孙乐,金友兵,杜林,等.平行语料库中双语术语词典的自动抽取[J].中文信息学报,2000,14(6):33-39.
[8]  Oh J H, Choi K S, Isahara H. A Hybrid Model for Extracting Transliteration Equivalents from Parallel Corpora[C]. In: Proceedings of the International Conference on the Computer Processing of Oriental Languages,Singapore.2006: 222-233.
[9]  Jin C G, Na S H, Lee J H,et al. Automatic Extraction of English-Chinese Transliteration Pairs Using Dynamic Window and Tokenizer [C].In: Proceedings of the 6th SIGHAN Workshop on Chinese Language Proceeding, Hyderabad, India.2008:9-15.
[10]  Widdows D, Dorow B, Chan H K. Using Parallel Corpora to Enrich Multilingual Lexical Resources[C]. In: Proceedings of the 3rd International Conference on Language Resources and Evaluation, Las Palmas de Gran Canaria, Spain.2002: 240-245.
[11]  Lee C J, Chang J S, Jang T S R. Alignment of Bilingual Named Entities in Parallel Corpora Using Statistical Models and Multiple Knowledge Sources[J]. Association for Computing Machinery Transactions on Asian Language Information Processing,2006,5(2):121-145.
[12]  Haghigi A, Liang P,Taylor B K, et al. Learning Bilingual Lexicons from Monolingual Corpora[C].In: Proceedings of the Association for Computation Linguistics: Human Language Technology, Ohio, USA.2008:771-779.
[13]  Lefever E, Macken L, Hoste V. Language-independent Bilingual Terminology Extraction from a Multilingual Parallel Corpus[C]. In: Proceedings of  the 12th Conference of the European Chapter of the Association for Computational Linguistics,Athens,Greece.2009:496-504.
[14]  Fung P, Church K W. K-vec: A New Approach for Aligning Parallel Texts[C]. In: Proceedings of The 15th International Conference on Computational Linguistics,Kyoto,Japan.2004: 1096-1102.
[15]  Rapp R. Identifying Word Translations in Nonparallel Texts[C].In: Proceedings of the 33rd Annual Meeting on Association for Computational Linguistics, Cambridge, Massachusetts, USA.1995: 320-322.
[16]  Kumiko Tanaka,  Hideya Iwasaki. Extraction of Lexical Translations from Non-aligned Corpora[C].In: Proceedings of the 16th International Conference on Computational Linguistics, Copenhagen, Denmark.1996: 580-585.
[17]  Iram Shahzad, Kiyonori Ohtake, Shigeru Masuyama, et al. Identifying Translations of Compound Using Non-aligned Corpora[C].In: Proceedings of the Workshop on Multilingual Information Processing and Asian Language Processing, Beijing, China.1999:108-113.
[18]  Fung P. A Statistical View on Bilingual Lexicon Extraction: From Parallel Corpora to Non-parallel Corpora[C]. In: Proceedings of the 3rd Translation in the Americans, Pittsburgh,USA.1998:1-16.
[19]  Saralegi X, San Vicente I, Gurrutxaga A. Automatic Extraction of Bilingual Terms from Comparable Corpora in a Popular Science Domain[C].In: Proceedings of the Workshop on Comparable Corpora of Language Resources and Evaluation,Marrakech, Morocco. 2008:27-32.
[20]  Daille B,Morin E.French-English Treminology Extraction from Comparable Corpora[C].In:Proceedings of the International Joint Conference on Natural, Jeju Island, Korea.2005:707-718.
[21]  Otero  P G. Learning Bilingual Lexicons from Comparable English and Spanish Corpora[C].In: Proceedings of Machine Translation Summit XI, Copenhagen, Denmark.2007:191-198.
[22]  Tiu E P,Roxas R. Automatic Bilingual Lexicon Extraction for a Minority Target Language[C].In: Proceeding of the 22nd Pacific Asia Conference on Language, Information and Computation, Cebu City, Philippines.2008:368-376.
[23]  Hervé Déjean, Eric Gaussier,  Fatia Sadat. Bilingual Terminology Extraction: An Approach Based on a Multilingual Thesaurus Applicable to Comparable Corpora[C].In: Proceedings of the 19th International Conference on Computational Linguistics,Taipei, Taiwan. 2002:218-224.
[24]  Frantzi K T, Ananiadou S, Mima H. Automatic Recognition of Multi-word Terms: The C-value/NC-value Method[J].International Journal on Digital Libraries,2000,3(2): 115-130.
[25]  Justeson J, Katz S. Technical Terminology: Some Linguistic Properties and an Algorithm for Identification in Text[J]. Natural Language Engineering,1995(1):9-27.
[26]  Gaussier E, Renders J M, Matveeva I, et al. A Geometric View on Bilingual Lexicon Extraction from Comparable Corpora[C].In: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, Barcelona, Spain.2004:526-533.
[27]  Pekar V, Mitkov R, Blagoev D, et al. Finding Translations for Low-frequency Words in Comparable Corpora[J].Machine Translation,2006,20(4):247-266.
[28]  Shinyama Y. Named Entity Discovery Using Comparable News Articles[C].In: Proceedings of the 20th International Conference on Computational Linguistics,Geneva,Switzerland.2004:848-853.
[29]  Rayson P, Garside R. Comparing Corpora Using Frequency Profiling[C].In: Proceedings of the Workshop on Comparing Corpora at the 38th Annual Meeting of the Association of Computational Linguistics, Hong Kong, China.2000:1-6.
[30]  夏云,李德凤.可比语料量化比较分析与应用文体翻译——一项基于自建小型语料库的研究[C].见:第18届世界翻译大会论文集.北京:外文出版社,2008:561-566.

[1] 康小丽, 章成志. 用于双语术语抽取的专业领域中英文可比语料库构建[J]. 现代图书情报技术, 2012, 28(2): 28-33.
[2] 郭少友. 基于词语上下文关系的文本自动分类方法研究[J]. 现代图书情报技术, 2008, 24(5): 44-49.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn