Please wait a minute...
Advanced Search
现代图书情报技术  2013, Vol. 29 Issue (9): 23-29     https://doi.org/10.11925/infotech.1003-3513.2013.09.04
  知识组织与知识管理 本期目录 | 过刊浏览 | 高级检索 |
树转录翻译模型解码优化
石崇德, 乔晓东, 王惠临
中国科学技术信息研究所 北京 100038
Decoding Optimization in Tree Transducer based Translation Model
Shi Chongde, Qiao Xiaodong, Wang Huilin
Institute of Scientific & Technical Information of China, Beijing 100038, China
全文: PDF (487 KB)   HTML  
输出: BibTeX | EndNote (RIS)      
摘要 针对树转录翻译模型中的规则二元化和解码算法进行深入研究,通过四分化的二元化转换方法减少词汇化同步转录规则的中间项目,通过实时判断中间项目有效性的RR-CKY算法来避免冗余项目生成。实验证明,这两种方法能有效减少解码过程中的中间项目,提高机器翻译解码效率,在一定程度上提高机器翻译效果。
服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
乔晓东
王惠临
石崇德
关键词 机器翻译树转录翻译模型句法分析RR-CKY算法    
Abstract:This paper proposes two methods to improve the efficiency of rule binarization and decoding in tree transducer based translation model. The authors convert synchronous transducer rules to four kinds of binary rules to reduce the temporary items, and propose RR-CKY decoding algorithm, which can avoid part of redundant items along with decoding. The experiments show that these two methods can reduce the number of temporary items and make decoding faster. They can also improve the quality of machine translation.
Key wordsMachine translation    Tree transducer based translation model    Parsing    RR-CKY algorithm
收稿日期: 2013-06-19      出版日期: 2013-09-27
:  TP391.2  
基金资助:本文系中国科学技术信息研究所重点工作项目“多语言科技信息语义关联网络构建及其应用”(项目编号:ZD2012-3-3)和中国科学技术信息研究所学科建设项目“自然语言处理”(项目编号:XK2012-6)的研究成果之一。
通讯作者: 石崇德     E-mail: shicd@istic.ac.cn
引用本文:   
石崇德, 乔晓东, 王惠临. 树转录翻译模型解码优化[J]. 现代图书情报技术, 2013, 29(9): 23-29.
Shi Chongde, Qiao Xiaodong, Wang Huilin. Decoding Optimization in Tree Transducer based Translation Model. New Technology of Library and Information Service, 2013, 29(9): 23-29.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.1003-3513.2013.09.04      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2013/V29/I9/23
[1] Wu D. Toward Machine Translation with Statistics and Syntax and Semantics[C].In: Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU’09), Merano, Italy. 2009: 12-21.
[2] Chiang D. Hierarchical Phrase-based Translation[J].Computational Linguistics,2007, 33(2):201-228.
[3] Marcu D, Wang W, Echihabi A, et al. SPMT: Statistical Machine Translation with Syntactified Target Language Phrases[C].In: Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing,Sydney,Australia. 2006:44-52.
[4] 刘洋. 树到串统计翻译模型研究[D].北京:中国科学院计算技术研究所,2007.(Liu Yang. Research on Tree-to-String Statistical Translation Models[D]. Beijing: Institute of Computing Technology, Chinese Academy of Sciences,2007.)
[5] 蒋宏飞. 基于同步树替换文法的统计机器翻译方法研究[D]. 哈尔滨:哈尔滨工业大学,2010. (Jiang Hongfei. Research on Synchronous Tree Substitution Grammar Based Statistical Machine Translation Methods[D]. Harbin: Harbin Institute of Technology,2010.)
[6] 宗成庆. 统计自然语言处理[M].北京:清华大学出版社,2008.(Zong Chengqing. Statistical Natural Language Processing[M]. Beijing: Tsinghua University Press,2008.)
[7] Zhang H, Huang L, Gildea D, et al. Synchronous Binarization for Machine Translation[C].In: Proceedings of the Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics. Stroudsburg, PA, USA: Association for Computational Linguistics,2006:256-263.
[8] Wang W, Knight K, Marcu D. Binarizing Syntax Trees to Improve Syntax-based Machine Translation Accuracy[C].In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning,Prague, Czech Republic. Stroudsburg, PA, USA: Association for Computational Linguistics, 2007: 746-754.
[9] Fang L, Chung T, Gildea D. Terminal-aware Synchronous Binarization[C].In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, Portland, Oregon, USA. 2011: 401-406.
[10] The Penn Treebank Project[DB/OL]. [2013-06-15]. http://www.cis.upenn.edu/~treebank/.
[11] Collins M. Head-driven Statistical Models for Natural Language Parsing[D]. Philadelphia: University of Pennsylvania,1999.
[12] Charniak E. A Maximum-Entropy-Inspired Parser[C].In: Proceedings of the 1st North American Chapter of the Association for Computational Linguistics Conference. 2000:132-139.
[13] Klein D, Manning C D. Accurate Unlexicalized Parsing[C].In: Proceedings of the 41st Annual Meeting on Association for Computational Linguistics.2003:423-430.
[14] Song X, Ding S, Lin C Y. Better Binarization for the CKY Parsing[C].In: Proceedings of the Conference on Empirical Methods in Natural Language Processing,Honolulu,Hawaii,USA. 2008:167-176.
[15] Schmid H. Efficient Parsing of Highly Ambiguous Context-free Grammars with Bit Vectors[C].In: Proceedings of the 20th International Conference on Computational Linguistics.2004.
[16] Fox H J. Phrasal Cohesion and Statistical Machine Translation[C].In: Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing. Stroudsburg,PA,USA: Association for Computational Linguistics,2002:304-311.
[17] Galley M, Hopkins M, Knight K, et al. What’s in a Translation Rule?[C].In: Proceedings of the Human Language Technology and North American Association for Computational Linguistics Conference (HLT-NAACL 2004),Boston, Massachusetts, USA. 2004:273-280.
[18] Graehl J, Knight K, May J. Training Tree Transducers[J]. Computational Linguistics, 2008,34(3):391-427.
[19] Goodman J. Semiring Parsing[J]. Computational Linguistics,1999,25(4):573-605.
[20] Venugopal A,Zollmann A,Vogel S. An Efficient Two-Pass Approach to Synchronous-CFG Driven Statistical MT[C].In: Proceedings of Human Language Technology and North American Association for Computational Linguistics Conference,Rocherster, NY, USA. 2007:500-507.
[21] GIZA + +[CP/OL]. [2013-06-15]. http://code.google.com/p/giza-pp/.
[22] The Stanford Parser[CP/OL]. [2013-06-15]. http://nlp.stanford.edu/software/lex-parser.shtml.
[23] SRILM[CP/OL]. [2013-06-15]. http://www.speech.sri.com/projects/srilm/.
[24] NIST Open Machine Translation (OpenMT) Evaluation[DB/OL]. [2013-06-15]. http://www.itl.nist.gov/iad/mig//tests/mt/.
[25] Och F J. Minimum Error Rate Training in Statistical Machine Translation[C]. In: Proceedings of the 41st Annual meeting on Association for Computational Linguistics, Sapporo, Japan. Stroudsburg, PA, USA: Association for Computational Linguistics, 2003:160-167.
[1] 刘文斌, 何彦青, 吴振峰, 董诚. 基于BERT和多相似度融合的句子对齐方法研究*[J]. 数据分析与知识发现, 2021, 5(7): 48-58.
[2] 石磊,王毅,成颖,魏瑞斌. 自然语言处理中的注意力机制研究综述*[J]. 数据分析与知识发现, 2020, 4(5): 1-14.
[3] 李博诚,张云秋,杨铠西. 面向微博商品评论的情感标签抽取研究 *[J]. 数据分析与知识发现, 2019, 3(9): 115-123.
[4] 刘清民,姚长青,石崇德,温晓洁,孙玥莹. 面向科技文献神经机器翻译词汇表优化研究*[J]. 数据分析与知识发现, 2019, 3(3): 76-82.
[5] 李琳, 李辉. 一种基于概念向量空间的文本相似度计算方法[J]. 数据分析与知识发现, 2018, 2(5): 48-58.
[6] 杨爽, 陈芬. 基于SVM多特征融合的微博情感多级分类研究*[J]. 数据分析与知识发现, 2017, 1(2): 73-79.
[7] 张帆, 乐小虬. 领域科技文献创新点句中主题属性实例识别方法研究[J]. 现代图书情报技术, 2015, 31(5): 15-23.
[8] 邵健, 章成志. 从互联网上自动获取领域平行语料[J]. 现代图书情报技术, 2014, 30(12): 36-43.
[9] 聂卉, 杜嘉忠. 依存句法模板下的商品特征标签抽取研究[J]. 现代图书情报技术, 2014, 30(12): 44-50.
[10] 唐晓波, 肖璐. 基于依存句法网络的文本特征提取研究[J]. 现代图书情报技术, 2014, 30(11): 31-37.
[11] 袁冬, 熊晶, 刘永革. 面向甲骨文的实例机器翻译技术研究[J]. 现代图书情报技术, 2012, 28(5): 48-54.
[12] 石崇德, 王惠临. 统计机器翻译中文分词优化技术研究[J]. 现代图书情报技术, 2012, 28(4): 29-34.
[13] 王东波, 朱丹浩, 谢靖. 面向汉语自动句法分析的语法知识库构建[J]. 现代图书情报技术, 2011, 27(4): 42-47.
[14] 孙镇 王惠临. 命名实体识别研究进展综述[J]. 现代图书情报技术, 2010, 26(6): 42-47.
[15] 仲夏 张志平 王惠临. 词汇化树邻接语法研究述评及中文应用初探*[J]. 现代图书情报技术, 2010, 26(5): 35-42.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn