|
|
Decoding Optimization in Tree Transducer based Translation Model |
Shi Chongde, Qiao Xiaodong, Wang Huilin |
Institute of Scientific & Technical Information of China, Beijing 100038, China |
|
|
Abstract This paper proposes two methods to improve the efficiency of rule binarization and decoding in tree transducer based translation model. The authors convert synchronous transducer rules to four kinds of binary rules to reduce the temporary items, and propose RR-CKY decoding algorithm, which can avoid part of redundant items along with decoding. The experiments show that these two methods can reduce the number of temporary items and make decoding faster. They can also improve the quality of machine translation.
|
Received: 19 June 2013
Published: 27 September 2013
|
|
[1] Wu D. Toward Machine Translation with Statistics and Syntax and Semantics[C].In: Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU’09), Merano, Italy. 2009: 12-21. [2] Chiang D. Hierarchical Phrase-based Translation[J].Computational Linguistics,2007, 33(2):201-228. [3] Marcu D, Wang W, Echihabi A, et al. SPMT: Statistical Machine Translation with Syntactified Target Language Phrases[C].In: Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing,Sydney,Australia. 2006:44-52. [4] 刘洋. 树到串统计翻译模型研究[D].北京:中国科学院计算技术研究所,2007.(Liu Yang. Research on Tree-to-String Statistical Translation Models[D]. Beijing: Institute of Computing Technology, Chinese Academy of Sciences,2007.) [5] 蒋宏飞. 基于同步树替换文法的统计机器翻译方法研究[D]. 哈尔滨:哈尔滨工业大学,2010. (Jiang Hongfei. Research on Synchronous Tree Substitution Grammar Based Statistical Machine Translation Methods[D]. Harbin: Harbin Institute of Technology,2010.) [6] 宗成庆. 统计自然语言处理[M].北京:清华大学出版社,2008.(Zong Chengqing. Statistical Natural Language Processing[M]. Beijing: Tsinghua University Press,2008.) [7] Zhang H, Huang L, Gildea D, et al. Synchronous Binarization for Machine Translation[C].In: Proceedings of the Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics. Stroudsburg, PA, USA: Association for Computational Linguistics,2006:256-263. [8] Wang W, Knight K, Marcu D. Binarizing Syntax Trees to Improve Syntax-based Machine Translation Accuracy[C].In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning,Prague, Czech Republic. Stroudsburg, PA, USA: Association for Computational Linguistics, 2007: 746-754. [9] Fang L, Chung T, Gildea D. Terminal-aware Synchronous Binarization[C].In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, Portland, Oregon, USA. 2011: 401-406. [10] The Penn Treebank Project[DB/OL]. [2013-06-15]. http://www.cis.upenn.edu/~treebank/. [11] Collins M. Head-driven Statistical Models for Natural Language Parsing[D]. Philadelphia: University of Pennsylvania,1999. [12] Charniak E. A Maximum-Entropy-Inspired Parser[C].In: Proceedings of the 1st North American Chapter of the Association for Computational Linguistics Conference. 2000:132-139. [13] Klein D, Manning C D. Accurate Unlexicalized Parsing[C].In: Proceedings of the 41st Annual Meeting on Association for Computational Linguistics.2003:423-430. [14] Song X, Ding S, Lin C Y. Better Binarization for the CKY Parsing[C].In: Proceedings of the Conference on Empirical Methods in Natural Language Processing,Honolulu,Hawaii,USA. 2008:167-176. [15] Schmid H. Efficient Parsing of Highly Ambiguous Context-free Grammars with Bit Vectors[C].In: Proceedings of the 20th International Conference on Computational Linguistics.2004. [16] Fox H J. Phrasal Cohesion and Statistical Machine Translation[C].In: Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing. Stroudsburg,PA,USA: Association for Computational Linguistics,2002:304-311. [17] Galley M, Hopkins M, Knight K, et al. What’s in a Translation Rule?[C].In: Proceedings of the Human Language Technology and North American Association for Computational Linguistics Conference (HLT-NAACL 2004),Boston, Massachusetts, USA. 2004:273-280. [18] Graehl J, Knight K, May J. Training Tree Transducers[J]. Computational Linguistics, 2008,34(3):391-427. [19] Goodman J. Semiring Parsing[J]. Computational Linguistics,1999,25(4):573-605. [20] Venugopal A,Zollmann A,Vogel S. An Efficient Two-Pass Approach to Synchronous-CFG Driven Statistical MT[C].In: Proceedings of Human Language Technology and North American Association for Computational Linguistics Conference,Rocherster, NY, USA. 2007:500-507. [21] GIZA + +[CP/OL]. [2013-06-15]. http://code.google.com/p/giza-pp/. [22] The Stanford Parser[CP/OL]. [2013-06-15]. http://nlp.stanford.edu/software/lex-parser.shtml. [23] SRILM[CP/OL]. [2013-06-15]. http://www.speech.sri.com/projects/srilm/. [24] NIST Open Machine Translation (OpenMT) Evaluation[DB/OL]. [2013-06-15]. http://www.itl.nist.gov/iad/mig//tests/mt/. [25] Och F J. Minimum Error Rate Training in Statistical Machine Translation[C]. In: Proceedings of the 41st Annual meeting on Association for Computational Linguistics, Sapporo, Japan. Stroudsburg, PA, USA: Association for Computational Linguistics, 2003:160-167. |
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|