Data Analysis and Knowledge Discovery  2019, Vol. 3 Issue (3): 76-82    DOI: 10.11925/infotech.2096-3467.2018.0684
Vocabulary Optimization of Neural Machine Translation for Scientific and Technical Document
Qingmin Liu1,Changqing Yao1,Chongde Shi1(),Xiaojie Wen2,Yueying Sun1
1Institute of Scientific and Technical Information of China, Beijing 100038, China
2Faculty of Linguistic Sciences, Beijing Language and Culture University, Beijing 100032, China
[Objective] This paper optimizes the vocabulary of Neural Machine Translation (NMT) in scientific and technical domain for the problem of vocabulary limitation and improves the translation performance. [Methods] Based on the word formation and Point-wise Mutual Information(PMI), the paper proposes a method to optimize the vocabulary while preserving the integrity of the lexical semanteme which reduces the number of unknown words. [Results] The NTCIR-2010 corpus and abstract of journal articles in the domain of automation and computer were selected for experiments. The experimental results were compared with the segmentation method and the sub-word method, and it proved the effectiveness of the method. [Limitations] This paper did not cover the optimization of non-Chinese characters. [Conclusions] The experiments show that in scientific and technical domain, the vocabulary optimization algorithm based on scientific word formation achieves better translation performance.

Key wordsNeural Machine Translation      Scientific and Technical Document      Out of Vocabulary     
Received: 28 June 2018      Published: 17 April 2019

Cite this article:

Qingmin Liu,Changqing Yao,Chongde Shi,Xiaojie Wen,Yueying Sun. Vocabulary Optimization of Neural Machine Translation for Scientific and Technical Document. Data Analysis and Knowledge Discovery, 2019, 3(3): 76-82.

