Coherent Neuronal Machine Translation Methods and Systems
TANDO is a research project whose main objective is the research, development and validation of context-aware coherent neural machine translation systems.
Machine translation has achieved major successes in recent years, with significant increases in translation quality using systems based on artificial neural networks and deep learning. Despite these advances, current systems often rely on sentence-level machine translation processes, i.e. translating each sentence independently, without access to the global context in which the sentences appear. This limitation results in systematic errors, as linguistic phenomena that require access to the context of sentences cannot be modelled.
The project focuses on developing optimal methods to reduce translation errors due to lack of contextual coherence by increasing the quality of translations through context representations and adaptations of neural machine translation processes. To this end, the project covers the following main aspects:
- Research and development of neural translation architectures and algorithms for significant improvement of coherence and overall translation quality.
- Creation of advanced high quality neural machine translation systems including contextual information processing.
- Creation and preparation of data sets for the development and validation of coherent generic translation in Basque-Spanish and Basque-French.
- Automatic and human evaluation of the systems developed.
TANDO is a project subsidised by the Basque Government and ERDF funds through SPRI's ELKARTEK 2020 call for proposals. It is carried out by the following consortium: Ametzagaña (Coordinator), Vicomtech (Scientific coordination), the IXA group of the University of the Basque Country (EHU), Elhuyar and ISEA.
Besides the scientific coordination, in this project Vicomtech collaborates in all aspects related to the research, development and evaluation of methods for the improvement of contextual coherence processing in neural machine translation. To this end, Vicomtech participates in the design and integration of advanced methods based on neural modelling of contextual information, as well as the preparation of appropriate resources to train and validate contextually coherent neural machine translation models. The methods explored in the TANDO framework cover the main approaches, such as the extension of context in the training of neural models, the use of encoders dedicated to context management, or methods dedicated to contextual correction. These activities will make it possible, on the one hand, to advance the state of the art in machine translation, and, on the other hand, to determine the efficiency of different contextual translation methods for use in real environments, with the aim of increasing the quality and usability of current machine translation systems.