Statistical modeling and deep learning for high-quality machine translation
Modela
Duration:
01.04.2016 - 31.12.2017
The main objective of MODELA is the research, development, and validation of advanced linguistic engineering techniques based on Deep Learning and statistical modeling for the configuration of high-quality machine translation systems in the field of news generation.
This objective is broken down into a series of scientific-technological objectives and scope and impact objectives. The objectives for 2016 focused on the conceptualization and specification of project requirements in order to achieve the initially defined objectives. In this sense, the objectives described below constitute a series of partial objectives achieved in 2016, the development of which will be completed in 2017. The following points show the initial objective and the partial objective defined within it for 2016.
Scientific-Technological Objectives
1. Development of techniques and tools for the collection of linguistic resources from heterogeneous sources.
a. 1. Identification of linguistic resource sources: The websites of Argia, EiTB, and Consumer will be analyzed for news content.
b. Development of techniques and tools for resource creation.
2. Configuration of a statistical machine translation system for Basque, English, and Spanish.
a. Definition of application areas and language pairs: The news and legal-administrative domains will be addressed. For the former, the Spanish-Basque language pair will be used, and for the latter, the English-Spanish pair.
b. Development of a basic statistical machine translation system.
3. Research and development of techniques and tools based on Deep Learning for integration with statistical machine translation systems.
a. Analysis of the state of the art for the development of neural machine translation systems based on Deep Learning.
b. Definition of requirements for neural translation systems: The appropriate hardware and software requirements for the development of these systems have been defined.
4. Development of a prototype system for the automatic translation of Basque, English, and Spanish, integrating Deep Learning techniques.
a. Definition of the prototype's scope: It has been agreed that work package 4 will be dedicated to developing a hybrid prototype combining statistical and neural systems, called SMT++, and that work package 5 will be dedicated to developing a pure neural system, called PNMT.
Looking for support for your next project? Contact us, we are looking forward to helping you.


