Vicomtech-IK4, Euskal Irrati Telebista and MondragonLingua make progresses on automatic translation to Basque

09.01.2017

They shared a corpus with over half a million sentences

With this development, the three organizations have created the first bilingual news’ corpus in Basque and Spanish. This will be an essential resource for further system-developments of automatic translation between both languages.

The diversity of subjects a corpus of news is composed of, as well as its volume, will make an important improvement on the quality of automatic translation into/from Basque. It consists of more than half a million phrase pairs in both languages, covering subjects such as national and international politics, culture, and sports, amongst others.

This resource was created with innovative automatic search methods of similar pieces of news in both languages, and it was developed within a framework programme of R&D projects financed by the Department of Competitiveness and Development of the Basque Government (GAITEK and HAZITEK programmes). In the same way, it should be noted the corpus created has been shared with META-SHARE, the European Network for linguistic resources.

MondragonLingua, EiTB and Vicomtech-IK4 show particular interest on sharing this result with the society in order to promote research and development of automatic translation into Basque.

Vicomtech

Parque Científico y Tecnológico de Gipuzkoa,
Paseo Mikeletegi 57,
20009 Donostia / San Sebastián (Spain)

+(34) 943 309 230

close overlay