SUMAT: Data Collection and Parallel Corpus Compilation for Machine Translation of Subtitles

Egileak: Volha V. Petukhova and Rodrigo Agerri and Mark Fishel and Panayota Georgakopoulou and Sergio Penkale and Arantza del Pozo and Mirjam Sepesy Maucec and Martin Volk and Andy Way

Data: 23.05.2012


PDF

Abstract

This paper describes the data collection and parallel corpus compilation activities carried out in the FP7 EU-funded SUMAT project. This project aims to develop an online subtitle translation service for nine European languages combined into 14 different language pairs. This data provides bilingual and monolingual training data for statistical machine translation engines which will semi-automate the subtitle translation processes of subtitling companies on a large scale.

BIB_text

@Article {
author = {Volha V. Petukhova and Rodrigo Agerri and Mark Fishel and Panayota Georgakopoulou and Sergio Penkale and Arantza del Pozo and Mirjam Sepesy Maucec and Martin Volk and Andy Way},
title = {SUMAT: Data Collection and Parallel Corpus Compilation for Machine Translation of Subtitles},
pages = {21-28},
keywds = {
parallel multilingual corpora, statistical machine translation, subtitle translation service
}
abstract = {
This paper describes the data collection and parallel corpus compilation activities carried out in the FP7 EU-funded SUMAT project. This project aims to develop an online subtitle translation service for nine European languages combined into 14 different language pairs. This data provides bilingual and monolingual training data for statistical machine translation engines which will semi-automate the subtitle translation processes of subtitling companies on a large scale.
}
isbn = {978-2-9517408-7-7},
date = {2012-05-23},
year = {2012},
}
Vicomtech

Gipuzkoako Zientzia eta Teknologia Parkea,
Mikeletegi Pasealekua 57,
20009 Donostia / San Sebastián (Espainia)

+(34) 943 309 230

Zorrotzaurreko Erribera 2, Deusto,
48014 Bilbo (Espainia)

close overlay

Jokaeraren araberako publizitateko cookieak beharrezkoak dira eduki hau kargatzeko

Onartu jokaeraren araberako publizitateko cookieak