SUMAT: Data Collection and Parallel Corpus Compilation for Machine Translation of Subtitles

Authors: Volha V. Petukhova and Rodrigo Agerri and Mark Fishel and Panayota Georgakopoulou and Sergio Penkale and Arantza del Pozo and Mirjam Sepesy Maucec and Martin Volk and Andy Way

Date: 23.05.2012


PDF

Abstract

This paper describes the data collection and parallel corpus compilation activities carried out in the FP7 EU-funded SUMAT project. This project aims to develop an online subtitle translation service for nine European languages combined into 14 different language pairs. This data provides bilingual and monolingual training data for statistical machine translation engines which will semi-automate the subtitle translation processes of subtitling companies on a large scale.

BIB_text

@Article {
author = {Volha V. Petukhova and Rodrigo Agerri and Mark Fishel and Panayota Georgakopoulou and Sergio Penkale and Arantza del Pozo and Mirjam Sepesy Maucec and Martin Volk and Andy Way},
title = {SUMAT: Data Collection and Parallel Corpus Compilation for Machine Translation of Subtitles},
pages = {21-28},
keywds = {
parallel multilingual corpora, statistical machine translation, subtitle translation service
}
abstract = {
This paper describes the data collection and parallel corpus compilation activities carried out in the FP7 EU-funded SUMAT project. This project aims to develop an online subtitle translation service for nine European languages combined into 14 different language pairs. This data provides bilingual and monolingual training data for statistical machine translation engines which will semi-automate the subtitle translation processes of subtitling companies on a large scale.
}
isbn = {978-2-9517408-7-7},
date = {2012-05-23},
year = {2012},
}
Vicomtech

Parque Científico y Tecnológico de Gipuzkoa,
Paseo Mikeletegi 57,
20009 Donostia / San Sebastián (Spain)

+(34) 943 309 230

close overlay