SUMAT: An online service for SUbtitling by MAchine Translation

Egileak: Arantza del Pozo and Lindsay Bywood and Mark Fishel and Panayota Georgakopoulou and Gerard Van Loenhout and Volha V. Petukhova and Mirjam Sepesy Maucec and Dimistris Spiliotopoulos and Anja Turner and Martin Volk and Andy Way

Data: 28.05.2012


PDF

Abstract

SUMAT aims to increase the efficiency and productivity of the European subtitling industry while enhancing the quality of its results via the effective introduction of SMT technologies into subtitling processes. In order to achieve this, we will develop an online subtitle translation service addressing nine different European languages divided into the following 14 language pairs: English-German; English-French; English-Spanish; English-Dutch; English-Swedish; English-Portuguese; Slovenian-Serbian. During the first year of the project the consortium’s subtitling companies have provided large amounts of professionally produced parallel and monolingual subtitle data, which have been processed into a form suitable for training SMT systems. Baseline SMT systems are being created using the Moses SMT training scripts and decoder and the IRSTLM toolkit. In the near future, subtitles will be enriched with linguistic information and the baseline SMT systems for subtitling will be built upon by: augmenting language models with extra monolingual target data and improved use of linguistic information; enhancing translation models through the use of POS tagged data and factored models; using compound splitters, named entity recognizers and additional lexica to deal with unknown words; and investigating hierarchical decoding to make use of syntactic dependencies.

BIB_text

@Article {
author = {Arantza del Pozo and Lindsay Bywood and Mark Fishel and Panayota Georgakopoulou and Gerard Van Loenhout and Volha V. Petukhova and Mirjam Sepesy Maucec and Dimistris Spiliotopoulos and Anja Turner and Martin Volk and Andy Way},
title = {SUMAT: An online service for SUbtitling by MAchine Translation},
pages = {203},
abstract = {
SUMAT aims to increase the efficiency and productivity of the European subtitling industry while enhancing the quality of its results via the effective introduction of SMT technologies into subtitling processes. In order to achieve this, we will develop an online subtitle translation service addressing nine different European languages divided into the following 14 language pairs: English-German; English-French; English-Spanish; English-Dutch; English-Swedish; English-Portuguese; Slovenian-Serbian. During the first year of the project the consortium’s subtitling companies have provided large amounts of professionally produced parallel and monolingual subtitle data, which have been processed into a form suitable for training SMT systems. Baseline SMT systems are being created using the Moses SMT training scripts and decoder and the IRSTLM toolkit. In the near future, subtitles will be enriched with linguistic information and the baseline SMT systems for subtitling will be built upon by: augmenting language models with extra monolingual target data and improved use of linguistic information; enhancing translation models through the use of POS tagged data and factored models; using compound splitters, named entity recognizers and additional lexica to deal with unknown words; and investigating hierarchical decoding to make use of syntactic dependencies.
}
date = {2012-05-28},
year = {2012},
}
Vicomtech

Gipuzkoako Zientzia eta Teknologia Parkea,
Mikeletegi Pasealekua 57,
20009 Donostia / San Sebastián (Espainia)

+(34) 943 309 230

close overlay