From Subtitles to Parallel Corpora

Authors: Mark Fishel and Panayota Georgakopoulou and Sergio Penkale and Volha V. Petukhova and Matej Rojc and Martin Volk and Andy Way

Date: 28.05.2012


PDF

Abstract

We describe the preparation of parallel corpora based on professional quality subtitles in seven European language pairs. The main focus is the effect of the processing steps on the size and quality of the final corpora.

BIB_text

@Article {
author = {Mark Fishel and Panayota Georgakopoulou and Sergio Penkale and Volha V. Petukhova and Matej Rojc and Martin Volk and Andy Way},
title = {From Subtitles to Parallel Corpora},
pages = {3-6},
abstract = {

We describe the preparation of parallel corpora based on professional quality subtitles in seven European language pairs. The main focus is the effect of the processing steps on the size and quality of the final corpora.


}
date = {2012-05-28},
year = {2012},
}
Vicomtech

Parque Científico y Tecnológico de Gipuzkoa,
Paseo Mikeletegi 57,
20009 Donostia / San Sebastián (Spain)

+(34) 943 309 230

close overlay