Improving a Long Audio Aligner through Phone- Relatedness Matrices for English, Spanish and Basque

Autores: Aitor Álvarez, Pablo Ruiz, Haritz Arzelus

Fecha: 12.09.2014


PDF

Abstract

A multilingual long audio alignment system is presented in the auto-matic subtitling domain, supporting English, Spanish and Basque. Pre-recorded contents are recognized at phoneme level through language-dependent triphone-based decoders. In addition, the transcripts are phonetically translated using grapheme-to-phoneme transcriptors. An optimized version of Hirschberg’s al-gorithm performs an alignment between both phoneme  sequences to find matches. The correctly aligned phonemes and their time-codes obtained in the recognition step are used as the reference to obtain near-perfectly aligned sub-titles. The performance of the alignment algorithm  is evaluated using different non-binary scoring matrices based on phone confusion-pairs from each decoder, on phonological similarity and on human perception  errors. This system is an evolution of our previous successful system for long audio alignment.

BIB_text

@Article {
author = {Aitor Álvarez, Pablo Ruiz, Haritz Arzelus},
title = {Improving a Long Audio Aligner through Phone- Relatedness Matrices for English, Spanish and Basque},
pages = {473-480},
volume = {8655},
keywds = {

Long audio alignment, automatic subtitling, phonological similarity matrices, perceptual confusion matrices.


}
abstract = {

A multilingual long audio alignment system is presented in the auto-matic subtitling domain, supporting English, Spanish and Basque. Pre-recorded contents are recognized at phoneme level through language-dependent triphone-based decoders. In addition, the transcripts are phonetically translated using grapheme-to-phoneme transcriptors. An optimized version of Hirschberg’s al-gorithm performs an alignment between both phoneme  sequences to find matches. The correctly aligned phonemes and their time-codes obtained in the recognition step are used as the reference to obtain near-perfectly aligned sub-titles. The performance of the alignment algorithm  is evaluated using different non-binary scoring matrices based on phone confusion-pairs from each decoder, on phonological similarity and on human perception  errors. This system is an evolution of our previous successful system for long audio alignment.


}
isbn = {978-3-319-10815-5},
date = {2014-09-12},
year = {2014},
}
Vicomtech

Parque Científico y Tecnológico de Gipuzkoa,
Paseo Mikeletegi 57,
20009 Donostia / San Sebastián (España)

+(34) 943 309 230

Edificio Ensanche,
Zabalgune Plaza 11,
48009 Bilbao (España)

close overlay