Weighted Set-Theoretic Alignment of Comparable Sentences

Fecha: 03.08.2017


PDF

Abstract

This article presents the STACCw system for the BUCC 2017 shared task on parallel sentence extraction from comparable corpora. The original STACC approach, based on set-theoretic operations over bags of words, had been previously shown to be efficient and portable across domains and alignment scenarios. We describe an extension of this approach with a new weighting scheme and show that it provides significant improvements on the datasets provided for the shared task.

BIB_text

@Article {
title = {Weighted Set-Theoretic Alignment of Comparable Sentences},
pages = {41-45},
keywds = {

BUCC 2017, Comparable Corpora, Sentence Alignment


}
abstract = {

This article presents the STACCw system for the BUCC 2017 shared task on parallel sentence extraction from comparable corpora. The original STACC approach, based on set-theoretic operations over bags of words, had been previously shown to be efficient and portable across domains and alignment scenarios. We describe an extension of this approach with a new weighting scheme and show that it provides significant improvements on the datasets provided for the shared task.


}
isbn = {978-1-5108-4575-6},
date = {2017-08-03},
year = {2017},
}
Vicomtech

Parque Científico y Tecnológico de Gipuzkoa,
Paseo Mikeletegi 57,
20009 Donostia / San Sebastián (España)

+(34) 943 309 230

close overlay