Weighted Set-Theoretic Alignment of Comparable Sentences

Egileak: Andoni Azpeitia Zaldua Thierry Etchegoyhen Eva Martínez García

Data: 03.08.2017


PDF

Abstract

This article presents the STACCw system for the BUCC 2017 shared task on parallel sentence extraction from comparable corpora. The original STACC approach, based on set-theoretic operations over bags of words, had been previously shown to be efficient and portable across domains and alignment scenarios. We describe an extension of this approach with a new weighting scheme and show that it provides significant improvements on the datasets provided for the shared task.

BIB_text

@Article {
title = {Weighted Set-Theoretic Alignment of Comparable Sentences},
pages = {41-45},
keywds = {

BUCC 2017, Comparable Corpora, Sentence Alignment


}
abstract = {

This article presents the STACCw system for the BUCC 2017 shared task on parallel sentence extraction from comparable corpora. The original STACC approach, based on set-theoretic operations over bags of words, had been previously shown to be efficient and portable across domains and alignment scenarios. We describe an extension of this approach with a new weighting scheme and show that it provides significant improvements on the datasets provided for the shared task.


}
isbn = {978-1-5108-4575-6},
date = {2017-08-03},
year = {2017},
}
Vicomtech

Gipuzkoako Zientzia eta Teknologia Parkea,
Mikeletegi Pasealekua 57,
20009 Donostia / San Sebastián (Espainia)

+(34) 943 309 230

Zorrotzaurreko Erribera 2, Deusto,
48014 Bilbo (Espainia)

close overlay

Jokaeraren araberako publizitateko cookieak beharrezkoak dira eduki hau kargatzeko

Onartu jokaeraren araberako publizitateko cookieak