Balanced training of a hybrid ensemble method for imbalanced datasets: a case of emergency department readmission prediction

Autores: Arkaitz Artetxe Vallejo Manuel Graña Andoni Beristain Iraola Sebastián Ríos

Fecha: 01.05.2020

Neural Computing and Applications


Abstract

Dealing with imbalanced datasets is a recurrent issue in health-care data processing. Most literature deals with small academic datasets, so that results often do not extrapolate to the large real-life datasets, or have little real-life validity. When minority class sample generation by interpolation is meaningless, the recourse to undersampling the majority class is mandatory in order to reach some acceptable results. Ensembles of classifiers provide the advantage of the diversity of their members, which may allow adaptation to the imbalanced distribution. In this paper, we present a pipeline method combining random undersampling with bootstrap aggregation (bagging) for a hybrid ensemble of extreme learning machines and decision trees, whose diversity improves adaptation to the imbalanced class dataset. The approach is demonstrated on a realistic greatly imbalanced dataset of emergency department patients from a Chilean hospital targeted to predict patient readmission. Computational experiments show that our approach outperforms other well-known classification algorithms.

BIB_text

@Article {
title = {Balanced training of a hybrid ensemble method for imbalanced datasets: a case of emergency department readmission prediction},
journal = {Neural Computing and Applications},
pages = {5735-5744},
volume = {10},
keywds = {
Class imbalance, Hospital readmission, Ensemble learning, Extreme learning machine
}
abstract = {

Dealing with imbalanced datasets is a recurrent issue in health-care data processing. Most literature deals with small academic datasets, so that results often do not extrapolate to the large real-life datasets, or have little real-life validity. When minority class sample generation by interpolation is meaningless, the recourse to undersampling the majority class is mandatory in order to reach some acceptable results. Ensembles of classifiers provide the advantage of the diversity of their members, which may allow adaptation to the imbalanced distribution. In this paper, we present a pipeline method combining random undersampling with bootstrap aggregation (bagging) for a hybrid ensemble of extreme learning machines and decision trees, whose diversity improves adaptation to the imbalanced class dataset. The approach is demonstrated on a realistic greatly imbalanced dataset of emergency department patients from a Chilean hospital targeted to predict patient readmission. Computational experiments show that our approach outperforms other well-known classification algorithms.


}
doi = {10.1007/s00521-017-3242-y},
date = {2020-05-01},
}
Vicomtech

Parque Científico y Tecnológico de Gipuzkoa,
Paseo Mikeletegi 57,
20009 Donostia / San Sebastián (España)

+(34) 943 309 230

close overlay