Privacy Mechanisms and Evaluation Metrics for Synthetic Data Generation: A Systematic Review

Authors: Pablo Alberto Osorio Marulanda Gorka Epelde Unanue Mikel Hernández Jiménez Imanol Isasa Reinoso Nicolás Moreno Andoni Beristain Iraola

Date: 01.01.2024

IEEE Access


Abstract

The growth of data publishing, sharing, and mining mechanisms in various fields of industry and science has led to an increase in the flow of data, making it an important asset that needs to be protected and managed effectively. To this end, different mechanisms have been used across different domains, including Privacy Enhancing Technologies like Synthetic Data Generation, which aim to protect user-sensitive data and prevent misuse among different domains. Then, Synthetic data has been used not only to augment datasets and balance classes but also in applications of data analysis paradigms that aim to provide useful insights in terms of utility while preserving the privacy of sensitive data. Still, there is a gap in the conceptual and state-of-the-art understanding of the level of privacy synthetic data generators can provide and how they affect various industries and fields. This systematic review attempts to address how privacy has been assessed and measured in the framework of synthetic data generation, and getting to know which metrics have been used to evaluate those mechanisms. We provide an overview with a total of 105 recent studies in this field after a screening process and identify future open research directions. The main findings include a high prevalence of differential privacy as a privacy-preserving technique and privacy budget cost as a trade-off metric, with a high percentage of GAN-based model implementations, and mainly healthcare applications. Our systematic review covers multiple privacy domains and can be understood as a general framework for privacy measurement applied in Synthetic Data Generation.

BIB_text

@Article {
title = {Privacy Mechanisms and Evaluation Metrics for Synthetic Data Generation: A Systematic Review},
journal = {IEEE Access},
pages = {8808-88074},
volume = {12},
keywds = {
Anonymization; confidentiality; privacy; privacy metrics; privacy-preserving big data analytics; synthetic data; synthetic data generation
}
abstract = {

The growth of data publishing, sharing, and mining mechanisms in various fields of industry and science has led to an increase in the flow of data, making it an important asset that needs to be protected and managed effectively. To this end, different mechanisms have been used across different domains, including Privacy Enhancing Technologies like Synthetic Data Generation, which aim to protect user-sensitive data and prevent misuse among different domains. Then, Synthetic data has been used not only to augment datasets and balance classes but also in applications of data analysis paradigms that aim to provide useful insights in terms of utility while preserving the privacy of sensitive data. Still, there is a gap in the conceptual and state-of-the-art understanding of the level of privacy synthetic data generators can provide and how they affect various industries and fields. This systematic review attempts to address how privacy has been assessed and measured in the framework of synthetic data generation, and getting to know which metrics have been used to evaluate those mechanisms. We provide an overview with a total of 105 recent studies in this field after a screening process and identify future open research directions. The main findings include a high prevalence of differential privacy as a privacy-preserving technique and privacy budget cost as a trade-off metric, with a high percentage of GAN-based model implementations, and mainly healthcare applications. Our systematic review covers multiple privacy domains and can be understood as a general framework for privacy measurement applied in Synthetic Data Generation.


}
doi = {10.1109/ACCESS.2024.3417608},
date = {2024-01-01},
}
Vicomtech

Parque Científico y Tecnológico de Gipuzkoa,
Paseo Mikeletegi 57,
20009 Donostia / San Sebastián (Spain)

+(34) 943 309 230

Zorrotzaurreko Erribera 2, Deusto,
48014 Bilbao (Spain)

close overlay

Behavioral advertising cookies are necessary to load this content

Accept behavioral advertising cookies