Securing multimedia-based personal data: towards a methodology for automated anonymization risk assessment seeking GDPR compliance

Abstract

Anonymizing personal data in multimedia content (image, audio and text) has become crucial for secure data-sharing while adhering to the rigorous data compliance requirements of the European Union (EU) General Data Protection Regulation (GDPR). Given the substantial volume of data involved, manual verification of anonymization accuracy is not feasible due to the high potential for human error and the impracticality of scaling such efforts. Consequently, automated or semi-automated processes are indispensable. However, it is important to note that these methodologies cannot guarantee absolute anonymization, potentially leading to inadvertent disclosure of personal information and associated legal and privacy implications. Therefore, when dealing with extensive multimedia datasets, it is strongly advised to conduct a comprehensive anonymization risk assessment. In response to this challenge, we introduce a novel methodology with an innovative design to quantitatively evaluate the effectiveness and reliability of the anonymization techniques by generating metrics to calculate risk indicators to conduct a comprehensive anonymization risk assessment. This methodology is built based on de-identification techniques to protect personal data while preserving data integrity. Our approach leverages a novel algorithmic framework that helps humans inspect the anonymized dataset, ensuring higher data privacy and security. The methodology detects non-anonymized personal data within an extensive dataset automatically. This is achieved by extracting characteristics related to personal data during the anonymization process and correlating attributes from the surrounding data using sophisticated AI-driven analysis. Afterwards, a rule-based algorithm is applied to the extracted characteristics from both processes to identify and qualitatively assess the anonymization risk. We demonstrate the applicability and effectiveness of our methodology through a focused application on license plates and face anonymization, utilizing a dataset of non-annotated vehicles and human images. By offering a scalable solution to evaluate anonymization risk while data-sharing, our methodology represents a pivotal step towards achieving GDPR compliance and processing practices, facilitating safer data-sharing environments across industries.

BIB_text

@Article {
title = {Securing multimedia-based personal data: towards a methodology for automated anonymization risk assessment seeking GDPR compliance},
pages = {132060C},
keywds = {
anonymization quality; Anonymization risk assessment; automatic anonymization; data-sharing; de-identification; GDPR; personal data; privacy preservation
}
abstract = {

Anonymizing personal data in multimedia content (image, audio and text) has become crucial for secure data-sharing while adhering to the rigorous data compliance requirements of the European Union (EU) General Data Protection Regulation (GDPR). Given the substantial volume of data involved, manual verification of anonymization accuracy is not feasible due to the high potential for human error and the impracticality of scaling such efforts. Consequently, automated or semi-automated processes are indispensable. However, it is important to note that these methodologies cannot guarantee absolute anonymization, potentially leading to inadvertent disclosure of personal information and associated legal and privacy implications. Therefore, when dealing with extensive multimedia datasets, it is strongly advised to conduct a comprehensive anonymization risk assessment. In response to this challenge, we introduce a novel methodology with an innovative design to quantitatively evaluate the effectiveness and reliability of the anonymization techniques by generating metrics to calculate risk indicators to conduct a comprehensive anonymization risk assessment. This methodology is built based on de-identification techniques to protect personal data while preserving data integrity. Our approach leverages a novel algorithmic framework that helps humans inspect the anonymized dataset, ensuring higher data privacy and security. The methodology detects non-anonymized personal data within an extensive dataset automatically. This is achieved by extracting characteristics related to personal data during the anonymization process and correlating attributes from the surrounding data using sophisticated AI-driven analysis. Afterwards, a rule-based algorithm is applied to the extracted characteristics from both processes to identify and qualitatively assess the anonymization risk. We demonstrate the applicability and effectiveness of our methodology through a focused application on license plates and face anonymization, utilizing a dataset of non-annotated vehicles and human images. By offering a scalable solution to evaluate anonymization risk while data-sharing, our methodology represents a pivotal step towards achieving GDPR compliance and processing practices, facilitating safer data-sharing environments across industries.


}
isbn = {978-151068120-0},
date = {2024-09-17},
}
Vicomtech

Gipuzkoako Zientzia eta Teknologia Parkea,
Mikeletegi Pasealekua 57,
20009 Donostia / San Sebastián (Espainia)

+(34) 943 309 230

Zorrotzaurreko Erribera 2, Deusto,
48014 Bilbo (Espainia)

close overlay

Jokaeraren araberako publizitateko cookieak beharrezkoak dira eduki hau kargatzeko

Onartu jokaeraren araberako publizitateko cookieak