Multilingual Anonymisation toolkit for Public Administrations-MAPA
The MAPA Project (Multilingual Anonymisation toolkit for Public Administrations) will develop the open source toolkit to anonymise data on the medical and legal domains, deploying it at several Public Administrations in Europe.
At its core, the MAPA anonymisation toolkit will use Named-Entity Recognition and Classification (NERC) techniques using both Deep Learning techniques and neural networks.
In addition, thanks to the transfer learning capabilities shown by new types of Deep-Learning models, new systems can be trained using relatively small datasets of manually labelled data. The knowledge acquired for a given domain or language can be transferred and re-used cross-language or cross-domain. MAPA will be trained to detect named entities that involve sensitive information.
MAPA will be feature-rich and the NERC approach will be complemented with other configurable mechanisms such as pattern detection based on regular expressions (passport or ID numbers, telephone numbers, street addresses, blood groups, age, sex, marital status, email addresses, bank accounts, etc.)
User-definable dictionaries for particular applications will also cater for specific usages of entity names known in advance.