Data anonymization: a challenge to face!
Since its implementation in May 2018, the GDPR has promoted the privacy of European users by requiring companies to protect the personal data they collect. Indeed, protecting citizens names and surnames as well as their biometric data is a question of preserving their private lives.
There are different methods for data protection and, particularly, data anonymization.
What is data anonymization?
Data anonymization refers to an irreversible transformation of data to prevent the identification of a particular individual. Irreversible means that it must be impossible to re-identify the person in question, directly or indirectly.
An alternative is pseudonyimzation. Like anonymization, it consists of a transformation of personal data to prevent the identification of an individual. However, its main objective is to prevent direct identification. That is, it is still possible to re-identify a person using additional information. Therefore, pseudonymization is a less powerful data processing than anonymization since it does not totally and definitively prevent the identification of a person.
Methods of anonymization
There are two main techniques for anonymizing data:
- Randomization: transforming data so that it can no longer be attributed to a real person.
- Generalization: generalize the data so that they become common to a set of people and not to a particular person.
The challenges of anonymization
However, companies are facing a growing and increasingly difficult volume of data to manage. Consequently, anonymization seems much less effective. Indeed, it is easier to re-identify a person by cross-checking information.
To ensure effective anonymization, the CNIL(1) considers that a set of anonymized data must meet 3 main criteria:
- Individualization: is it always possible to isolate an individual? In other words, it must not be possible to identify a particular individual.
- Correlation: is it possible to link separate data sets for the same individual? In other words, the cross-checking of information must not be possible.
- Inference: can you deduce information about an individual? No induction allowing the identification of a person should be possible.
The dataset is poorly anonymized if it does not meet all 3 conditions.
Case study: RATP and the anonymization of video streams
Data anonymization solutions must be built on a case-by-case basis. In fact, each company and industry have different needs and requirements. Let’s consider the following example to illustrate this point.
Deepomatic has set up a tailor-made system for RATP to anonymize the video stream from their video surveillance cameras. More specifically, anonymization is achieved through an automatic blurring module that operates in real-time. This solution allows rapid image processing and ensures the preservation of personal data (in this case, biometric data) of users of the Paris transport network.
If you also have a project to anonymize images or video streams, do not hesitate to contact us.