Machine Learning and the GDPR

machine learning

Machine Learning and the GDPR

In October the the EU’s Article 29 Data Protection Working Party released their guidelines on automated decision-making and profiling under GDPR. The guidelines were received with mixed feelings, leaving machine learning enthusiasts worried. For one thing, their provisions are much wider than those of the GDPR. As a result, many consider them harmful for those companies who rely on such tools.

What is machine learning?

Machine learning is a field of computer science and is closely related to computational statistics. It has many applications which include adaptive websites, bioinformatics, computer networks, computer vision including object recognition, information retrieval, internet fraud detection, marketing, online advertising, search engines and more. Even before the GDPR came into play, there were a lot of ethical questions surrounding machine learning. From those worried humans will lose their jobs once AI takes over, to those who say there’s no way we can trust robots to make decisions, those against machine learning have always let their voices be heard. Now we add to that the paragraphs dedicated to automated-decision making and profiling in the GDPR. The new regulation does not prohibit these activities, but it does restrict them somewhat.

What does the GDPR say?

Article 22 addresses automated decision-making specifically, stating:

the data subject shall have the right not to be subjected to a decision based solely on automated processing, including profiling, which produces legal effects concerning him or her or similarly significantly affects him or her

There are three exemptions from this explained in the same article and they apply when the processing:

(a) is necessary for entering into, or or performance of, a contract between the data subject and a data controller;

(b) is authorised by Union or Member State law to which the controller is subject and which also lays down suitable measures to safeguard the data subject’s rights and freedoms and legitimate interests; or

(c) is based on the data subject’s explicit consent.

Article 13 also states that data subjects have the right to explanation of the logic involved. In short, the GDPR does not forbid profiling. It requires the transparency of all operations, appropriate statistical procedures and accuracy of data. Also, a strong emphasis is place on the right to opt out. Which is something the GDPR enforces in all areas where consent is involved, not just profiling.

What does the WP29 say?

If the GDPR merely talks about the right to opt out, the WP29 takes it one step further, turning it into a full prohibition:

as a rule, there is a prohibition on fully automated individual decision-making, including profiling that has a legal or similarly significant effect

Many companies today rely on machine learning for advertising. However, machine learning is a tool that can be used for a much wider variety of purposes. Consent from the data subject is definitely required, but it is also not a lawful basis. At first look, there is worry that this provisions will eliminate all forms of online advertising in exchange for platform use, such as Facebook and other social media.

The bad

We should not forget the ugly part of profiling, which the GDPR aims at reducing. Profiling can easily discriminate and lead some consumers to being offered less attractive deals. The easiest example is that of prices differing based on geographical location. But the discrimination can be taken even further. Such an example is provided in the WP29 guidelines – a company that places people in categories like “young, single parents”, “rural and barely making it” and more. The consumers are then scored based on financial vulnerability. This profiles are then used to offer these consumers deals and other financial services. It is easy to see why forbidding such practices will be a good thing for the data subjects.

Storage limitation may be another provision that will conflict with machine learning. According to Article 5 of the GDPR states that personal data cannot be kept for longer than is necessary for the purposes for which the personal data are processed. However, machine learning algorithms process large volumes of data and build correlations. The longer the storage period, the better results can be expected from the algorithm. With the new provisions, storing data for long periods of time might conflict with the proportionality considerations.

The good

Despite the worries that the new guidelines will hinder the progress made with machine learning and disrupt the activity of those who use this technique for research, there are many positive aspects. The data subjects rights are at the center of the GDPR and all the provisions surrounding automated-decision making and profiling enforce these rights once more. Less discrimination, less invasion of privacy are only a few of the benefits. A lot of pointless data collection should be stopped through these provisions.

For example, when you buy from an online retailer, you will often need to create an account and enter at least your credit card and delivery address. Which is all well, as this data is necessary for the contract. If you consent to the retailer  saving your data for further processing, again no problem. However, many retailers now go one step further by building a profile for their customers from their purchases. If you visit the website afterwards and you are logged into your account, the profile might get updated from the simple fact that you search certain products.

While some people like this, as they receive customized newsletters from there, this data collection and profiling is not necessary for the performance of the contract. The GDPR, if applied correctly, will stop this type of activities, so that only data truly necessary is collected and discrimination is reduced if not completely eliminated.

As a conclusion

There is still a lot to say about machine learning. It is an interesting field and the GDPR certainly is not meant to stop the researches in this area, as some fear at the moment. It should stop discrimination and un-fair practices, which can only be a positive thing. Consent is key as always, but this time it certainly isn’t enough. Transparency is also a necessity when using profiling. If you do use profiling or automated-decision making algorithms, make sure you respect all the provisions of the GDPR and always keep in mind to put the rights of the data subject first.



About the author

Laura Vegh is the Chief Security Officer at, a passwordless security solution. She has a PhD in Systems Engineering, focused on cyber-physical systems security.