For companies, putting an Artificial Intelligence application into production is the Holy Grail. Yet a simple and well-known phenomenon threatens projects in production: the natural weakening of their performance, called “data drift”.
You’ve heard it many times before, nothing in life, except diamonds, lasts forever. Your smartphone is getting slower and slower (and you have to take your charger everywhere with you), and as Sadi Carnot says in the second law of thermodynamics, over time, things naturally tend towards disaster. Even your grandparents’ fridge, the illustrious and resistant one, has finally given up. The modern, digital world of machine learning is no exception to this law. Applications in production lose their performance (think of an antivirus that no longer protects you, or an antispam that, as time goes by, lets everything pass).
In short, whatever the models developed, their performance weakens over time and the applications linked to them become obsolete, unfortunately faster than a fridge (and often even faster than a smartphone).
The phenomenon may seem insignificant, but when applications are in production in a company, when they are used to help workers and consumers, it becomes serious, and puts a strain on the relationship between the digital department and business departments. The beneficiaries of the applications see their technological assets shrink and take technology to task.
So why does the performance of machine learning models decrease over time? Very often, the cause is the same, well-known and pernicious: data drift (also known as concept drift, a variant with the same effects). Let me explain, and give you an example.
“Data Drift”, what’s this all about?
Until the advent of unsupervised learning, almost all models in production are supervised models: they are trained on a fixed database representing a set of situations over a given period of time. That is to say, they have seen and digested many examples of situations that systematically lead to this or that result in a specific context, but if you go out of this context, the model is lost. The context changes over time, and gradually the application, if not re-trained, loses performance.
Let’s take an example in the field of image recognition. Suppose you are a manufacturer of barrier-free motorway tolls. You film a vehicle passing at 120 km/h on the motorway and you try to understand the nature of the vehicle in order to charge the right price (does the car have a trailer? If so, how big is it? Is the truck moving loaded, how many of its axles touch the ground? All these elements are necessary to determine the price).
Let’s say that you use thermal cameras and that you have successfully trained the model in summer (or in Portugal). A few months later, you realize that its performance drops drastically in winter (or in Canada). Indeed, the radically different climatic conditions will lead to the input images being transformed from one season or country to another. The images will not look like what the model is used to. If the phenomenon has not been anticipated before production, the model will be completely disrupted, the prices charged to vehicles will be wrong, customers will be outraged, the motorway company will lose considerable sums of money and will have to give users free access to the motorway for the time it takes to solve the problem.
We still have a few questions to answer: How does concept drift differ from data drift? How can we anticipate, detect and solve these problems? Why is data drift particularly strong in the field of image recognition? You will find out in the second part of this article.
An article originally posted on L’Usine Nouvelle