Published on04/10/2024

Understanding AI Performance in Quality Control Automation

Visual AI and automation are now very well integrated in quality control processes of Field Ops departments. Companies value its ability to streamline operations, enhance accuracy, and reduce human error. It’s a game changer that levels up every aspect of field management.

However, to fully reap the benefits of this technology, companies must understand the ins and outs of AI metrics to fully gauge operational effectiveness. Accuracy, precision, and recall – these are all crucial AI metrics that keep your finger on the pulse of performance.

After all, what can’t be measured can’t be improved. Therefore, it’s critical that field operators identify ways to measure and improve the performance of visual AI and automation.

In this blog series, we’ll unpack these metrics, explain their significance, consider AI performance and how to use these metrics to optimize AI-driven visual automation systems for better decision-making.

What are the key metrics to look out for to measure AI performance?

There are a variety of metrics to keep an eye on. Here are the most important ones to monitor.

Accuracy

Accuracy evaluates the percentage of occurrences where an AI system is right. This metric is adapted when all outcomes have equal business impact. This metric is useful if the system classifies the type of equipment it is looking at.

Recall

Recall evaluates the percentage of positives that have been correctly predicted by the system. For instance, safety-critical defects prediction will typically require high recall, meaning all of the defects are correctly being predicted, even if it creates false positives in the process. They would then be handled by a back office team.

Precision

Precision evaluates the percentage of correctness when predicting positives. Let’s take the example of fraud detection which is a case where high precision is important. Blocking the users or telling them they’ve done something wrong when it’s not the case can badly hurt adoption.

F1-Score

The F1-score analyzes both precision and recall to provide a comprehensive view of the AI system’s effectiveness. It will provide you with a score that symbolizes how competently the AI system is performing in detecting relevant instances and minimizing false alarms. This makes it especially handy in situations where both underreporting and overreporting issues could have significant consequences.

Beyond single metric, measuring the whole AI system performance

To understand AI performance, we have to start by understanding the notion of models and workflows. Indeed Quality Control Automation powered by AI requires both.

Deep Learning Models allow to perform tasks like:

classification (attributing one or several tags to an image),
object detection (localizing specific objects with bounding boxes),
segmentation (classifying each pixel of the image / providing precise masks of various object instances),
OCR (extracting text from an image).

Workflows assemble several models combined with other algorithms and business rules. Thanks to workflows, it is possible to offer real-time feedback to field workers and produce the quality control data. A typical workflow runs 10 models.

Models and workflows are 100% customized to each client and answer different use cases.

In some cases, it can be beneficial to capitalize on off-the-shelf solutions. Here is why:

Off-the-shelf models are trained on large datasets of images labeled by Deepomatic, to work on a wider range of cases than customer-specific models. They are elementary blocks that can be reused in any customer workflow, without usually requiring any fine-tuning. Off-the-shelf models help reduce the time and effort (data collection, AI training) required to set up the solution and accelerate the time-to-value.

Off-the-shelf workflows work the same way as customer specific and allow to implement an entire quality control checkpoint. The only customization required is on the last layer of business logic. Off-the-shelf workflows help significantly reduce the time needed to implement some control points from zero.

Coming back to the performance question, it can be considered at the level of deep learning models (building blocks of a workflow), or at the workflow level (including the business logic). Ultimately, what matters most is the performance of the workflow, as it deals with the business outcomes. Generally, better models yield better workflows, which is why we put more effort into improving the performance of the models. In other words, the performance of the whole system generates higher results than its components.

Bear in mind that defining the performance is also tightly related to what is essentially a positive outcome, and this needs to be business-driven, not data science driven. For example, in the case of detecting unplugged cables on a street cabinet, it is more important to be extra precise when there are a few unplugged cables in the cabinet, while it does not really matter when more than 100 cables are unplugged. Defining this notion of positive outcome is usually what we spend time with our customers in the first phase of the project.

In the second article, we will walk you through the Real-World Strategies for Field Services to continuously refine AI performance. We will address the common challenges in achieving high AI performance.