People rely on vision and use their eyes every day to avoid obstacles, reach objects or perform hundreds of different tasks. This has inspired scientists who have been working on trying to give computers a way to “see” things too. 

It is what we call Computer Vision.

You may have heard this term before, as it is often associated with artificial intelligence, but do you really understand its meaning and functioning? Here are the fundamentals of computer vision.

What is Computer Vision?

Computer vision (CV) is a subcategory of Computer Science & Artificial Intelligence. It is a set of methods and technologies that make it possible to automate a specific task from an image. In fact, a machine is capable of detecting, analyzing and interpreting one or more elements of an image in order to make a decision and perform an action. 

Computer Vision extracts information from images and recognizes specific concepts. It can therefore perform a variety of tasks such as recognizing faces or characters in an image, detecting the location of an object in an image, or classifying images. The most common CV tasks are object detection and image classification. 

Object detection consists in searching for a particular element and locating it within an image, using a “box”. There is also a more elaborate and accurate detection method (to the pixel) called polygon segmentation.

As for image classification, it makes it possible to identify to which category an image belongs, based on its composition, i.e. to identify the main subject of the image. However, it is possible to associate more than one category to an image thanks to tagging with an operation similar to classification.

How does Computer Vision work?


In a majority of cases, computer vision is based on Deep Learning (DL), a field of Machine Learning. 

Deep Learning is a set of automatic learning techniques. DL relies on a network of artificial neurons (aka convolutional neural networks), similar to the human brain. That is, a neural network is made up of several successive layers of neurons. Depending on the neural architecture chosen, each of these layers can influence another. 

In order for the computer vision algorithm to be able to recognize an image, it is necessary to train the neural network beforehand. To do this, it is provided with a visual database, that has first been annotated manually depending on the type of information it wants to extract. 



What is Computer Vision used for?

Examples of applications


Computer vision can be used in a very wide range of industries, such as construction, automotive, oil & gas, or telecommunications

In practice, computer vision helps to automate business processes. It assists humans in detecting specific objects, behaviors, or situations. It saves a lot of time and considerably reduces the rate of human errors. In addition, the development of breakthrough innovations such as the autonomous car or connected objects is now possible thanks to CV.



What can computer vision recognize?


There are different categories of Computer Vision such as image processing (including image recognition), facial recognition, optical character recognition or iris recognition. This variety means that CV can be useful for many different types of industries, and various practical use cases. Here are some concrete examples of CV applications that are currently in production:  

Key Steps of Computer Vision Evolution


David Hubel & Torsten Wiesel, respectively neurologist and neuropsychologist, recorded the sound of neurons in a cat’s visual cortex. They analyzed the reaction of the cat’s brain according to the different types of images presented. In line with their recordings, they concluded that the primary visual cortex is composed of simple and complex neurons and that visual processing is triggered by simple shapes like straight lines. 


Russel Kirst, an American engineer, developed the first digital image scanner. It was a drum scanner which captured image information and transformed it in a series of 0 and 1 called “binary language” that computers are able to understand.


Lawrence Roberts introduced the process of transforming 2D photos into 3D solid objects. 3D construction was a milestone in computer vision research. 


Raymond Kurzweil, an American inventor and engineer, developed Optical Character Recognition. OCR is able to recognize virtually any font that has standard character shapes. The final goal of this process was to create a machine that was able to read texts out loud for blind people.


David Marr, a British neuroscientist, wrote new algorithms based on Hubel & Wiesel’s work, to enable computers to detect shapes such as edges, curves or corners.

In the meantime, a Japanese computer scientist called Funihiko Fukushima was introducing a pattern recognition system, the Necognitron (also inspired by Hubel & Wiesel’s model). This neural network model is convolutional and multilayered.


Since the early 2000s, researchers have been focusing on object recognition. In 2006, an AI scientist, Fei-Fei Li, started working on ImageNet, which is a large visual database. His ambition was to improve the volume and the quality of data available to train AI algorithms. Since 2010, ImageNet has been accessible to everyone.

To gain in visibility, a contest called ILSVRC is held each year to evaluate algorithms for object detection and image classification on a large scale. In 2012, the winning AlexNet model, by the researcher Alex Krizhevsky, achieved an error rate of 15.3%, which was a real breakthrough at the time. This success has highlighted Computer Vision and its enormous potential.



Deepomatic offers a unique solution to automate your business processes through a computer vision platform.
With Deepomatic Studio® & Deepomatic Run®, you’ll be able to build your project and put into production at industrial scale.