By now you should have at least a rough idea of how a deep learning model for computer vision operates. However, if you’ve tried your hands at it before, you know that going from theory to action is another story. There’s no better way to learn than practice, but a few heads up never hurt. That’s why we’ve summarized for you the ten challenges that you will be confronted with sooner or later.
Build your dataset
Data is the backbone of any model. The more data you have, the better. And when it comes to computer vision, data is composed of two distinct elements: the images and their labels.
1. Take out your camera
It all starts with images. If you’re lucky, you’ve been silently storing images over the years. Most likely you have a few pictures here and there, but you’re nowhere near a full dataset. Whether it be installing image acquisition systems in factories or scrapping the internet, you have to set up an automatic acquisition system. The second lesson you should take from this is to start storing every picture you have. Today, storage is cheap, and you never know when those images will prove valuable!
2. Grab a cup of coffee and put on some music
Okay, you have thousands of images. Unfortunately, this is only half the battle. Now comes labeling. Labeling is the action of manually looking at each picture and retrieving the information from it. Sometimes you can be smart about it and with a bit of scripting automatically extract the information embedded in the image during storage. However, even with this trick, the labels retrieved rarely cover all of your dataset. Practically, this means spending hours in front of your computer executing the same classification task over and over again. It can be very grueling and time-consuming. This is why a lot of people outsource it entirely and why recently so many companies specialized in outsourcing labeling have seen the light.
3. Iterate, iterate, iterate
The traditional way of doing data science is to define the problem you’re trying to solve, build a dataset, test different models, select the best-performing and deploy it into production. What people sometimes miss out on, is that fiddling with your model is not the only lever at your disposal to improve performance. It turns out that focusing on dataset quality and balance can have a tremendous impact. Besides, you can’t predict all the cases you’ll encounter, especially outliers. You need a system with a feedback loop to analyze and visualize your dataset and then put a human at the center of it.
Train your model
Once you have the first version of your dataset, you can start playing with neurons, activation functions, objective losses, and other hidden layers. Here are some tips and tricks you might want to keep in the back of your head.
4. Don’t reinvent the wheel
Deep learning is a very hot subject right now. Tech giants are centering their strategy around it, VCs are pouring millions of dollars into it and billionaires are arguing about it. What can you do? You can take advantage of it! The community is vibrant, and most research papers are published within months. All tech giants — except Apple — release dozens of papers every year with their latest breakthroughs. That’s not all! Tools are being built at an impressive speed to make the job easier. Take deep learning frameworks, for instance, with TensorFlow (Google) or Caffe (Facebook), etc. You would need an extraordinary reason to develop your own. Before starting a new project or a new initiative, take a good look at what has already been, and is being, done. How do you fit into this picture?
5. Slow and steady wins the race
With all this research at your hands, your first instinct might be to find out the latest advancements and focus your efforts on them. However, until you have a proven track-record of delivering AI projects that provide real business-value and you’ve established a working AI policy in your company, then you might want to refrain from it. Start small with something that works and then build on top of it. With computer vision, this equals transfer-learning: using a pre-trained model and fine-tuning it to the task at hand.
6. Build for the future
The first steps to any industrial computer vision system are pretty much always the same:
- Gather a small dataset.
- Download several pre-trained networks.
- Fine-tune them using a small set of hyper-parameters.
- Asses system performance.
- Identify improvement options.
No matter the project, you will start with the same rough steps, which is the reason why going through them swiftly and efficiently can make you win crucial development time. If you’ve not started already, you should develop a system that lets you try the most famous architectures (VGG, ResNet, GoogleNet, etc.) quickly with each new use case. It will allow you to get a sense of what works best and what you will need to focus on. It will also help you capitalize on all your time spent transforming spaghetti research code into enterprise software.
Deploy into production
You’ve trained your new shiny neural network. You’ve reached performance levels you could only dream of. All you need to do now is to release it into the wild. The thing is, this simple sentence is all but… simple.
7. Build the right team
Going from a fixed model to serving millions of images per day is an entirely different job. Different job. Different people. In broad strokes, where you needed a researcher with okay software development skills to implement and train the model, you now need a talented software developer with an okay understanding of deep-learning. Today with resources such as fast.ai or deeplearning.ai, it’s way easier to give the necessary knowledge to a proper software developer than to turn someone from an academic background into an excellent software developer!
There’s a massive gap between running a neural network on a single GPU and making predictions in real-time with multiple GPUs on several distant servers. Throw in the need to run the same algorithm on embedded devices such as a Nvidia Jetson, a Movidius or a custom board and you’ve got yourself a hefty amount of work! At the same time, you will also need to integrate it with the rest of the infrastructure and ensure it meets your robustness and reliability requirements. Hardware, neural network optimization, load-balancing, are all things you should think about it from the start if you don’t want to be caught red-handed further down the line.
9. Nothing is written in stone
Your model is now in production and ready to bear the load. The go-live was a success and you’re pouring the champagne, congratulation! You’re already thinking of your next projects. Beware, you need to keep an eye on your performances!
- Your business use case will evolve with time. Sometimes it’s just a minor tweak, but still, you have to incorporate it if you want your algorithm to stay relevant.
- Even if the use case remains the same, for instance identifying a car brand, your dataset will need to reflect the changes and trends happening in your underlying population.
- Breakthroughs will go from the cutting-edge research to being production-ready and you will want to integrate, or at least test them, in your setup.
- People will abuse your system. As with every system, there are always people that try to game it. Deep learning is a very young field but already Adversarial Networks, networks explicitly designed to fool other networks, have to be taken seriously and it will only increase as they become more and more present in our everyday lives.
10. Tying it all together
You may have noticed that all the tips in the list are use-case-agnostic. What this means is that you will need to go through them no matter which image recognition system you’re trying to implement. Plus, they all win from being integrated together: you will need to be able to train and deploy a first model to know how to improve your dataset which will then help you redefine your tasks, etc. What you should strive for, in the end, is to have all those elemental parts communicate at all times and not treat every step as a separate stage in the development process.
Build one platform to unite them all.