Image recognition and cancer detection

byVincent Delaitre

Image recognition in cancer detection.

Reduce unnecessary and invasive treatments thanks to deep learning.

Unfortunately, everybody knows someone who has been diagnosed with cancer. Patient survival chances improve immensely when cancer is detected and treated early. Effective screening is, therefore, the key. However, the evaluation of screenings can be cumbersome, time-consuming, and prone to errors. In addition, it may often result in unnecessary and invasive treatments with possible painful side effects. This is where AI comes in.

Image recognition for cancer detection

At Deepomatic, we believe AI can assist doctors in their patient’s diagnoses with real-time screening analyses that are just as accurate as those performed by a trained professional. The automatic analysis would save time and enable diagnoses in remote places, which lack trained personnel.

Based on this premise, Deepomatic embarked on a very promising project with Light for Life Technologies (LLTech), a French start-up that is developing a scanner for in-depth microscopic tissue imaging. The technology can capture images from a tissue sample within minutes, dramatically reducing the waiting time for results. In addition, there is no need for any kind of tissue preparation, modification, or staining, allowing the reuse of the biopsy for further analyses.

Let’s take a closer look at how we used our image recognition platform to understand the implications of deep learning on cancer diagnosis.

Create a dataset of labeled cancer images

LLTech provided us with 18 images of biopsies containing cancerous cells and 122 ones without any abnormalities. Using our annotation tool, we identified the abnormal regions within the former images through polygon annotations.

Annotation of cancerous regions.
Annotation of cancerous regions on LLTech’s prostate cancer image.

Since the abnormal regions were much smaller (<1% of the image area), we split the images into 256×256 pixels sub-images to improve training performances. This approach also removed pure background sub-images (without any tissue) from the dataset and allowed the identification of abnormal tissue regions without a trained detector algorithm. Then, we grouped the sub-images into two categories: healthy and cancer.

Annotation of cancerous regions.
Splitting the image into sub-images: the white areas correspond to the cancer annotations and the red percentages to the overlap on the corresponding sub-image.

For the healthy category, we chose only sub-images with more than 50% of tissue of the 122 biopsies. For the cancer category, we used the sub-images from the 18 abnormal biopsies containing more than 40% of cancer. To avoid biased training, due to the very unbalanced dataset, we augmented the cancer category by randomly cropping sub-images in the annotated regions and removing sub-images from the healthy category.

Augmenting the cancer dataset by randomly cropping sub-images in the cancer annotation region.
Augmenting the cancer dataset by randomly cropping sub-images in the cancer annotation region.

The final dataset contained 5,319 sub-images in both healthy and cancer categories. We used 25% of them, i.e. 1330 randomly chosen sub-images, to test the algorithm’s performance.

Train a custom model to diagnose cancerous tissue

After uploading the pre-processed images and splitting them into the above-described labels as well as training and test datasets, we chose a pre-trained GoogLeNet architecture tailored to that dataset. Without any other pre-processing or tweaking of the algorithm, we obtained an 89 and 93% accuracy of classifying healthy and cancerous tissue, respectively. We obtained these results after only 10 training epochs. In the latest version of our software, the user can opt for different pre-trained algorithms (AlexNet, ResNet, GoogLeNet) and can modify their corresponding hyper-parameters such as learning rate policy and decay rate.

Modification of pre-trained algortithms.
Modification of pre-trained algorithms.

Learn to detect cancer, one image at a time

When training models to recognize particular concepts, it can often be frustrating if you don’t meet your target performance rate. What’s even more frustrating is when you don’t know why. Fortunately, our tool allows users to easily identify the reasons behind unsatisfactory performance rates. In the cancer classification case, we could easily see that:

  • Annotations were not precise and informative enough.
  • The background was an important feature for the algorithm and we needed to eliminate it,
  • The dataset was not diverse enough with 18 images from which less than 1 % of the sub-images were used for the training.

By controlling these variables, simply doubling the number of images in the dataset would drastically raise performance rates and make automatic detection a reliable and convenient option. What’s more, the major advantage of LLTech’s medical revolution is that the data is processed in real time. Therefore, it can be sent by remote transmission directly to an artificial intelligence system to help with the diagnosis. Doctors could receive biopsy results in the operating room, thus avoiding a second intervention. There is no doubt that medical technology powered by deep learning will have a revolutionary impact on the diagnosis of cancers.


This solution is of course accompanied by challenges still to overcome. Annotation requires an expert’s time, so obtaining high quality annotated datasets will remain a costly challenge for years. Also, to be more accurate and useful, we’ll need to train several classes to classify the cancer stages. The next step of our collaboration with LLTech is to train new algorithms on their new Dynamic Cell Imaging (DCI) which can provide complementary sub-cellular contrast to the existing imaging. Watch this space.

Optical biopsies for the detection of cancer.

LLTech systems perform optical biopsies for the detection of cancer. Their microscopes simultaneously exploit two optical technologies to generate colored images in real-time. The yellow and green/blue areas correspond to cancer and normal cells, respectively.


Our Blog Articles