Road Damage Detection

6 min readMar 9, 2021

Detecting different types of road damages like cracks and potholes from the given image/video of the road.

Table of content

Problem statement
Dataset
Deep learning for object detection
EDA
State of the art techniques
Models
Future work
References

1. Problem Statement:

Road infrastructure is a crucial public asset as it contributes to economic development and growth while bringing critical social benefits. Road surface inspection is primarily based on visual observations by humans and quantitative analysis using expensive machines. Best alternative to these methods would be an smart detector which detects damages using their recorded images or video.

Apart from the road infrastructure, a road damage detector will also be useful in self driving cars to detect some potholes or other disturbance in their way and try to avoid them.

2. Dataset

Dataset used in this project is collected form here.

The dataset contains images of roads of different countries, they are Japan, India, Czech. With images the the annotation of labels were in xml files i.e. the labels were in PASCAL VOC format.

As the dataset consist majority of images from japan (in it’s previous versions it contained images only from japan) the labels were decided according to Japanese road guidelines, according to the data sources.

But the latest dataset now contains images of other countries, so to generalize the damages we considered only following labels.

D00: Vertical cracks , D10: Horizontal cracks, D20: Alligator cracks, D40: Potholes

3. Deep learning for Object Detection

CNN’s or Convolution neural network are the building blocks for all computer vision tasks. Even in case of object detection to extract patterns of object from images to a feature map(basically a matrix with smaller dimensions than image) convolution operation is used.

Now from past couple of years there’s been a tremendous research has been done on object detection task and we got good bunch of state of the art algorithms or methods and some of them in nutshell I’ve explained below.

4.EDA

Total number of images in dataset: 26620

Distribution of labels

Count for each classes
D00 : 6592
D10 : 4446
D20 : 8381
D40 : 5627

Distribution of labels in each country (Whole data analysis)

Data analysis of  Czech
    categories  count
0  # of images   2829
1          D00    988
2          D10    399
3          D20    161
4          D40    197
5  # of labels   1745
********************************************************************
Data analysis of  India
     categories  count
6   # of images   7706
7           D00   1555
8           D10     68
9           D20   2021
10          D40   3187
11  # of labels   6831
********************************************************************
Data analysis of  Japan
     categories  count
12  # of images  10506
13          D00   4049
14          D10   3979
15          D20   6199
16          D40   2243
17  # of labels  16470
********************************************************************

Distribution of sizes of labels in images

Minimum size of a label: 0x1
Maximum size of a label: 704x492

5. State of the art techniques

Object detection is now a vast topic, equivalent to a semester subject. It consist of many algorithms. So just to make it short object detection algorithms are categorized into various categories like Region based algorithms(RCNN, Fast-RCNN, Faster-RCNN), two stage detectors, one stage detectors, where Region based algorithms themselves are part of two stage detectors, but I’m gonna explain them in nutshell below therefore I mentioned them explicitly.

Let’s start with RCNN (Region based convolution neural network)

The basic architecture of object detection algorithm consist of two parts. The part consist of a CNN which gets the original image information into feature maps and in the next part the different algorithms have their different techniques. So in case of RCNN it uses selective search to get ROI’s (Region of Interest) i.e. there is a chance of having different object in that place. Extracts around 2000 regions from each image. Using those ROI’s it classifies the labels and predicts the object location using two different models. Therefore these models are call two stage detectors.

There are some limitations by RCNN and to overcome that they came up with Fast RCNN. RCNN has high computation time as each region is passed to the CNN separately also it uses three different model for making predictions. Therefore in Fast RCNN each image is passed only once to the CNN and feature maps are extracted. Selective search is used on these maps to generate predictions. Combines all the three models used in RCNN together.

But Fast RCNN is still using selective search which is slow and hence computation time is still high. And guess what they came up with another version whose name makes sense i.e. Faster RCNN. Faster RCNN replaces the selective search method with region proposal network which made the algorithm much faster.

Now let’s move towards some one-shot detectors. YOLO and SSD are very famous models for object detection as they provide very good trade off between speed and accuracy

YOLO (You Only Look Once): A single neural network predicts bounding boxes and class probabilities directly from full images in one evaluation. Since the whole detection pipeline is a single network, it can be optimized end-to-end directly on detection performance

SSD (Single Shot Detector): The SSD approach discretises the output space of bounding boxes into a set of default boxes over different aspect ratios. After discretising, the method scales per feature map location. The Single Shot Detector network combines predictions from multiple feature maps with different resolutions to naturally handle objects of various sizes.

6. Models

As a newbie in deep learning or let say computer vision to be precise, to learn the fundamentals I tried some of the basic and fast to implement algorithms as follow on this datasets

Efficientdet_d0
SSD_mobilenet_v2
YOLOv3

For first and second model I used tensorflow model zoo and for training yolov3 referred this.

To train a tensorflow object detection model on your custom dataset you can go through this.

To evaluate mAP (mean average precision) is used

Got very low mAP with efficientdet_d0 and ssd_mobilenet_v2 maybe because didn’t change some default config for learning rate, optimizer and data augmentation.

Results

Inference using efficicentdet_d0

Inference using SSD_mobilenet_v2

Same code as efficientdet

Inference on YOLOv3

7. Future work

To Try different configurations of learning rate, optimizers and data augmentation with tensorflow object detection models and try to implement new versions of models like yolov4, v5

8. References

Check GitHub Repo for whole code

Deshram/Road-Damage-Detection

Detecting different types of damages of roads like cracks and potholes for the given image/video of the road. Dataset…

github.com

Linkedin account