Road Damage Detection
Detecting different types of road damages like cracks and potholes from the given image/video of the road.
Table of content
- Problem statement
- Dataset
- Deep learning for object detection
- EDA
- State of the art techniques
- Models
- Future work
- References
1. Problem Statement:
Road infrastructure is a crucial public asset as it contributes to economic development and growth while bringing critical social benefits. Road surface inspection is primarily based on visual observations by humans and quantitative analysis using expensive machines. Best alternative to these methods would be an smart detector which detects damages using their recorded images or video.
Apart from the road infrastructure, a road damage detector will also be useful in self driving cars to detect some potholes or other disturbance in their way and try to avoid them.
2. Dataset
Dataset used in this project is collected form here.
The dataset contains images of roads of different countries, they are Japan, India, Czech. With images the the annotation of labels were in xml files i.e. the labels were in PASCAL VOC format.
As the dataset consist majority of images from japan (in it’s previous versions it contained images only from japan) the labels were decided according to Japanese road guidelines, according to the data sources.
But the latest dataset now contains images of other countries, so to generalize the damages we considered only following labels.
D00: Vertical cracks , D10: Horizontal cracks, D20: Alligator cracks, D40: Potholes
3. Deep learning for Object Detection
CNN’s or Convolution neural network are the building blocks for all computer vision tasks. Even in case of object detection to extract patterns of object from images to a feature map(basically a matrix with smaller dimensions than image) convolution operation is used.
Now from past couple of years there’s been a tremendous research has been done on object detection task and we got good bunch of state of the art algorithms or methods and some of them in nutshell I’ve explained below.
4.EDA
Total number of images in dataset: 26620
Distribution of labels
Count for each classes
D00 : 6592
D10 : 4446
D20 : 8381
D40 : 5627
Distribution of labels in each country (Whole data analysis)
Data analysis of Czech
categories count
0 # of images 2829
1 D00 988
2 D10 399
3 D20 161
4 D40 197
5 # of labels 1745
********************************************************************
Data analysis of India
categories count
6 # of images 7706
7 D00 1555
8 D10 68
9 D20 2021
10 D40 3187
11 # of labels 6831
********************************************************************
Data analysis of Japan
categories count
12 # of images 10506
13 D00 4049
14 D10 3979
15 D20 6199
16 D40 2243
17 # of labels 16470
********************************************************************
Distribution of sizes of labels in images
Minimum size of a label: 0x1
Maximum size of a label: 704x492
5. State of the art techniques
Object detection is now a vast topic, equivalent to a semester subject. It consist of many algorithms. So just to make it short object detection algorithms are categorized into various categories like Region based algorithms(RCNN, Fast-RCNN, Faster-RCNN), two stage detectors, one stage detectors, where Region based algorithms themselves are part of two stage detectors, but I’m gonna explain them in nutshell below therefore I mentioned them explicitly.
Let’s start with RCNN (Region based convolution neural network)
The basic architecture of object detection algorithm consist of two parts. The part consist of a CNN which gets the original image information into feature maps and in the next part the different algorithms have their different techniques. So in case of RCNN it uses selective search to get ROI’s (Region of Interest) i.e. there is a chance of having different object in that place. Extracts around 2000 regions from each image. Using those ROI’s it classifies the labels and predicts the object location using two different models. Therefore these models are call two stage detectors.
There are some limitations by RCNN and to overcome that they came up with Fast RCNN. RCNN has high computation time as each region is passed to the CNN separately also it uses three different model for making predictions. Therefore in Fast RCNN each image is passed only once to the CNN and feature maps are extracted. Selective search is used on these maps to generate predictions. Combines all the three models used in RCNN together.
But Fast RCNN is still using selective search which is slow and hence computation time is still high. And guess what they came up with another version whose name makes sense i.e. Faster RCNN. Faster RCNN replaces the selective search method with region proposal network which made the algorithm much faster.
Now let’s move towards some one-shot detectors. YOLO and SSD are very famous models for object detection as they provide very good trade off between speed and accuracy
YOLO (You Only Look Once): A single neural network predicts bounding boxes and class probabilities directly from full images in one evaluation. Since the whole detection pipeline is a single network, it can be optimized end-to-end directly on detection performance
SSD (Single Shot Detector): The SSD approach discretises the output space of bounding boxes into a set of default boxes over different aspect ratios. After discretising, the method scales per feature map location. The Single Shot Detector network combines predictions from multiple feature maps with different resolutions to naturally handle objects of various sizes.
6. Models
As a newbie in deep learning or let say computer vision to be precise, to learn the fundamentals I tried some of the basic and fast to implement algorithms as follow on this datasets
- Efficientdet_d0
- SSD_mobilenet_v2
- YOLOv3
For first and second model I used tensorflow model zoo and for training yolov3 referred this.
To train a tensorflow object detection model on your custom dataset you can go through this.
To evaluate mAP (mean average precision) is used
Got very low mAP with efficientdet_d0 and ssd_mobilenet_v2 maybe because didn’t change some default config for learning rate, optimizer and data augmentation.
Results
Inference using efficicentdet_d0
Inference using SSD_mobilenet_v2
Same code as efficientdet
Inference on YOLOv3
7. Future work
To Try different configurations of learning rate, optimizers and data augmentation with tensorflow object detection models and try to implement new versions of models like yolov4, v5
8. References
- https://tensorflow-object-detection-api-tutorial.readthedocs.io/en/latest/
- https://pjreddie.com/darknet/yolo/
- https://medium.com/analytics-vidhya/evolution-of-object-detection-582259d2aa9b
- https://machinelearningmastery.com/object-recognition-with-deep-learning/
- https://opencv-tutorial.readthedocs.io/en/latest/yolo/yolo.html
- https://www.appliedaicourse.com/
Check GitHub Repo for whole code
Linkedin account