After playing with lane lines detection and traffic signs recognition, it's time to feed a Deep Neural Network with a video recorded during human driving. The model will be trained afterwards to act "humanlike" and predict correct steering angles while driving autonomously. Such procedure is also called Behavioral Cloning. For simplicity (and public safety 🙂 ) I will collect video data and test the resulting DNN in a game-like simulator developed by Udacity. This project stands as a third assignment in "Self-Driving Car Engineer" course.
This method is widely used among the companies in the industry. Google's Waymo reports to collect 4 million miles of data by Nov 2017 . It's a huge, still growing database which enables to produce high-quality models like Deep Neural Networks. Their goal is to generalize and behave appropriately in any situation on a road. Even in previously unseen situations. Some companies, like drive.ai, almost entirely rely on such holistic deep-learning approach . This is in contrast to a traditional robotics approach or when smaller DNNs are used for different system components. Of course, processing of such enormous amount of data is a great challenge. This stimulates the need of faster GPU processors or new DNN structures. Here , you can find an informative analysis describing what effort it takes to train such data in real production environment. It's worth noticing that recorded data usually doesn't come only from cameras. There are also LiDARs or radars which fused together create a complete machine vision system.
The project goals and pipeline
The main task of the project is to build a Convolutional Neural Network using Keras library in Python. A very important step is to properly collect data in simulator by recording correct driving and recovery driving. After preprocessing the data and training the model, we should check if we are able to use it to drive a car without leaving the road.
Steps of this project:
- Collect video data of driving behavior using simulator
- Preprocess the data
- Choose a CNN model and create it in Keras
- Train and validate the model
- Verify if the model predicts correct angles during autonomous driving.
Below, there is a short excerpt from my data collection process. According to recommendations from the Udacity course and other participants, it consists of two modes of driving:
- Good driving - teaching the model how to drive properly around the track
- Recovery driving - teaching the model how to recover from situations when the car is on a side of the road
The recorded data, which will be used to feed the neural network, is composed of:
set of video frames         and corresponding         steering angles
After training, the model will be able to predict the angle having the image on it's input.
It turned out that the data obtained from the recovery mode is even more important for the Behavioral Cloning project. From the "good driving" dataset most of the samples contain steering angle equal to 0. Having collected the "recovery driving" we gain valuable information about non-zero steering angles. Thanks to such data, the model would learn how to quickly apply bigger steering angle when vehicle is suddenly off the track. Even if the model is not perfectly tuned and the car will wobble right and left in autonomous mode, there is a higher chance that we can successfully pass challenging turns on track.
Below, there are random frames from training set with corresponding steering angles. Applied steering is normalized to <-1, 1> range.
Additionally, in the Udacity simulator, it was possible to use images from three cameras. They are "mounted" in a row, with some distance from each other. So, images from them differed slightly. Below, you can see frames from left and right camera taken at the same moment. The idea behind this is to collect more samples and diversify them. During autonomous mode, images fetched into model input will come from the central camera only. That means that we should add a certain angle shift to frames from left and right camera.
In the shown example, we can see that a vehicle is turning left. For the angle corresponding to image from the right camera, we should subtract some value. It's because when we will see such image in auto mode (using central camera), we should expect an answer from the model: Turn a bit more to the left. It's working the opposite in the left camera images.
From each sample from data collection, I picked randomly one of three images from available cameras - applying a relevant angle shift if needed. In total, 8 laps of "good driving" and 2 laps of "recovery driving" were used for further processing.
As in any Machine Learning project it's crucial to preprocess our data, e.g. by data augmentation, cleaning, transformation or dimensionality reduction. It can significantly help to train the model. Let's visualize some properties of our dataset, clean it a bit and then produce more samples of desired properties. Generating more samples from existing ones can be helpful if we don't have large enough dataset. But it also can prevent overfitting because during generation of new samples we can add image shifts, translations, shadows etc. The data is more diverse and could be potentially used for more "unseen" situations.
After drawing a histogram of steering angles for all collected samples, one conclusion comes straightaway. Although we did a "recovery driving" there is still a huge number of samples with angles close to 0. This could bias the model towards predicting 0 angle. To improve this situation I rejected about 85% of samples whose steering angle is really, really close to 0. Below, there are histograms of training samples before and after this operation:
We can see that there are still many 0 angles among the samples but it's significantly lower value than before.
On the latter histogram we can now observe that for larger steering angles (let's say, |angle| > 0.25), most of them are on the left side of the chart. It means that, during samples collection, vehicle turned left more than turned right. It's justified as the circuit in the simulator is anti-clockwise. Again, we don't want to bias the model towards negative angles in this case. The remedy is to flip some of the samples. It means flipping an image horizontally and changing the sign of corresponding steering angle. I did it for randomly chosen 50% of samples.
The final histogram depicts samples and their angles after the flipping operation:
I finally rescaled all images from dataset from 160x320 pixels to 160x100. Then, using Keras Cropping2D layer I cropped 40 rows from the top and 20 rows from the bottom of each image. This removed unnecessary information about sky, trees and the car hood at the bottom. The final image size was 100x100 which was intentional as it's regarded easier for CNN to operate when the input image is a square.
Throughout the Behavioral Cloning project I tested 3 models: LeNet, model proposed by Comma.ai and model reported in Nvidia paper for end-to-end learning for self-driving cars. My first step was to use a convolution neural network model similar to the LeNet model. I thought this model might be appropriate because it worked well in previous projects utilizing CNN for e.g. handwritten digits recognition. After testing these models, adding/removing some layers I observed that the Nvidia model works the best in general. It's a well known CNN architecture recently, it has about 27 million connections and 250 thousand parameters.
In order to gauge how well the model was working, I split my image and steering angle data into a training and validation set. I found that sometimes my model had a low mean squared error on the training set but a high mean squared error on the validation set. This implied that the model was overfitting. To combat this, I modified the model so that after each convolutional layer there is a dropout layer. Also, to increase nonlinearity, "ELU" activation layers were placed after each layer of the model. I tried also to reduce the model complexity but it didn't help much. I would say that it was important to have not so much collected data (about 10 laps in total were just enough). The model used an Adam optimizer, so the learning rate was automated during the training process. I tuned a bit the batch size which finally equaled 128. Also, I was experimenting with the number of epochs between 2 and 10. Finally, it was sufficient to run the training only on 3 epochs.
Final Model Architecture
The final model architecture consisted of a convolution neural network with the following layers and layer sizes. Total number of parameters (all trainable) equaled 297019.
The following Python code was used to model the chosen CNN for the Behavioral Cloning project. Keras is used as a library which models the neural network.
def build_nvidia_model(dropout=.4): model = Sequential() input_shape_before_crop=(IMAGE_ROWS_BEFORE_CROP,IMAGE_COLS, CHANNELS) input_shape_after_crop=(IMAGE_ROWS, IMAGE_COLS, CHANNELS) # trim image to only see section with the road model.add(Cropping2D(cropping=((IMAGE_CROP_TOP,IMAGE_CROP_BOTTOM), (0,0)), input_shape=input_shape_before_crop)) # pixels normalization using Lambda method model.add(Lambda(lambda x: x/127.5-1, input_shape=input_shape_after_crop)) model.add(Conv2D(24, (5, 5), activation='elu', strides=(2, 2))) model.add(Dropout(dropout)) model.add(Conv2D(36, (5, 5), activation='elu', strides=(2, 2))) model.add(Dropout(dropout)) model.add(Conv2D(48, (5, 5), activation='elu', strides=(2, 2))) model.add(Dropout(dropout)) model.add(Conv2D(64, (3, 3), activation='elu')) model.add(Dropout(dropout)) model.add(Conv2D(64, (3, 3), activation='elu')) model.add(Dropout(dropout)) model.add(Flatten()) model.add(Dense(100, activation='elu')) model.add(Dense(50, activation='elu')) model.add(Dense(10, activation='elu')) model.add(Dense(1)) optimizer = Adam(lr=0.001) model.compile(optimizer=optimizer, loss='mse') return model
The whole code implementing all described processes can be found here.
Testing and final result
The final step was to run the Udacity simulator in autonomous mode which used my created CNN model. It was a lot of fun and, it was very exciting to see how well the car is driving around track just on its own. There were a few spots where the vehicle frequently fell off the track, especially during the turn after crossing the bridge. It sometimes ended in spectacular crashes or even in vehicle sinks 🙂 To improve the driving behavior in these cases, I tried to tune the model for better generalization, as discussed above. Sometimes I also added new samples covering just turns to emphasize these difficult situations in the dataset. But when it started working and the vehicle was able to drive autonomously around the track without leaving the road, my satisfaction was really big!
Have a look at the final result!
On the upper left corner a predicted steering angle is displayed.