Today we are going to work together in a project to find lanes in images and videos using python. For the project, we will take a manual approach. Even though it’s true we can get better results using technologies such as Deep Learning, it is also important that we learn the concepts, how it works, the basics, so that when we build our advanced models we can apply the knowledge we already learned. Some steps we are presenting might also be required when using Deep Learning.
The steps we are going to take are the following:
- Compute the camera calibration and resolve distortions.
- Apply a perspective transform to rectify the binary image (“birds-eye view”).
- Use color transforms, gradients, etc., to create a thresholded binary image.
- Detect lane pixels and fit to find the lane boundary.
- Determine the curvature of the lane and vehicle position with respect to the center.
- Warp the detected lane boundaries back onto the original image.
- Output visual display of the lane boundaries and numerical estimation of lane curvature and vehicle position.
All the code and explanation can be found in our Github.
Compute the camera calibration
Today’s cheap pinhole cameras introduce a lot of distortion to images. Two major distortions are radial distortion and tangential distortion.
Due to radial distortion, straight lines will appear curved. Its effect is more as we move away from the center of the image. For example, one image is shown below, where two edges of a chessboard are marked with red lines. But you can see that border is not a straight line and doesn’t match with the red line. All the expected straight lines are bulged out. Visit Distortion (optics) for more details.
To solve this problem we will use the OpenCV python library, and using sample images taken by the target camera to a chessboard. Why a chessboard? In a chessboard image, we can easily measure the distortion as we know how the object looks, and we can calculate the distance from the source points to the target points and use them to calculate the distortion coefficients we can then use to fix the image.
The next image shows an example of an output image from the came and the undistorted resulting image:
(All this magic happens in the file lib/camera.py
), but how does it work? The process consists of 3 steps:
Sample an image:
In this step, we identify the corners that define the chessboard grid, in case we cannot find the board, or that the board is incomplete we discard the sample image.
# first we convert the image to grayscale
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# Find the chessboard corners
ret, corners = cv2.findChessboardCorners(gray, (9, 6), None)
Calibrate:
In this step we find the camera intrinsic and extrinsic parameters from several views of a calibration pattern, which we can then use to produce the resulting image.
img_size = (self._valid_images[0].shape[1], self._valid_images[0].shape[0])
ret, self._mtx, self._dist, t, t2 = cv2.calibrateCamera(self._obj_points, self._img_points, img_size, None, None)
Undistort:
In this final step we actually produce the resulting image by compensating for lens distortion based on the parameters detected during the calibration step.
cv2.undistort(img, self._mtx, self._dist, None, self._mtx)
Apply a perspective transform to rectify the binary image (“birds-eye view”).
The next step in the process is to change the perspective of the image, from the regular camera view mounted on the front of the car to a top view, also called “birds-eye view”. Here is how it looks like:
(All this magic happens in the file lib/image_processor.py
)
This transformation is very simple, we take four points on the screen that we know and we translate those into the desire positions of the screen. Let’s review it more in detail using the example of the image above.In the picture we see a green shape which was drawn on top, this rectangle is using the four source points as the corners and it’s overlapping what it would be for the camera a regular straight road. The rectangle cuts around the center of the image which because of the perspective is where the street view would normally end to give place to the sky.Now we take those points and we move them to our desire position on the screen, which is transforming the green area in a rectangle, going from 0 to the height of the picture, here are the source and destination points we will use on our code:
height, width, color = img.shape
src = np.float32([
[210, height],
[1110, height],
[580, 460],
[700, 460]
])
dst = np.float32([
[210, height],
[1110, height],
[210, 0],
[1110, 0]
])
Once the points are identified it’s as simple as using OpenCV to do its magic once more:
src, dst = self._calc_warp_points(img)
if self._M is None:
self._M = cv2.getPerspectiveTransform(src, dst)
self._M_inv = cv2.getPerspectiveTransform(dst, src)
return cv2.warpPerspective(img, self._M, (width, height), flags=cv2.INTER_LINEAR)
Use color transforms, gradients, etc., to create a thresholded binary image.
Now that we have the image in place, we need to start discarding all the irrelevant information from it and keep only the lines. For this we will apply a series of changes which we will detail next:
Convert to grey scale
Convert the color image to greyscale
return cv2.cvtColor(img, cv2.COLOR_RGB2GRAY)
Enhance the image
Do some minor but important enhancements by smoothing the image with Gaussian Blur and weighting the original image to the smoothed image together
dst = cv2.GaussianBlur(img, (0, 0), 3)
out = cv2.addWeighted(img, 1.5, dst, -0.5, 0)
Threshold on the horizontal gradient using Sobel.
Calculate the derivative of the color change function on the X axis and apply a threshold to filter high intensity color changes, which as we are using a greyscale, would be borders.
sobel = cv2.Sobel(img, cv2.CV_64F, True, False)
abs_sobel = np.absolute(sobel)
scaled_sobel = np.uint8(255 * abs_sobel / np.max(abs_sobel))
return (scaled_sobel >= 20) & (scaled_sobel <= 220)
Gradient direction threshold so that only edges closer to vertical are detected, using Sobel.
Now we calculate the directional derivatives over new thresholds
# Calculate the x and y gradients
sobel_x = cv2.Sobel(img, cv2.CV_64F, 1, 0, ksize=sobel_kernel)
sobel_y = cv2.Sobel(img, cv2.CV_64F, 0, 1, ksize=sobel_kernel)
# Take the absolute value of the x and y gradients
gradient_direction = np.arctan2(np.absolute(sobel_y), np.absolute(sobel_x))
gradient_direction = np.absolute(gradient_direction)
return (gradient_direction >= np.pi/6) & (gradient_direction <= np.pi*5/6)
Next, we combine them into one gradient
# combine the gradient and direction thresholds.
gradient_condition = ((sx_condition == 1) & (dir_condition == 1))
Color threshold
This filter applies to the original image, where we try to get only those pixels which are yellowish/white (as road lines are)
r_channel = img[:, :, 0]
g_channel = img[:, :, 1]
return (r_channel > thresh) & (g_channel > thresh)
HSL threshold on L layer and S layer
For this task it’s necessary to change color spaces, in particular, we will use the HSL color space as it has interesting characteristics over the images we use.
def _hls_condition(self, img, channel, thresh=(220, 255)):
channels = {
"h": 0,
"l": 1,
"s": 2
}
hls = cv2.cvtColor(img, cv2.COLOR_RGB2HLS)
hls = hls[:, :, channels[channel]]
return (hls > thresh[0]) & (hls <= thresh[1])
And finally we combine all of it into a final image:
grey = self._to_greyscale(img)
grey = self._enhance(grey)
# apply gradient threshold on the horizontal gradient
sx_condition = self._sobel_gradient_condition(grey, 'x', 20, 220)
# apply gradient direction threshold so that only edges closer to vertical are detected.
dir_condition = self._directional_condition(grey, thresh=(np.pi/6, np.pi*5/6))
# combine the gradient and direction thresholds.
gradient_condition = ((sx_condition == 1) & (dir_condition == 1))
# and color threshold
color_condition = self._color_condition(img, thresh=200)
# now let's take the HSL threshold
l_hls_condition = self._hls_condition(img, channel='l', thresh=(120, 255))
s_hls_condition = self._hls_condition(img, channel='s', thresh=(100, 255))
combined_condition = (l_hls_condition | color_condition) & (s_hls_condition | gradient_condition)
result = np.zeros_like(color_condition)
result[combined_condition] = 1
Our new image now looks something as follows:
Example 1 | Example 2 |
---|---|
Awesome! can you already see the lines forming there?
Detect lane pixels and fit to find the lane boundary.
Up until now, we were able to create an image that consists of an eye-bird view which contains only the lane characteristics (at least for the most part, we still have some noise). With this new image now we can start doing some calculations to transform the image into actual values we can use, like lane position, and curvature.
Let’s work on identifying the pixels on the image first and building a polynomial that represents the lane function. How are we planning on doing so? Turns out that there is a very clever method using a histogram of the bottom half of the image, here it’s an example of what the histogram would look like:
The peaks on the image help us identify the left and right side of the lane. Here is how building the histogram looks on code:
# Take a histogram of the bottom half of the image
histogram = np.sum(binary_warped[binary_warped.shape[0] // 2:, :], axis=0)
# Find the peak of the left and right halves of the histogram
# These will be the starting point for the left and right lines
midpoint = np.int(histogram.shape[0] // 2)
left_x_base = np.argmax(histogram[:midpoint])
right_x_base = np.argmax(histogram[midpoint:]) + midpoint
But you may ask, why the bottom half only? well… the answer is that we want to focus only on the segments which are immediately next to the car, as the lane may take a curve which can affect our histogram. Once we find the position of the lanes closer to the car we can use a moving window approach to find the rest as we detail in the next picture:
Here is what it looks like on code:
# Choose the number of sliding windows
num_windows = 9
# Set the width of the windows +/- margin
margin = 50
# Set minimum number of pixels found to recenter window
min_pix = 100
# Set height of windows - based on num_windows above and image shape
window_height = np.int(binary_warped.shape[0] // num_windows)
# Current positions to be updated later for each window in nwindows
left_x_current = left_x_base
right_x_current = right_x_base
# Create empty lists to receive left and right lane pixel indices
left_lane_inds = []
right_lane_inds = []
# Step through the windows one by one
for window in range(num_windows):
# Identify window boundaries in x and y (and right and left)
win_y_low = binary_warped.shape[0] - (window + 1) * window_height
win_y_high = binary_warped.shape[0] - window * window_height
win_x_left_low = left_x_current - margin
win_x_left_high = left_x_current + margin
win_x_right_low = right_x_current - margin
win_x_right_high = right_x_current + margin
if self._debug:
# Draw the windows on the visualization image
cv2.rectangle(out_img, (win_x_left_low, win_y_low),
(win_x_left_high, win_y_high), (0, 255, 0), 2)
cv2.rectangle(out_img, (win_x_right_low, win_y_low),
(win_x_right_high, win_y_high), (0, 255, 0), 2)
# Identify the nonzero pixels in x and y within the window #
good_left_inds = ((nonzero_y >= win_y_low) & (nonzero_y < win_y_high) &
(nonzero_x >= win_x_left_low) & (nonzero_x < win_x_left_high)).nonzero()[0]
good_right_inds = ((nonzero_y >= win_y_low) & (nonzero_y < win_y_high) &
(nonzero_x >= win_x_right_low) & (nonzero_x < win_x_right_high)).nonzero()[0]
# Append these indices to the lists
left_lane_inds.append(good_left_inds)
right_lane_inds.append(good_right_inds)
# If you found > min_pix pixels, recenter next window on their mean position
if len(good_left_inds) > min_pix:
left_x_current = np.int(np.mean(nonzero_x[good_left_inds]))
if len(good_right_inds) > min_pix:
right_x_current = np.int(np.mean(nonzero_x[good_right_inds]))
# Concatenate the arrays of indices (previously was a list of lists of pixels)
try:
left_lane_inds = np.concatenate(left_lane_inds)
right_lane_inds = np.concatenate(right_lane_inds)
except ValueError:
# Avoids an error if the above is not implemented fully
pass
This process is very intensive, so when processing video there are a few things we can adjust, as we do not always need to start from zero, calculations made previously give us a window of where the lanes can be next, so it’s easier to find. All that is implemented in the final code on the repository, feel free to take a look.
Once we have all the windows, we can now just build the polynomial using all the identified points, each line (left and right) would be calculated independently as follows:
left_fit = np.polyfit(left_y, left_x, 2)
right_fit = np.polyfit(right_y, right_x, 2)
The number 2 represents a second order polynomial.
Determine the curvature of the lane and vehicle position with respect to the center.
Now we know where the lines are on the image, and we know the position of the car (at the center of the camera) we can do some interesting calculations to determine the curvature of the lane and the position of the car respect to the center of the lane.
Curvature of the lane
The curvature of the lane is a simple calculation over the polynomial.
fit_cr = np.polyfit(self.all_y * self._ym_per_pix, self.all_x * self._xm_per_pix, 2)
plot_y = np.linspace(0, 720 - 1, 720)
y_eval = np.max(plot_y)
curve = ((1 + (2 * fit_cr[0] * y_eval * self._ym_per_pix + fit_cr[1]) **2)** 1.5) / np.absolute(2 * fit_cr[0])
There is an important consideration though, for this step we can’t work in pixels, we need to find a way to convert pixels to meters, so we introduce 2 variables: _ym_per_pix and _xm_per_pix which are pre-defined values, we won’t go into much details about it, you can take this values are presented, if you want to find more, there are procedures to identifying this values using algorithms and camera information.
self._xm_per_pix = 3.7 / 1280
self._ym_per_pix = 30 / 720
Vehicle position respect to the center
Very simple, calculate the position of the middle of the lane, and compare it to the center of the image, like this
lane_center = (self.left_lane.best_fit[-1] + self.right_lane.best_fit[-1]) / 2
car_center = img.shape[1] / 2
dx = (car_center - lane_center) * self._xm_per_pix
All Done!
Now you have all the information you need, and the polynomial to represent the lane. Your final result should look like the following:
And sample video
Or maybe not exactly… Remember we wrapped the image to the eye-bird view? well you need to revert the effect to render the polynomial into the original image, but I leave that for your homework, or just check it out on my code.
Remember, as mentioned, all the code is available on Github.
Thanks!
Top comments (0)