Augmented Reality using OpenCV Python



First we will import the required packages. We will use Opencv and Numpy.

import cv2
import numpy as np

Import files

Next step is to import all the media. In this case we will 3 different images.

  1. Webcam Image: In which we will find the target.
  2. Video Image: Image from the video that we will overlay once target found.
  3. Target Image: The image that we will try to locate in the webcam image.
cap = cv2.VideoCapture(0)
myVid = cv2.VideoCapture("video.mp4")
imgTarget = cv2.imread("TargetImage.jpg")

Grabbing the first frame

To get the image from our video we will use the read method. This will allow us to get the first frame. This is required so that we can resize it to the target image since the we will be overlaying it on the target image in the webcam.

success, imgVideo =
hT, wT, cT = imgTarget.shape
imgVideo = cv2.resize(imgVideo, (wT, hT))

ORB Detector

Now we will initialize our detector . There are many types of detectors available. Some are free and some require license for commercial use. These detectors basically detect features in an image and put them in their own words known as descriptors. You can check out more detail about features in the opencv documentation. The most common ones include ORB, SIFT and SURF. We will be using the ORB detector since it is free to use .

Now we will create the ORB Detector with 1000 features. If you would like to know more about the features and detectors you can check out the tutorial that was done earlier on this topic.

orb = cv2.ORB_create(nfeatures=1000)

Once the detector is intialized we are going to find the Key points and the Descriptors for the Target Image.

kp1, des1 = orb.detectAndCompute(imgTarget, None)
feature detector opencv
Detected Features

The While Loop

A similar process is applied to find the the key points and descriptors for the Latest webcam image.

while True:
    success, imgWebcam =
    imgAug = imgWebcam.copy()
    kp2, des2 = orb.detectAndCompute(imgWebcam, None)

Find Matches

Once we have the descriptors for both images we can use them to match. We will use the Knn Brute Force Matcher. Once we have all the matches we will filter them to find the good ones.

    bf = cv2.BFMatcher()
    matches = bf.knnMatch(des1, des2, k=2)
    good = []
    for m, n in matches:
        if m.distance < 0.75 * n.distance:

If we print out the length of good matches when the target image is present we get 51 matches. Whereas if we print it when the target image is not present we get only 13 matches. There we can clearly see a difference when the target is present or not.

To display the good features we can use the drawMatches function.

    imgFeatures = cv2.drawMatches(imgTarget, kp1, imgWebcam, kp2, good, None, flags=2)
Feature Matching Opencv
Feature matching

Video Tutorial – Part 1


Given we have a few points in our train image and the location of same points in the query image we can find the relationship between them. Using this relationship we can find the points in the second image given we have the point of the first image. This relationship is basically a matrix, and the process of finding it is know as Homography. So to find the bounding box we will use the keypoints that we have in our query image (video frame) and the key points of the Target image. This will give us the transformation matrix that we can use to determine the bounding box.

But before we do that we have to make sure that we have enough good matches to begin with. Therefore we will add the condition of the list of good matches to be at-least 15 to proceed further. This value of 15 can be varied depending upon different cases.

        if len(good)>15:
            src_pts = np.float32([ kp1[m.queryIdx].pt for m in good]).reshape(-1,1,2)
            dst_pts = np.float32([ kp2[m.trainIdx].pt for m in good]).reshape(-1,1,2)
            matrix, mask = cv2.findHomography(src_pts, dst_pts, cv2.RANSAC,5.0)

Finding the Bounding Box

So now that we have our transformation matrix we can define the points we are looking for. In this case we want the edges so we define the corner points. Using perspectiveTransform function we can find the destination points by inputing the corner points and the matrix as our arguments.

Then we can simply draw these points using polylines.

            pts = np.float32([ [0,0],[0,hT],[wT,hT],[wT,0] ]).reshape(-1,1,2)
            dst = cv2.perspectiveTransform(pts,matrix)
            img2 = cv2.polylines(imgWebcam,[np.int32(dst)],True,(255,0,255),3, cv2.LINE_AA) # draw polylines
Homography opencv
Bounding box

Augmented Reality with Image

Since we already have the transofrmation matrix we will use this to adjust our video image so that it is in the correct spot and ready to be augmented. We will use the warpPerspective function for this.

            imgWarp = cv2.warpPerspective(imgVideo, matrix, (img2.shape[1], img2.shape[0]))
augmented reality opencv python

Video Tutorial – Part 2

Creating the Mask

Inorder to add our images we can simply use the addWeighted function. But this function blends the images rather than overlaying one on top of the other. So the final result of this is no very realistic.

The other method is to remove the area in the webcam image where we want to overlay our target image and then add them together. This can be done using masking. So we already have one image ready which is the warped image that we created in the previous part. Now we will create the other image that we will add it with.

            maskNew = np.zeros((imgAug.shape[0],imgAug.shape[1]),np.uint8) # create blank image of Augmentation size
            cv2.fillPoly(maskNew,[np.int32(dst)],(255,255,255)) # fill the detected area with white pixels to get mask
            maskInv = cv2.bitwise_not(maskNew) # get inverse mask
            imgAug = cv2.bitwise_and(imgAug, imgAug, mask=maskInv) # make augmentation area black in final image

So first we are creating a mask based on the location of the target found. Now we can use the inverse method to find its negative. If we add the mask inverse and the webcam image we would get the anew image where all the webcam image information is shown except where the image is suppose to be augmented. So the black area can be thought of an empty space where we can add our image.

So once we have the imgAug which is our new masked image and the imgWarp, we can simply add them up using the bit wise or operator.

  imgAug = cv2.bitwise_or(imgWarp, imgAug) # add final image with warped image
augmented reality opencv python
Final Augmented Image

Stacking Images

We are dealing with a lot of images here so we can use the stacking function to put them all in one window.

Stacking Function

def stackImages(imgArray,scale,lables=[]):
    sizeW= imgArray[0][0].shape[1]
    sizeH = imgArray[0][0].shape[0]
    rows = len(imgArray)
    cols = len(imgArray[0])
    rowsAvailable = isinstance(imgArray[0], list)
    if rowsAvailable:
        for x in range ( 0, rows):
            for y in range(0, cols):
                imgArray[x][y] = cv2.resize(imgArray[x][y], (sizeW,sizeH), None, scale, scale)
                if len(imgArray[x][y].shape) == 2: imgArray[x][y]= cv2.cvtColor( imgArray[x][y], cv2.COLOR_GRAY2BGR)
        imageBlank = np.zeros((sizeH, sizeW, 3), np.uint8)
        hor = [imageBlank]*rows
        hor_con = [imageBlank]*rows
        for x in range(0, rows):
            hor[x] = np.hstack(imgArray[x])
            hor_con[x] = np.concatenate(imgArray[x])
        ver = np.vstack(hor)
        ver_con = np.concatenate(hor)
        for x in range(0, rows):
            imgArray[x] = cv2.resize(imgArray[x], (sizeW, sizeH), None, scale, scale)
            if len(imgArray[x].shape) == 2: imgArray[x] = cv2.cvtColor(imgArray[x], cv2.COLOR_GRAY2BGR)
        hor= np.hstack(imgArray)
        hor_con= np.concatenate(imgArray)
        ver = hor
    if len(lables) != 0:
        eachImgWidth= int(ver.shape[1] / cols)
        eachImgHeight = int(ver.shape[0] / rows)
        for d in range(0, rows):
            for c in range (0,cols):
    return ver

Calling the stacking function

Now we can call the stacking function with the input argument as the Array of images and the scale.

StackedImages = stackImages(([imgWebcam,imgTarget,imgVideo,imgFeatures],
augmented reality opencv python
Stacked Images

Augmented Reality with Video

So far we have used a single frame from our video to display on the Target Image. But the real fun part is to have an augmented reality video using opencv python. Now we will run a video on it instead. The process is quiet simple. During the while loop if we have detected the target we will keep fetching the next frame form the video. If the target is lost we will reset the frame count. We will also look at if the complete video is played so that we can start it again. This way it will keep looping.

First we will introduce a flag by the name ‘detection’. When ever the Target is found we will make it True and whenever its lost we will make it false. This way we can keep track. We will also add a frame counter to keep track of the number of frames we have already displayed.

detection = False
frameCounter =0

Now in the while loop we will first add the condition when the target is lost and we have to reset the frames and the counter. We can use the ‘set’ method to set he frame back to 0.

    if detection==False:

Now we will write the code for the True condition when the Target is found. Here we will first check if the number of frame already played is equal to the total number of frames. If yes we will reset the frame and the frame count as before so that it can repeat.

        if frameCounter==myVid.get(cv2.CAP_PROP_FRAME_COUNT):

Now we just grab the frame form the video and resize it as we did in the beginning of the code.

       success, imgVideo =
       imgVideo = cv2.resize(imgVideo, (wT, hT))

Finally we will increment our frame counter in the loop.

frameCounter += 1
augmented reality opencv python


At last we can add the frames per second and the Target found information on the final image.

timer = cv2.getTickCount()

#### CODE ####

fps = cv2.getTickFrequency() / (cv2.getTickCount() - timer)
cv2.putText(StackedImages,'FPS: {} '.format(int(fps)), (25, 40), cv2.FONT_HERSHEY_SIMPLEX, 1, (230,20,20), 3);
cv2.putText(StackedImages,'Target Found: {} '.format(detection), (25, 80), cv2.FONT_HERSHEY_SIMPLEX, 1, (230,20,20), 3);

Video Tutorial – Part 3

In conclusion the augmented reality with opencv is a simple and easy way to get started with augmented reality. The techinques we have used with video ca also be used with 3d models as well. But that we will keep for another post.

Complete Code

You can access the full code through enrolling on this free course.



Leave a Reply

Your email address will not be published. Required fields are marked *