Advanced Computer Vision with Python - Full Course

# Creating an AI-Based Mouse Controller Using Python and OpenCV

## Introduction

In this comprehensive guide, we will walk through the process of creating an AI-based mouse controller using Python and OpenCV. This innovative project leverages hand tracking technology to control your computer’s mouse cursor simply by moving your hand in front of a webcam. Additionally, you can perform clicks by bringing your index and middle fingers together.

This article is designed for individuals with a basic understanding of Python programming and computer vision concepts. By the end of this guide, you will have a functional AI-based mouse controller that you can customize to suit your needs.

---

## Prerequisites

Before diving into the project, ensure you have the following:

1. **Python installed** on your system.

2. **OpenCV (cv2)** installed for image processing.

3. **MediaPipe** installed for hand detection and tracking.

4. **AutoPi** installed for mouse control.

You can install these packages using pip:

```bash

pip install opencv-python mediapipe autopi

```

---

## Setting Up the Project

### Step 1: Create a New Python File

Open your preferred IDE (e.g., PyCharm, VS Code) and create a new Python file. We’ll name it `ai_mouse_controller.py`.

### Step 2: Import Necessary Libraries

Add the following imports at the top of your file:

```python

import cv2

import numpy as np

from mediapipe import solutions as mp

from mediapipe.framework import landmark as mp_landmark

import time

import autopy

```

---

## Creating the Hand Tracking Module

### Step 3: Define a Class for Hand Detection and Tracking

Create a class `HandTracker` with methods to detect hands, find landmarks, check which fingers are up, and calculate distances between points.

```python

class HandTracker:

def __init__(self):

self.mpHands = mp.solutions.hands.Hands()

self.detector = self.mpHands

self.lm_list = []

self.bbox_info = []

def find_hands(self, img, draw=True):

imgRGB = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

self.detector.process(imgRGB)

return img

def get_landmarks(self, img):

landmarks = self.detector.get_landmarks(img)[-1]

if landmarks:

self.lm_list = [(lm.x * 1080, lm.y * 720) for lm in landmarks.landmark]

self.bbox_info = [landmarks.boundingBox]

return self.lm_list

def fingers_up(self):

up_fingers = []

if len(self.lm_list) > 0:

# Index finger

tip_x, tip_y = self.lm_list[8][0], self.lm_list[8][1]

base_x, base_y = self.lm_list[5][0], self.lm_list[5][1]

if (tip_y - base_y) < 30:

up_fingers.append(1)

else:

up_fingers.append(0)

# Middle finger

tip_x2, tip_y2 = self.lm_list[12][0], self.lm_list[12][1]

base_x2, base_y2 = self.lm_list[9][0], self.lm_list[9][1]

if (tip_y2 - base_y2) < 30:

up_fingers.append(1)

else:

up_fingers.append(0)

return up_fingers

def find_distance(self, p1, p2):

x1, y1 = self.lm_list[p1][0], self.lm_list[p1][1]

x2, y2 = self.lm_list[p2][0], self.lm_list[p2][1]

distance = np.hypot(x2 - x1, y2 - y1)

return distance

```

---

## Implementing the Mouse Controller

### Step 4: Initialize Variables and the Webcam

In your main function, initialize variables for mouse control and set up the webcam:

```python

def main():

tracker = HandTracker()

cap = cv2.VideoCapture(0)

width, height = autopy.screen.size()

p_x, p_y = 0, 0

current_x, current_y = 0, 0

smoothning_value = 5

while True:

success, img = cap.read()

img = tracker.find_hands(img)

lm_list = tracker.get_landmarks(img)

if lm_list:

x1, y1 = lm_list[8][0], lm_list[8][1]

x2, y2 = lm_list[12][0], lm_list[12][1]

# Smoothening the values

current_x = p_x + (x1 - p_x) / smoothning_value

current_y = p_y + (y1 - p_y) / smoothning_value

# Moving the mouse

autopy.mouse.move(current_x, current_y)

# Clicking mechanism

up_fingers = tracker.fingers_up()

if up_fingers == [1, 1]:

distance = tracker.find_distance(8, 12)

if distance < 40:

autopy.mouse.click()

# Drawing the circle for better visualization

cv2.circle(img, (int(x1), int(y1)), 15, (100, 100, 255), cv2.FILLED)

# Displaying FPS and bounding box

cv2.putText(img, str(int(cap.get(cv2.CAP_PROP_FPS))), (10, 50),

cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 0, 0), 2)

cv2.rectangle(img, (100, 100), (width - 100, height - 100), (0, 255, 0), 2)

p_x, p_y = current_x, current_y

cv2.imshow("AI Mouse Controller", img)

if cv2.waitKey(1) & 0xFF == ord('q'):

break

cap.release()

cv2.destroyAllWindows()

if __name__ == "__main__":

main()

```

---

## Explanation of the Code

### Hand Tracking Module (`HandTracker` Class)

- **Initialization**: The class initializes MediaPipe’s hand detection model.

- **Finding Hands**: Converts the image to RGB format and processes it using MediaPipe to detect hands.

- **Getting Landmarks**: Extracts the coordinates of the hand landmarks and stores them in `lm_list`.

- **Checking Fingers Up**: Determines which fingers are up by comparing tip and base points of each finger.

- **Finding Distance**: Calculates the distance between two specified landmarks.

### Main Function

- **Webcam Setup**: Captures video from the webcam and initializes variables for mouse control.

- **Mouse Movement**: Uses the index finger’s position to move the mouse cursor, with smoothening applied to reduce jitter.

- **Click Mechanism**: Detects when both the index and middle fingers are brought close together to simulate a click.

- **Visualization**: Draws a circle on the index finger for better visualization and displays FPS and bounding box.

---

## Conclusion

This project demonstrates how powerful computer vision libraries like OpenCV and MediaPipe can be used to create innovative applications. By following this guide, you’ve created an AI-based mouse controller that allows you to control your computer using hand gestures. Feel free to experiment with the parameters (e.g., `smoothning_value`, click distance threshold) to optimize the system for your environment.

Remember to share your experience and modifications in the comments below!

"WEBVTTKind: captionsLanguage: enhey everyone welcome to the advanced computer vision course in this course we are going to learn advanced techniques to better our skills of computer vision now you might think that the term advance might not be for you now this does not mean that the topics are very difficult it just means that they are on an advanced level and we will do our best to learn it as simple as possible we will break it down into basic code and then we will create modules out of this so that we can use them in different projects once we are done and we will be learning four different chapters including hand tracking pose estimation face detection and face mesh and not just that we will also be creating five different projects so that we can learn some real world applications for example volume gesture control ai trainer ai mouse control and a lot more if you would like to see more of such content do check out my channel murza's workshop where we create projects related to computer vision artificial intelligence and robotics so without further ado let's get started hey everyone welcome to my channel in this video we will learn hand tracking in real time we will first write the bare minimum code to run and then learn how to convert it into a module so we don't have to write it again and again for different projects the best part is we do not have to configure 100 parameters along with 20 different installs to make it run within 10 to 15 minutes you will have your model working the framework we will be using today is called the media pipe which is developed by google they created these amazing models that allow us to quickly get started with some of the very fundamental ai problems such as face detection facial landmarks hand tracking object detection and quite a bit more so we will be covering the rest of these as well so make sure to subscribe to keep updated now the model we are working with today is the hand tracking it uses two main modules at the back end so one of them is the palm detection and the other one is hand landmarks now the palm detection basically works on complete image and it basically provides a cropped image of the hand from there the hand landmark module finds 21 different landmarks on this cropped image of the hand to train this hand landmark they manually annotated 30 000 images of different hands so that is a lot of work and this is one of the reasons it works so well and the best part is that it is cross platform and we don't have to dive deep into the sea of configurations and installations so within just two clicks we will be up and running so let's have a look at the implementation so right now i am in pycharm and we are going to first create a new project so you can see that i have created this hand tracking project and we will go to file settings and we will go to our project then the interpreter and we will add so here we are going to add our packages so we will write here opencv python we will install that and then we will write media pipe and we will install that so these are the only two packages that we will be needing so within two clicks we are ready to start coding so that is amazing okay so now we will create a new file we will call it let's say hand tracking tracking minimum so the bare minimum code that is required to run this so the first thing we will do we will write here import cv2 and then we will import media pipe as mp and then we will import time so this is to check the frame rate so first we are going to create our video object so we will write here cv2 dot video capture and i'm going to use my webcam number one you can use your webcam number zero so then we will write file true and then we have success success and we have our image is equals to cap dot read so that will give us our frame we will write cb2 dot weight key 1 and we will write cv2 dot i am show i am show and we will write here image and image and we will write img so this is basically what we always do to run a webcam and what we can do as well is right here that if more doing it or we can skip it it's fine we don't need to write that we have to close with the q button so here we can right click and we can press the run button and let's see so there you go this is my webcam you can see my hand there you go and we are going to detect this hand so the first thing we have to do is we have to create an object from our class hands so here we are going to write now this is related to the hand detection modules or the hand detection model so later on we will create our own module so that we can learn how to use it easily in different projects so getting the values of these different points or the landmarks is a little bit tricky but we will create a module so that we can just say i want uh the point number five of the hand so tell me the location so that will become quite easy to use in different projects so first of all we are going to write here mp hands is equals to mp.solution now this is you can say a formality that you have to do before you can start using this model so you will write mp.solutions.hands and then we are going to write that we are going to create an object called hands we will write mp hands dot hands and then inside that we have to write our parameters now what are these parameters so let's go and check them out so we will click on we will press the control button and we will right click on this and it takes us to that function so here we can check what exactly are we getting uh what exactly do we have to input so here the first thing is the static image mode so static image mode they have this configuration where they will track and detect so if you put this as false then sometimes it will detect and sometimes it will track based on the confidence level but if you put it as static mode then the whole time it will do the detection part which will make it quite slow so we will keep it false so that it detects and if it has a good tracking uh confidence it will keep tracking so this way it will be much faster whenever the tracking confidence goes lower than a certain range then it will do the uh detection again so then you have the maximum number of hands so here we have two and then we have the minimum detection confidence so this is 50 and then minimum tracking confidence which is 50 so it means if it goes below 50 percent it will do the detection again okay so now that we know our parameters we can go back and we can write here false so actually we are not going to write anything because these are the default parameters and they have already given the default values so we do not have to change or write anything here if we want to we can otherwise we can skip it as well so for this instance we are going to skip and later on we are going to write whatever we need so then we are going to go actually we will need to go back okay so then here in the loop we are going to send in our rgb image to this object so here we have to first convert it so we will write here image rgb is equals to cv2 dot cvt color and then we will write our image y is a double bracket okay we will write our image then we will write cv2 dot color underscore bgr to rgb so this is our idea that we want to convert it into rgb because this class or this object only uses rgb images so we need to convert that first so we will write here that our results results is equals to hands so we are calling this object dot process so there is a method inside this object called process that will process the frame for us and it will give us the results so that's how simple this is now all we need to know is how to extract this information and use it so after this what we can do is we can simply display this but at this point we are not really displaying or doing anything but i still want to run it to see if everything is working so far so it will be processing it but it will not display anything for us so let's try it out so there you go and now you can see even though it's processing the frame rate has not decreased it's uh it seems real time we will later on check the exact uh speed as well the exact frame rate so don't worry about that okay so then we are going to open this object up the the one that we have received and we are going to extract the information within so as we have seen the parameters we can have multiple hands so what we can do is we can extract these multiple hands so we will have to put in a for loop to check if we have multiple hands or not and we have to extract them one by one now before we do that we have to make sure that there is something in the results so we can print out the results and we can run it and it just gives us that it is a media python solution based solutions output and if i bring in my hand nothing really changes so we need to know when something is detected or not so to check if something is detected or not we can write here dot multiple uh multi underscore hand underscore and underscore landmark landmarks so let's run this and see what happens so here it says none and if i put my hand and there you go so straight away we are getting some values so what we will do is we will say that if we can remove the print or let's keep the print we can copy this part and we can go down and we can write here if this is true then we are going to go in and for each hand so we can say for each hand landmark landmarks let's say in results dot multi whatever we wrote here multi-hand landmarks so you saw that we were getting some results so is it of one hand or two hands we don't know well we actually know because i just put one hand but it could be of multiple hands so here we will have each hand and then we will get the information or extract the information of each hand so once we do that we have a method provided by the media pipe that actually helps us draw all these points because there are a lot of them and we you have almost how many were there 21 points and between each points if you want to draw a line it will be quite a lot of maths that would be involved there so they provided us with the function or a method for that so we are going to write that down and that is basically mp draw we will call it mp draw is equals to mp.solutions so lucian's dot drawing utilities so we will write that and now we will use mp draw to actually draw it so we will write here empty draw and then we are going to write draw landmarks and inside that we will give in our image that we want to draw on so we don't want to draw on the rgb image because we are not displaying the rgb image we are displaying the original image bgr so we will write image and then we are going to write hand lms so this is a single hand okay so there could be multiple hands this is let's say hand number zero then there could be hand number one so this is that single hand so if i run this now that should draw the hand for us let's try it out and there you go so now you can see it is drawing the hand for us and it looks pretty good so but these are points and i told you that we could draw the connections as well so how can we do that we can do that by writing here mp hands dots and underscore connections so that is it so we are using mp hands dot hand connections and this will draw the connections for us so let's try that out and there you go so now you can see how easily we got our uh what he called hand position and we got all the 21 landmarks if you like the video so far give it a thumbs up and don't forget to subscribe so this is good but the problem is we don't still know how to use these values so where are these values how can i extract and use them so for example if i want to track one of these positions to perform a certain task what exactly can i do so that is still remaining and we will learn how to do that but before we go there i want to do the frame rates so we are going to write the fps so to do that we are going to write here that our previous time is equals to zero and our current time is equals to time is equals to zero okay so once we have done that we will go down here and before we display we are going to write here current time is equals to time dot time and this will give us the current time and then our fps will be one divided by our current time minus the previous time previous time okay so then our previous time will become the current time our previous time will be the current time so yeah that seems good and what else can we do can we yeah i think we should display it on the screen so that we can see it rather than putting it on the console so we can write here cv2.cv2.put text and we want to put it on our image we want to convert it into a string because it is time so we are using what do we call fps fps and we also have to round it because or should we if we round it it will give us decimal values we don't want decimal values for fps we can just put integer so that will give us that and then we can give it a value the position let's say 10 and 70 and then we can give in our font cb2 dot font whatever comes first and then we write then we write the scale and then we write the color so let's put purple or let's put blue whatever let's put purple and then we have the uh i think i missed something i missed a comma here okay and then we need to put i think the scale or the thickness the thickness let's put as 2 or let's put a 3. okay so that seems good and what else i think that should be fine let's run it so here we have it so now we can see that the time is around 30 30 fps the frame rate it goes to 20 sometimes but most of the times it's 30. you can see it's quite fast very responsive thumbs up oh thumbs up makes it go away thumbs up yeah this time it worked thumbs up great let me try my other hand as well so that seems fine and it is working quite good so we can move on so now we are going to get the information within this hand so for each of these hands so we will get the id number and we will also get the landmark information so the landmark information will give us the x and y coordinates and we also have their id numbers and they are already listed in the correct order so all we have to do is we have to check their index number and that's it so what we can do is we can write here for id and the landmark we are going to find it or we are going to enumerate and then we are going to find it inside the hand lms dot landmark so this is basically our landmark uh this is basically our landmark that we are getting from here and this is the id number or the index number that we are getting which will relate to the exact index number of our finger landmarks so if it is zero it will be the bottom middle one uh then if it's four it will be a tip and things like that so what we can do is we can print here and we can write id and landmark so we can see at least what is happening so let's run that and there you go so let's see what did we get so if we go up here you can see that this is id number 20 19 18. so if we keep going back we keep going back we will start from zero so each id has a corresponding landmark and the landmark has x y and set so we are going to use the x and y coordinates to find the information or to find the location for the landmark on the hand but the thing is if you see here these values are decimal places so the location should be in pixels so it should be for example 500 pixels in the width and 200 pixels in the height something like that but here you can see these are picks these are decimal places so basically what they are giving is they are giving a ratio of the image so we will multiply it with the width and the height and then we will get the pixel value so this is how we can get it directly so here what we are going to do we are going to first check out the heights the width and the channels of our image which will be which will be image dot shape so we can write this and this will give us the width and height and then what we can do is we can find the position so we can write here cx and cy is our position of the center and basically it will be an integer because it is decimal places so we have to convert it into integer so we will multiply our landmark dot x value multiplied by the width and for the second one it will be integer and then landmark dot y value multiplied by the height so this will give us the cx and the cy position so now we can print this out but the thing is that it is not for a specific one it is for all of them so if we print it now let's remove this and we will print we will print cx and cy so if we run this now it will give us for all 21 values so how do we know which one is for which which one is for landmark one which one is for landmark two so we need to write the id of that as well so we can write it like this so there you go so now we have this information so if we look here this is the this is the id number and this is the cx and the cy position so what we can do is we can use any of these to actually uh use it to our benefit to actually print out any of these landmarks so i can write here if id is equals to zero this means we are talking about the first landmark then we are going to let's say draw the circle so we will write here cv2 dot circle we are drawing it on the previous one and we will color it a different way and we will make it a little bit bigger so it is easier for us to know that this is the one that we are uh printing so it shouldn't be an issue so we can write here that our radius is let's say five and then our color will be different it will be purple and then we have cv2 dot filled so once we have that now it will only draw for what you call the id number one so if i run this now and there you go so you can see here at the bottom you get okay let me make it bigger it's very small so let's make it 25 there you go so now you can see clearly that we are detecting that landmark which is 0. so if i remember correctly 4 is also 4 is a tip of one of the fingers let's make it 15. 25 is too big there you go so it is the tip of the thumb so you can see now we are getting this information and what we can do is we can put all of this in a list and we can use it to print or we can use it to find the location and move around and do all sort of different things with this what we can do is we can also remove this and then it will draw on all of them but that's that's not useful because we are already drawing on all of them so here you can see looks quite weird anyways so that is the basic idea that this is how you get the cx and cy information which is basically for each one of these and we can put them in a list so that we can later on uh return this list and use it to our benefit if we want to track the index finger at the tip of it or the bottom part of it whatever we want to track we can do that so now that we are done with this we are going to create a module out of this so that next time if we are using it in a project we don't have to write all of this again we can simply ask for the list of these values of these 21 values of each hand for example we can say give us for the first hand give us for the second hand whatever and then we can simply say okay i need point number 10 and it will give us the value of point number 10 which is let's say at this point it is 4 four four and two one zero so that will make it very easy for us to uh create new projects so let's see how we can do that so now we will create a modules file so here we will call its hand tracking module so we will copy pretty much all of this code and we will paste it here first of all we will write here if name is equals to main this means that if we are running this script then do this so whatever we write in the main parts will be like a dummy code that will be used to showcase what can this module do so we will write here def main and we are going to put our while loop inside of this so while true and in fact all of this as well uh not that let's copy this part first so we'll put this here and then this part here also for the frames fps we will put it down here and what else what else do we need yeah the video capture we can put it here wait why did it show here okay i think i copied it or what yeah so we need to remove this okay and then what else what else i think that is fine for now so now what we have to do we have to create our class so i thought of doing it in functional programming but i think it will be better if we create a class so we are going to create a class here we will call it class and detector and inside that we will write def inits in itself and inside that we have to give in our parameters so these parameters are the basic parameters that are required for this hands so if you remember we went to the hand and we have all these parameters so these are the ones that we will be using to input that so so that we have the flexibility of changing these so here we have the mode so we will write here mode is equals to false then we are going to write the max number of hands so we can write here max hands is equals to 2. then we can write the detection the direction confidence is equals to 0.5 and the track confidence is equals to 0.5 so then we can remove all of this and now the first thing we have to do is we have to write self dot mode is equals to mode this means that we are going to create an object and the object will have its own variable so this is that variable whenever we are using the variable of the object we will call it self dot something and we are assigning it initially we are assigning it a value provided by the user so we are calling uh we are calling it mode and we are providing it the value of the mode so the same thing we have to do with the other parameters so we will write here max hands is equals to max hands self dot detection confidence is equals to detection confidence self dot tracking confidence track confidence is equals to track confidence and then all of these have to be inside this initialization as well so if you remember they are part of the initial code where we are initializing everything and then there is the while loop so we need to initialize these as well over here and again we will write here self dots so we will keep putting self dot everywhere and we also have to so why is this giving an error empty hands because we need to add the self here as well so we will write here self.mphands.hands so that should be good and inside that we have to give in our parameters so the parameters will be self mode then the max hands the confidence and the tracking confidence there you go so this should be fine so i think the initialization is done so now we can move on to the uh detection part so we can write here let's say we will call it find hands and inside that we have to just copy this part so do we need to convert we do need to convert and we need to put this as well so we will put all of this we will put it here inside and then we will go back up here and let's start from here so first of all we will need an image to find the hands-on so that will be this image and then hands is not being recognized because it has to be self-taught hands so we are talking about this object within this uh object so then we have mp.draw so this should be self.mp draw and then self dot hands connection so that should be good and should we draw it inside i i don't think we need to draw it here in fact we do not even need to get the landmarks from here what we can do is we can keep this outside and we can comment this so here this is what we need basically to draw the hands so we can put a flag here we can write here draw and we can put it by default as true and we can check if we want to draw or not so here we can write here if draw then do this okay so it will only draw if we ask it to draw so i think this is good enough to actually run the code or the run the class without uh actually getting the list so for testing this should be fine so what we can do here so we will create a new method within this class that will find the position for us it will give that list for us but for now we will just test to see everything is working so far or not so here we will first create our object we will call it detector is equals to hand detector and we will not give in any parameters because we know that we have these default parameters already there so once that is done we will get our image and once we get the image we are going to send this image we are going to write here detector dot now this is the method here find hands so this is the method within our class so we will write here find hands and we have to give in our image so that is the most important component so we might need to draw on it so we need to return the image if we have drawn on it so we will return the image so then we can go back and take the image over here so image is equals to this so if we run this now as a module it should work so let's see if we did any mistake uh yeah it's working oh yeah that's good so now our module is running is the main reason for creating this module is to get those position values of the landmarks very easily so we need to create that find position function or the method so we will write here find position and we are going to give in the parameters of our image now we don't really need the parameter of the image but we need it for the width and the height so if you remember here we need the shape so we can do it in other ways but this is simple so we will try it now later we can improve on it then we need the hand number so if you are detecting with if you want the information of hand number one hand number two and number three whichever hand you want you can ask the information of that and then we will have the parameter for draw so again we will put it as true by default so now we can uncomment this and we can bring it back okay so that should work now here the issue is that we were using a for loop to actually run this but now we need to first check again we will go back here and we will create a list here lm this is the landmark list that we are going to return so this list will have all the landmark positions so we can return this whether it is filled or not we will return it so we will return this and then we are going to check again that whether any landmarks were detected or not or any hands were detected or not so to do this we use basically this part here so if the self results multi-hand landmarks if that is available then we are going to check the next things so here we will write this and we will put all of this inside that so but here we are getting the results this is the results it should be self.results self.results and now i can use the results here as well okay so now we need to replace this here as well okay so if it's not self then i cannot use it in this method to use it in all the methods you have to make sure this is for this object this variable okay so now we have to write down that which hand are we talking about because earlier we were getting it for all the hands so if you want you can get it for all hands it's up to you but i'm creating this method to get the uh to get for one particular hand if you want you can change that too so here we are going to write so earlier we had four hands this this this now we will get this and we will point to we will point to a particular number and that number will be the hand number so we will say that our let's say our hand let's say we'll call it my hand is equals to this and we will put this over here so it will get the first elements the first hand and then within that hand it will get all the landmarks and it will put them in a list so here we are just printing them out so here we can write uh lm lists list dot append and we want to append the values of id cx and cy and we can remove the print because we're getting it anyways so yeah so that is i think that is good and here we have the option of draw as well so we can write here if draw then do this otherwise don't so by default it is true so it is going to draw so yeah let's see how that works out so we can return this list so i can call this so we can go down here and i can call find position and we can remove the self and we have the image and do we need anything else i don't think so okay copying this was a bad idea anyway so finding positions of image yeah and it will give us the list so we can copy this and we can paste here so now what i can do is by the way we have to write detector dot find position so now what i can do is i can print the value of my list at any index so if i want let's say uh the zero index then i will write here zero if i want the landmark number four i will write here four so if you remember the 4 was the tip of the thumb so this will give us that position so let's run this and see what happens index is out of range okay so that is uh understandable why because here we have to check if nothing is found which means lm list the length of it is zero so we will write here that the length of this list is 0 then we will if it's not equal to 0 then we will print so let's run it again and there you go so if i put my hands so now you can see it's drawing for all of them but it is showing me only landmark number four so if if you look at my thumb so if you look at my thumb if i'm going really to the okay it's not moving the values okay so now if i go till the very end you can see it is 600 something because the value is 640 the max value and now if i go around to the starting point it goes uh around 150 something so i'm talking about the x position by the way so the x position is changing like this then we have the y position here it is going towards zero and down here if we go down here it is going towards 400 something so this is a 640 by 480 image it works well with 1280 and 720 as well so it's still around 20 something frames per second so that is quite good okay so now this is working as a module and what we can do is we can use this in a different project now you might say how can we do that well here is the dummy code so this dummy code we can use to actually uh run in a different project so i can copy this and i can create let's say my new game game hand tracking blah blah blah okay that's a very long name um so i can i can paste this here uh the complete code and i can remove the indentation and then i can import i can import these so then i need to check what is missing so here now i need to imports i will import hand tracking module and tracking module as h t m so now i will write htm dot hand detector and the rest will remain the same so that is pretty much it so if i run this now it should run exactly the same so there you have it and there you go so now it's giving me the values of the index uh not the index the tip of the thumb which is uh landmark number four and it is showing me all the landmarks uh as we have drawn so if i if i go back here if you like the video so far give it a thumbs up and don't forget to subscribe to my module i can change the color or i can change the size of these as well again all these parameters you can change you can add to your methods if you wish and it could become easier for you it depends on your project so if you have a lot of different um things that you want to accomplish within one project then you can add more methods to this to compensate for that so here let's say we will make it a little bit smaller so 15 is too big we will make it seven let's say and let's change the color so it's bgr so let's make it blue and there you go so now we have changed it and if we go back actually i preferred the previous one even the big aspect so um here what we can do is if you don't want to show it we can write here false and it will not display oh it's displaying wait why is it displaying so why is it displaying let's go back here and if draw then only we do this okay maybe it's drawing here as well am i running this yeah i'm running the correct file but when i write false here oh there are more arguments that's the problem there are more arguments so we cannot just write here we have to write draw is equals to false my bad straw is equals to false and there you go the drawing is gone that the custom drawing that we did now if you want to remove this drawing as well you can write here draw is equals to false and there you go so now you will see that you are getting the value of the thumb but nothing has been drawn so this way you can customize it to your needs hey everyone welcome to my channel in this video we will learn pose estimation we will detect 33 different landmarks within a human body and all of this will be done in real time that's right more than 24 frames per second and only running cpu we will first look into the basic code required and then we will create a module out of this so that we don't have to write the code again and again and yes we will be creating a lot of projects with this so don't forget to subscribe and hit that like button so here we are in our python project and you can see that we have named it pose estimation project and we have a folder here with post videos so if i open this up you can see that we have a lot of different types of videos we have a total of nine videos so let me play a few so here you can see this one is a little bit smaller and some of these videos are actually slow motion so when we are running it it will look like it's slow but it's not actually slow it's actually slow motion so like this video and then i think what else i think this video as well it's slow motion and i think this one is fine this one is normal this one is slo-mo as well so i took these videos i think from pixels.com so we are going to test these videos out and see how well does our pose estimation work so we will select all of this or let's let's create a new file let's delete this one so let's delete this and now we are going to create a new file and we will write here pose as the mason uh minimum so this is the bare minimum code that is required to run it and later on we are going to see how we can create a module out of this so that we don't have to write the code again and again for a lot of different projects so let's start by importing cv2 and then importing media pipe as mp but now you can see that we get an error this is because we did not include these packages in our project so how can we include those so we will go to file settings and we will go to our project interpreter and open that up and here we will write opencv dash python and we will hit install and then we will write media pipe and we will hit install so now both the packages have been installed and we can go back and you can see the error is now gone so opencv is the library that we will be using for image processing and media pipe is the framework that will allow us to get our pose estimation so now the first thing we will do we will read our video so we will write here cap is equals to cv2 dot video capture and we will simply give in our video number so here we will write pose videos videos and we will write video number one dot mp4 i think the one is quite big let's let's try it out and later we can change it if we want and then we will write while true and we will write success and image is equals to cap dot read so that will give its uh our image and then we can write here cv2 dot i am show and we will write here image image and then we will write here image and then we will write cv2 dot weight key and we will write one so that we get a one millisecond delay okay so let's run this and see if it works and there we have it so our video went quite fast so the frame rate is actually quite high at this point what we can do is we can check the frame rate by writing here time let's say current time is equals to uh we need to import time as well so we will write here import time and then we will write time dot time then we will write that our fps is equals to 1 divided by our current time minus our previous time and our previous time is equals to current time and we need to define the previous time up here at the top so we'll write here previous time is equals to zero and then we can simply put our text so we will write here put cv2 dot put text and we will write in our image and then we will write in the text itself so we will write string fps but we will convert it into an integer before we do that then after that we have our origin so let's say 70 and 50 and let's say the cv2 dot font plane and let's say uh three and then for the color we can put two five five zero and two five and zero and then we can put three so this should give us our frame rate let's try that out uh wait what happened okay this should be below over here my bad so let's run it again and there we have it so you can see it's a hundred something frames per second so that's quite a lot if you want to reduce it we can put here let's say 10 so now it's like 50 60 frames per second but when we are using our model it will automatically slow it down so we don't need to worry about that so the next step would be to create our model so our objects so that we can detect our pose so here we are going to write mp pose pose is equals to mp.solutions so solutions dot pose so we are going to use this and then we are going to create our object we are going to say pose is equals to mp pose mp pose dot pose and then we will give in our parameters so if we go to the parameters you can see that we have the static image mode this is basically that when you are detecting and when you are tracking so if you put it as true then it will always detect based on the model it will always try to find the new detections but when you put it as false it will try to detect and when the confidence is high it will keep tracking so there will be a tracking confidence and then there will be a detection confidence so whenever it detects if the confidence is more than 0.5 it will say okay now we have detected now i will go to tracking now the tracking will check if the tracking confidence is more than 0.5 it will keep tracking whenever it goes below 0.5 then it will come back to detection so this way we do not use the heavy model again and again for detection instead we use detection then tracking then whenever it's lost we use the detection again so this is what this does and then we have the upper body only so you can decide if you want to detect only the upper part so it will have you can see here we have 33 poses landmarks or only 25 so it's up to you which one do you want to use and here we also have a feature to smooth which is by default true so we will keep it as true so we can define all of these parameters or we can skip them it's up to us so for the initial purposes we are going to skip all of these for the simplicity so then we are going to go down here and simply we are going to convert our image so we will say image rgb is equals to cv2 dot cvt color so this image is in bgr but this library or this framework uses rgb so to make it compatible we have to convert it so we will write here image and then we will write cv2 dot cv2 dot what color underscore bgr to rgb so this is our conversion and once we have done the conversion we are simply going to send this image to our model so we will write here results is equals to pose dot process and we will write image rgb so that's pretty much it so that will give us our detection of our pose but it will not draw anything but for now we can just run it to see if everything is working and now you can see that the frame rate has decreased so this is good to see that our model is actually working but you can see that it's almost uh real time so that is really good so now what we can do is we can draw our landmarks or whatever we have detected we can draw it but before we do that we can print the results as well so we can write here results and let's see what do we get so here you can see we get nothing so it's just a class but we don't see any information so how do we see the information we simply write results dot pose landmarks land marks so if we run that now you will see that we are getting actual landmarks so each landmark will have the x y and z value and then it will also have a visibility value so how visible is it so this is the information that we get so we can put all of this in a list later on so that it's easier to access so then we will check if it is detected or not so we will say that if this is present if this is true then we are going to write here that mp draw dot draw landmarks draw underscore landmarks and then we are going to define our parameters but here you can see we don't have anything called mpdraw so we are going to declare that we will write here mpdraw draw is equals to mp.solutions solutions dot draw utilities so we will use that and then we can write here uh within our landmarks we can send in our image then we can write results dot uh pose landmark so this is the same thing that we are printing out that we got so we will write here land marks and then what do we have uh i think that is good let's do that let's run it and there you go so now you can see we are getting all the points you can see that it is in red now what we can do is we can also add the connections or the lines between these so here we can write mp dot not mp pose dot pose connections so that will fill up the connections there you go so now you can see we have the green lines which are the connections and then we have the points as well and we are detecting them all at real time so that is very very amazing so now we know that we are getting this information but how can we know which is for which so here it just says landmark this landmark this there is no indication of which landmark represents what so what we need to do is we need to organize a little bit so that it is in a list and we can simply use these values um in our project so for example i will say i want landmark number five i want landmark number three so if we go to the media pipe website you can see here that these are the landmarks that they have given so for example if i want the right ear so i can just say i want the element number five element number eight so give me that element if i want the nose i can say give me the element number zero so based on this it will become very easy for us to actually uh create our new projects so gesture recognition and lots of different applications will become very very easy so how can we extract the information within this object so what we can do is we can write here for id and landmark in enumerate we will write this result so we are going to loop through this and we want the count as well that's why we have written enumerate so it will give us the loop count over here so 0 1 2 3 and so on then we are going to get the shape actually this will be we have to add here landmarks landmark and then we are going to write here height width and channel is equals to img dot shape and the reason we need this shape is because uh let me actually show you why and you will understand so what we can do is we can write here print and we can write lm dot x or let's just print lm so you can see what is happening and we can print the id number as well so you know what exactly are we extracting here so let's run this so there you go so now you can see we have the id number and this is the information of the landmark so 30 is this 21 is this and it goes all the way to wait why is this showing i think i didn't remove the other print anyways so you can see here from 0 to 32 we will have all these 33 landmarks so let's go down and let's just remove the previous print first okay so now we know that we have the id and we have the landmark but you can see the landmarks they are actually in decimal places so this is basically a ratio of the image so what we can do is to get the actual pixel value we can say that lm.x this will give us this x value multiplied by the width of the image so that will give us the uh x of our what do you call the point or the landmark so it will give us the exact pixel value the same thing we can do for lm dots y and we can multiply it with the height so we can do that and then what we can do is we can put them in our cx and cy variables so this way it will be easy for us to use the indentation is wrong that is why it's giving an error okay so also we need to convert this into integer because we have to make sure it's not a float or a double because we are talking about pixel values so then what we can do is to confirm that this is happening and we are getting the correct values we can simply print the circle uh on top of this point so we can write here cv2 dot circle and in the circle we will write image and then we will write cx and cy and then we can write let's say the value of 15 15 will be too big let's say 10 then 2 5 5 0 2 5 5 this is purple let's make it blue and then we will write cv2 dot filled so this will overlay on the previous points if we are detecting it properly so let's try it out so there you go so now you can see we have those blue dots and these are the ones that we put ourselves so this means that we are getting the correct information at the correct pixel values and it is working good we can reduce this to five there you go much better actually let's try another video we have been using the same video we have like nine videos so let's try a random one number five yeah that looks good let's try number six yep that is good number three there you go okay here we had an issue but now it's fine that's good so we can see that we are getting good results and that should be good enough for us so next what we can do is we can convert this into a module so that we can use these values very easily so the first thing we will do we will go here and right click and we will create our module let's call it pose module and we will copy all of the bare minimum code and we will paste it here so that looks good now the first thing to make it a module we have to write here if underscore underscore name is equals to underscore underscore main then we are going to write main so what this does is that if we are running this by itself then it will run the main function and if we are just calling another function it will not run this part so this is what it means so we will write here main and within the main we are going to write everything or we will write the dummy code in the main so whenever you want to see what a module is capable of or the testing script you can put it in the main function so here we will we will take this till here we will cancel it from here we will cut it and we will paste it in the main then we will take all of this as well we will cut it and paste it in the main again so that should be good and what else so now we are going to create a class so in that class what we should be able to do is we should be able to create objects and we should be able to have methods that will allow us to detect the pose and find all these points for us okay so here we are going to say that our class let's say is pose detector and inside that we will have our first method which is the init this is the initialization so we will write here in itself and then we are going to write in the parameters that are required so whatever parameters we are expecting we will write here so here we are going to write mode so if i go to where is it okay let's go to pause estimation and here we are going to go to pose so i will copy all these parameters and we will just paste them here and we will write so the first one is mode so we will keep it as false so that we get fast detections uh fast detection plus tracking and then we will write here upper body so upper body is equals to false and then we have the smoothness smooth is equals to true and then we have the the detection confidence is equals to 0.5 and then we have the tracking confidence is equals to 0.5 so these will be our initial parameters and then we can remove this and then we can write here that self self dot mode is equals to mode now if you are not familiar with object oriented programming then this basically means that uh whenever we create a new object it will have its own variables so this whenever you write self dot something it is the variable of that object so whenever we are using um a variable within our class we will uh write self dot something so self dot objects sorry self dot mode dot self.upbody so we are going to write that so here we are saying that the specific variable of our class or our object is basically the one that the user has provided so it will set this instance of that object to false so this is what it does so here we will copy all of these and we will write up body is equals to up body and then we will write smooth is equals to smooth and then we will write detection confidence is equals to detection confidence and we will write track confidence is equals to track confidence so as i mentioned before we just have to put self dot in front of each one of these and that should be good to go so next we also have to declare these so again this will be part of that object so we need to write self so here we will write self.draw self.impose and self.pose and we will write here self dot impose so this is good and what we can do is earlier we were not using any of these parameters but now we have to so here we are going to send in all the parameters so we will write yourself that mode so self.body self.smooth self thought detection confidence and self thought tracking confidence so that is pretty much it okay we need to put a comma here yeah so our initialization is pretty much done and now what we can do is we can uh create a method to find the pose so we can write here find pose and we do have to write self here so whenever we are creating a new method we have to write cell first and then we have to give in our image and then we will also have a flag called draw and we will put it as true so this basically what it will do is it will ask the user do you want to draw or not do you want to display it on the image or not so we want to we want to display so we are going to put this inside here and then uh this draw if results draw this this this yeah this will be separate this for loop will be separate and this results will be here so let's push this in and we can remove this and what else so now we need to write the self dot so here we will write self dot pose process and self dot mp draw then we can also write self.mppose so anything missing uh no it seems fine now what we need to do is we need to put that flag so we will write here if draw then we basically do this so that should be good so let's try it out or should we put it inside let's put it inside that so we will say that if landmarks are present or let's put the draw inside so if the landmarks are detected and we set draw then you need to draw so that is good so by now we should have a working class so we can create an object from it and then we can run it so let's try that out so here we will write our detector detector is equals to our pose detector and we will give in the default parameters and here what is happening here this should be inside the while loop so here we are going to write detector dots find what was it find or detect find pause find pause why is it not detecting oh the indentation is wrong so now it will detect okay so detector dots find pose and then we have to give in our image so that should be it and we need to return the image so we will write here return image and we can bring in the image back here so if we run this it should draw on our video so let's run the pose module and there you go so now it is running and it looks good let's try it on the first image there you go so this is quite good and what we can do next is we can do the main part which is to find the points so we need to get those points here we are going to define the get position and we will write inside that we want an image and then again we want the draw flag so by default we are going to yeah let's put it as true and then we can uncomment this okay so now what we will do is we will first of all here we are getting an error for result because we need to write self dot results here and here and here and here okay so that should be good and what else so we need to push this in in the for loop so now we are we are just looping we need to check first as well if the results are available so if the results are available then we will use this for loop and then we need to put it in a list so we are going to write a list lm list which is for our landmarks and we will append our values so here it depends on you what kind of values do you want to append so i'm going to append only the x and y values and the id if you want to append the z and the uh what was the other one the visibility you can do that too so here i will write lmlist dot append and i will write in the id so here i will write id and then i will write the cx and the cy and i will add the option here that if draw then do this so this looks pretty good so let's go down and try it out so where is it here so we can write here that our lm list is equals to detector dot find position is it find position or get position let's keep it find position so it's similar to the previous one find position find position and then we will give in our image and let's keep it as true for drawing so we know it is detecting and we can print out the list as well so we can print out lm list and there you go so now you can see we are wait why is it printing this that is not right uh okay this is printing this parts any other prints uh no so let's write again yeah so right now it's none wait why is it none the lm list is none lm list oh we didn't return it forgot to return return lm list okay so now we can run it and there you go so now we have the list and if we go till the end you will see we have a total of 32 points so we have 33 points because we are starting from zero so now i can easily say for example i want number 0 i want number 8 i want number 5 number 23 whatever i want i can write down here so let's say we want number 8 or let's say we want number 14. let's try that number 14. and there we have it so this is the number 14 and if i wanted to i could draw this number 14 so i can i can put this as draw false and then here i can write uh lm14 at one and two so this will be one and this will be two and i can make this really big so i can see what exactly i'm am i tracking and i can change the color of it so that it is a little bit different than all of these so i can put it as red and there you go so now you can see we are tracking this elbow so this is how easy it can be uh by the way whenever it finishes the video it will give you an error you can loop through that there's a function for that as well but we're not going to do that so this is the basic idea of how you can convert this into a module and now you can use this in any project that you want and you can easily find the position so how exactly can you do that so let's create a new file and we will call this our awesome pose projects so we will copy the parts from the main so as i said this is the testing code so we will copy all of that and we will paste it in our awesome project and now we have to import so we will write here import cv2 and then we have to import time and then we have to import our module so our module needs to be in the same folder if it's not in the same folder it will not work so you have to write import pose module as pm because it's a smaller name so we can write here pm dot pose detector so this will create the detector and everything else will run the same way so if we run this now our awesome projects and you can see it runs exactly the same way so now i can use this in many different applications so one thing we can do is we can test some other videos so let's say number two there you go again we are detecting the right elbow and you can see it's tracking really well that's pretty awesome even when it's a little bit hidden it's still tracking it that's really good and then let's try number three what happened to number three there is an error list index out of frame okay so yeah this is a problem because we are not checking uh if the list if actually the list is filled or not so we need to check if lm list is not equals to zero actually the length of it is not equal to zero if the length of this is not equals to zero then only we can do these two things so actually we should change that at the main module as well so where is it here so let's run the awesome project again and this time it works so probably the first frame it was not able to detect that's why it was giving this error so now you can see it's detecting that elbow let's try number four there you go detecting the right elbow very nicely let's try number five there we have it excellent the frame rate is amazing so you are getting real time number six pretty good and the best part is it's running on cpu it's not actually using gpu to run this so that is very good then we have number eights let's try that okay that looks good and then we have number nine so yeah that is pretty good so there are a few instances where it gets it wrong but overall you can see it's really good hey everyone welcome to my channel in this video we are going to learn the latest face and eye detection method that runs at an amazing 60 frames per second and all of this on an hd resolution this is a lightweight model provided by google so it runs on cpu and mobile devices as well so here we are in our pycharm project and the first thing we will do we'll go to file settings and we will go to our project and the interpreter and we will add so here we are going to add opencv dash python and we will also install media pipe which is the library that will help us detect faces okay so now both of them are installed and here on the side you can see that our project is called face detection project and then we have a folder called videos so let's open that up and see what it contains so here is the folder and we have a total of six videos that we are going to test on now you could do it on a webcam as well but i'm going to do it on a video because one it will be more clear and second with the webcam you will have a limit of the frame rate so with the videos you will have faster frame rates okay so we are going to right click and we will go to python file and here we are going to create our project so we will call this face detection basics so the idea is that we are going to look at the very basics the bare minimum code that is required to run it and once we understand that we are going to go ahead and create a module out of this so that we do not have to write it again and again for different projects so the first thing we will do is to import cv2 and then import media pipe as mp this is so that we don't have to write media pipe again and again we can just write mp and then we will import time this will be to display the frame rate so uh first of all we are going to run our video so we will write here cap is equals to cv2 dot video capture and we are going to capture from within our video folder so we will write here video slash one dot mp4 so this is the video that we will be using and then we will write here while true we are going to write success and image is equals to uh cap dot read so that will give us our frame and then we are going to write cv2 dot weight key as one and we will also write cv2 dot i am show we will write image and then image so this is pretty much you can see the boilerplate of our projects that we have been doing so we can simply right click and run this and we have an issue cb2 uh is it oh it's called videos it's not video okay so there you go so this is the video that we will be using and as you can see it is going pretty fast because it is running at a higher frame rate so you could reduce the frame rate from here for example if i put 10 you will see that the frame rate is uh it's much closer to real time but we are not going to do that instead we are going to display the actual value so to do that we will simply write here c time is equals to time dot time and then we will write fbs is equals to one divided by our current time minus the previous time and then we are going to write that our previous time is equals to the current time and then we can display it so we will write cv2 dot put text we'll put it on our image we are going to write fps oh i wrote it wrong f and then f p s and then we will write a curly bracket and inside that we are going to write integer and then fbs and what else then we need to define the position so let's say 20 and 70. and then we are going to write down the font so cb2 dot font let's take the plain one then we have the scale let's put it as three color let's put it as blue or green let's put it as blue uh two five five two five five zero and two no zero okay then we will write let's say the thickness and yeah that's pretty much it one more thing we have to do we have to declare the p time for the initial frame because it is not defined before and we are using it here so we need to define before that and all of this needs to be before the image show so we are going to put it down here and we can simply run this and see what happens there you go so you can see the frame rate is extremely high 140s 160s something like that um blue might not be the best color so let's put a screen let's try that yep that's much better okay so now that we have our frame rate by the way uh as i was mentioning before you can reduce the frame rate from here let's say i put 20 so now you will see that yeah it's much closer to real time it's around 30 something 40. by the way this video is a slo-mo so that's why it looks a little bit slower because the video itself is a slow motion video so if i open it up and i play this you can see this is actually a slow motion video okay um we could use the second one as well yeah this one looks real time or it's hard to tell okay um we could try it with the second one here let's see how that works out yeah it's it's not real time either okay so uh anyways so that's the idea so we will keep it as one because we are going to process it and hopefully we are going to get a good frame rate so you can see it's quite fast so what is the next step the next step is to import our media pipe functions or classes so that we can use them so here what we will do is we will use the face detection module so we will write here mp face detection detection is equals to mp dot solutions dot face detection detection yeah okay so then we need something to draw now we can draw ourselves as well but we can import the drawing part here as well let's say mp draw is equals to mp.solutions solutions and then draw drawing utilities okay so basically this will allow us to draw without going into the details but if we want to draw ourselves we can do that too because the rectangle drawing rectangle should not be that hard but we are getting a few points as well we will be getting six points so for the eyes and the nose and the ears so we could we could simply draw them by using this okay then we are going to write face detection is equals to mp face mp face detection dot face detection so this basically initializes that we are going to use a face detection again my spellings are wrong to fix that okay so then we can simply go and this is an bgr image so we need to convert it into rgb before we do that before we send it to the media pipe library so we will write here image rgb is equals to cv2 dot cvt color and we will give in our image and then we will write cv2 dots color underscore bgr to rgb to rgb okay so then we will write here that our results is equals to face detection dot process and we want to process our rgb image so this is the idea whatever the the output is it will be stored in the results and then we can process these results so we can print this out we can write your results and let's try it and see what happens there you go so now all of a sudden you have seen that the frame rate has decreased so if i don't run this and i don't print this you will see that the frame rate is very high but as soon as i run it the frame rate reduces to around 60 70 i can even see 90s so yeah so it means it's working at the back end and it is giving us a class so they have their own class by media pi python solutions so what we need to do is we need to know how to extract the information out of this so we are going to do that next so here we are going to write so it can be multiple faces actually let's go to a video let's go to the first video again because it has multiple faces so we will check if the results dot detections detections is available then we are going to write for id and then detection detection in results dot detections the text actions okay so we are going to find in that but it should be enumerate because we want the id number as well so we will write this and then we can loop through each one of the results and we can display them so we can write here print detection so whatever we have it is being processed here and what we can do is we can display it with the id so we can write id and then it will show us the detection so let's run that okay there is some issue here where is it results start again the spellings the detections okay so if i close this and if i go up here you can see these are the key points so here it is saying that this is detection number one this is id number one and they have given it a label of zero this is the score this is 91 percent sure that it is a phase then there's location data within it which means the bounding box and then there are key points so one two three four five and six key points so you can use these key points as well but we are more interested in the face itself so here you can see this is zero this is face number zero and this is face number one so the label id is same because both of them are faces and here we have face number one and here we have face number zero so now we need to know how to extract this information so we already know that this is a class so what we can do is we can for example if we wanted the score we will write detection dot score so if i write here detection dot score that will give me the value of the score so i can print that out what happened there so if i print that out you will see i'm getting the detection by itself so separately i'm getting the value so i can get that then how can i get the for example the x minimum so this is the x y position that we want so to do that you can see that we have first of all location data inside location data we have relative bounding box so we need to write that so we will write here print we will say location data first default detection detection dot location data and then we will write dot relative bounding box so that should give us the values so let's comment these and let's run this and there you go so now we simply have the information of our bounding box okay so that is good now what else can we do now the thing is that these are uh you can say normalized values so they are normalized between 0 and 1. so what we need to do is we need to multiply them with the width and the height to get the pixel values so we can draw but before we actually go into the drawing we can also draw by the function that they have provided so that function is mp draw dot draw landmarks or draw detection and then we have image and then we have the detection so yeah that should draw the faces for us there you go so now you can see we are getting the points and we are getting the faces uh the bounding box around the faces now the thing is for me the the points are not very accurate i don't see them being useful you can see sometimes maybe you will get a better result maybe let's say video number two let's try that yeah here it's a little bit better you can see the eyes and the nose and let's try number three yeah here for the baby you can see it's quite good so again if you want to use the points then you should make sure that the video is quite zoomed in the focus is on the face so if you want that you can use the points for the eyes and for the nose and for the ears so the same way we got the information of the x and y and the score the same way you would get the information of the landmarks but i'm not going to go into the landmarks i will just keep on the bounding box we will focus on the bounding box and we will also focus on the score so these three things are what we are interested in if you want more you can add on to it so now that we know how to draw this uh what we need is we need to draw ourselves we need to get the bounding box information so that if we want to use it um just the numerical values we should have access to those so what we will do is instead of using all of this so if i wanted if i wanted the value of x minimum i would have to go inside this as well and i will write x minimum here now that is a very long call for just one value so we need to shorten it so that it is easier to work with so what we are going to do is we are going to store all this information in a bounding box and from that we are going to extract these information we are going to multiply so what do i mean by that let's have a look so we will say that our bounding box coming from the class is equals to detection all of this detection dot local location data relative bounding box all of that so that is the the bounding box coming from the class that's why we wrote c in front of it but then we are going to convert it in our own language so that we can work with pixel values rather than just the normalized values so we will call that the simple boundary box without the c and what we need to do is we need to write here that our bounding box c dot x minimum multiplied by our width now but we don't have the width so what we will do is we will write up here that our image height then our image width and our image channels is equals to image dot shape and then i can use here i width image width by the way this i here is not a big deal because defining variables is about scope where is it being used if it is being used in a lot of different places it has a very wide scope maybe it's being used out of the function out of the loop then yes then you need to define long names that are clear like here location data relative bounding box and what not but if you are using it in a small scope for example when you use for loops you write for i in range right so why did you write i you wrote i because you are not going to use that i outside this for loop the scope is very limited so if you are writing i w i h or even wh it's not a big deal so then we are going to convert this into integer and then we are going to copy this and here we are going to write that we need the y minimum and we are going to multiply this with the height so that's the idea and then let's go to the next line so we'll put a backslash and then we will copy this and here we are going to write width and we will keep it iw and here we will write height so now our bounding box will have x y and width and height so we can simply use to draw this uh using a rectangle function or we can use these values outside uh in our project so that should be good so what we can do is we can write here cv2 dot rectangle and we will put the image and we will say the bounding box we can simply feed in the value because it contains all the four values it will understand and then we will let's make it purple zero two five five and let's make the thickness of two so that should be enough let's remove this so note that we are not drawing by default so we are not using the default function to draw we are drawing by ourselves so let's run that and there you go so now we are getting a clean box a bounding box that is quite nice there you go you saw here yeah you can see here we are getting some false detections as well so what you can do is you can you can change the uh what you call there there is a parameter here minimum detection confidence so you can change this so you can write here it's by default at 0.5 you can write let's say 0.75 hopefully that will remove that there you go so now you don't see the false positives okay so that is good now what else can we do so this is quite good and the only thing we might want to see is the confidence value so the confidence value or the score as i mentioned you can get by detection.score and we can put it let's copy this we can put it on the bounding box and here we can write for example that we have let's keep it integer we have detection dot score and we are going to get the first element of it uh if you don't get the first element it it will just put a bracket around it so it has only one value but it will put a bracket around it so we are just removing that bracket and we can multiply it by hundred so that it is a percentage and we can write outside that this is a percentage and we don't need to write anything at the beginning and the location should be bounding box at zero bb ox at zero and then bounding box i think that is way too much okay let's bring that down too so bounding box bounding box at one but we are going to subtract -20 from it so that um it is easier to see it is not overlapping with the box okay so let's run that and see if it works there you go so we are getting 90 something percent of uh score the confidence level so i think it should be the same color so it looks blended in so we will put it here two five five what else i think the size was big too so we can change it to two yeah that looks much better so let's try out the first video there you go so that's pretty good we are able to detects with the score and with the bounding box value so now this is good but the problem is that you can see that there is quite a bit of code that you have to write to find the face and to get the values that you can use so what we can do is instead of initializing and writing all of this code again and again for different projects what we will do is we will create a module out of this so that we just import that module and use it whenever we want okay so how can we do that so we are going to go to python file and we will call this face detection module and here we are going to copy all of our basic code now the first thing for the module we need is our if statement so if we have the name underscore name is equals to main so this means that if we are running this file then we need to run main so whatever is in main is like the dummy code it is kind of a hint that this is how you can use this module so we are going to declare here that this is main and inside main we are going to write our while statements so so yeah let's copy this part we will paste it in the main then we have the while so we will just read the frame and then we have the ending part we will put the frame rate as well over here so yeah that should work fine then we are going to okay so then we are going to create our class so we are going to call this face the tech tour and we are going to define our initial function uh initial method in it and then inside that we are going to write um okay let's let's leave it for now or let's write it that's all right minimum detection confidence the the tech shun confidence okay that should be fine and we will give it a default value of 0.5 now the thing is that we need to put everything which is related to initializations over here inside so we will take this and we will put this in and we will remove some spaces first of all we need to write here self dot minimum detection confidence is equals to minimum detection confidence so in in objects or in the classes all we need to do is we need to write here self dot and here we will write self dot self dot and self dot so it is not a generic variable or a global variable it is now an instant variable so whatever we are defining is for this class or for this object so that's why we are writing self okay so that is good now we need to define another function or a method called find faces and we need an image do we need anything else we can put a flag of draw so draw is equals to true by default so we are going to put all of this in and let's see what do we need to change first of all we need to write the self dot so here we will write self dot face detection and then we will also write self dot results self thought results in case we want to use it later it's not really required at this point but anyways so we will comment this we can remove all of this and what else okay so now we need to return something so we are finding the faces that is all good but we need to return all these faces so what we will do is we will return the bounding box information the id number and also the score if you want to return the landmarks you can return that to here so what we will do is we will create a new variable a new list and we will call it bounding boxes plural so we will make it empty and all of this is being done for one phase so it is showing and it is processing face by face so what we need to do is whenever it processes we need to put that in our bounding boxes so here when we have our bounding box value then we can send in whatever we want so in this case we are going to send bounding boxes dot append and we are going to append uh first of all the bounding box and then we are going to append the uh detection dot score so if you want you can append the id as well but normally it is the same uh what you call it will have the same list number the index of the list so it is a bit redundant but anyways if you want you can add this too so this is the idea then we simply return these uh bounding boxes so we return bounding boxes and we also returned the image that has all these detections on it so that is the idea so by now we should have it running let's try it out so we are going to create an object from our class we will call it detector and we are going to write that our face detector we are not going to give in any value so by default it's 0.5 we will keep it 0.5 so once we get the frame we are going to write here image is equals to detector dot find faces and we will give in our image and we will keep it the draw as default so that is good now if we run this it should pretty much run the same way as it did before and there's an error so inside of main okay so here we are facing an issue with the bracket probably what is it image this is fine i think it seems fine to me let's go here and copy this yeah it's exactly the same so it should work uh the error is not here it's probably because it's not finding the image okay let's remove this part and try it yeah it is about the image it's not able to get the video why are you not able to get the video um are you not able to get videos1.mp4 okay let's remove this as well okay i get it my bad there are two parameters that needs to be returned so we need to write here bounding boxes yeah that was the issue so let's run this and there you go now this is the module yes this is the module running okay so that is good now it is running and okay it's not detecting properly why did we change the we can we can change the value here so maybe let's say 0.4 but it was working before yeah here it's 0.75 that's the issue so we need to put this here minimum detection self dot minimum detection okay so that is good now we do not need to do 0.4 here it by default at 0.5 so it should work fine there you go so now it is detecting properly okay so that is good now it is working as a module and we can get the information by the way you can just print here print the bounding boxes and let's see what do we get there you go so now you're getting the information so this is face number zero this is the bounding box this is the confidence this face number one bounding box and the confidence so you can extract this information and use it very easily so by now we are pretty much done but i want to do something else now this is you can say kind of a fancy thing if you want to do this you can add it but it is of course not necessary so what i want to do is we are always showing these bounding boxes uh the same way we are just drawing a rectangle and it's very plain it doesn't look good this is a new method so i want to show it in a new way so what can we do so what i was thinking is that we can put some corner what do you call lines a little bit thicker so that will give it um that will give it an image of like a target so whenever you have a target you have these edges that are thicker and then you have the bounding box in between something like that so let's try to add that here so what i will do is i will create a new method here and i will call that um let's say fancy draw instead of regular draw we are going to do fancy draw so we are going to write here that we need our image and we need the bounding box so uh we will do one at a time so just send us one bounding box and we will put the bounding box around it then send the next one and then so on it will not take an array okay so then first of all we are going to open it up we are going to extract the information from it so we are going to write here x y width and height is equals to the bounding box and then we also need the x1 and the y1 which is basically equals to the diagonal point so x x and y is the first point or you can say the origin point and then x1 and y1 is the bottom right point so the corner points at the diagonal position so rather than calculating it again again we can just define it right now so we can write it as x plus width and y plus height so now we have all the information we need now we can start drawing the lines so here we will write cv2.line and inside the line we need to send in our image and then we need the starting point so i will say the starting point is x and y then we have the x plus a certain value so this is the length of that line so let's write here length is equals to let's say 30 we will write here length and then we are going to give in the value of y so y will stay the same so then we will give it the color the color will be the same as the what you call rectangle so we will write here two five five zero two five five and here we have to write the thickness so let's write here that this is the thickness is equals to let's say 10. so i will write here uh sorry t so uh it might be a little bit confusing at this point but let me show you how it works so let's remove this rectangle from here and i will put this here rectangle and then i will write here fancy self dot fancy draw and i will give in the image and the bounding box so that's the idea and it will return the image so we can write here return image return image okay so let's run this and see what happens so there you go so now you see there's a thick line on the corner now we need to draw the line at the bottom so let's draw that so we will copy this and we will paste it here now this time around the x will be without any extra value no length for the x we will add the length to the y so if we run it again now you see we have a corner point so we have a corner we have two lines at the corner so what we need to do now is we need to replicate this on the others i think the thickness for this should be one we can give it let's say rectangle thickness is equals to one and we can put here rt uh let's run that yeah this looks much better okay so and even the thickness for this is too much let's put it as seven let's say yeah it's getting better maybe five yeah that looks good so next what we can do is we can write here that this is for the top left which is basically for x and y okay this is the point x and y then we will copy this and we will paste it here and we are going to say that this is for the top right and now our position will be x 1 and y so we will just change this to x one and that should work fine x one and y yeah so one more thing is that if i run this now you will see the issue it is going outwards we need to bring it inwards so how do you bring it inwards you simply subtract the length instead of adding in the x direction so there you go so now it's inwards and it's done so we need to do the same for the next two so we will copy this and we will paste here so for the bottom left and then we have the bottom right so for the bottom left we will have x and y one so we will just replace this with y one y 1 everywhere and then this will remain positive but this will be negative again you can play with these and you can see if it gives you a problem then you can replace it if you don't understand it directly so this is the third one and then for the fourth one it will be x1 and y1 so we will write here x1 and y1 and the values will be both minus so both of them will be inwards there you go so now it looks much better than just a rectangle drawn around it so and yeah one more thing i forgot to add is the condition for drawing so if we want to draw or not so here i can write if draw then we do this otherwise no need so if i run this it will draw if i go back here and where is it here and if i put this as false it will give us the values but it is not going to draw oh what is that uh we didn't put that in the draw yeah this needs to be in the draw there you go there you go so now it's not drawing anything at all and if we go down and we write here nothing then it will run because by default it's true so as you can see now we have our face detection which is running almost at 60 plus frame rate which is pretty amazing given that we are only using a cpu and this is an hd video so it is 1280 by 720 if we are running even lower than this for example 640 by 480 the frame rate will be crazy high so you can keep that in mind let's try this one it's pretty good let's try how many do we have we have six videos let's try all of them this is number four it's pretty good and you can see here see that's the thing with our cascades that it does not detect on the sides so if you rotate your face it will not detect at all but this one does so that is pretty cool there you go oh this kid is really enthusiastic then number six there you go so even on the side but if it's gone too far maybe it will not detect but here you can see it's blurry and it's on the side and even then it is detecting uh quite a bit that there is a person there so that is pretty amazing hey everyone welcome to my channel in this video we are going to learn how to detect 468 different landmarks on faces we will use the model provided by google that runs in real time on cpu and on mobile devices so here we are in our pycharm project and we have created a new project called face mesh project now we have a special folder here by the name videos and if you go inside you will see that we have a couple of videos in fact more than a couple you can see here that we have a total of eight videos and each of these videos they have a different size so they are not actually the same width and height you can see this one is smaller so the frame rate might vary based on the size so let's try this one yeah this one is quite i think this one is hd or even full hd probably so yeah so these are some of the videos that we are going to test on so what we will do is we will right click and we will create a new python file and we are going to call it face mesh basics so what we will do is we will learn the basics and once we know the basics we are going to create a module out of it so that we don't have to write it again and again for different projects so the first thing will be to install our libraries we will go to file settings and the project interpreter and we will add the opencv python and we will also add media pipe which is the library provided by google that we will be using to detect all these different 400 plus points on the faces so one of them is already done the other one is also done great so now we are going to import cv2 and then we are going to import media pipe as mp and then we will also import time to see the frame rate so as always the first thing we will do we will run our video and we will use cap is equals to cv2 dot video capture and we are going to give in our video path so here it is in videos videos slash one dot mp4 let's use the second one because the second one has two faces in it so that will be better when we are testing okay so then we are going to write while true we are going to say uh success and image is equals to cap dot read we are going to read our image and then we will say cv2 dot i am show we will say that it is an image img and then cv2 dot weight key as one so that is pretty much it so let's run it and see if it works and there you go so you can see the video is now running and we have two faces uh the first one i believe has only one yeah and it is quite big so it is i think hd i mean full hd so let's try let's work on number two and later on once we are done we can try number one number three four five and so on okay so the next thing we will do is to write the frame rate so here we are going to write c time is equals to time dot time and then we are going to write fps is equals to 1 divided by c time minus p time so p stands for previous c stands for current time so then we will write that our previous time is equals to current time and up here we are going to define the previous time is equals to zero so that should give us the frame rate and then all we need to do is we need to put it on our image so we will write put text on the image and we are going to write fps and in the curly brackets we are going to write integer of fps and yeah that should be it and then we have to give in the location so 2070 and then we will given the font cv2 dot font let's give it a plain font and what else then we will give the scale then the color we will keep it as green and then we will give in the thickness so yeah that should be enough so let's try that out and there we have it so now we are getting the frame rate it is quite high at the moment okay so that is pretty good we can bring this down now the next step would be to use our media pipe library to actually find the points the different points on the face so what do we need first of all we are going to write here mp draw is equals to mp.solutions solutions dot drawing utilities so this will help us draw on our faces now we could draw ourselves as well but the thing is that when they are using their own function they actually make some lines in between some connections between these points and that is quite complicated so rather than doing it manually we could use their function for displaying purposes but if you just want to see the points you can just draw circles by yourself as well i will show you how to do that as well okay then we are going to write mp face mesh is equals to face mesh is equals to mp dot solutions dot face underscore mesh so we will be using uh this to actually create our face mesh so we will write here face mesh is equals to mp phase mesh dot face mesh i know it's quite a bit of repetition but this is what we need to actually create our object and from where we can actually find our faces so then inside this if i press the control button and i click on this it will take me to the function itself to the class face mesh and here it will tell me that what is the what are the input arguments so here we can see its static image mode is false then we have the total number of faces then minimum detection confidence and the minimum tracking confidence so the static image is whether you are using it only for detection or you are using detection and tracking so if it is a static image mode it will always detect in each and every single image but if it is false then it will detect and then it will track so detection is always heavier than tracking so therefore we will detect first whatever the confidence is above 0.5 which means that it has found the probability of 50 for a face then we are going to detect and then we are going to keep tracking that face if the tracking confidence is higher than 50 as well so we can change these parameters as well but for now we are not going to do anything here uh actually we can change the maximum number of faces because uh we want to detect two faces so here we can write max number of faces is equals to two uh the rest we can keep it as it is so then we are going to go down here and this actually accepts this class actually accepts only an rgb image so we have to convert it so here this image is bgr so we will convert it so we will write image rgb is equals to cv2 dot cvt color and then we will write image cb 2 dot color underscore bg r to rgb so that is the idea and then we can simply write results is equals to face mesh dots process and we are going to send in our rgb image so that is the idea and now what you will see is that the frame rate has dramatically reduced there you go so now we are getting right 50s to 60s which is actually quite high considering that i'm only using cpu but if we do it on the first one uh it will be slower because that is full hd video yeah so it's around 40s and if we are doing it on an hd video then it is around 60s so that is pretty amazing okay so that is good and now we are getting some results but now we need to display them so to display them we are going to write if results dot multi-face multi underscore face underscore landmarks then if something is detected then we are going to go ahead and draw but the thing is that you can have multiple faces here so you need to loop through the faces before we actually draw so to do that we will write here for face landmarks let's say face landmarks in results dot multi multi underscore face underscore landmarks we are going to go ahead and loop through that and for that we are going to now draw we will write here mp draw dot draw landmarks and then we are going to give in our parameters so if we go over here you can see that you have the image you have the landmark list you have the connections so here first of all we will give in our image then we have the landmarks which is basically your face lms these are the face landmarks and then the connections so we are going to write uh mp face mesh dot face connections so let's try that out let's move it a little bit here okay let's try that out and there you go so as you can see now we are getting the faces and both the faces are being detected and if you go back and you write here one you will see that only one face is detected there you go and you can see how fast and accurate that is which is pretty amazing to see okay so what if you want to change the size of let's say the thickness of the circles or the thickness of the lines around it so what what if you want to do that well to do that you can write here some specifications so you can write here draw no not mp draw draw specs is equals to mp draw dot drawing specifications and there you can write that my thickness is equals to 1 and my circle radius is equals to 2. let's see because i'm changing right now i think initially it is 1 1 or 2 2 something like that so we are changing it a little bit so that we can see an effect so once we do that then we will go to our mp draw over here and we are going to write our specifications here so we will say that landmark landmark uh drawing spec i think it is the next one so we can directly write yeah so these are the next two parameters so we can directly write them so we can write here that it is draw specs and then again draw specs so let's run that and there you go so now you can see it has changed and um let's say i increase the radius like dramatically let's put it five so there you go so now you can see all these weird points and let's see let's increase this as well to five and there you go so now it is completely blocked so anyways you get the idea so you can put one one here one one with an hd video is fine you know it looks good but if you have a full hd video then one one is is way too subtle at least to me like i can't really see the points here so if you put maybe two and you put here two then it is more visible yeah something like that so anyways you can play around with this all day and you can see which one suits you the best so this is basically the the basic idea of how you can draw now that is all well and good but in reality when you are creating a project you need to use these points you need to know their positioning to actually use them in a project so how can you get the actual points now there are a total of 406 points so that's a lot but what we can do is we can at least look at them and we can maybe number them maybe you don't know which one is which so we can put the numbering over there to see which is the nose which are the starting of the lips starting off the eye edge of the eye and so on so how do you get these values so what we will do is now here we are getting into one face so face lms is the landmarks of one face now in order to now in order to get further deep and find out all the different points we are going to add another loop so we will write here for lm in face landmarks dot landmark we are going to get each of these landmarks and we are going to print it so we are going to write here print lm so if i run this now you will see these are the landmarks so you get the x position ui position and z position so this is the basic idea so now these landmarks we are going to first convert them into pixels so that we can use them right now they are normalized from zero to one so we are going to write here i h iw and ic i stands for image image height image width image channels is equals to image dot shape so when when we get the shape now we can multiply it with the normalized values to get the actual pixel values so what we are going to look at is the x and y if you want the z as well you can add that as well but i'm going to write x and y only so to get the x and y value all we have to do is we have to write l m dot x and then we have to multiply it with the width and for the height we will write lm dot y multiplied by eye height so that is the basic idea so now we have our values uh in terms of pixels and we can do whatever we want with them so let's first of all print them out so we will write here x and y and we can comment this part so let's run that and if we go back you can see these are the points that we are getting so if we want to get the id for it well we can put it in a list and we can check the index of that list but if you just want to look at it you can write here enumerate and here you can write id and what you can do is you can write here id so that will give you the id number and then it will tell you the actual value so here you can see it starts at or does it start okay it starts here at zero and then all the way it goes till 467 so we have a total of 468 values and each value has an x and a y point so a y x and a y number so here what we will do is we will put this in a list so what we can do or or let's let's keep it still here and then now what we can do is we can create this into a module and in the module we are going to put it in a list and we will return something so let's keep the basics still here that we are getting the numbers and everything and yeah that should be good so let's run it for two phases and let's run it with video number two there you go so if we go down here you will see a lot of values being generated that's good we could also add uh an enumerate here and we could write which face number are we talking about face number one or face number two but uh let's let's forget that and let's go ahead and do the what you call module so the idea of the module is that once you create the module you don't have to write all of this initializations and all of this conversions again and again so all you need to do is you need to call that function or that method within our class and that will do the magic for you and it will return you the values and you will be happy to work with it so how do we do that we right click we go to new we create a python file we call it face mesh module module okay so what we will do is we will go ahead and copy all of our code and now we are going to convert this into a module so for the module the first thing we have to do is we have to write what to do if you are running the module by itself so we will write here if underscore underscore name is equals to underscore underscore main then we are going to run our main function and we will define our main function here main and we will write down our loop inside it so we will copy this part actually we will cut it and we will paste it here and then we will go here and we will cut this and we will paste it here and what else we will cut this and we will place it above the while so that gives us the initial part so if we were to comment all of this and if we were to run this it should run so let's try that there we go so this is like we are starting from the scratch um okay so then we are going to convert it into a class so here we are going to write class is equals to face mesh detector you can write a better name probably but we are going to use face mesh detector and then we are going to define our initial method uh for initialization and we are going to given some parameters now these parameters will be the ones that are uh where is it this one so for the face mesh so whatever parameters we have here we are going to give in to our object so let me write down here so we have static image mode so we will write here static mode is equals to false so this will be by default false and then max faces is equals to 2 by default then minimum detection confidence is equal to 0.5 and then minimum track confidence is equals to 0.5 so we are going to write these and then we have to tell that these are the values of this instance so we will copy this twice then we will copy this again we will copy this again and we will copy this again so here we will put equals to equals to equals to an equals to and here we will write self self dot this so we will copy that self.self.self. so if you are not familiar with this i would highly recommend that you check out uh object oriented programming uh the basics of object oriented programming so then we are going to uncomment this and we are going to write self in front of each of these and then we are going to write self here as well and here in the max number of faces and all of this we need to write our new variables so here we have self.static mode self dot faces then minimum detection let's write it in a new line and then minimum tracking so that is the idea so that is good for initializations uh this if you want to make it a parameter here as well feel free to do that i think it should be fine without it okay so next we are going to write a function called or a method called find mesh face or faces i don't know find face mesh let's say face mesh and then we will write of course self and then why is it not okay the indentation is wrong it should be here okay so then we are going to write image and draw so we will have a option to uh draw or not draw so it will be a flag so we can uncomment this and we can go back up here and then we will see what is missing so this indentation is wrong so we need to go back okay so then we will just copy this self dot and again we will start putting the self dot everywhere and self dot results uh self.mp draw whatever it's giving an error just put a self dots okay so that is good and wait what happened here i need to go back okay so that is good and now we should be able to see our results without going into the returning part we should be able to see the result but we didn't create an object or we didn't call the find face so it will not do anything so we need to do that we will write here detector detector is equals to uh face mesh detector and then here we are going to write here that our image is equals to detector dot face mesh or is it find find face mesh here find face mesh and we will give in our image and we will keep its true for the drawing part and here we are going to return our image so we will write here return image so that should be good and let's see if it works there you go so now it's working as an object but the last thing we have to do is we have to convert this so that we are getting our values in return so that's the main thing so and again the drawing part again is optional so we can directly put here if draw then do this so if i run this now it should draw if i go down here and i write here false false it should not draw it will work but it will not draw anything so that's good okay so what is next yeah so now we need to uncomment this and for every face we are going to go through every landmark and through every landmark we are going to convert it into x and y and then we need to store it so where do we store it we store it in let's say a variable called face so this will be a list so we will store it in list face dot append and we are going to append the x and y so this will be the x and y value now that is good for one face but we have multiple faces so what do we do we create another list we call it faces plural and this time around we append after the loop we append faces dot append face so basically when we are looking for the landmarks we append the landmarks in the faces and then once we have that face with all the landmarks then we append the faces so that we get the final result so that is the basic idea actually let's put it outside so it doesn't give random error that uh it has been used before declaration so yeah so in any case we are going to return faces so even if it's empty it doesn't matter we are going to return it so that's the idea and then in the image here i can write here faces and that should return the faces um what we can do we can print here so we can write here if the how can we write this uh the length of faces but if we write the length of faces it will be yeah it will be something okay yeah that should work if the length of faces is not equals to zero then we are going to print the length of faces let's say so let's see how many faces do we get okay we are printing a lot of things that's why we are not having a clear picture so yeah let's run it again there you go so now it is showing one face and why is it showing one face because we put we put maximum two phases why is it showing one that is weird max faces is two did we say something here are we running the module yeah we are running the module still saying one okay let's see why does it say one so it should be four oh this should be inside the loop my bad so now it should work there you go so now you have two faces being detected and if we go to video number one uh where is that with your number one then it will be one face so you can see here it's only one face so that is the idea and then what we can do is to check whether we are on the uh correct path or not we can write here faces at zero let's see what does it print so oh actually it's printing the length no no we yeah actually it's good to see that it has 468 points so that means we are getting all the points that is good but now let's print all the points it will be a long list there you go so now you can see these are all the different points that we have so this is quite good okay so one more thing we can do now if you're not familiar with which point is which number then what you can do is you can print the id number over here so you can write that cv2 actually we wrote it somewhere yeah why write it again if we have it already we can write here that cv2.put text and we are going to put the text of our id so let's just write here string id and where do we need to put it we need to put it at x and y so this is the x and y position uh 3 is way too big so we are going to put it as 1 1. let's see how that works out so this is going to print out the id number of each of the points oh boy okay so it's like a matrix okay so that is bad um maybe we need to look at a video that is uh more focused on the face doesn't have a lot of other stuff let's see maybe this one maybe this one will be better let's try that this number six there you go so now it's much better so still not that good we uh first of all let's let's change the maximum to uh max faces is equals to one so we only look at one face and then let's put this as 0.5 okay it's going to the kid i was hoping it will attach to the elderly person but no it did not happen okay let's keep the maximum faces as 2 and let's make this even smaller 0.3 let's try that no that's not readable 0.5 is let's try that again okay so i can see 1 is here 4 5 then it goes to 195. 151 9 8 so yeah it's a little bit harder to read still 1.7 yeah now it's a little bit better here at the edges you can see very clearly what are the points numbers so 21 54 103 is 67 and so on but in the middle area especially with the nose i think one i can see one here so one is the nose uh the center of the nose let's let's try another video hopefully we will get something better so number by the way you can read all of this in the paper so if you go to media pipe website they have a paper listed there and if you go in the paper you will find these you'll find more information on these points so yeah now it's hard to see with this uh maybe if you have just an image and you apply that apply this method on the image and then you scale it up to see the numbers maybe that will work so anyways this is the basic idea of how you can detect 468 points on a face and that running on a cpu all of this running only on a cpu so that is a pretty amazing task and the results are pretty good you can see let's try out different videos so we have video number one you can see it is pretty good video number two actually let's let's uh remove this part let's remove the id and uh let's keep its normal yeah so let's try it again wait what happened there uh oh yeah the draw is false so we need to remove that there you go that is pretty good you can see it is very smooth it's very smooth yeah then let's try number three there you go even when the faces are a bit far it is detecting quite well number four that is good okay when it goes to the side it disappears and that is understandable then let's try number five this seems like a like a zoom meeting yeah could be used in that with the frame rate we are getting it could be used in a zoom meeting okay let's try number six there you go uh it's flickering a little bit here maybe because they're merging the faces at some point yeah probably because of that then let's try number seven how many do we have eight yes okay that's good okay when she goes down then uh of course it will not detect but as soon as she gets back up uh the face is detected there you go the person is laughing and you can see that that is pretty good okay so this is it uh for today's video i hope you have learned something new if you like the video give it a thumbs up and don't forget to subscribe and share it with your friends and i will see you in the next one hey everyone welcome to my channel in this video we are going to learn how to use gesture control to change the volume of a computer we will first look into hand tracking and then we will use the hand landmarks to find the gesture of our hand to change the volume this project is module based which means we will be using a previously created hand module which makes the hand tracking very easy so here we are in our pyjama project this is the same one that we used in the previous video so in the previous video we learned about hand tracking minimum code the bare minimum code that is required to run hand tracking and then we created a module out of this so that we don't have to write it again and again we can simply import this module and run it as it is now if you haven't checked this video i highly recommend that you go and check that video out first and then you can come back here and we will continue with the uh hand tracking project so the project that we are working on is the let's call it volume control or let's call it volume hand control so we are going to control the volume of our computer with our hand so that is something very interesting now the first thing is that we will go to file and we will go to settings we will make sure that our uh libraries or packages are installed so we will write here opencv python so this is the first one that we need we will hit install and then i've already done it that's why i'm not doing it then we are going to write media pipe and we will click on that and we will hit install so these are the two main libraries that we need for now and later on we can check what we need uh afterwards the first thing you have to do is you have to import your packages so we will write here import cv2 then we will import time and then we have to import numpy numpy as np so these are the basic packages that we will be using later on we will add some more as well so the first thing we want to do is we want to check if the webcam is working and everything is running fine so we will write here that cap is equals to video capture cv2.videocap and we will write the id number so i'm using id1 most probably you will need id 0. then we will write while true we are going to check the success of the capture and then we will write image and then we will write cap dot read so this is the main idea and then we will write cv2 dot im show we will write the image and then img and then we will write cv2 dot white key as one so that will give it a one millisecond delay so this is looking good let's run it and see what happens there you go so this is my hand that we will be tracking and we will change the volume from and this is our webcam so what can we do okay one more thing we can do here is we can write here let's say this is the part where we have our parameters so we can write here that our cam width let's write width of our camera and the height of our camera is equals to 640 by 480 so we are basically defining it here and then we can use it here we can say cap dot set prop id number three is width so we will write with cam and then prop id at number four is height cam so we can write height gap so that should be good and let's try it with the different value 1280 by 720 so this will make it a little bit bigger yeah there you go so but we are going to use 640 by 480 instead okay so that is good what else can we do we can add the frame rate so here we are going to write that our current time is equals to time dot time and then we have the fps is equals to 1 divided by our current time minus the previous time and then previous time previous time is equals to current time so we can define the previous time as 0 over here and what else so now we can put this fps on our image so we can write here cv2 dot put text and we will write image and then we will write f p s and we will write the value of the fps let's say integer fps because it's decimal we don't want decimal places and then we can write the the location 70 and then we write the font so we can put any font that we want and then we will write the scale and the color so let's write two five five zero two five five and then we will write the thickness so that should be good so let's run this and see if it works oh it does work but looks really bad okay so first of all the color is bad so let's change it to blue so we can remove this and then let's try the blue color first yeah the blue color is much better then we can make it smaller it's really big so scale let's put it as one yeah that seems good thickness thickness is fine i think it's fine we can reduce a little bit but and then we can push it up a little bit let's say 50 yeah now it really looks good and in the correct place okay so now that we are all set now we will do the magic part the magic part here is that we already have our hand tracking module so we don't have to write a lot of code we will just use its functionality and we will be able to get our hand very quickly the landmarks of our hand very quickly so we will write here imports hand hand tracking module now if you don't see hand tracking module that is because you need to put it in the same folder as your project so if it's in the it's not in the same folder or if it's not even here then you will not be able to use it so you need to make sure it is in the same folder and we can import it as something else hdm let's say hand tracking module because we don't want to write that complete name it's quite big okay so then we will create an object so this is a class inside here we have a class where is it here hand detector so we will uh create an object from it we will call it detector detector is equals to uh what was the name htm dot hand detector and then we have our default parameters already here so we don't need to write anything for now so we will keep it like that and then what else can we do then after doing this we need to find the location or we need to find the hand so we will write here that detector detector dot find hands so this is the method that we created so this is this method find hands and all we need to do is we need to send in the image so this will give us the hands so we want to draw it so we will not put it as false we will keep the draw as true and it gives us the image back so we will accept that image back again so that is pretty much it so we can we can separate this code not too much we can separate this code so that it is easier to read so now we can run it and see if it works and there you go so now we are getting good detection and it seems quite good so what happens sometimes you can see it detects a little bit on the side and then it detects small hand as well so what we want to do is we want to change the detection confidence here so at default it is 0.5 so the detection confidence is 0.5 we wanted to be really sure that it is a hand and then only detect the hand this way when you are changing the volume it will be a little bit smoother because it will not flicker too much so we can right here detection confidence is let's say 0.7 so you can play around with this value if you want but i will keep it at 0.7 yeah seems fine to me and then we are going to go to the next part which is the best part of getting the position so now all we have to do to get this position is to write that our landmark list is equals to detector dot find position and then we just send in our image that's it uh and we will also write draw is equals to false we don't want to draw it because we are already drawing it so we will keep that as false and now let's run it and there you go we are getting it but i forgot to print so print lm list so let's run that and there you go so at the bottom you can see that we are getting that list so here we are getting that list and we have a total of 20 values here uh 21 values because we are counting 0 as well so we have a total of 21 values now if we want to get the value of a particular point then we can write that point itself so here for example we need point number two so this is the landmark number two so if we run that again okay so this is a good point we are getting an error because the index is out of range so before we actually print or before we actually do anything related to the points we have to make sure that there are some points so we will write here if lm list the length of it is not equals to zero not equals to zero then we are going to do this otherwise it will skip it so let's try that and there you go so now i'm only getting landmark number two so this is good but how do we know which landmark do we need so here is the media pipe website and they have given us the landmark model information so these are all the values that we get all the landmarks that we get so here we will need the value number four which is for our thumb and we will need the value number eight which will be for our index so we need the tip of both the thumb and the index so we will go back here and here i can write i need the value number four and i need the landmark list i need the value number eight so this way we will get only these two values there you go so now i have these values and you can see they are changing but to make sure that we are using the correct ones we are going to create a circle around them so we will write here cv2 dot circle and in the circle we are going to say that we want to put it on our image and then we have to give in the center value now here what we can do is we can create some variables so that we don't have to write all of this again and again so we can write here x1 and y1 is equals to lm list at number four and we need the first element of it so this is the id number zero then this is number one element which is the x and this is the second element which is the y so we need the first element as x and then we need the second element as y and then we can copy this and we can do it for the index as well so we will write here this is 8 and this one is 8 as well so now i can simply write x1 and x2 x1 and white one and then we are going to write the radius let's say 15 and then we will give in the color let's say 255 0 and then 255 and then we will write cb2 dot filled okay so we can copy this for the other one and we can paste it here and we can write x2 and y2 so this will create two circles at that point so if we are getting the correct points then it should draw on the thumb and the index and there you go so now you can see that it is drawing on the thumb and the index so this is good now the next thing we can do is we can create a line between them so we can write here cv2.line and within the line we'll give in our image and then we will give an x1 and y1 and then we will give in x 2 and x 2 and y 2 and then we will give in what else do we have we have the color so 2 5 5 0 two five five and then we have the thickness so let's try that out there you go so now we are getting the line as well in between okay so next thing what we have to do is we have to get the center of this line so we can do that simply by writing here cx and cy is equals to x1 plus x2 divided by 2 and then y1 plus y2 divided by 2. so we can simply write it like this and this will give us the cx and cy and we can put a circle for that as well so let's write here c x and let's write here c y so let's try that out and there you go so now we are getting a nice circle in between so this is good now next thing we can do is to find the length so this is the most important thing we need to know what is the length between these two points or what is the length of this line when we know the length of this line then we can change the volume based on that so what we can do is we can write our own function and we can do a little bit of maths to square it and then square root but instead we are going to import math and within math we have the hypotenuse function so we will use that so we will write here that our length is equals to math dot hypotenuse and then inside that we are going to write x2 minus x1 and then y2 minus y1 so that will give us our length so we can print it out so let's try that i think we are printing something else else as well so we need to remove that first okay let's try it okay so here you can see that when i increase the distance the value increases when i decrease it the value decreases so the maximum you can say is around 300 something so we can say let's say it's 300 and the minimum let's say is 50. okay so this is the maximum and minimum that we have so one thing we can do is we can write here that if our length is less than 50 then we want to change the color of our center circle so let's make it green so this will give it like a button effect so when you're pressing it it changes so here let's let it load okay so here when we come closer there you go so now it actually feels like a button and it is very soothing i don't know why but it's pretty cool and it's quite fast so it is good so this is good and now what we can do is we can uh change the volume based on this length so how can we do that well we have a couple of libraries that can help us with this the one that i found was paiko i don't know why it says call but it is developed by andrey miras so thanks to him for developing this awesome library which allows us to change the volume of our computer so it is under the mit license and if you go down you have to just write pip install pico and then you can use this code as the template so i will copy this because i'm lazy and i will paste it here okay so now what we have to do is we have to simply go to file settings and we will add pico by call you can do pip install as well it's the same thing so we can click on pico and we can click on install so that is installed and we're good to go if we go back you will see all the errors are gone so we will copy the imports and we will we will cut them and we will paste it at the top and the rest of it we are going to see what do we need and what we don't so what we can see here here is that we have the volume get mute so this seems like the initializations so we are not going to change anything there then here we have volume dot get mute uh we don't want that then we have get master volume level um do we need i don't think we need that we might need the volume range and then we have volume dot master volume so we can set the volume so before we set it let's see what is the range so i can print this and there you go so our range is from -65 to zero so this is our range so we are going to use these two parameters so 0 will be maximum and 65 will be minimum we will ignore this value so if i set the volume as -20 let's run it and see what happens so right now it is at 26 the volume goes to 26. if i set it as let's say minus 5 let's see what happens to the volume it goes to 72. and if i put it as 0 then you can see it goes to 100 so this is basically the idea so what we can do is we can get the minimum and maximum range so we can write here that our volume range is equals to volume get volume range and then we will say that um or we will keep it like this let's keep it like this we can take the values later on or i think it's better to write so we'll write here minimum volume is equals to uh volume range at one visit yeah at zero and then maximum volume is at one so this is our range okay so now we can use minimum volume and maximum volume instead of using this okay so what is the next thing that we need to do now the next thing is we need to convert our volume ranges so as you saw that our minimum and maximum was 350 so we were getting let's write it after the length so here we know that our our hand range was from 300 was the maximum and the minimum was 50. so it was from 50 to 300. now we need to convert it to our volume range so our volume range is from uh minus 65 to zero so we need to convert this range into this range so in order to do that we have a very simple function in numpy so we have did we include yes we included numpy so in numpy we can write here volume is equals to numpy dot interpret and inside that we are going to give in the value that we want to convert so we want to convert the length and now we have to give in the range so our range was 50 to 300 and then now we have to give in the range to which we want to convert so our range here is the minimum minimum volume and this will be the maximum volume so let's print this out so let's print the volume so let's remove that and let's see if this works so right now it's minus 25 if i go down it goes to minus 65 so here our volume is zero and if i go up you can see it should go till zero yeah there you go this is the maximum so this is good uh if you are a little bit confused you can write the length here as well so the length and the volume you can see side by side we can make it let's make it integer that doesn't give weird values so yeah here you can see when we have the minimum volume then it gives us minus 65 and then when the length is maximum it gives us zero so this is exactly what we wanted so now that we have converted this we can simply send it to our master volume so we have the function here we can actually remove it from here and we can go down and after the volume we can paste it here so set master volume level we are going to set it as our volume so let's try this so i will open up my volume and let's try this there you go so if i go to zero it goes to zero as well if i increase the length you can see and now it goes to the maximum i think 300 is a little too far well it's fine you can see i can change the volume from here you can make it a little more smoother as well by changing the range of these two over here and even this you can play around with because i can see it's not very proportional so you can play around with those values to make some changes now the last thing we can do is we can show the volume bar on the side so that it looks a little bit nice to see what is the volume at any given point so what we can do here is that we can create a rectangle so we will go down and we will write here cv2 dot rectangle and we will put it on our image we will give it uh the initial position and then we will give it the ending position so we will give that and then we will give in the color so let's make it green and then we will write cb2 dot filled so this is the idea so the width of our bar is basically 85 minus 50 which is 35 so if we run this there you go so this is our bar and what we have to do is we have to remove the fill we don't want to fill we want let's say three so let's run that yeah so there is our bar and then the next part will be filled so the next part here we will copy this and now we will give in the volume so we will say that our width is the same but the height will be different so we will write here integer our height so volume of this and do we need to change anything else uh not really so here we need to change cv2 dot filled so if we run this volume is not defined oh yeah volume is not defined here so we can write here volume is equals to zero okay there you go so now you can see that it's going out of the uh what he called image so if i try to change let's try to change does it change no because it is too big for us so what we need to do is we need to convert our range again so in this instance our range is from let's say 400 so 400 let me show you here so 400 is at this point so this is the height 400 so when our volume is 0 it should be 400 and at this point we have what do we have we have 150 when the volume is maximum it should be 150 so our new range so we can write here that this is volume for the bar so we will write here it's again 50 from 300 till 300 but the minimum now is 400 and the maximum is 150 so instead of volume we will send in volume bar so let's run that okay now volume bar is not defined so we need to write here volume bar and let's do volume as well volume is equals to zero okay let's run it still the same why where did i make a mistake volume bar is from so where is the issue let's run it again and see if i put my hand in ah okay so the first value is wrong so we need to go up here and volume bar should be 400 so it is at the first point it should be zero so our zero is at 400 so there you go so now if i bring in my hand and now if i change you can see it changes the value there you go so the last thing we can do is to add a percentage at the bottom so we can copy this and we can paste it here and in the text we are going to given our percentage but again we don't have any percentage so we can create another conversion here we can write here volume percentage equals to 50 to 300 and now it will be from 0 to 100 so we need the percentage of that and then we can write here that our volume this is our volume should we write volume or let's just keep it like this and then we can write here volume percentage and then we can write percentage in front of that but the location let's put it 440 let's keep it at 40 and then let's put this as 450 and let's keep the rest same or let's change the color let's make it green so it matches that so let's try that out okay volume percent i always forget this volume percentage is not defined you need to define it as zero or should it be zero uh yeah it should be zero okay so zero percent sixty-four it's hard to see this color let's change the color completely so let's keep it as blue so i will copy the blue color where is it from the fps and we will put it for all of these okay let's try it again yeah now it's much much clearer so here we can see so 100 is a little bit hard to reach so instead of 300 i can make it 250 for example 275 but now you can see the percentage is changing let me show the volume bar and you can see here that the percentage is changing hey everyone welcome to my channel in this video we are going to learn how to count fingers we will first look into hand tracking and then we will use the hand landmarks to count the fingers and all of this will be happening in real time and it requires close to no installations and configurations so here we are in our python project and you can see that this is the same project that we used in our previous two videos in the first video we looked at the bare minimum code that is required to do the hand tracking parts and once that was done we created that into a module so that we do not have to write the code again and again and it will be easy for us to create new projects so this will be one of those examples where we create a project out of this module another example that we did earlier was the volume hand control and you can see here that this was the code that we did earlier so all of this is available on my website and now what we will do also there is another folder here you can see that it says finger images so basically what this is that we have the images of different fingers so when it is one when it is two three four five and one it is zero so we can have very specific ones as well where you have the index and the pinky finger up so then you can have only the thumb up and you can have all sorts of different scenarios but for simplicity we are just going to use these six scenarios but if you want to add more later on you can do that and it will pretty much use the same code and you can keep adding on to it okay so once we are in our project we will go to file settings and we will make sure that everything is installed now because this is the same project i know that the packages are already installed but if somebody is doing this for the first time then i will show you what you have to do so here you will write cv dash python and you will install this and then you will go to media pipe media pipe and then you will install this so both of these are installed so we don't have to worry about that and then we will go to our project we will create a new file and this time around we are going to call it finger something finger counting projects let's say so the first thing we will do is to import our cv2 the opencv library then we will import time and do we need anything else we will also import os so i will tell you why we need os later on now the first thing as always we will turn on our webcam so here we will write cap is equals to cv2 dot video capture and we are going to give in device number one most probably for you it will be device number zero and then you have the option of giving the size so you can write for example cap dot set and here we are going to write that this is number three number three is for the width so i can write the width of the cam and then i can write cap dot sets and the heights of the cam so we need to define the width and height so the width of the cam and the height of the camera is equals to 640 by 480. so we can write it like this and what else so then we have to write our while loop so while true then we are going to write here success and image is equals to cv2 dot cap dot read so it will read our frame then we write cv2 dot i am show and inside that we write image and then img and at the end of the day we have to give it a delay so cb2 dot weight key as one so this will give it a one millisecond delay so that we can see our images okay so what else okay the spellings here are wrong success okay so let's try this out and see if we are on the right track so this is my webcam and you can see my hand and that seems good okay so next we are going to do something new here and that will be to import our images so we have all these images so what we need to do is we need to get them one by one and then we want to store them so that later on whenever we have the certain amount of fingers shown then we can display that image so we need to store them first so how do you store it you will use os so what we can do is we can write here that let's say our list my list is equals to os dot list directory so we want to list the directory we want to list all the files that are present in finger images so we will say that our folder path is equals to finger finger images so i will copy this and i will paste it here so now if we print this out you will see that we get a complete list there you go so we get all the names so one thing to note here is that i have put them in order and the last one is zero so when there are no fingers then it will be zero here now you might say why didn't you put it here in the beginning and there is a reason why and i will show you later on why this is the reason what is the reason so then we are going to create a list of images so we can say list of images or we can say let's say overlay because we want to overlay this image on our main image so we will say overlay list is equals to empty and now we can loop through our list so we can say that for image paths in our list we want to create we want to import our image so we will write here image is equals to cv to dot i am read and then we have to given the path of the image so this is the path of the image so it is in finger images and it is one dot jpg so one dot jpg is basically this i am path and then our folder path is basically finger images so we can write here f and then we can write here folder path okay f needs to be small folder path and then we can write slash and then we can write our image path so we can write here image path impact okay so now if you're confused let me show you what this will look like so you can go here and we can print this and we can skip the import or we can import doesn't really matter so there you go so for each of the images you get this finger images at one dot gpg then finger images 2.jpg and so on so this way we get all the paths and we can simply import now we have imported it let's keep it there we have imported it but we didn't save it so we need to save it in our list so we will say overlay list dot append and we want to impend our image so that will give us our image list now to confirm that everything is working fine we can write here length of our overlay list and we can write here print so if that list is six then we should be good to go and there you go so the value is six so this means we have imported all these images and we are good to go so how exactly do we overlay an image now the thing is that image itself is a matrix so what we can do is we can define that our new image we want to put in our old image based on this location so what we can say is that our image is equals to so if i wanted the elements the first element of this list i would say overlay list 1 or overlay list 0 right so the same way if i want to target a specific region of my image then i can write that target space in this bracket so this is also called slicing so what we will do is we will give in the height first the range of the height and then we will give the range of the width so we will say that i want to put my image this overlay image whatever it is let's say we are using overlay list at number zero so we are using the first image so i want to put that at zero wait what happens uh i think insert is pressed or yeah okay zero and then 200 this is the limit of my height and then 0 to 200 this is the limit of my width so now now the reason i'm putting 200 is because these images are 200 so the size of this image is basically 200 by 200 so we can automate that i will show you how to do that too so let's try this and see if it works so i will run this and there you go so now you can see the image number one is displayed at 200 by 200 so this is 0 0 and this is 200 200 so our image is now displayed properly if i wanted to shift this i can write for example 100 here and then 100 here so we are getting an error that we have hundred and two hundred uh okay so the problem is that the image is of size 200 by 200 and they are saying that you cannot convert it to 100 by 100 so here hours if i'm increasing 100 here i need to increase 100 here as well so that it maintains that 200 size so if we run this now then you can see that the image has shifted down so this is how you can place your image within our original image so here we are going to write 0 and 200 and 0 and 200 but now we want to automate this so let's say you don't have a 200 by 200 image you have something else you're the size of the image is something else and even the size of each image could be different so what can we do then what we can do is we can write here that for example we are using overlay list 0 okay so we will write here overlay list 0 dot shape so this will give us that shape and we can store it in height width and channel because we have these three things and then we can replace this with height and we can replace this with word so if we run this now then it will give us the same effect because we don't have to worry about uh the size of the image it will put it on the corner so this is good now we understand how to overlay our finger image on the original image now what we can do is we can display the frame rate so here we can write current time is equals to root time dot time and then we have to write fbs is equals to 1 divided by current time minus the previous time and the previous time we have to declare up here in we will put it as zero and then we will say that our previous time is equals to the current time so this will update every loop and then we can simply write cv2 dot put text we will write in our image and then we will write the then we will write the fps as an integer and we will write here fbs and what else then we will write the uh what do you call the location so we can put 470 and then we will write cv2 dot font any of these and then we will have the scale and then the color and then the thickness so let's try this out and see if we get a good image whoa that is really big so let's change this to font plane there you go so now it is quite good and we can see that it is working so that is good so the next thing would be to actually go into our hand tracking part so here we are going to go up and we will import we will import our module which is the hand tracking module and we will import it as h m so this is what we will do or let's say htm hand tracking module and then we will create uh not here we will go down and here we are going to create a detector so we will write here the detector is equals to htm dot hand detector and we will not give it any values or should we we can give it the detection confidence so we can keep it a little bit higher so 0.75 let's say later on if we get some errors we can change that too so that is good so here we are going to write that we want our detector to find the hands and we will send in our image and we will just ask it ask it to return our image so if we go to the hand tracking module you will remember that this is our class hand detector class inside that we have a method called find hands and it just needs an image and it will output the image with the drawing so we can write here image we will return image back and if we print this it should uh draw our hand so there you go so we have our hand and it is drawing nicely so that should be good and then what we can do is we can create a list of the landmarks that we detect so we will say that detector dot find position and we want it to find the position within our image and we want to draw as false because we are already drawing so we don't want to draw again so we will write here that drawing is false and then we can print our lm list to see if we are getting something so right now it's empty and when i bring in my hand you can see that the list fills up so that is good and now what we can do is we can write here that if the length of our lm list is not equals to zero then we are going to do something so that something could be anything so what we are trying to do is that we are trying to get the tip of our fingers and based on that tip we can decide whether our fingers are open or closed so here is the website of media pipe and we can see that these are our landmarks so what we have to do is we have to first of all get all these points so we need point number four point number eights 12 16 and 20. so we need to use these and then we need to check whether these are below let's say number six or number seven i think it's better to take number six you can even use number five but i think number 5 will be too much so you can say that if number 8 is below number 6 then it is what you call closed then the finger is closed and if it is above six then the finger is open so what we can do is we can pick one of these and we can try it out and then we can apply it to the rest of them so let's say we pick our index finger so this is eight and six so what we will do is we will write here that if the lm list at number 8 we will get the value of the y not the x so y is the third element so it will be 0 1 and 2 so we will write here 2. so if the point number 8 is less than the point number point number six then it means it is open so in that case we will write here print index finger open so because we are using the opencv orientation so up means lower values so our image is starting from the top so the maximum value at the maximum height is zero so to check for example here our value is 50 and here our value is 100 then it means our finger is open if it's the opposite if it's 100 here and then 50 here then it is it means it is closed so here we are going to try out and see if it says finger open so here right now it's saying finger open if i close it you will see that it stops saying actually let me remove the print it's quite annoying so let's run it again so right now it will say index finger open if i stop if i close then it will stop and if i open it again it will say index finger open so this is how you can tell if the finger is open or not so now we need the tip points for each one of these so for each finger we need a point now we could write a lot of if statements and if you are using a lot of different types for example if you have the pinky finger and the index one up as well and then you consider that a different gesture than your two fingers of any kind for example the index finger and the middle finger up so if you want to differentiate between these then you have to create lots of different images lots of different scenarios but here we are only going to use six scenarios so we can simply use a for loop so what we will do is we will create a list here and we will call it tip ids and this tip ids will be basically number four which is for the thumb number eight for the index number 12 for the middle finger then 16 for the ring finger and then 20 for the pinky finger so these are the tips and then what we can do is we can put a for loop here and then we can change this value so here we can say for let's say id in range what is the range from 0 to 5 we are going to repeat this and here we are going to put in the value so the idea is that we have our id number four so we can write here tip ids at number id okay and the other value so here you can see it is eight and here it is six so it is minus two so whatever value we have here minus 2 so this minus 2 so this should basically loop and it should tell us for each one of these but once it tells us that if the finger is open or not we need to save that so here we can write fingers fingers is equals to empty and here we can write fingers dot append we can append either one or we can append zero so if the finger is open then we will append one if the finger is closed then we will append zero so here we will write prince fingers so let's see how that works out so if we have our hand in you can see all the fingers are open if i close all of them close except for the thumb so we will discuss the thumb but let's try the other ones out so here we have one the index finger then the middle finger then the ring finger and then the pinky finger so you can see it is really good i can do one two three and four there you go so that is amazing so what i was saying earlier is that for example if you have this pose it will show you that the index is open and the pinky is open but for images we are still going to use this image that it will show us that there are two fingers open because we are not using all the scenarios now one thing you have noted is that the in the thumb is an issue and the reason is that the tip of the thumb is not acting the same way so we are using the tip at this point so this is the point where we are using the tip and we are saying that when it is below this point because -2 will be here at this point so this basically is never going below that if i really push my finger maybe not even then so that is not a good way to check it so how can we check it can we say that if this is below instead we can write -1 even in that case it is very hard to bring it below that maybe now it will say 0 but it will be very hard so how can you tell if the thumb is closed or not so for the thumb what we actually do is naturally when we are closing we put it on the side we don't bring it down we put it on the side so we can check whether this point here at the top is on the left of the second point or on the right so right now it's on the right now it's on the left so when it comes to the left side we will say it is closed and when it's on the right side we will say it's open so this is the idea so let's see how we can do that so what we will do is we will keep this for loop for the four fingers so we will just make it one to five and then over here we are going to create another loop uh not another loop just an if statement and we will say that the id number one which is the x axis is less than this point id number one and then we will check minus one not the uh two values below only one value below only one landmark below so if you are not clear about this i'm talking about this point number three so if this point is on this side then we will consider it closed if this point is on this side of the point so here it is on the left then we will consider it open so keep in mind for the left and for the right hand it will be different so if you are doing it for the right hand uh it will be the same what i am doing if you are doing it for the left hand then you can uh do it the other way around now you might say that if i am using it with the left hand with this code it will not work yes it will not work but there is a possibility of checking whether it's a left hand or a right hand and then based on that you can change your parameters you can change the if statement based on that so it is not something very complicated it's very simple but right now we will just focus on the right hand so here what we will do is uh okay we have already written the code so we don't need to do anything else so we can write here thumb thumb and we can write here uh four fingers so that is the idea so let's try this out if it works or not okay we have another must be integers or slices okay uh oh there's no id my bad so there's no id we have to get our id by itself so this will be number zero so let's run that again okay then we have another issue tip ids zero one okay i think there's an issue with the brackets so there should be a bracket here so tip ids add zero yeah and then the first part of it so the bracket should be here and over here uh it will be tip ids minus one and there you go so it should be like that okay so now you can see it is zero when it should be one so it is basically opposite so we need to make it greater than so now you can see all of them are one if i put my thumb if i close it you can see it says zero so that's how easy it is so now if i put all of them closed you can see it says zero and if i open i can open one by one two three four and five and i can get all the detections so that is good now the next thing we have to do we have to change our image so to change the image first of all we need to know how many fingers did we actually get so to do that we can actually let's go up here let's comment this and instead we are going to write here that our total fingers total total fingers is equals to fingers dot count so this is a method in our list so we have a list in which we can use the count method to count the number of values present of what of the number one so basically we are saying find how many uh ones do you have i think the indentation of this is wrong so we need to fix that okay so then we can print the total fingers let's try that oh it has no attribute counts because c is supposed to be small okay so let's try that here we have 5 0 1 2 3 4 5 4 3 2 1 0. there you go so now it's looking good and all we have to do now is to change our image so how can we change the image so here we have the code for changing so we will bring it in the loop or what is it in the if statement so we will bring it in the if statement and then we have to change the value over here so we have to put a value based on the total fingers so already we have laid down if it is finger number one it should be element number zero so we can take this total fingers wait what happened total fingers minus one so if it is one it will become zero so it will take this image if it is two it will become one and it will take this image so this is how it will work so let's see if it changes we are taking the shape of zero we should take the shape of the same image even though all of them are same doesn't matter at this point but overall it could be different okay so we have 5 now and then we have zero one two three four five so that's how easy it is one two three four five and it looks like an animation if i do it fast and it's running real time so it looks really good now you might say we didn't add zero we didn't we actually never said go till six so this value of total fingers can be let's say at five maximum right and then five minus 1 is 4 how is it going to 6 that does not make any sense right so this is why i put it here as 6. what happens is that when the value of total fingers is 0 it gives the value of minus one and in python what we have is that if we write minus one of the list it will take the last element so the last element is the value number zero so that's why i put the image at the end so it will take that minus one value and whenever it is zero it will become minus one and it will take the last element which is the sixth element so this is uh the results so you can see here whenever it is zero it gives us that a fifth image which is image number six so this is good and we are pretty much done the only thing we can do is we can add a rectangle to show the count so that it is a little more appealing you can say so here we can write cv2 dot rectangle we will put in an image and then we have to give it the uh the points the starting point and the ending point uh 170 so i have uh tried out this before so i know the values that will work properly so i'm directly inputting those so then we can put the color let's say green and then we can write cv2 dot filled so if we run this uh you will see whenever we have our image or whenever we have our detection we get that green rectangle so this is only coming when we have the hand because it is in the if statement so if we put it outside the if statement it will it will always appear but i prefer that it disappears whenever the detection is not there so then we are going to put the text we will write cb2 dot put text and inside that we will add our image and we will change the text to total fingers and then we will give in the location and then we have to give in the font so we will pick the fonts plain font and then we have to give in the scale we will keep it really big so we can see two five five zero and zero and then we will put 25 so this is the thickness so let's try that out and we'll bring in our hands and if we close it's zero one two three four five i will do it again one two three four and five one two three four five so as you can see it works really well and the detection is really great but again if i use these two two fingers it will still show me the image of two and if i let's say put these three up it will still show me the three default image that we have so if you want to change these you will have to put if statements for each of these you will have to have an if statement rather than a loop and you can define if this is the case then this should be the image if this is the case then this should be the image so each of the fingers can be like a binary so if that is 0 0 1 1 then you do this if it's 1 1 1 you do this 1 1 0 0 you do this and so on so you can do it like that hey everyone welcome to my channel in this video we are going to create a personal ai trainer we will use the pose estimation running on cpu to find the correct points and using these points we will get the desired angles then based on these angles we can find many gestures including the number of bicep curves we will write the code in a way that you will be able to find angles between any three points with just a single line of code so here we are in our python project and you can see that this is the exact same one that we used in our previous video so we started off with the bare minimum code so you can see this is all of the bare minimum code that we required to run our pose estimation and then we created a module out of this so this is that module which allows us to create these projects very quickly so today will be one of these examples where we create a project very rapidly so then here is our awesome project so this was the demo of how you can actually utilize this to run your module so these were the things or these were the files that we created last time so if you haven't checked that video i highly recommend that you do go through that video before you continue here now today we are going to do the ai trainer so we have a new folder here called the ai trainer let's open that up and let's check out the contents so here we have a test image so basically the idea is that we want to find the angle of any three given points so point number one point number two and point number three so this is actually explained here in the summary so what we will be doing is we will be using these three points and based on these three points we are going to find the angle between these two lines so that will tell us uh how much angle we are at and based on that we can do some calculations for the gestures or for the poster so that we can tell the person okay you have done this many curls so what we will be doing is we will be counting the number of curls that a person has done you can apply to other techniques as well to see the posture whether they are using correct ones for yoga or something like that so the main idea is that we will do this in two steps the first one will be to find the angles so we will create a method where we we can input any three points and it will give us the angle of these three points so this way we do not have to worry about getting other angles for example if i want for the leg i will have that information as well if i want for the arm and the shoulder i can have that information as well for the elbow and the wrist i can have that so with one single line of code i will be able to have all of these different angles so i will just have to specify the landmark number for example uh for this arm it is 11 13 and 15 so i can say 11 13 15 and it will give me that one if i say 12 14 and 16 it will give me for this one so for sorry for this one so this is the idea that we will create this method and the second part is where we will try to find the angle and no actually we will have the angle and based on that angle we will see how many curls did the person do so this is our idea so we will start off with this is by the way the video let me bring it here so this is the one we will be using again both of these images and videos i got from pexels.com so you can check it from there or you can find these documents in my website so you will find this folder over there so what we will do is we will right click and we will create a new file and we are going to call this let's say our a ai trainer project so the first thing we will do we will import our packages so we will write here import cv2 then import numpy as np and what else do we need we need time so we will import that so all of these packages we have imported earlier so if you are new and you haven't seen the previous video you can go to settings python interpreter and you can add and here you can write opencv dash python and you can hit install and then you can write media pipe media pipe and there you can install so these are the two libraries that are the most important ones okay so once we have that we are going to import our image and the video so here you can see we have the image not this one the test image and the video so we will be using the test image at first for the angles and once we have that then we will use the video for the curls now uh we will write the while loop anyway so that we don't have to switch at the end we just can remove one line and it will convert to video so we will write here first that we have a video capture device so we will write here uh cv2 dot video capture and we will say this is the ai trainer slash what is it girls dot mp4 and then we will say while true we are going to check the success and the image and we will say cap dot read and oh and then we will say cb2 dot im show we will write here image image and then we will write image and cv2 dot weight key and layoff one so if we run this we should get our video running so let's run that and there you go so as you can see this is quite a big video it's quite huge we can resize it we can write here image is equals to cv2 dot resize and we want to give it a specific one so we will write here one two eight zero by 720 so let's try that and there you go so now this is good and we are ready for the girls part but as i mentioned before we are going to use the image first and later on we are going to use this so we are going to comment this and here instead we are going to write image is equals to cv2 dot i am read and then we can read our image so it will be a i trainer slash what is it test.jpg so we can actually put it outside so let's run that and there you go so we are getting our image so we will check the angle of this and we will see how we can calculate that okay so that is good now we need to find our pose so to find the pose what we have to do is we have to import our pose module so here is our pose module that we did in our earlier video so here we have the class pose detector we are going to use that to create an object once we have the object we can use find pose to find the pose and then find position to get all the data in that list so here we are going to say detector the detector is equals to oh i didn't import forgot to import so imports pause module pose module as pm and then we can write here pm dot post detector and we do not need to give any inputs at this point so then we can come down here and we can write here that uh detector dot find pose and do we need to input anything uh we need to put input the image and then we need to tell whether we want to draw or not so by default it's true so we will draw and what else so do we need anything back uh no not really we can yeah we need the image back so let's run that and there you go so now you can see uh okay this is not good we need to put it inside so it's detecting again and again there you go so now you can see it is detecting the pose and now we can try to find the angle but how do we get the landmark values so we can use the get position so we will write here nlm list is equals to detector dot get position wait why is it not showing detector dot is it oh it's find position okay so find position and then we will write image and we don't want to draw so we will write here false i think there are only two arguments so we can directly write false yeah so that's fine so let's run that actually we need to print to see if we're getting anything lm list and let's do that and there you go so this is our list and we can see we have all these 32 points so that is good okay so now we can uh first of all we need to make sure that we have a list where we have uh the post detected otherwise it will give us an error so we can write here the length of our lm list is not equals to zero then we are going to do something magical so what is the magic that we are going to do so now here the thing is that we can write the code here but then it will be for this project only we can write here pass for now so that will be for this project only but we don't want to do that we want to we want to enhance our pose module by adding a method to it so this pose class which is the pose detector it will have another method that will allow us to get the angle of any three landmarks so instead of giving the points for example 485 and 281 what we will do is we will say we want to find the points between three four and five and because it already has this method and it already has this list we will make it an instant list so that it is for that particular object so then we will not have to even input the point value we just have to input the number so we need to know which landmark numbers we are talking about so how can we do that let's start by writing our code so the first thing we will do is we will create a new method we will call it find angle or angle no find the angle because it's just one angle then we will give in our image um the image the image is for drawing so we will input that and then we will need the three points so we will call it p1 p2 and p3 so these are the three point uh landmarks that we need and then we can have the flag for drawing as always so we will keep it as true okay so now as i said instead of giving in the points we are just using the values so we are using value number four value number three you can call them index index number two three four so these are basic basically index values so what we need to do is we need to get the values of the points based on our index value so how can we do that so in the find position you can see we have the lm list so what we can do is here we can send this lm list back again to our object but that is not a good way to do it because we already have it we can just use it internally so we can write here self dot and now this is part of that object so we will write here self dot and we will write here self dot so now what we can do is we can write here that our x1 and x1 and y1 is equals to self dot self dot lm list at point number one so let's say we want landmark number three so that will give us this landmark number three but the thing is it has three things inside it has number three it has 485 and 281 so what we can do is we can slice it so here we will write we need for from point number one till the end so it will take this and this and it will ignore this so then it will store it in x one and x uh in y one you can also do it like this so you can ignore the first one and you can you can remove this one and you can ignore the first one and you can take just the last two but uh let's do the first method let's do it like this and then we are going to write here x2 y2 and then x3 and white by three and here we are going to write point number two and point number three now to make sure we are getting the correct points we are going to draw so here we will write if draw we are going to write a circle so let's just copy because we are lazy and we will just change this to x1 and x2 okay so yeah that should be good we can copy this and we can make it x wait what did i do x 1 x 2 why did i do x 1 x this should be y 1 i always make that mistake y 2 x 2 and then y 2 and then x three and then y three okay so that should give us but we didn't call it so we need to call it here so we are going to write that detector dot get angle or find angle and we will give in our image so now we need to give in the points so if we go to media pipe and we check for the points you can see that we want point number 11 13 and 15 and then 12 14 and 16 so based on if we want right or left so the left one is the odd one so 11 13 and 15 is the left one so here i think the right one is visible so we will use the right one so 12 14 and 16 and the draw we will keep it as true so now let's run this and see what happens and there you go so now you can see these have turned blue so we know that these are the ones that we are using uh should we decorate it more or should we do it later let's do it now so what we can do is we can make it look a little bit nicer just to make sure that we are using these correct points because uh at some point we are going to remove all the other uh what do you call the detection the pose estimates so here we are going to create a bigger circle and let's put it as 15 and let's put this as 10 and we will not fill it and we will put the value of 2 here and for the color let's put it as red so here we are going to write 255 and we are going to write here all of these as red um yeah and then what else so we will copy this we'll paste it here and paste it here we'll make it two and two and then three and three so let's try that there you go uh oh we forgot to remove the filled no we did remove the fill oh we didn't do the size we forgot to change the size okay there you go so these are the three points and what else we can do the line as well so let's do the line before that because uh the circles we want to draw on the line so we will write here cv2 dot line and the first line will be on the image and we will have two points and then we will have we will put the color of white so it's really visible and then we will put the thickness of three so here our first uh what do you call points will be x1 and y1 and then the second one will be x2 and y2 we will copy this and here we will have what happened there here we will have x3 so 2 will be common for both of them one and three will be changing so let's try this uh nothing happens why didn't anything happen because i put this in the wrong place this should be outside okay there you go so now we have the white line and so now we can go back and we can make this false so that we can just focus on these three points and the rest will be gone so there you go so now we have these three points and we want to know the angle between these points so we will go back to our module and here now we need to find the angle so the angle finding is not actually hard so here we can write for example get the land marks and here we are going to write calculate the angle and then here we will write draw well you know it's written draw here and we're writing draw here very redundant but anyways so we will write here angle is equals to we are going to use math so we will write here imports import math so this is just basic trigonometry so we don't have to worry too much about this so what we can do is we can write here math dot tan 2 sorry a tan 2 a tan 2 and then we have to give in y 3 minus y 2 and then x3 minus x2 then we will subtract and we will write here math dot a tan 2 and then we will write y1 minus y two and then we will write x one minus x two so again i'm missing some brackets so that should be good so this will give us our angle in radians so we can convert it into degrees so math dot degrees and there you go so if you have three points and you want to find the angle between the two lines the this is the method this is the equation that you can use so here we can simply write print angle and we can see what exactly is our angle okay uh what happened there so there's a mistake for sure it's the brackets yeah and this is extra so yeah i put one extra my bad okay so here we are getting our angle 87 which makes sense like it is almost 90 degrees you can see that so 87.3 it's not bad so what we can do is we can put this on the actual text or the actual image so that we can see so we can write here cb2 dot put text and we will write image and we will write the angle but let's convert it into integer and then we will convert it into string otherwise it will not accept and then we will write here uh the value so the position so here we will write x2 so this is the center point so we have x1 and x2 these are the further points and x2 is the middle point so we want to write the value near the middle point so we are using x2 but we don't want to write it exactly at that position so we can subtract like 20 from it and then we can write y2 and we can let's say add 50 to it so we can change these values if we are not satisfied okay then we can write cb2 dot font let's pick the plain one and then we will write the scale and the color so color let's put purple and then we will write the thickness so let's run that there you go so we are getting 88.87 so that's good uh the x is bad so let's do we did minus right so minus 50 let's see yeah it's better now maybe let's convert this into blue that's more visible now the background is a little bit black so it's not that visible maybe red will be more visible or green yeah red is not bad so 88 degrees is what we are getting and sometimes what happens is that we get a negative value so for that case we can write here should we actually let's remove this and for those cases we can write if our angle is less than zero then we will say our angle we want to add 360 to it so 360 minus whatever the value will be so that will solve that problem so now what we can do is we can use any three points to find our angle so let's say i want for this is for the right one so we can say here right right arm let's say and then i can do the exact same thing with just one single line of code i can write here left arm and i will just change the values 11 13 and 15 and now you will see it will do for both of them and that is pretty amazing so it is telling the angle for both of them at the same time so uh the last uh the the one that is hidden is not very clear so you cannot rely on that angle but the one that is here is quite good so we can remove this so now we can try this on our video so for the video as i mentioned we want to do it on the left arm rather than the right and by the way you can do it on the legs as well so it's up to you now which three points you want to take for example you can take 23 25 and 27 so it will tell you the angle between these two lines then you can use 24 26 28 it will tell you the angle between these so it is up to you which ones you want to use for your own project but for now we will use the left arm for the bicep curls so now we will remove this we will uncomment this and we will remove the image part and let's try to see if we get the angle there you go so now we are getting our angle and as you can see probably it's going to negative and all so that's why it's giving this value so what we can do is we can find the minimum and the maximum and based on that we can check if it has reached that point or not so let's try to figure out the minimum and maximum now the good thing is if you click on the window it will stop so we can check our values very easily so here you can see it's 338 but that's not the last one so here i saw 190 something yeah 180 probably or let's say 200 yeah 200 is not bad or just to be on the safe side we can keep it a little bit higher like 220 or something like that so that it goes to zero easily and then over here we can see it goes to 300 something so 340 but again might not go always there so 3 30 maybe 20 or 310 i can see it goes again and again to 328 so it went to 340 340 oh it's going to 40. now it's 28 27 29 okay so to be on the safe side we can we can say that it is um 20 right so or we can say 15. so what we can do is we can convert our range we want it from 0 to 100 we want to know how much curl uh we are at so the percentage of curls so at the zero point or at the 100 point so what we will do is we will create a percentage and we will say that did we import numpy yes so we will use numpy to convert our range so we will say numpy dot inter interp and then we will given our angle not range angle oh we didn't get it back that's why it's saying this angle is equals to so i think i forgot to return yeah so we need to return return angle so that will return our angle over here and then we can use this here angle is equals to what else do we need so we need the first range so our range will be let's say 210 to 310 let's say and we want to convert it into 0 to 100 so this is already 100 so we can easily just subtract but if we had a difficult range you can use this method so that should be good so let's print out this value and we will print out the angle uh not the angle the percentage so or we can print the angle and the percentage that would be good to see are we printing anything here no okay that's good so we can run this so here we can see so it's zero going hundred percent then going back to zero then two hundred percent going to zero hundred zero excellent so now we can see that we are getting the correct values so we should not face any issues because we have taken quite safe values if you go too far you might not get good results you have to check the trade-off between accuracy and taking the risks of getting error okay so then we are going to check when are we reaching the the first curl when are we reaching the second we need to count so here we are going to go up and we are going to define two things the first one will be count which will be zero and the other one will be direction which will be zero now we will have two directions direction number zero and direction number one direction number zero will be when it is going up and direction number one will be when it's going down so we will consider a full curve only if it does both of these so it goes up and then it goes down to zero so from zero to hundred and hundred to zero so if we get that then we will consider it as first curve so we could do the other way around but i want to keep it like this where we have um the complete curl if we go back and forth the whole thing so we are going to write here that we want to check check for the dumbbell dumbbell curl girls okay so then we are going to write here that if our percentage is equals to 100 and then we are going to check if our direction direction is equals to zero so this is the first direction it means we are going up now you might say why didn't you just write and here if percentage is hundred and if direction is zero i will tell you why later there's the reason so then we are going to say count plus equals 0.5 so we will add 0.5 to the count so if it's going up and it has reached 100 it will be 0.5 and then it's going down and it reaches 0 then it will be 0.5 so that will be a complete curve so what i can do now is i can change the direction so direction is equals to 1 and then i can check here that if my percentage is equals to 0 and my direction is equals to one then i will write count is equals to plus is equals to 0.5 so the same thing that we did earlier and then we will make the direction one uh no we will make the direction zero okay so this will keep adding to our count and this way we will know which direction we are moving and how many counts uh did we have so far so here we can remove this and we can simply print the count so we can write here print count and let's see what do we get so 0.5 down 1. 1.5 down 2. 2.5 down 3 3.5 down 4. so excellent so if you um okay let's just display it first and then i can discuss so we can display it like this cv2 dot put text and we can write here image and we can write our string so we can write inside that integer so if you want to directly show the decimal places so if you want to show the 0.5 count as well then you can keep it like this so you can write simply count but i don't like that uh or let's try it now i will show you and then we will change it and then we will write here let's say 50 and 100 and then we will write cb2 dot font let's put the plain one 15 and then we will put 2 5 5 0 0 and then 25 actually these are very big values we will put big values later on but for now we just want to see if we are getting the output properly okay so what is the problem it's not callable is that the issue oh okay i forgot to write comma okay so here we are getting 1.5 to 2.5 3 3.5 that is good so what i was saying is that if you don't want that you can simply write here int and then you can put it like that so if we see that now it's zero it becomes one two three so it's up to you which one do you prefer okay so now that we have this we can make it a little more appealing so what we can do is we can first of all add our fps so we didn't add that earlier so we can add it now so we can write here c time is equals to time time dot time and then we can write fbs is equals to 1 divided by current time minus the previous time and then we will write our previous time is equals to current time and then we will put the text so we can copy this and we can paste it here and then we can change it to fps so that should be good now we need to put previous time as zero over here okay so that will give us the time and we can comment this to check the fps so there you go so we are getting good fps now this video is 1080p and we are reducing the size of it and that's why it's giving a lower frame rate otherwise you will have a better frame rate if you uh because here we are resizing so if you directly use maybe this size or even lower you will get higher frame rate okay so then what we can do is we can put our what you call the number of count in a box so let's create a box so i have already checked the values of this box so i will directly input it so we will write here cv2.rectangle rectangle and we will put in our image and we will put in 0 450 and then 250 and 720 so this is for a 1280 by 720 which is hd image so it is for that so 255 and 0 and then cb2 dot filled so this will give us a green box and what we can do is we can uncomment this and we can format it and then here we need to change the value let's put it as 45 and 670. let's run that there we go okay we need to change the size of this so 15 and 25 25 okay so there you go so now we are getting this nice and big so we can see what is happening that is quite good and it is giving us the angle do we need the angle uh no i don't like the angle let's remove the angle from here so let's run it again there you go so now we are getting without the angle so it looks good now what else can we do okay let's put the bar so if you remember we did it in one of the other videos for the volume gesture control so you can check that out as well it was to control the volume of a computer using your hand gestures so we use the bar in that one as well so here we are going to use the bar again so for the bar first of all we need some values so we will write here bar is equals to np dot interp and we will convert our angle from the range of 220 to 300 uh not 300 310 and we will put it as 650 and 100 so this is the maximum value of our bar uh no the minimum value of our bar and this is the maximum value of our bar because the opencv convention is opposite so this is the minimum this is the maximum okay so why is not giving a space here okay so then we need to create that rectangle so here we can create that rectangle we will copy this because we are lazy we will copy twice and we will also put the text okay so now we need to change the values here so the starting will be 1100 and then hundred then we have one one seven five and we have six fifty then we have uh what we call 1100 here and then we have our bar value so integer bar not bad bar and then we have eleven seven five and then six five zero so for the text we will have our percentage so we will make a string and we will write here integer and then we will write our percentage and at the end we will write percent so this we will display at eleven hundred and seventy five and the size will be four and four okay so we can write here that this is for our bar draw bar and let's say we write here uh show or let's try draw girl count okay so then let's try it out there you go but the bar value is not changing what did we do wrong oh so apparently the value of the bar is not changing because it is filled come on so this should be let's say three and let's try that yep so it goes down it goes up it goes down it goes up now one more thing we can do is when it reaches the zero or the maximum position we can change the color so that we know it's it's like when you reach a certain point and or you press a button it changes color so you can you kind of get a feedback so it looks good so that is the reason why we put an if within an if so here we are going to write color is equals to by default we are going to write the color as purple and if it reaches we are going to change the color as green so we will put the color as green so 2 5 5 and 0. and the same thing we can do for our zero if you want to change the color for zero make it different than the other ones you can do that too but we are going to keep it like that so here we are going to write we are going to write here color and then here we are going to write color and here we are going to write color so all of them are same so now it's green purple purple green purple green green yeah so that's how you can tell if you have reached the correct point or not so as you can see it works very well and again if you want to use a webcam you can simply enter the value or the id number of your webcam and it should run pretty much the same way just make sure that your face is visible within the camera and you are at a good distance because a lot of this depends on the face as well so if you are not within the uh if your face is not in the camera it will not detect properly so that is pretty good so i i had a hard time finding uh these videos with bicep curls so that's why i just had this one but i will try to find some more but yeah so that is the idea the error is because the video ends that's why it gives this error it's not something that we made a mistake here hey everyone welcome to my channel in this video we are going to create a virtual painter using ai we will first track our hand and get its landmarks and then use the points to draw on the screen we will use two fingers for selection and one finger for drawing and the best part is that all of this will be done in real time so the first thing we will do here is to go to canva.com and if you're not familiar with canva it is basically a design tool it's an online website that helps you create all these different designs for brochures business cards flyers whatever you want you can basically design here so this is a free website so you can go and start you can start by just signing up so here we have a canvas of 1280x720 so we are expecting our webcam to be of this size that's why we are using this size so the idea here is that we are going to create a design to actually make it look more appealing and make it look more like a software so this is up to you if you want to skip this step you can do that too there will be files to download from my website the ones that i already did so if you want to skip you can go ahead and do that too but here we are going to start off with a rectangle and what we will do is we can we can put it on the side but the issue with the side is that it will be hard to select different elements so what we will do is we will put the menu on the top so our menu will be here at the top now for the width let's keep it at 100 100 is too small maybe 120 125 yeah that looks good so then we can lock this or let's change the color first so i like this blue or maybe this one yeah this one looks better so let's lock this so it doesn't move around then we are going to add let's add the logo first so i'm going to go to photos and right here logo uh no it's in upload my bad so i have my logo here i'm going to place it on the side and yeah that looks about right then we are going to add some brushes so here we will go to the elements and we will search for brushes so let's write here brush uh okay so these are strokes we are not getting actual brushes here we have let's write paint brush maybe yeah i think now we have better results uh i want to look for something that is free this is free okay so maybe something else that is free this is free as well so i'm looking for something free so that everyone can use it not just the pro users um maybe this one no it's pro this one is free too do we have a color of i think this one is the best okay let's keep it this way and if we want to we can change it later on so let's take this part right about here and we will zoom in and maybe a little bit smaller there you go so i think that looks good now the good thing is that most probably this is an svg file so we can change the color for it so let's say we want pink what is that purple let's try pink and what we can do is we can have another one that is a little bit bigger than this at its back so that will indicate that this has been clicked so we can put it okay let's change the color first we can put it as white or gray or dark gray let's put it at the back i think white will be better okay i'm no designer i'm just eyeballing this so i'm gonna put it as whites yeah okay so then what we can do is we can copy both of these and we can paste them here so somewhere in the middle and we can paste another one and then we need an eraser so we don't need the last one actually the one at the back or do we okay let's keep it let's keep it for now i will tell you why later on okay so now we need an eraser so let's write here eraser okay this one is pro pro pro this one is free no not really all of these are paid oh this one is free yeah i think that could work so let's put that a little bit bigger i think i'm putting it too big anyways you can change the design later on uh we are learning the concept here uh or let's just make it a little bit smaller so i think that will be better so we can grab these two and we can place them here this one in the middle this one a little bit further okay i think that is better and then we can make this a little bit smaller and there you go so i think that looks good now what we can do okay these are not at the top to put it at the top here okay so now what we will do is we will change the color for these so the first one is let's say pink the second one let's say is blue or dark blue in between maybe this looks weird okay let's keep this as blue and then the last one let's keep it as green so these are the let's say three colors that we want and then what we will do is we will this doesn't seem right i think at the back needs to go higher maybe yeah maybe like this okay i'm going to copy this and we will paste it here so we'll put it at back and the same thing to the last one okay so you can move it around with your keys your arrow keys so maybe a little bit higher again you can spend a lot of time on this but we are going to skip that we will not put a lot of effort here just for a demonstration purpose but you can of course go ahead and try a lot of different things so um now what we will do is we will copy all of these we will duplicate them so here you can see or let's change the size first so we already know that our size is 1280 by 125 so we will go to resize and we will change this to 125 and we will copy and resize or you can resize is up to you i'm going to press resize so it will do it on the same one then i can unlock this and i can grab all of this and i will scale it up so it should fit perfectly because it is the same size there you go so this is the image that we need so we are going to copy this four times because we will have a selection for each one of these so we can delete the selection for this and the selection for this and this is the first one so first one this is the one selected the second one this is selected and for the third one we will have this one selected and the fourth one we will have the rubber selected so here we are going to it is whites let's change this to let's change this one to the same color as this and let's make it a little darker like let's make all of them darker so yeah and then let's make this it's a black that's too much okay i think that is enough indication so we just need an indication that that has been pressed so that should be fine so now what we can do is we can download all these four images and the idea here is that whenever we detect a click we are going to uh change these images so we will see which one do we need so if we have clicked the first one then we will change the image to this if the second one is clicked we will change the image to this then the third one and the fourth one which is the eraser so at any given point only one of them can be selected so that's the idea so we can go and we can write here that this is our virtual painter let's say and we can download all of these in jpg or png whatever format you want let's do it jpg and we can download all of these so here i am in the pycharm project now this is the same project that we have been using for the hand tracking for the finger counter for the hand uh what he called volume control so we have done quite a bit of projects earlier than this so this is the exact same uh what do you call project and you can go to file settings and you can see that we have already installed our media pipe and we have already installed opencv but if you are new you can go to the add button you can write here opencv dash python and you can install this and then you can go to mediapipe mediapipe is the google library that we will be using for hand tracking so you can download and install this so this is the main idea now what we will do is first of all we will right click this and we will open it up in the explorer now once we are in our project we are going to create a new folder and we are going to call it header so we will have our header images in here so these are our images that we downloaded so i will copy all of this and i will paste them here and now you can see that these are our header files so we have a total of four header files so when we go back to our pycharm project you will see that we have a new folder by the name header and we are going to use that for our images so then we are going to right click and we will create a new python file and we will call this virtual painter so now we are going to import our libraries so we will write here import cv2 import numpy as np and then we will need imports time if we want to show the frame rate and then we will also need imports os because we need to access these files so we will need that what else we also need our hand module so we will write here import hand tracking module as htm so this will be our hand tracking module now if you are not familiar with this if you haven't watched the previous uh tutorial then make sure you go ahead and look at that because this is the tutorial in which we went step by step and we created this hand tracking module so this is very important this basically tracks the hand so if i run this you will see if i right click here this should open up my webcam and if i bring in my hand you can see it tracks the hand so this is the idea of the hand tracking module and based on these values we are going to do some painting okay so now first of all we are going to import our images so that is the first thing we are going to do we will write here that our folder path is equals to header so these are the header images and we are going to say that my list is equals to os dot list directory list directory and then we will get the name of all our folder uh path files so this is the idea if we print this out if we print my list you will see that we get all these uh images name so here you can see one dot jpg two three and four so what we have to do is we have to import them so that we can use them later on to overlay on the top so here we are going to write for image path in my list my list we are going to loop and we are going to import so we will see that our image is equals to cv2 dot im read and we are going to read from a string and that will be the folder path slash the folder the image path so this will be our complete path that we need to read from and then we are going to store it in a list so let's call this list overlay list so this will have all the images that we want to overlay so we will write here overlay dot append and we want to append our image okay so this will overlay our images uh no it will import all our images now what we can do is we can print the length of our overlay list so we can see that whether we have imported all of them correctly or not so let's run that we have imported indeed four so that is good so far and next what can we do next next we can run our webcam and then once we have our webcam we can simply overlay one of these images by default so we can call our image let's say header and we are going to give it an initial value we will say that over list at zero so now this is our image so whenever we get our original image we are going to overlay this on top of that okay so this is good for now what we can do next is we can create our loop and we can run our webcam so here we are going to write cap is equals to cv2 dot video capture video capture and we will say that device number one now you should write device number zero because i have multiple cameras i'm using one then i will write cap dot set now this is important because we want the exact same size we want one two eight zero by seven twenty so we have to make sure that the width and the height are exactly the same so we are going to write one two eight zero by seven twenty then we are going to write here while true we want to run our webcam and we want to get our images so we will write here that the success and the image is equals to cap dot read and then we are going to um okay let's just display it first so cb2 dot weight key is one and we have cv2 dot i am show i will say image and image so that should be enough so let's run it we have an error oh i wrote i am read no no no i am show what happened there okay so there you go so this is our image and you can see my hand so what we will do we are going to overlay our image now so how can we do that so we can simply write that our image so you might think it's a little bit difficult but it's not it's very easy because our image is a matrix we just need to define where is the location of this new image so we will slice it so here we are going to say that our height is from 0 to 125 because we know that the image size is 125 and then the width we are going to say is from 0 to 1 2 8 0. so at this region we are going to say our image is equals to header and that's how simple it is so if we run this there you go so now you can see that we have our image and on top of that we have overlaid our first image now this is good because by default we want our first uh paint brush selected which is the pink one so we are already up to a good start so here we can write that we are setting the header image setting the header image okay so this is good and now we can just separate this because we are going to write our code over here so what else can we do now the next step is in before we go into the details of the project let's split it up into pieces so first of all we will import the image okay so that's the first step we have pretty much already done that uh there is another thing we need to do i will tell you later this is the first step let's write it here one then the second step is that we want to find the landmarks so find hand landmarks so this can be easily done with our hand tracking module so that should not be an issue then number three is basically checking which fingers are up so check which fingers are up now this is because we want to draw when one finger is up which is the index finger and we want to select only when two fingers are up this will allow us to easily move around the canvas without painting so when our two fingers are up it will not draw anything if we want to draw we have to put one finger up the index finger so this will allow us to easily navigate through the canvas then we have the selection mode so we will check here at the fourth stage we will say if selection mode which is when two fingers are up two fingers are up then we have to what do you call we have to select not draw then in the fifth one we have to check if we have the drawing mode mode when index finger is up so this is the idea now let's go on to the next parts actually do we have a next part no so this is pretty much it we have five different steps we are going to go step by step and see how we can achieve each one of these so the first thing is that for importing image it is pretty much done the only thing we have to do is we have to flip the image so we are going to flip horizontally this is because when you are drawing if you want to draw on the right side then when you move to the left it will draw on the right so to flip the image we are going to right here flip and we will write image and we want to flip in the first selection now this will allow us to solve that problem so let's try that so now the image is flipped so if i go on the right side it's going to the right if i go on the left it's going to the left so it will be easier to draw because it will be more intuitive now the second step here is to find the landmarks now the landmarks we can find but we need to import first so we are going to write here that our detector is equals to hand tracking module dots hand detector and then we are going to give in uh detection confidence so this is up to us what value do we want to use but we are going to keep a high confidence because we want it to be uh good in painting we don't want a lot of mistakes here and there so by default it's 0.5 we have changed it to 0.85 then over here we are going to say that our image is equals to detector dot find hands and we are going to send in our image so this will draw on our image and it will uh detect the hand so let's try that there you go so now we can see it's detecting properly so that is good the next step is to get all the landmark positions so we are going to write here landmark list is equals to detector dot find position and we will write image and we do not want to draw so draw is equals to false and then we can check if the length of our landmark list is not equals to zero not equals to zero then we are going to do something let's print the lm list so let's try that so let's see if it prints there you go so it's printing the landmarks if i move around if i go out of the image it does not print anything if i come back in it prints the what do you call landmarks so that is good so now that we have this we need to know the the tip points of our index finger and the middle finger so what we will do is we will call the index finger tip as x1 and y1 so we will say x1 and y1 is equals to lm list at point number eight so this is the landmark eight so here you can see this is the landmark eight and we have the value of 729 and 396 for example in this case so this is the tip of our index finger so here i can write tip tip of index and middle finger zoom okay so then uh as you can see it is basically the first and the second element not the zero one so here we have to write from one till the end so we will write it like this so it will grab only these two so we are unpacking them here so x1 and y1 so we will do the same thing here for the middle finger and for the middle finger we will write x2 and y2 and the number for middle finger is 12 so that should be good so now that we have our landmarks and everything is good now we need to check which finger is up so this thing we did in our one of our previous videos and that was i believe in the finger counting project so if you remember here if we go back here so here is the code that when it is thumb it will check if it is up or down then it will check the rest of the four fingers if they are up or down so and then it will give us the total finger count and it will also give us the list where is the list yeah the fingers is basically the list which will tell us which of these are up and which of these are down so what we can do is we can pretty much copy paste this part and we can create a method in our hand tracking module so earlier we did not do it as a method in our class but this time around because we are using it again and again it is better to create a class rather than putting it in different projects every time so this is our hand tracking module we are going to open that up and you can see right now we have only two methods find hands and we have find position so next we are going to put another method and that will be called fingers up so we will write here def fingers up so that will tell us which of the fingers are up so we will go to our finger counter project and from here we can copy so we have from fingers till this part i think that should be enough so we will copy that and here we are going to paste it so what we are doing is we are first creating a list called fingers and then we are basically checking if the tip of our thumb is on the right or on the left so that will tell us if it's open or closed and then for the fingers we are checking if the tip of the finger is above uh the other landmark which is two steps below it or it is not so if it is below that then it means it is closed if it's above that it means it is open so it is storing the value of 1 when it is open it is storing the value of 0 when it is closed but why are these errors over here these errors are here because it does not recognize tip ids and tip ids was something that we declared over here so this is the list so we can copy that and in our module because we know that it will not going to change the tip numbers the tip ids they are not going to change we are going to write it in our initialization so we will write here self dot tip ids is equals to this so now if we go back and instead of just using tips we use self.tips it should work so we will do that and there you go so that works but now the lm list it doesn't uh recognize it because we did not define it as a self as an instance so we need to write here self dot lm list and i can copy that self again and we can paste it here we can paste it here so we can paste it here we can paste it here and here and here so what happened now was that not only we are returning this lm list when we find the positions we are also storing it so that if we want it in our other functions or our other methods we can use it as well so we do not need to send it out and then receive it back again internally it is available for us to use okay so that should be good and if we just return these fingers up it should give us if the fingers are up or not which of them are up or not so we can just return the fingers and if we go back to our let's close the finger counter if we go to our virtual painter here we are going to ask we will say that detector dot fingers up fingers up and do we need to send something in no not really so that is good we don't need to send anything and we can simply receive the values in the variable or the list fingers so let's print that out so fingers and let's remove the previous prints so let's run this and see what happens so something is out oh it needs to be inside this my bad okay so this is our image i will bring in my hand oh nothing is happening why is that okay so it takes a little bit uh to detect now you can see uh how many fingers are there uh okay the issue is i will tell you what the issue is for the thumb it's showing one when it is open when it is closed and 0 when it's open because we are not checking the left and right so this is an issue that needs to be solved more precisely but for now we'll just change this what was it before it was greater than less than so by writing less than it will solve the issue for now but later on we do have to fix this so i'm not going to fix it now for this one but in this case you can only use your right hand to draw so if you use your left hand it will be an issue actually it won't be an issue because we are not using the thumb at all but it will give you wrong values so here you can see the thumb you can see two of them then three then four so you can see all of these fingers they are being detected properly so now if i do this i should be able to select any of these and then if i do if i do this i should be able to draw so this is the main idea so we need to check uh now we know that which fingers are up now we need to check if it's selection mode or it is drawing mode so how can we do that that is very simple let me bring that in because i will forget again okay so now we need to check if uh the second oh no the first elements so we have fingers at first and fingers at second if both of them are up this means if both of them are true then it means it is selection mode so we will print here say selection mode and if that is not the case we will write here then we will write here that this is equals to false we only have the index finger up then we will say drawing mode so this is the idea now what we do after that is a different story but for now we just want to check if it is able to understand this or not so let's try that we should remove the previous print let's remove that let's run this so this is selection mode because two fingers are up and now it's drawing mode so now it's working very well we can see when it is so when it is nothing it doesn't do anything when for detection you have to make sure a lot of the image or a lot of the hand is available in the image so yeah so this is drawing mode we can draw this is the selection mode and everything is good so far okay so the next step what we can do is we can uh change or we can draw some circle around it so here if it is selection mode let's draw a rectangle so we are going to write here cv2 dot rectangle and we are going to write their image and then we are going to write that we want from x1 to y1 minus let's say 15 we want to go above because we are creating uh let's say a rectangle using two points so we don't want to just give those points we want to make it a little bit higher and a little bit lower so then we are going to use x2 and y2 and this time we will make it plus 15. so then we will give it a color so let's say 255 0 255 this is purple and then we will say cb2 dot fill now we are drawing a rectangle here because we are going to draw a circle when it is time to draw so for the circle we are simply going to write here cb2 dot circle and we are going to give in our image and we are going to draw from x1 and y1 so we will write here x1 and y1 and then we are going to say 15 and the draw color again 2 5 5 0 2 5 5 and then we are going to write cb2 dot filled so this will be a visual indication of when we are in selection mode and when we are in drawing mode so for selection it will be rectangle for drawing it will be circle so let's try that out so here it is rectangle and then if we do this it becomes circle so here the detection is not that good but still you can see the rectangle is very small 15 is not a good value let's make it 25 and 25. yeah now it is better so here i can say it is selection mode and here i can say it is drawing mode so it will be easy for us to detect so my hand should be really back and then i should be able to draw easily but because i'm near to the computer so it's a little bit difficult to do it this way but it should work fine so far we have done well okay so we have the selection mode now and we have the drawing mode so let's work with the selection mode first what we have to do now is we have to check if we are at the top of the image now if we are at the top of the image we are going to change our uh image or selection mode based on the location so first of all we will check if y1 is less than 125 so this was the value of our header so if we are in the header then we are going to do something so we will say that if our value is between 250 if our x1 is between 250 and 450 then it means it is clicking the first one so here let me just write it here checking for the click so that is what we are doing here so we will write here that if this is the case then our header is equals to overlay list at zero so that's the first one so by default it is the first one because we are using it here the value of header is already where is it yeah it's already the first one then we are going to copy this and we are going to paste it down here and we will write that else if else if the value is between 550 and 750 so these values i have checked before so if it's not the same we can change a little bit so then it is 750 then it should be overlay list one and then we are going to copy this we will paste it here and then one more time so we have a total of four so here it will be 800 to 950 and this will be 1200 and one zero five zero so this will be two and this will be three okay so let's try that out so if we move around we should be able to select so if i go up you can see it selects if i go here it selects if i go here it's an x so wherever i go it selects the correct one and the visual indicator is quite nice so it tells us which one did we select and if i go with one finger it's not going to do it because this is the drawing mode in the selection mode we can select okay so that is the difference here so now we are able to select properly so that is quite good next we are going to change the color so whenever we select something we want to indicate that the color has changed so here for example for the drawing circle or even for the rectangle we want to change the color for it so what we will do is we will declare a color we will go up here and over here we are going to say now the first image is for purple so if we go to the header and we see the first image it is for purple or you can say pink whatever color you want to anyways so we are going to say that by default our draw color color is equals to purple so whenever the value is selected another value is selected we are going to change this draw color so and we will use it to actually draw so instead of putting a random color for the rectangle and for the circle this is the color that we are going to draw and let's draw it after we have uh checked for the click okay we will put it here is it in the if no it should be here okay so what can we do now is that we can change this color individually so here we will say that draw color is equals to for this case it should be purple so it should be two five five zero two five five and for the second one it is blue so it is b g r so this should be on and this should be off and then we have green so b g r so this should be 0 this should be 2 5 5 and this should be zero and then for the eraser we will just make it black so black will erase everything so we will make it zero zero zero two five five two five five two five five is one and zero zero zero is black so let's run that let's try this so here our color is purple and both of them are purple we go to selection mode and now it's changed to blue you can see for both of them it's changed to blue we go to green and now you can see it's changed to green right so and if we go to the eraser it's changed to black so we can erase it for the black actually we can make an exception for the black because for erasing normally the tool is quite bigger but again we will discuss that later so now this is done so the selection mode is done we are able to select our color and it changes the color that is perfect now what we need to do is we need to draw so we already know when we are in drawing mode we now need to draw based on our points okay so now the easiest method and this is one that i have been using before as well because it is very simple whenever you are learning a new concept you should go with the simple thing rather than the most advanced or complicated one so earlier what i did was i created using uh simple points so whenever you have a point you just draw that single circle and then whenever you move the finger you draw the circle there as well but the issue with that is whenever you have a rapid movement it will not draw continuously as a line it will have some gaps so that is not a good way to draw so what we will do now is instead of drawing just a circle we are going to draw lines but the issue with lines is that we need a starting point and we need an ending point so here whatever current position we are in we just have that single point so we need to know the previous point as well once we know those both points then we can simply draw a line so let's draw that so here we are going to write that our cv2 dot line and we are going to draw on our image and where exactly do we want to draw we are going to draw at our x previous position and x y previous position and then the new position which is x1 and y1 then we are going to say which color do we want to draw it with we are going to say the draw color and then we are going to give in the thickness so here we can declare a variable we can say it we can say that it is brush thickness and if we want to change it we can change it from the top here so here we can create our variables and we can write here that let's say our brush thickness is 15. so we can easily change it up and now we have the x and the y previous so we will go up here and we will declare that the x previous and the y previous are 0 and 0 and then we will go down here now the issue here is that at the very first iteration at the very first frame we will not have any xp and yp we will have the value of 0 0 so it will draw a line from 0 0 to whatever the point you are at and that will look really bad so this should not be the case so how can we fix it we can fix it by writing here that if our xp our xp is equals to 0 and our yb is equals to zero it means it is the very first frame that we have detected the hand or we are starting to draw then we are going to say that xp and yp is equals to x1 and y1 so instead of putting 0 0 we are saying whatever value it you are at draw exactly at the same point so instead of drawing a line it will just draw a point so the very first time we see our finger it will just draw a point instead of a line after that it will keep drawing as a line so whenever we have the new points we are going to say that our xp and yp is equals to our x1 and y1 so these are our previous points so it will keep updating those so hopefully that was clear now let's try to run it and see if it draws so here if i do this it is drawing but it is removing at the same time so as you can see it is drawing something but it is removing if i go really fast then you can see it is drawing so this means that our image is updating every iteration so we cannot draw on our actual image so if we want to do that we will have to uh do something else i will tell you what that something is later on but for now we need to create a new canvas on which we can draw so what we will do is we will go up here and at the very top we are going to create a new image and we are going to call it image canvas and this will be the canvas on which we will draw so we are using numpy to draw our canvas and we will use the zeros method and the size is 720 by 1280 they use heights before the width so we are writing it like this and it has three channels because we want colors and it is unsigned uh unsigned integer of 8 bits which means it will have 0 to 255 values so that is pretty much it and now instead of drawing on our original image we are going to draw on the canvas so where did it go here so we will copy this and we'll paste it here and we will say image canvas and we will show the image canvas as well canvas okay so let's try that so now we should have two images so this is the image canvas and this is the original image i will keep this at the top and if i if i bring in my hand and if i draw now you can see it draws and that is very very satisfactory okay so this is the idea but now the thing is it's not drawing here which is fine if you want to you can draw there as well and i will show you how to do that but before we go there let's try out different colors so that we know that it is working well so i will keep this in the front but it will be hard to see okay let's try to put it here on the side and i will try to run that okay so this is a selection mode i will select blue and now i will draw with blue then we have the uh green mode the selection mode and then i will draw with green now i want to show you something else if i don't put this part if i don't put the fist frame condition what it will do is it will draw a line from the very start so if i bring it here uh okay this is very annoying uh we need to fix this first okay let's say i am uh here is my hand and then i start drawing there you go so did you see it started from the zero zero point and it drew all the way to the current point so whenever you have uh the new points let's say i go to the blue one and i select that you will see that okay not now but in the previous one you saw that whenever you have the first first frame then it will create this problem so we are going to open that up so that we don't have this issue okay next we can also try the uh erasing parts so the black one should erase whatever we have drawn earlier but what we can do is we can make it a little bit bigger because if it is bigger it is easier to erase so let's first select so let's draw and now i will go to the eraser and i will come here and you can see it is erasing but this is not very good because it's very small so what we can do is we can have a special condition for the erasing so we can write here that if the draw color is equals to 0 0 0 then we will have the size different so here we will copy this part uh actually we will copy both of these and here we are going to write that instead of the brush thickness it is the eraser thickness so we will write here eraser and we will also write here that this is eraser thickness so we will go up and here we are going to write eraser thickness is equals to let's say 50. so it will be easier to erase and here we can simply write else we do this so that should be good so if we have the eraser tool now so let's draw now let's go to the eraser tool and it's a little bit hard for me because i'm sitting very next to the webcam and there you go so i can erase now very easily and all of this will get very very nice and very very good looking once we do it on the original image the only thing is that we have two images now and it is very hard to see what is happening so now all we need to do is we need to put it on the original image so we need to draw on the original image so how can we do that we cannot draw on the original image because it refreshes every time so instead of doing that we are going to add our two images so what we can do is we can go down here so here what we can do is we can write here that image is equals to cv2 dot uh add weighted and then we have to give in our image the first image and then we give in the value 0.5 let's say and then we give in the second image which is image canvas and then we given the value let's say 0.5 so this will add these two images and it will blend them so let's try that out oh there is a argument missing there you go so if i draw you can see now it is drawing again i need to go back a little bit okay so now you can see it is drawing when the hand is really at the back like this it will draw well but because i'm very close to the pc and the camera it is hard for me to do this i can go back a little bit and there you go but the issue here is that this does not look very good i can change these values but still it will be a blend it will not be an actual uh merging of the images it the colors will not be that bright there will be a transparency on it so if you don't want that what can you do now it is a little bit complicated but if you break it down it is very simple so what we will do is we will write here uh let's where is the statement right here okay so we need to go here and we will write here first of all we are going to create a gray image so we will write here image gray is equals to cv2 dot cvt color and we are going to say that image canvas and cv2 dot color underscore bgr to gray tr to gray so we are converting it into gray image you might be wondering wait why are we converting it to gray image i will explain in a bit then we are going to convert it uh into a binary image then we will write here image inverse is equals to now we are creating we are converting this into a binary image and we are also inversing it what does that mean i will explain as well cv2 dot threshold and image gray and we are going to write 50 and 255 and then cv2 dot threshold binary inverse so the idea basically is that we want let me actually run it so you can see better so let's say i draw something let's say i drew this and now i have this image so what i want to do i want to convert this image into black and white so wherever i have black i want it to be white and wherever i have this colored image i want it to be black so what this will do is it will create a mask with all this white and only this region as black and then i will go to this image and in this image i will make all of this black and then i will merge this image with this previous image with the black area so it will overlay these two so i know it sounds a little bit complicated but you will see how it works so we first of all we are creating that inverse image so that all that region where we drew uh it is black then we are going to write here that our image inverse is equal to cv2 dot cvt color and we are going to write that our image inverse is basically cv2 dot color color underscore gray to bgr now we are converting it back because we want to add it to our original image so we cannot add it if they are not the same dimension you cannot add a gray image to a colored image so we need to make sure both of them have three dimensions then we are going to do image is equals to cv2 dot bitwise and and we are going to add our original image with the inverse image image inverse and then we are going to add our image is equals to cv2 dot bitwise or and then we are going to write image and image canvas so again this might be a little bit confusing but let's see the results and i will go step by step and explain it as well so now here you can see i can draw oh it went down here you can see i can draw easily it's not updating this value it seems yeah okay it's not updating the value okay so this is drawing on the canvas now which is good but actually on the original image but the issue is that it's drawing this straight line whenever it is detecting again so what is happening is that whenever the hand is detected we should put this x p and y p as zero so we will copy this part and whenever the hand is detected then we will make it zero actually no whenever the hand is detected no no no whenever we start drawing again so yeah whenever we start drawing again or whenever we have a selection whenever we have a selection then we are going to do that okay let's try that yeah now it's starting from the right position instead of a random position there you go so that is good so let me explain what is happening at the back so first of all we have our gray image and then we are converting it into an image inverse so let me display that image inverse so this is our inverse image so wherever i draw something it is going to draw but with black area so let's try that so as you can see it will draw with that black area right so that is the idea then the next step is that we add these images so we are adding with and we are adding the image inverse and the image so let me show you how that looks like so i will remove this part and we will run it we will see here that when we draw you will see that now it is showing us black region wherever we drew so all we have to do now is we have to add this image to this image so when we add this because here we have colored part here it's black here we have colored part and here it's black so if we do an or operation between these two it will give us our final image so here we were doing an and operation with the original image and the image inverse here we will do an or operation to add these two up and there you go so now it's moving around and the flickering you can improve if you have a better detection if you don't have a lot of noise so again uh it cannot be completely uncontrolled environment yeah the environment should be a little bit controlled to have some good results so if i go back you can see i can draw a few things and then if i go to the eraser if i go to the eraser and i select the eraser it will rub quite a bit again my conditions are not very well but you can get the idea and if i want to increase the size of the brush or the paint or whatever i can change it from here i can make this 100 for the eraser thickness and it will erase better so here if i draw something and then i can do this and i can go to the eraser and then i can erase you can see how that's how simple that is hey everyone welcome to my channel in this video we are going to create an ai based mouse controller we will first detect the hand landmarks and then track and click based on these points we will also apply some smoothing techniques to make it more usable so here we are in our pycharm project and we have created it by the name ai virtual mouse so what we have here is the hand tracking module now if you have not been following we have written this module from scratch so from the very beginning from the very first project we have added a lot of different methods to this particular class so now the thing is that in our previous project we added the fingers up method and the fine distance method and this will allow us to very easily create this new project so we will have a look at that how we can do that and this file of course will be available online on my website so a lot of you ask how do you access the code on the website you have to log in and you have to enroll to get access and of course it is free just enroll and you will get the access now if you have not been following you have to go to file settings and you have to go to the interpreter and you have to add the open cv open cv python and we have to install it and then we also have to install media pipe through which we will get all this hand tracking functionality so media pipe and we are going to hit install okay so now both of these are installed and we can hit ok so the first thing we will do we will go and create a new python file and we will call it ai virtual mouse project okay so what we will do is we will first import cv2 then we are going to import numpy as np and then we will import our module which is hand hand tracking module as htm and then we are going to imports import time now apart from all of this that we have been doing earlier as well what we will do is we will also add a new library which will allow us to move around with our mouse so with the python script we will be able to move our mouse we will be able to click on it there are a lot of these that you can use the one we are going to use is called auto pi so we are going to hit install on that so an error occurred so let me check again if we can install it okay so he was giving an error earlier but then i clicked on it again and it installed fine so we can close this and we can go back and now we can also import auto pi okay so the first thing we will do is we will run our webcam to see everything is working so we will write here cap is equals to cv2 dot video capture and we are going to write that our video id is one now you will use zero if you have one camera i have multiple so i'm using one now the second thing is that we have to have a fixed width and height so we cannot leave it to the default of the camera so we need to change our width and height so we will write here cap dot set and the prop id for width is three then the prop id for height is 4 so we will make it 480. so that's how you can define the word and height actually what we can do is we can put them in variables because we need to use them later on as well so let's declare our variables over here so we are going to write here that our width of the cam and the height of the cam is equals to 640 by 480 and we can just input these values here so this is the height of the cam and then we can simply go let's remove this and then we can simply go and write while true we are going to say success and image is equals to cap dot read and we are going to get our frame value so once we have this frame value then we are going to say cb2 dot im show and we will say that our image and then i mg and then cb2.weight key and one so this is pretty much that we have been doing in all our projects so let's run this and see if it works there you go so now you can see my webcam and there is my hand so that is all good so next what we can do is we can add our detector for the hand tracking but actually let's discuss what are the steps that we are going to take today to create this project so the first step will be uh let's put some numbering as well so it is easier to remember so the first step will be to find the hand landmarks so that will be the first step then the second step will be to get the tip of the index and the middle finger so the idea is that if we have just the index finger then the mouse will move if we have the middle finger up as well then it will be in clicking mode so if it is in clicking mode and if the distance between the two fingers is less than a certain value then we will detect it as a click so you can bring your fingers together and it will click so in that mode we are not going to move the mouse but in the index mode where we are moving uh that is the only mode in which we will move so what we can do here is that our second what do you call our second step will be to get the tip tip of the index and middle fingers so once we have the tip of the index and middle finger what we will do is we will check which of these fingers are up so we will write here number three check which fingers are up then in the fourth step based on this information we will check if it is in moving mode so we will write here only index finger which means it is in moving mode so we will move our mouse and if it is in moving mode then we are going to then we are going to convert our coordinates the units now why do we need to convert because our webcam will give us a value of let's say 640 to 480. so for my screen i have a full hd which means 920 by 1080 so we need to convert these coordinates so that we get the correct positioning okay so then okay it's bringing it back we can change that later then we will add another step to actually smooth in the values so why do we need to do that so that it is not very uh jittery it doesn't flicker a lot so we will write here smoothen values so we will smoothen these values and once that is done we can simply move our mouse so move mouse okay so this might seem a lot but it these steps are very easy some of them are single lines some of them just two lines so don't worry about these then number eight will be to check if we are in clicking mode so when both index and middle fingers are up then it is clicking mode so once it is clicking mode we will find the distance between these fingers so we will find distance this tends between fingers and if the finger is if the distance is short then we are going to click so we will write here click mouse if distance short so these are the 10 steps we have to follow and the 11th and 12th step is fairly easy so the 11th step is the frame rate to see if we are getting a decent amount of frame rate and the 12th step is to display so we have already done this display thing so we do not need to do anything more in that now what we can do is we can go on to the frame rate the frame rate is very simple as well we have done this quite a lot of times by now so we will simply write time.time and then we are going to write fps is equals to 1 divided by c time which is the current time minus the previous time and then we will write that the previous time is equals to the current time and then we will write cb2 dot put text image then our string which will we will first convert it into integer and then we will write our fps and then we are going to write the position so we will write 10 and let's say 50 or let's say 20 and 50 then we will write cv2 dot font cv2 dot font plane and then let's say for the thickness or this is the scale let's put it as three and then we have okay we need to go back then we have the color two five five uh let's keep it blue yeah and then we will write three this is the thickness so if we run this we should have here p time p times equals to zero so if we run this we should have our frame rate there we go so that is quite good okay so next we have the frame rate we have the display now we are going to do the actual part of all of these steps so first of all we have to get the landmarks to get the landmarks we have to declare here the detector we have to create the object so detector is equals to hdm dot hand detector and inside that do we need to add something we can add for example the maximum hands because we are only expecting one hand so we can write here one and the rest we can keep same then here we are going to go down and we are going to write that our image image is equals to detector dot find hands not fingers up find hands find hands and then we are going to uh find the positions of these hands so we are going to write here lm list and the bounding box so this is something that we added in our previous project and we will write here detector detector dot uh find position i think the spellings detector find position and then we will write image so we are sending in our image and that should detect it and it should also draw so let's run this and see if it detects we have an issue uh finance oh we have to give the argument of image my bad okay so there you go so now we are detecting our bounding box and we are detecting the fingers and the landmarks as well so that is pretty good so we are done with step number one and now we will check that if if our length of our lm list is not equals to zero then we are going to get the tip info so we can actually put this up here so here we will write x1 and y1 so these are the points of the index finger so we are going to write lm list and we will write that it is point number eight and we want from we want the element number one and two so we will write it like this uh the same thing we will do for our second finger which is the middle finger we will write x2 and y2 and here instead of 8 we will write 12. so this will give us the coordinates of our index and middle fingers so we do not need to draw these at this point so we can just print them out if you want to see we can print x1 y1 and then x2 and y2 so we can print those and we are getting an error x1 y1 not enough values to unpack expect it to got one why is that lm list let me check here what is the issue this is find position yeah and this is lm list and bounding box yes so that should be fine okay let's print out the lm list first print lm list let's check that yeah we are getting some values and they seem fine uh oh okay my bad should be one one and colin okay there you go so now we are getting the points there you go so now we are getting all these points so for the index finger and the middle finger we move them around you can see the values they change okay so this is good we are done with our second parts now we will go on to the third part check which fingers are up now this is extremely simple because we have already created a method by the name fingers up all we have to do is we have to call it we will write here detector dot find uh not find fingers up and we will store it in fingers so we can push this in and let's print out so print fingers and we will remove the print from here so let's run that so there you go so all the fingers are up all of them are closed one two three four and five so we are getting these values so that is pretty good now let's go to the next step which is okay let me push those all of these in okay so step number four is only index finger moving mode so now we need to check if only the index finger is up so we will write here if if fingers at 1 which is the index finger is equals to one and fingers at two is equals to zero so this is when the index finger is up and the middle finger is down so this will be moving mode so now we need to move our finger uh now we need to check where our finger is moving so we get those points and we send it to the mouse coordinates okay so first of all what we will do is we will write here that we need to convert so here we are converting our coordinates so we will write here that x3 is equals to we will write np dot interp we are going to convert one range to another range so here we are converting the x one value and the initial range is basically from zero to the width of our webcam and then the second range is from 0 to the width of the screen but we didn't get the width of the screen now i know that my screen is this size but it could be different for yours so in order to get the exact value what we will do is we will go up here and we will write here that our width of the screen and the height of the screen is equals to auto pi dot screen dot size so this will give us the size of the screen so if we remove the print from here and if we remove this statement and all of this uh yeah so then we can print this print w screen and height screen okay let's run this and there you go so now you can see it's telling me it's 9 to 1920 by 1080 so this is the idea now that we have these values let's comment this now that we have these values we'll go back here and we are going to continue that it is from 0 to the width of the screen so the same thing we will do with the height we will copy this and we will write here y3 and then we will write here y1 and then we will write here height and then height so this is the idea so these are the points that now we have converted and now we will send this value to the mouse we will smoothen these values but we will do that later on first we need to see what is the original result and then we can convert it so here we are going to write auto pi dot move mouse dot move and then we are going to write that our x3 and our y3 are our coordinates so let's try this and see what happens so you can see my mouse here if i bring in my hand and this is my index finger and now you can see it is moving but the problem is when i'm going to the right it's going to the left so this is very annoying and it is very it's not intuitive so what we will do is we will flip it in order to flip it we just need to flip the width so what we will say is we will say that whatever the width of the screen is screen is minus this so now if i go to the right it should go to the right so the image here will be flipped but in reality i'm moving to the right and the mouse is also moving to the right now if i move to the left the mouse is also moving to the left so this should be easier to work with so that is good uh what we can do is we can draw a circle so that we know that we are what you call moving the mouse so here we can write cv2 dot circle and we will write image and then we are going to write x1 and y1 so we want to draw on that and let's say 15 is the radius and the color is purple and then we will write cb2 dot build cv2 dot filled there we go so let's run that yeah so now whenever we are in moving mode then it will show us this big circle so that we know that we are in moving mode okay so this is good so now one of the issues here is that when i am moving when i'm moving i can go up very easily it's not that bad uh it flickers at the top much more than in the center but i can go there but if i want to go down it's very hard because the hand is not detected properly again if i move down you can see it's not detecting property and i'm unable to go down so what we can do is we can set a region where we want to detect the movement instead of the whole image size we can set a particular range so how can we do that first of all let's create that range so we will write here cv2 dot rectangle and we will set in our image and then we are going to call this let's say frame so this will be a certain value for example 100 or 200 something like that so we will call it frame reduction and we will also again call it frame reduction so we can go up and we can declare it here frame reduction is equal to let's say 100 so we will write here this is basically frame reduction reduction okay so that should be good now once we have the frame reduction what we can do is now we need to give in the second value so this is the initial value now we need to give the diagonal points so we will write here that the width of the cam minus the frame r and then the second point will be the height of the camera minus frame r then we will given the color 255 0 and 255 and then we will give in the thickness so this will draw a rectangle so let's try that so whenever we are in it's not drawing anything oh it is yeah whenever we are in okay um okay maybe we nee we need to put this outside so we can put it outside here because we won't always want to see it whenever we have the hand in we want to see it so now you can see we have our box now the idea here is that when i reach the top of this rectangle it should be the mouse should be at the top of my screen and when i reach the bottom of this it should be at the bottom of the screen so and same thing for the corners if i am moving at this corner it should be at the corner but now you can see it is not at the corner so again you can adjust these values up and down we will keep it in the center for now but later on if you want to you can adjust so how can you reflect this on our x3 and y3 so how can you change these values so all you have to do is it's very simple instead of 0 you will write here frame r you'll write here frame r and here you are going to write width of the cam minus frame r and height of the cam minus frame r that's it so now your values should reflect properly so here if i have my finger at the top right corner you can see it reaches the top right corner if i have it on the other side you can see it's reaching the top and now if i go back and i go down you can see it reaches down so we are having some issues as well it's going out of bounds uh we can fix those issues later uh what we can do is we can push this up as well a little bit so that it is easier for our fingers to move around but we can do that later we can move on to the next step which is to detect the click okay so then we are going to detect the click so here we have to check if both the index and middle fingers are up so we are going to copy this part and we are going to paste it here and we will write one and one so if both of them are up then we need to find the length of uh between our fingers so what we will do here is we will write that our detector dot find distance dot find distance between which points point number eight and point number twelve so uh these are the landmark ids so landmark eight and landmark 12 and then we will write image then it will unpack the values of length and then the image and then the what did i do here it should be comma and then we also get a bunch of info that we are going to ignore so the main thing that we need is the length so we need to know what is the distance between these two fingers so what we can do is we can write here print length and let's try it so when we are in our detection mode it is giving us the length and it is telling us uh there is a good indication because it actually gives a center point as well and it draws a line in between so that is pretty good okay so what we can do next is we can check that if the length is below a certain value then we will detect it as a click but we need to define that threshold so we are going to go back and let's try it out so here it should be open here it should be closed so i can see it's around 30 something so if it's less than 40 maybe yeah okay so we can say if it's around less than 40 then it is detected as a click uh you can do a normalization here as well but that will be quite a detail so we are not going to go into that so we will write here length is less than 40 then we are going to cv2 dot circle we are going to draw the same circle that we had drawn here but this time we are going to draw it in green so we have the detection that it has been clicked so let's try that so here there you go um we could draw it to the center one as well it will look better okay how can we do that basically this is the information we are getting for the line so we can write here info line or we can write line info and then based on this line info if we go to our fine distance you can see cx and cy are the last elements so this is the fourth and this is the fifth so we will write here this is the fourth and this here is the fifth push it down okay so let's run this and hopefully this time the center one will be green there you go so now it looks a little bit better so that's good okay so what is next now we actually need to click so rather than just changing it to green we need to click and the clicking part is way easier than you think uh and that is auto pi dot mouse dot click and that's it so now it should click uh by the way these two we are already uh doing so we should write here that we are checking the distance here and then we are clicking the mouse if the distance is short over here so that's the idea okay so let's try it so what i will do is i will try to click and minimize this this part here so here is my finger and if i move around and i click you can see it's shaking a lot yeah it clicked there you go it click again but as you can see it shakes a lot so this is a very big problem which is not allowing us to use this properly so what can we do as i mentioned before if we go up here we can smoothen the values so how can we smoothen the values so what we can do is instead of sending in exactly the same value that for example if it goes from 0 to 100 instead of saying go to 100 directly we will dilute it a little bit so we will smoothen it we will reduce its value so it goes step by step so what we can do is first of all we are going to create a value called smooth the ning is equals to let's say five so this is a random value that i've chosen uh later on we can see what is the effect okay so now what we will do is we will uh we need to also create two variables so what we will do is we will write here in fact we need to create more than two variables so we should separate the variable declarations here yeah that should be fine okay so what we will do is we will say that our previous location we will call it previous location of x and the previous location of y is equals to 0 0 and the current location of x location of x and the current location of y so again these will be 0 and 0. so what we will do is what did i do here so what we will do is we will use these values and we will update them each iteration to smoothen our mouse so here we are going to go here now instead of x3 and y3 we are going to send in the smoothened values of current location and we will update our previous location so how can we do that we will write here that our current location of x is equals to our previous location of x plus our x3 minus our previous location of x divided by the smoothing value so whatever the value is we will divide it by that and the same thing we will do with our y value so we can write here y you can multiply with this as well you can multiply smoothing as well then you will have to go into points so 0.1 0.2 or you can divide and keep it whole numbers it's up to you so we'll write here y3 and then we will write y and then we will write that's it okay so then we will just send in our x value and y value instead of x3 and y3 and then we will just update these values once we have uh use them so we will write here previous location x and previous location y is equals to current location x okay let's put y first current location uh current location x okay so that is the idea now uh let's put a very dramatic value let's say 20 and let's run it so now you will see if i move it around you see it is quite smooth but it is quite slow so we need to find you see when i stop it takes a while to stop so what we need to do is we need to find a good balance so let's try five so i like this it moves nicely and it stops it doesn't shake a lot there you go i can click as well there you go and let's click on the minimize there you go so yeah that looks good uh let's try 10. uh 10 is good but it's a little bit slower yeah it's hard to stop at that point yeah 10 is a little bit fast it is a little bit slower so maybe seven yeah this one is better there you go i can do that i can go to this one i can click on this one and this one again there you go so it is pretty good so that is quite nice so that is pretty much it as you can see it works quite well and we broke it down into different steps and when you go and try to solve each of these steps it becomes very easy to get a solution and all of this is possible thanks to our hand tracking module that we created earlier if we don't do that then it will be quite difficult and it will take quite a lot of time to actually create such a project but as you can see it was quite easy and quite simple what we achieved in this short amount of time so this is it for today's video i hope you have learned something new if you like the video give it a thumbs up and don't forget to subscribe and i will see you in the next one youhey everyone welcome to the advanced computer vision course in this course we are going to learn advanced techniques to better our skills of computer vision now you might think that the term advance might not be for you now this does not mean that the topics are very difficult it just means that they are on an advanced level and we will do our best to learn it as simple as possible we will break it down into basic code and then we will create modules out of this so that we can use them in different projects once we are done and we will be learning four different chapters including hand tracking pose estimation face detection and face mesh and not just that we will also be creating five different projects so that we can learn some real world applications for example volume gesture control ai trainer ai mouse control and a lot more if you would like to see more of such content do check out my channel murza's workshop where we create projects related to computer vision artificial intelligence and robotics so without further ado let's get started hey everyone welcome to my channel in this video we will learn hand tracking in real time we will first write the bare minimum code to run and then learn how to convert it into a module so we don't have to write it again and again for different projects the best part is we do not have to configure 100 parameters along with 20 different installs to make it run within 10 to 15 minutes you will have your model working the framework we will be using today is called the media pipe which is developed by google they created these amazing models that allow us to quickly get started with some of the very fundamental ai problems such as face detection facial landmarks hand tracking object detection and quite a bit more so we will be covering the rest of these as well so make sure to subscribe to keep updated now the model we are working with today is the hand tracking it uses two main modules at the back end so one of them is the palm detection and the other one is hand landmarks now the palm detection basically works on complete image and it basically provides a cropped image of the hand from there the hand landmark module finds 21 different landmarks on this cropped image of the hand to train this hand landmark they manually annotated 30 000 images of different hands so that is a lot of work and this is one of the reasons it works so well and the best part is that it is cross platform and we don't have to dive deep into the sea of configurations and installations so within just two clicks we will be up and running so let's have a look at the implementation so right now i am in pycharm and we are going to first create a new project so you can see that i have created this hand tracking project and we will go to file settings and we will go to our project then the interpreter and we will add so here we are going to add our packages so we will write here opencv python we will install that and then we will write media pipe and we will install that so these are the only two packages that we will be needing so within two clicks we are ready to start coding so that is amazing okay so now we will create a new file we will call it let's say hand tracking tracking minimum so the bare minimum code that is required to run this so the first thing we will do we will write here import cv2 and then we will import media pipe as mp and then we will import time so this is to check the frame rate so first we are going to create our video object so we will write here cv2 dot video capture and i'm going to use my webcam number one you can use your webcam number zero so then we will write file true and then we have success success and we have our image is equals to cap dot read so that will give us our frame we will write cb2 dot weight key 1 and we will write cv2 dot i am show i am show and we will write here image and image and we will write img so this is basically what we always do to run a webcam and what we can do as well is right here that if more doing it or we can skip it it's fine we don't need to write that we have to close with the q button so here we can right click and we can press the run button and let's see so there you go this is my webcam you can see my hand there you go and we are going to detect this hand so the first thing we have to do is we have to create an object from our class hands so here we are going to write now this is related to the hand detection modules or the hand detection model so later on we will create our own module so that we can learn how to use it easily in different projects so getting the values of these different points or the landmarks is a little bit tricky but we will create a module so that we can just say i want uh the point number five of the hand so tell me the location so that will become quite easy to use in different projects so first of all we are going to write here mp hands is equals to mp.solution now this is you can say a formality that you have to do before you can start using this model so you will write mp.solutions.hands and then we are going to write that we are going to create an object called hands we will write mp hands dot hands and then inside that we have to write our parameters now what are these parameters so let's go and check them out so we will click on we will press the control button and we will right click on this and it takes us to that function so here we can check what exactly are we getting uh what exactly do we have to input so here the first thing is the static image mode so static image mode they have this configuration where they will track and detect so if you put this as false then sometimes it will detect and sometimes it will track based on the confidence level but if you put it as static mode then the whole time it will do the detection part which will make it quite slow so we will keep it false so that it detects and if it has a good tracking uh confidence it will keep tracking so this way it will be much faster whenever the tracking confidence goes lower than a certain range then it will do the uh detection again so then you have the maximum number of hands so here we have two and then we have the minimum detection confidence so this is 50 and then minimum tracking confidence which is 50 so it means if it goes below 50 percent it will do the detection again okay so now that we know our parameters we can go back and we can write here false so actually we are not going to write anything because these are the default parameters and they have already given the default values so we do not have to change or write anything here if we want to we can otherwise we can skip it as well so for this instance we are going to skip and later on we are going to write whatever we need so then we are going to go actually we will need to go back okay so then here in the loop we are going to send in our rgb image to this object so here we have to first convert it so we will write here image rgb is equals to cv2 dot cvt color and then we will write our image y is a double bracket okay we will write our image then we will write cv2 dot color underscore bgr to rgb so this is our idea that we want to convert it into rgb because this class or this object only uses rgb images so we need to convert that first so we will write here that our results results is equals to hands so we are calling this object dot process so there is a method inside this object called process that will process the frame for us and it will give us the results so that's how simple this is now all we need to know is how to extract this information and use it so after this what we can do is we can simply display this but at this point we are not really displaying or doing anything but i still want to run it to see if everything is working so far so it will be processing it but it will not display anything for us so let's try it out so there you go and now you can see even though it's processing the frame rate has not decreased it's uh it seems real time we will later on check the exact uh speed as well the exact frame rate so don't worry about that okay so then we are going to open this object up the the one that we have received and we are going to extract the information within so as we have seen the parameters we can have multiple hands so what we can do is we can extract these multiple hands so we will have to put in a for loop to check if we have multiple hands or not and we have to extract them one by one now before we do that we have to make sure that there is something in the results so we can print out the results and we can run it and it just gives us that it is a media python solution based solutions output and if i bring in my hand nothing really changes so we need to know when something is detected or not so to check if something is detected or not we can write here dot multiple uh multi underscore hand underscore and underscore landmark landmarks so let's run this and see what happens so here it says none and if i put my hand and there you go so straight away we are getting some values so what we will do is we will say that if we can remove the print or let's keep the print we can copy this part and we can go down and we can write here if this is true then we are going to go in and for each hand so we can say for each hand landmark landmarks let's say in results dot multi whatever we wrote here multi-hand landmarks so you saw that we were getting some results so is it of one hand or two hands we don't know well we actually know because i just put one hand but it could be of multiple hands so here we will have each hand and then we will get the information or extract the information of each hand so once we do that we have a method provided by the media pipe that actually helps us draw all these points because there are a lot of them and we you have almost how many were there 21 points and between each points if you want to draw a line it will be quite a lot of maths that would be involved there so they provided us with the function or a method for that so we are going to write that down and that is basically mp draw we will call it mp draw is equals to mp.solutions so lucian's dot drawing utilities so we will write that and now we will use mp draw to actually draw it so we will write here empty draw and then we are going to write draw landmarks and inside that we will give in our image that we want to draw on so we don't want to draw on the rgb image because we are not displaying the rgb image we are displaying the original image bgr so we will write image and then we are going to write hand lms so this is a single hand okay so there could be multiple hands this is let's say hand number zero then there could be hand number one so this is that single hand so if i run this now that should draw the hand for us let's try it out and there you go so now you can see it is drawing the hand for us and it looks pretty good so but these are points and i told you that we could draw the connections as well so how can we do that we can do that by writing here mp hands dots and underscore connections so that is it so we are using mp hands dot hand connections and this will draw the connections for us so let's try that out and there you go so now you can see how easily we got our uh what he called hand position and we got all the 21 landmarks if you like the video so far give it a thumbs up and don't forget to subscribe so this is good but the problem is we don't still know how to use these values so where are these values how can i extract and use them so for example if i want to track one of these positions to perform a certain task what exactly can i do so that is still remaining and we will learn how to do that but before we go there i want to do the frame rates so we are going to write the fps so to do that we are going to write here that our previous time is equals to zero and our current time is equals to time is equals to zero okay so once we have done that we will go down here and before we display we are going to write here current time is equals to time dot time and this will give us the current time and then our fps will be one divided by our current time minus the previous time previous time okay so then our previous time will become the current time our previous time will be the current time so yeah that seems good and what else can we do can we yeah i think we should display it on the screen so that we can see it rather than putting it on the console so we can write here cv2.cv2.put text and we want to put it on our image we want to convert it into a string because it is time so we are using what do we call fps fps and we also have to round it because or should we if we round it it will give us decimal values we don't want decimal values for fps we can just put integer so that will give us that and then we can give it a value the position let's say 10 and 70 and then we can give in our font cb2 dot font whatever comes first and then we write then we write the scale and then we write the color so let's put purple or let's put blue whatever let's put purple and then we have the uh i think i missed something i missed a comma here okay and then we need to put i think the scale or the thickness the thickness let's put as 2 or let's put a 3. okay so that seems good and what else i think that should be fine let's run it so here we have it so now we can see that the time is around 30 30 fps the frame rate it goes to 20 sometimes but most of the times it's 30. you can see it's quite fast very responsive thumbs up oh thumbs up makes it go away thumbs up yeah this time it worked thumbs up great let me try my other hand as well so that seems fine and it is working quite good so we can move on so now we are going to get the information within this hand so for each of these hands so we will get the id number and we will also get the landmark information so the landmark information will give us the x and y coordinates and we also have their id numbers and they are already listed in the correct order so all we have to do is we have to check their index number and that's it so what we can do is we can write here for id and the landmark we are going to find it or we are going to enumerate and then we are going to find it inside the hand lms dot landmark so this is basically our landmark uh this is basically our landmark that we are getting from here and this is the id number or the index number that we are getting which will relate to the exact index number of our finger landmarks so if it is zero it will be the bottom middle one uh then if it's four it will be a tip and things like that so what we can do is we can print here and we can write id and landmark so we can see at least what is happening so let's run that and there you go so let's see what did we get so if we go up here you can see that this is id number 20 19 18. so if we keep going back we keep going back we will start from zero so each id has a corresponding landmark and the landmark has x y and set so we are going to use the x and y coordinates to find the information or to find the location for the landmark on the hand but the thing is if you see here these values are decimal places so the location should be in pixels so it should be for example 500 pixels in the width and 200 pixels in the height something like that but here you can see these are picks these are decimal places so basically what they are giving is they are giving a ratio of the image so we will multiply it with the width and the height and then we will get the pixel value so this is how we can get it directly so here what we are going to do we are going to first check out the heights the width and the channels of our image which will be which will be image dot shape so we can write this and this will give us the width and height and then what we can do is we can find the position so we can write here cx and cy is our position of the center and basically it will be an integer because it is decimal places so we have to convert it into integer so we will multiply our landmark dot x value multiplied by the width and for the second one it will be integer and then landmark dot y value multiplied by the height so this will give us the cx and the cy position so now we can print this out but the thing is that it is not for a specific one it is for all of them so if we print it now let's remove this and we will print we will print cx and cy so if we run this now it will give us for all 21 values so how do we know which one is for which which one is for landmark one which one is for landmark two so we need to write the id of that as well so we can write it like this so there you go so now we have this information so if we look here this is the this is the id number and this is the cx and the cy position so what we can do is we can use any of these to actually uh use it to our benefit to actually print out any of these landmarks so i can write here if id is equals to zero this means we are talking about the first landmark then we are going to let's say draw the circle so we will write here cv2 dot circle we are drawing it on the previous one and we will color it a different way and we will make it a little bit bigger so it is easier for us to know that this is the one that we are uh printing so it shouldn't be an issue so we can write here that our radius is let's say five and then our color will be different it will be purple and then we have cv2 dot filled so once we have that now it will only draw for what you call the id number one so if i run this now and there you go so you can see here at the bottom you get okay let me make it bigger it's very small so let's make it 25 there you go so now you can see clearly that we are detecting that landmark which is 0. so if i remember correctly 4 is also 4 is a tip of one of the fingers let's make it 15. 25 is too big there you go so it is the tip of the thumb so you can see now we are getting this information and what we can do is we can put all of this in a list and we can use it to print or we can use it to find the location and move around and do all sort of different things with this what we can do is we can also remove this and then it will draw on all of them but that's that's not useful because we are already drawing on all of them so here you can see looks quite weird anyways so that is the basic idea that this is how you get the cx and cy information which is basically for each one of these and we can put them in a list so that we can later on uh return this list and use it to our benefit if we want to track the index finger at the tip of it or the bottom part of it whatever we want to track we can do that so now that we are done with this we are going to create a module out of this so that next time if we are using it in a project we don't have to write all of this again we can simply ask for the list of these values of these 21 values of each hand for example we can say give us for the first hand give us for the second hand whatever and then we can simply say okay i need point number 10 and it will give us the value of point number 10 which is let's say at this point it is 4 four four and two one zero so that will make it very easy for us to uh create new projects so let's see how we can do that so now we will create a modules file so here we will call its hand tracking module so we will copy pretty much all of this code and we will paste it here first of all we will write here if name is equals to main this means that if we are running this script then do this so whatever we write in the main parts will be like a dummy code that will be used to showcase what can this module do so we will write here def main and we are going to put our while loop inside of this so while true and in fact all of this as well uh not that let's copy this part first so we'll put this here and then this part here also for the frames fps we will put it down here and what else what else do we need yeah the video capture we can put it here wait why did it show here okay i think i copied it or what yeah so we need to remove this okay and then what else what else i think that is fine for now so now what we have to do we have to create our class so i thought of doing it in functional programming but i think it will be better if we create a class so we are going to create a class here we will call it class and detector and inside that we will write def inits in itself and inside that we have to give in our parameters so these parameters are the basic parameters that are required for this hands so if you remember we went to the hand and we have all these parameters so these are the ones that we will be using to input that so so that we have the flexibility of changing these so here we have the mode so we will write here mode is equals to false then we are going to write the max number of hands so we can write here max hands is equals to 2. then we can write the detection the direction confidence is equals to 0.5 and the track confidence is equals to 0.5 so then we can remove all of this and now the first thing we have to do is we have to write self dot mode is equals to mode this means that we are going to create an object and the object will have its own variable so this is that variable whenever we are using the variable of the object we will call it self dot something and we are assigning it initially we are assigning it a value provided by the user so we are calling uh we are calling it mode and we are providing it the value of the mode so the same thing we have to do with the other parameters so we will write here max hands is equals to max hands self dot detection confidence is equals to detection confidence self dot tracking confidence track confidence is equals to track confidence and then all of these have to be inside this initialization as well so if you remember they are part of the initial code where we are initializing everything and then there is the while loop so we need to initialize these as well over here and again we will write here self dots so we will keep putting self dot everywhere and we also have to so why is this giving an error empty hands because we need to add the self here as well so we will write here self.mphands.hands so that should be good and inside that we have to give in our parameters so the parameters will be self mode then the max hands the confidence and the tracking confidence there you go so this should be fine so i think the initialization is done so now we can move on to the uh detection part so we can write here let's say we will call it find hands and inside that we have to just copy this part so do we need to convert we do need to convert and we need to put this as well so we will put all of this we will put it here inside and then we will go back up here and let's start from here so first of all we will need an image to find the hands-on so that will be this image and then hands is not being recognized because it has to be self-taught hands so we are talking about this object within this uh object so then we have mp.draw so this should be self.mp draw and then self dot hands connection so that should be good and should we draw it inside i i don't think we need to draw it here in fact we do not even need to get the landmarks from here what we can do is we can keep this outside and we can comment this so here this is what we need basically to draw the hands so we can put a flag here we can write here draw and we can put it by default as true and we can check if we want to draw or not so here we can write here if draw then do this okay so it will only draw if we ask it to draw so i think this is good enough to actually run the code or the run the class without uh actually getting the list so for testing this should be fine so what we can do here so we will create a new method within this class that will find the position for us it will give that list for us but for now we will just test to see everything is working so far or not so here we will first create our object we will call it detector is equals to hand detector and we will not give in any parameters because we know that we have these default parameters already there so once that is done we will get our image and once we get the image we are going to send this image we are going to write here detector dot now this is the method here find hands so this is the method within our class so we will write here find hands and we have to give in our image so that is the most important component so we might need to draw on it so we need to return the image if we have drawn on it so we will return the image so then we can go back and take the image over here so image is equals to this so if we run this now as a module it should work so let's see if we did any mistake uh yeah it's working oh yeah that's good so now our module is running is the main reason for creating this module is to get those position values of the landmarks very easily so we need to create that find position function or the method so we will write here find position and we are going to give in the parameters of our image now we don't really need the parameter of the image but we need it for the width and the height so if you remember here we need the shape so we can do it in other ways but this is simple so we will try it now later we can improve on it then we need the hand number so if you are detecting with if you want the information of hand number one hand number two and number three whichever hand you want you can ask the information of that and then we will have the parameter for draw so again we will put it as true by default so now we can uncomment this and we can bring it back okay so that should work now here the issue is that we were using a for loop to actually run this but now we need to first check again we will go back here and we will create a list here lm this is the landmark list that we are going to return so this list will have all the landmark positions so we can return this whether it is filled or not we will return it so we will return this and then we are going to check again that whether any landmarks were detected or not or any hands were detected or not so to do this we use basically this part here so if the self results multi-hand landmarks if that is available then we are going to check the next things so here we will write this and we will put all of this inside that so but here we are getting the results this is the results it should be self.results self.results and now i can use the results here as well okay so now we need to replace this here as well okay so if it's not self then i cannot use it in this method to use it in all the methods you have to make sure this is for this object this variable okay so now we have to write down that which hand are we talking about because earlier we were getting it for all the hands so if you want you can get it for all hands it's up to you but i'm creating this method to get the uh to get for one particular hand if you want you can change that too so here we are going to write so earlier we had four hands this this this now we will get this and we will point to we will point to a particular number and that number will be the hand number so we will say that our let's say our hand let's say we'll call it my hand is equals to this and we will put this over here so it will get the first elements the first hand and then within that hand it will get all the landmarks and it will put them in a list so here we are just printing them out so here we can write uh lm lists list dot append and we want to append the values of id cx and cy and we can remove the print because we're getting it anyways so yeah so that is i think that is good and here we have the option of draw as well so we can write here if draw then do this otherwise don't so by default it is true so it is going to draw so yeah let's see how that works out so we can return this list so i can call this so we can go down here and i can call find position and we can remove the self and we have the image and do we need anything else i don't think so okay copying this was a bad idea anyway so finding positions of image yeah and it will give us the list so we can copy this and we can paste here so now what i can do is by the way we have to write detector dot find position so now what i can do is i can print the value of my list at any index so if i want let's say uh the zero index then i will write here zero if i want the landmark number four i will write here four so if you remember the 4 was the tip of the thumb so this will give us that position so let's run this and see what happens index is out of range okay so that is uh understandable why because here we have to check if nothing is found which means lm list the length of it is zero so we will write here that the length of this list is 0 then we will if it's not equal to 0 then we will print so let's run it again and there you go so if i put my hands so now you can see it's drawing for all of them but it is showing me only landmark number four so if if you look at my thumb so if you look at my thumb if i'm going really to the okay it's not moving the values okay so now if i go till the very end you can see it is 600 something because the value is 640 the max value and now if i go around to the starting point it goes uh around 150 something so i'm talking about the x position by the way so the x position is changing like this then we have the y position here it is going towards zero and down here if we go down here it is going towards 400 something so this is a 640 by 480 image it works well with 1280 and 720 as well so it's still around 20 something frames per second so that is quite good okay so now this is working as a module and what we can do is we can use this in a different project now you might say how can we do that well here is the dummy code so this dummy code we can use to actually uh run in a different project so i can copy this and i can create let's say my new game game hand tracking blah blah blah okay that's a very long name um so i can i can paste this here uh the complete code and i can remove the indentation and then i can import i can import these so then i need to check what is missing so here now i need to imports i will import hand tracking module and tracking module as h t m so now i will write htm dot hand detector and the rest will remain the same so that is pretty much it so if i run this now it should run exactly the same so there you have it and there you go so now it's giving me the values of the index uh not the index the tip of the thumb which is uh landmark number four and it is showing me all the landmarks uh as we have drawn so if i if i go back here if you like the video so far give it a thumbs up and don't forget to subscribe to my module i can change the color or i can change the size of these as well again all these parameters you can change you can add to your methods if you wish and it could become easier for you it depends on your project so if you have a lot of different um things that you want to accomplish within one project then you can add more methods to this to compensate for that so here let's say we will make it a little bit smaller so 15 is too big we will make it seven let's say and let's change the color so it's bgr so let's make it blue and there you go so now we have changed it and if we go back actually i preferred the previous one even the big aspect so um here what we can do is if you don't want to show it we can write here false and it will not display oh it's displaying wait why is it displaying so why is it displaying let's go back here and if draw then only we do this okay maybe it's drawing here as well am i running this yeah i'm running the correct file but when i write false here oh there are more arguments that's the problem there are more arguments so we cannot just write here we have to write draw is equals to false my bad straw is equals to false and there you go the drawing is gone that the custom drawing that we did now if you want to remove this drawing as well you can write here draw is equals to false and there you go so now you will see that you are getting the value of the thumb but nothing has been drawn so this way you can customize it to your needs hey everyone welcome to my channel in this video we will learn pose estimation we will detect 33 different landmarks within a human body and all of this will be done in real time that's right more than 24 frames per second and only running cpu we will first look into the basic code required and then we will create a module out of this so that we don't have to write the code again and again and yes we will be creating a lot of projects with this so don't forget to subscribe and hit that like button so here we are in our python project and you can see that we have named it pose estimation project and we have a folder here with post videos so if i open this up you can see that we have a lot of different types of videos we have a total of nine videos so let me play a few so here you can see this one is a little bit smaller and some of these videos are actually slow motion so when we are running it it will look like it's slow but it's not actually slow it's actually slow motion so like this video and then i think what else i think this video as well it's slow motion and i think this one is fine this one is normal this one is slo-mo as well so i took these videos i think from pixels.com so we are going to test these videos out and see how well does our pose estimation work so we will select all of this or let's let's create a new file let's delete this one so let's delete this and now we are going to create a new file and we will write here pose as the mason uh minimum so this is the bare minimum code that is required to run it and later on we are going to see how we can create a module out of this so that we don't have to write the code again and again for a lot of different projects so let's start by importing cv2 and then importing media pipe as mp but now you can see that we get an error this is because we did not include these packages in our project so how can we include those so we will go to file settings and we will go to our project interpreter and open that up and here we will write opencv dash python and we will hit install and then we will write media pipe and we will hit install so now both the packages have been installed and we can go back and you can see the error is now gone so opencv is the library that we will be using for image processing and media pipe is the framework that will allow us to get our pose estimation so now the first thing we will do we will read our video so we will write here cap is equals to cv2 dot video capture and we will simply give in our video number so here we will write pose videos videos and we will write video number one dot mp4 i think the one is quite big let's let's try it out and later we can change it if we want and then we will write while true and we will write success and image is equals to cap dot read so that will give its uh our image and then we can write here cv2 dot i am show and we will write here image image and then we will write here image and then we will write cv2 dot weight key and we will write one so that we get a one millisecond delay okay so let's run this and see if it works and there we have it so our video went quite fast so the frame rate is actually quite high at this point what we can do is we can check the frame rate by writing here time let's say current time is equals to uh we need to import time as well so we will write here import time and then we will write time dot time then we will write that our fps is equals to 1 divided by our current time minus our previous time and our previous time is equals to current time and we need to define the previous time up here at the top so we'll write here previous time is equals to zero and then we can simply put our text so we will write here put cv2 dot put text and we will write in our image and then we will write in the text itself so we will write string fps but we will convert it into an integer before we do that then after that we have our origin so let's say 70 and 50 and let's say the cv2 dot font plane and let's say uh three and then for the color we can put two five five zero and two five and zero and then we can put three so this should give us our frame rate let's try that out uh wait what happened okay this should be below over here my bad so let's run it again and there we have it so you can see it's a hundred something frames per second so that's quite a lot if you want to reduce it we can put here let's say 10 so now it's like 50 60 frames per second but when we are using our model it will automatically slow it down so we don't need to worry about that so the next step would be to create our model so our objects so that we can detect our pose so here we are going to write mp pose pose is equals to mp.solutions so solutions dot pose so we are going to use this and then we are going to create our object we are going to say pose is equals to mp pose mp pose dot pose and then we will give in our parameters so if we go to the parameters you can see that we have the static image mode this is basically that when you are detecting and when you are tracking so if you put it as true then it will always detect based on the model it will always try to find the new detections but when you put it as false it will try to detect and when the confidence is high it will keep tracking so there will be a tracking confidence and then there will be a detection confidence so whenever it detects if the confidence is more than 0.5 it will say okay now we have detected now i will go to tracking now the tracking will check if the tracking confidence is more than 0.5 it will keep tracking whenever it goes below 0.5 then it will come back to detection so this way we do not use the heavy model again and again for detection instead we use detection then tracking then whenever it's lost we use the detection again so this is what this does and then we have the upper body only so you can decide if you want to detect only the upper part so it will have you can see here we have 33 poses landmarks or only 25 so it's up to you which one do you want to use and here we also have a feature to smooth which is by default true so we will keep it as true so we can define all of these parameters or we can skip them it's up to us so for the initial purposes we are going to skip all of these for the simplicity so then we are going to go down here and simply we are going to convert our image so we will say image rgb is equals to cv2 dot cvt color so this image is in bgr but this library or this framework uses rgb so to make it compatible we have to convert it so we will write here image and then we will write cv2 dot cv2 dot what color underscore bgr to rgb so this is our conversion and once we have done the conversion we are simply going to send this image to our model so we will write here results is equals to pose dot process and we will write image rgb so that's pretty much it so that will give us our detection of our pose but it will not draw anything but for now we can just run it to see if everything is working and now you can see that the frame rate has decreased so this is good to see that our model is actually working but you can see that it's almost uh real time so that is really good so now what we can do is we can draw our landmarks or whatever we have detected we can draw it but before we do that we can print the results as well so we can write here results and let's see what do we get so here you can see we get nothing so it's just a class but we don't see any information so how do we see the information we simply write results dot pose landmarks land marks so if we run that now you will see that we are getting actual landmarks so each landmark will have the x y and z value and then it will also have a visibility value so how visible is it so this is the information that we get so we can put all of this in a list later on so that it's easier to access so then we will check if it is detected or not so we will say that if this is present if this is true then we are going to write here that mp draw dot draw landmarks draw underscore landmarks and then we are going to define our parameters but here you can see we don't have anything called mpdraw so we are going to declare that we will write here mpdraw draw is equals to mp.solutions solutions dot draw utilities so we will use that and then we can write here uh within our landmarks we can send in our image then we can write results dot uh pose landmark so this is the same thing that we are printing out that we got so we will write here land marks and then what do we have uh i think that is good let's do that let's run it and there you go so now you can see we are getting all the points you can see that it is in red now what we can do is we can also add the connections or the lines between these so here we can write mp dot not mp pose dot pose connections so that will fill up the connections there you go so now you can see we have the green lines which are the connections and then we have the points as well and we are detecting them all at real time so that is very very amazing so now we know that we are getting this information but how can we know which is for which so here it just says landmark this landmark this there is no indication of which landmark represents what so what we need to do is we need to organize a little bit so that it is in a list and we can simply use these values um in our project so for example i will say i want landmark number five i want landmark number three so if we go to the media pipe website you can see here that these are the landmarks that they have given so for example if i want the right ear so i can just say i want the element number five element number eight so give me that element if i want the nose i can say give me the element number zero so based on this it will become very easy for us to actually uh create our new projects so gesture recognition and lots of different applications will become very very easy so how can we extract the information within this object so what we can do is we can write here for id and landmark in enumerate we will write this result so we are going to loop through this and we want the count as well that's why we have written enumerate so it will give us the loop count over here so 0 1 2 3 and so on then we are going to get the shape actually this will be we have to add here landmarks landmark and then we are going to write here height width and channel is equals to img dot shape and the reason we need this shape is because uh let me actually show you why and you will understand so what we can do is we can write here print and we can write lm dot x or let's just print lm so you can see what is happening and we can print the id number as well so you know what exactly are we extracting here so let's run this so there you go so now you can see we have the id number and this is the information of the landmark so 30 is this 21 is this and it goes all the way to wait why is this showing i think i didn't remove the other print anyways so you can see here from 0 to 32 we will have all these 33 landmarks so let's go down and let's just remove the previous print first okay so now we know that we have the id and we have the landmark but you can see the landmarks they are actually in decimal places so this is basically a ratio of the image so what we can do is to get the actual pixel value we can say that lm.x this will give us this x value multiplied by the width of the image so that will give us the uh x of our what do you call the point or the landmark so it will give us the exact pixel value the same thing we can do for lm dots y and we can multiply it with the height so we can do that and then what we can do is we can put them in our cx and cy variables so this way it will be easy for us to use the indentation is wrong that is why it's giving an error okay so also we need to convert this into integer because we have to make sure it's not a float or a double because we are talking about pixel values so then what we can do is to confirm that this is happening and we are getting the correct values we can simply print the circle uh on top of this point so we can write here cv2 dot circle and in the circle we will write image and then we will write cx and cy and then we can write let's say the value of 15 15 will be too big let's say 10 then 2 5 5 0 2 5 5 this is purple let's make it blue and then we will write cv2 dot filled so this will overlay on the previous points if we are detecting it properly so let's try it out so there you go so now you can see we have those blue dots and these are the ones that we put ourselves so this means that we are getting the correct information at the correct pixel values and it is working good we can reduce this to five there you go much better actually let's try another video we have been using the same video we have like nine videos so let's try a random one number five yeah that looks good let's try number six yep that is good number three there you go okay here we had an issue but now it's fine that's good so we can see that we are getting good results and that should be good enough for us so next what we can do is we can convert this into a module so that we can use these values very easily so the first thing we will do we will go here and right click and we will create our module let's call it pose module and we will copy all of the bare minimum code and we will paste it here so that looks good now the first thing to make it a module we have to write here if underscore underscore name is equals to underscore underscore main then we are going to write main so what this does is that if we are running this by itself then it will run the main function and if we are just calling another function it will not run this part so this is what it means so we will write here main and within the main we are going to write everything or we will write the dummy code in the main so whenever you want to see what a module is capable of or the testing script you can put it in the main function so here we will we will take this till here we will cancel it from here we will cut it and we will paste it in the main then we will take all of this as well we will cut it and paste it in the main again so that should be good and what else so now we are going to create a class so in that class what we should be able to do is we should be able to create objects and we should be able to have methods that will allow us to detect the pose and find all these points for us okay so here we are going to say that our class let's say is pose detector and inside that we will have our first method which is the init this is the initialization so we will write here in itself and then we are going to write in the parameters that are required so whatever parameters we are expecting we will write here so here we are going to write mode so if i go to where is it okay let's go to pause estimation and here we are going to go to pose so i will copy all these parameters and we will just paste them here and we will write so the first one is mode so we will keep it as false so that we get fast detections uh fast detection plus tracking and then we will write here upper body so upper body is equals to false and then we have the smoothness smooth is equals to true and then we have the the detection confidence is equals to 0.5 and then we have the tracking confidence is equals to 0.5 so these will be our initial parameters and then we can remove this and then we can write here that self self dot mode is equals to mode now if you are not familiar with object oriented programming then this basically means that uh whenever we create a new object it will have its own variables so this whenever you write self dot something it is the variable of that object so whenever we are using um a variable within our class we will uh write self dot something so self dot objects sorry self dot mode dot self.upbody so we are going to write that so here we are saying that the specific variable of our class or our object is basically the one that the user has provided so it will set this instance of that object to false so this is what it does so here we will copy all of these and we will write up body is equals to up body and then we will write smooth is equals to smooth and then we will write detection confidence is equals to detection confidence and we will write track confidence is equals to track confidence so as i mentioned before we just have to put self dot in front of each one of these and that should be good to go so next we also have to declare these so again this will be part of that object so we need to write self so here we will write self.draw self.impose and self.pose and we will write here self dot impose so this is good and what we can do is earlier we were not using any of these parameters but now we have to so here we are going to send in all the parameters so we will write yourself that mode so self.body self.smooth self thought detection confidence and self thought tracking confidence so that is pretty much it okay we need to put a comma here yeah so our initialization is pretty much done and now what we can do is we can uh create a method to find the pose so we can write here find pose and we do have to write self here so whenever we are creating a new method we have to write cell first and then we have to give in our image and then we will also have a flag called draw and we will put it as true so this basically what it will do is it will ask the user do you want to draw or not do you want to display it on the image or not so we want to we want to display so we are going to put this inside here and then uh this draw if results draw this this this yeah this will be separate this for loop will be separate and this results will be here so let's push this in and we can remove this and what else so now we need to write the self dot so here we will write self dot pose process and self dot mp draw then we can also write self.mppose so anything missing uh no it seems fine now what we need to do is we need to put that flag so we will write here if draw then we basically do this so that should be good so let's try it out or should we put it inside let's put it inside that so we will say that if landmarks are present or let's put the draw inside so if the landmarks are detected and we set draw then you need to draw so that is good so by now we should have a working class so we can create an object from it and then we can run it so let's try that out so here we will write our detector detector is equals to our pose detector and we will give in the default parameters and here what is happening here this should be inside the while loop so here we are going to write detector dots find what was it find or detect find pause find pause why is it not detecting oh the indentation is wrong so now it will detect okay so detector dots find pose and then we have to give in our image so that should be it and we need to return the image so we will write here return image and we can bring in the image back here so if we run this it should draw on our video so let's run the pose module and there you go so now it is running and it looks good let's try it on the first image there you go so this is quite good and what we can do next is we can do the main part which is to find the points so we need to get those points here we are going to define the get position and we will write inside that we want an image and then again we want the draw flag so by default we are going to yeah let's put it as true and then we can uncomment this okay so now what we will do is we will first of all here we are getting an error for result because we need to write self dot results here and here and here and here okay so that should be good and what else so we need to push this in in the for loop so now we are we are just looping we need to check first as well if the results are available so if the results are available then we will use this for loop and then we need to put it in a list so we are going to write a list lm list which is for our landmarks and we will append our values so here it depends on you what kind of values do you want to append so i'm going to append only the x and y values and the id if you want to append the z and the uh what was the other one the visibility you can do that too so here i will write lmlist dot append and i will write in the id so here i will write id and then i will write the cx and the cy and i will add the option here that if draw then do this so this looks pretty good so let's go down and try it out so where is it here so we can write here that our lm list is equals to detector dot find position is it find position or get position let's keep it find position so it's similar to the previous one find position find position and then we will give in our image and let's keep it as true for drawing so we know it is detecting and we can print out the list as well so we can print out lm list and there you go so now you can see we are wait why is it printing this that is not right uh okay this is printing this parts any other prints uh no so let's write again yeah so right now it's none wait why is it none the lm list is none lm list oh we didn't return it forgot to return return lm list okay so now we can run it and there you go so now we have the list and if we go till the end you will see we have a total of 32 points so we have 33 points because we are starting from zero so now i can easily say for example i want number 0 i want number 8 i want number 5 number 23 whatever i want i can write down here so let's say we want number 8 or let's say we want number 14. let's try that number 14. and there we have it so this is the number 14 and if i wanted to i could draw this number 14 so i can i can put this as draw false and then here i can write uh lm14 at one and two so this will be one and this will be two and i can make this really big so i can see what exactly i'm am i tracking and i can change the color of it so that it is a little bit different than all of these so i can put it as red and there you go so now you can see we are tracking this elbow so this is how easy it can be uh by the way whenever it finishes the video it will give you an error you can loop through that there's a function for that as well but we're not going to do that so this is the basic idea of how you can convert this into a module and now you can use this in any project that you want and you can easily find the position so how exactly can you do that so let's create a new file and we will call this our awesome pose projects so we will copy the parts from the main so as i said this is the testing code so we will copy all of that and we will paste it in our awesome project and now we have to import so we will write here import cv2 and then we have to import time and then we have to import our module so our module needs to be in the same folder if it's not in the same folder it will not work so you have to write import pose module as pm because it's a smaller name so we can write here pm dot pose detector so this will create the detector and everything else will run the same way so if we run this now our awesome projects and you can see it runs exactly the same way so now i can use this in many different applications so one thing we can do is we can test some other videos so let's say number two there you go again we are detecting the right elbow and you can see it's tracking really well that's pretty awesome even when it's a little bit hidden it's still tracking it that's really good and then let's try number three what happened to number three there is an error list index out of frame okay so yeah this is a problem because we are not checking uh if the list if actually the list is filled or not so we need to check if lm list is not equals to zero actually the length of it is not equal to zero if the length of this is not equals to zero then only we can do these two things so actually we should change that at the main module as well so where is it here so let's run the awesome project again and this time it works so probably the first frame it was not able to detect that's why it was giving this error so now you can see it's detecting that elbow let's try number four there you go detecting the right elbow very nicely let's try number five there we have it excellent the frame rate is amazing so you are getting real time number six pretty good and the best part is it's running on cpu it's not actually using gpu to run this so that is very good then we have number eights let's try that okay that looks good and then we have number nine so yeah that is pretty good so there are a few instances where it gets it wrong but overall you can see it's really good hey everyone welcome to my channel in this video we are going to learn the latest face and eye detection method that runs at an amazing 60 frames per second and all of this on an hd resolution this is a lightweight model provided by google so it runs on cpu and mobile devices as well so here we are in our pycharm project and the first thing we will do we'll go to file settings and we will go to our project and the interpreter and we will add so here we are going to add opencv dash python and we will also install media pipe which is the library that will help us detect faces okay so now both of them are installed and here on the side you can see that our project is called face detection project and then we have a folder called videos so let's open that up and see what it contains so here is the folder and we have a total of six videos that we are going to test on now you could do it on a webcam as well but i'm going to do it on a video because one it will be more clear and second with the webcam you will have a limit of the frame rate so with the videos you will have faster frame rates okay so we are going to right click and we will go to python file and here we are going to create our project so we will call this face detection basics so the idea is that we are going to look at the very basics the bare minimum code that is required to run it and once we understand that we are going to go ahead and create a module out of this so that we do not have to write it again and again for different projects so the first thing we will do is to import cv2 and then import media pipe as mp this is so that we don't have to write media pipe again and again we can just write mp and then we will import time this will be to display the frame rate so uh first of all we are going to run our video so we will write here cap is equals to cv2 dot video capture and we are going to capture from within our video folder so we will write here video slash one dot mp4 so this is the video that we will be using and then we will write here while true we are going to write success and image is equals to uh cap dot read so that will give us our frame and then we are going to write cv2 dot weight key as one and we will also write cv2 dot i am show we will write image and then image so this is pretty much you can see the boilerplate of our projects that we have been doing so we can simply right click and run this and we have an issue cb2 uh is it oh it's called videos it's not video okay so there you go so this is the video that we will be using and as you can see it is going pretty fast because it is running at a higher frame rate so you could reduce the frame rate from here for example if i put 10 you will see that the frame rate is uh it's much closer to real time but we are not going to do that instead we are going to display the actual value so to do that we will simply write here c time is equals to time dot time and then we will write fbs is equals to one divided by our current time minus the previous time and then we are going to write that our previous time is equals to the current time and then we can display it so we will write cv2 dot put text we'll put it on our image we are going to write fps oh i wrote it wrong f and then f p s and then we will write a curly bracket and inside that we are going to write integer and then fbs and what else then we need to define the position so let's say 20 and 70. and then we are going to write down the font so cb2 dot font let's take the plain one then we have the scale let's put it as three color let's put it as blue or green let's put it as blue uh two five five two five five zero and two no zero okay then we will write let's say the thickness and yeah that's pretty much it one more thing we have to do we have to declare the p time for the initial frame because it is not defined before and we are using it here so we need to define before that and all of this needs to be before the image show so we are going to put it down here and we can simply run this and see what happens there you go so you can see the frame rate is extremely high 140s 160s something like that um blue might not be the best color so let's put a screen let's try that yep that's much better okay so now that we have our frame rate by the way uh as i was mentioning before you can reduce the frame rate from here let's say i put 20 so now you will see that yeah it's much closer to real time it's around 30 something 40. by the way this video is a slo-mo so that's why it looks a little bit slower because the video itself is a slow motion video so if i open it up and i play this you can see this is actually a slow motion video okay um we could use the second one as well yeah this one looks real time or it's hard to tell okay um we could try it with the second one here let's see how that works out yeah it's it's not real time either okay so uh anyways so that's the idea so we will keep it as one because we are going to process it and hopefully we are going to get a good frame rate so you can see it's quite fast so what is the next step the next step is to import our media pipe functions or classes so that we can use them so here what we will do is we will use the face detection module so we will write here mp face detection detection is equals to mp dot solutions dot face detection detection yeah okay so then we need something to draw now we can draw ourselves as well but we can import the drawing part here as well let's say mp draw is equals to mp.solutions solutions and then draw drawing utilities okay so basically this will allow us to draw without going into the details but if we want to draw ourselves we can do that too because the rectangle drawing rectangle should not be that hard but we are getting a few points as well we will be getting six points so for the eyes and the nose and the ears so we could we could simply draw them by using this okay then we are going to write face detection is equals to mp face mp face detection dot face detection so this basically initializes that we are going to use a face detection again my spellings are wrong to fix that okay so then we can simply go and this is an bgr image so we need to convert it into rgb before we do that before we send it to the media pipe library so we will write here image rgb is equals to cv2 dot cvt color and we will give in our image and then we will write cv2 dots color underscore bgr to rgb to rgb okay so then we will write here that our results is equals to face detection dot process and we want to process our rgb image so this is the idea whatever the the output is it will be stored in the results and then we can process these results so we can print this out we can write your results and let's try it and see what happens there you go so now all of a sudden you have seen that the frame rate has decreased so if i don't run this and i don't print this you will see that the frame rate is very high but as soon as i run it the frame rate reduces to around 60 70 i can even see 90s so yeah so it means it's working at the back end and it is giving us a class so they have their own class by media pi python solutions so what we need to do is we need to know how to extract the information out of this so we are going to do that next so here we are going to write so it can be multiple faces actually let's go to a video let's go to the first video again because it has multiple faces so we will check if the results dot detections detections is available then we are going to write for id and then detection detection in results dot detections the text actions okay so we are going to find in that but it should be enumerate because we want the id number as well so we will write this and then we can loop through each one of the results and we can display them so we can write here print detection so whatever we have it is being processed here and what we can do is we can display it with the id so we can write id and then it will show us the detection so let's run that okay there is some issue here where is it results start again the spellings the detections okay so if i close this and if i go up here you can see these are the key points so here it is saying that this is detection number one this is id number one and they have given it a label of zero this is the score this is 91 percent sure that it is a phase then there's location data within it which means the bounding box and then there are key points so one two three four five and six key points so you can use these key points as well but we are more interested in the face itself so here you can see this is zero this is face number zero and this is face number one so the label id is same because both of them are faces and here we have face number one and here we have face number zero so now we need to know how to extract this information so we already know that this is a class so what we can do is we can for example if we wanted the score we will write detection dot score so if i write here detection dot score that will give me the value of the score so i can print that out what happened there so if i print that out you will see i'm getting the detection by itself so separately i'm getting the value so i can get that then how can i get the for example the x minimum so this is the x y position that we want so to do that you can see that we have first of all location data inside location data we have relative bounding box so we need to write that so we will write here print we will say location data first default detection detection dot location data and then we will write dot relative bounding box so that should give us the values so let's comment these and let's run this and there you go so now we simply have the information of our bounding box okay so that is good now what else can we do now the thing is that these are uh you can say normalized values so they are normalized between 0 and 1. so what we need to do is we need to multiply them with the width and the height to get the pixel values so we can draw but before we actually go into the drawing we can also draw by the function that they have provided so that function is mp draw dot draw landmarks or draw detection and then we have image and then we have the detection so yeah that should draw the faces for us there you go so now you can see we are getting the points and we are getting the faces uh the bounding box around the faces now the thing is for me the the points are not very accurate i don't see them being useful you can see sometimes maybe you will get a better result maybe let's say video number two let's try that yeah here it's a little bit better you can see the eyes and the nose and let's try number three yeah here for the baby you can see it's quite good so again if you want to use the points then you should make sure that the video is quite zoomed in the focus is on the face so if you want that you can use the points for the eyes and for the nose and for the ears so the same way we got the information of the x and y and the score the same way you would get the information of the landmarks but i'm not going to go into the landmarks i will just keep on the bounding box we will focus on the bounding box and we will also focus on the score so these three things are what we are interested in if you want more you can add on to it so now that we know how to draw this uh what we need is we need to draw ourselves we need to get the bounding box information so that if we want to use it um just the numerical values we should have access to those so what we will do is instead of using all of this so if i wanted if i wanted the value of x minimum i would have to go inside this as well and i will write x minimum here now that is a very long call for just one value so we need to shorten it so that it is easier to work with so what we are going to do is we are going to store all this information in a bounding box and from that we are going to extract these information we are going to multiply so what do i mean by that let's have a look so we will say that our bounding box coming from the class is equals to detection all of this detection dot local location data relative bounding box all of that so that is the the bounding box coming from the class that's why we wrote c in front of it but then we are going to convert it in our own language so that we can work with pixel values rather than just the normalized values so we will call that the simple boundary box without the c and what we need to do is we need to write here that our bounding box c dot x minimum multiplied by our width now but we don't have the width so what we will do is we will write up here that our image height then our image width and our image channels is equals to image dot shape and then i can use here i width image width by the way this i here is not a big deal because defining variables is about scope where is it being used if it is being used in a lot of different places it has a very wide scope maybe it's being used out of the function out of the loop then yes then you need to define long names that are clear like here location data relative bounding box and what not but if you are using it in a small scope for example when you use for loops you write for i in range right so why did you write i you wrote i because you are not going to use that i outside this for loop the scope is very limited so if you are writing i w i h or even wh it's not a big deal so then we are going to convert this into integer and then we are going to copy this and here we are going to write that we need the y minimum and we are going to multiply this with the height so that's the idea and then let's go to the next line so we'll put a backslash and then we will copy this and here we are going to write width and we will keep it iw and here we will write height so now our bounding box will have x y and width and height so we can simply use to draw this uh using a rectangle function or we can use these values outside uh in our project so that should be good so what we can do is we can write here cv2 dot rectangle and we will put the image and we will say the bounding box we can simply feed in the value because it contains all the four values it will understand and then we will let's make it purple zero two five five and let's make the thickness of two so that should be enough let's remove this so note that we are not drawing by default so we are not using the default function to draw we are drawing by ourselves so let's run that and there you go so now we are getting a clean box a bounding box that is quite nice there you go you saw here yeah you can see here we are getting some false detections as well so what you can do is you can you can change the uh what you call there there is a parameter here minimum detection confidence so you can change this so you can write here it's by default at 0.5 you can write let's say 0.75 hopefully that will remove that there you go so now you don't see the false positives okay so that is good now what else can we do so this is quite good and the only thing we might want to see is the confidence value so the confidence value or the score as i mentioned you can get by detection.score and we can put it let's copy this we can put it on the bounding box and here we can write for example that we have let's keep it integer we have detection dot score and we are going to get the first element of it uh if you don't get the first element it it will just put a bracket around it so it has only one value but it will put a bracket around it so we are just removing that bracket and we can multiply it by hundred so that it is a percentage and we can write outside that this is a percentage and we don't need to write anything at the beginning and the location should be bounding box at zero bb ox at zero and then bounding box i think that is way too much okay let's bring that down too so bounding box bounding box at one but we are going to subtract -20 from it so that um it is easier to see it is not overlapping with the box okay so let's run that and see if it works there you go so we are getting 90 something percent of uh score the confidence level so i think it should be the same color so it looks blended in so we will put it here two five five what else i think the size was big too so we can change it to two yeah that looks much better so let's try out the first video there you go so that's pretty good we are able to detects with the score and with the bounding box value so now this is good but the problem is that you can see that there is quite a bit of code that you have to write to find the face and to get the values that you can use so what we can do is instead of initializing and writing all of this code again and again for different projects what we will do is we will create a module out of this so that we just import that module and use it whenever we want okay so how can we do that so we are going to go to python file and we will call this face detection module and here we are going to copy all of our basic code now the first thing for the module we need is our if statement so if we have the name underscore name is equals to main so this means that if we are running this file then we need to run main so whatever is in main is like the dummy code it is kind of a hint that this is how you can use this module so we are going to declare here that this is main and inside main we are going to write our while statements so so yeah let's copy this part we will paste it in the main then we have the while so we will just read the frame and then we have the ending part we will put the frame rate as well over here so yeah that should work fine then we are going to okay so then we are going to create our class so we are going to call this face the tech tour and we are going to define our initial function uh initial method in it and then inside that we are going to write um okay let's let's leave it for now or let's write it that's all right minimum detection confidence the the tech shun confidence okay that should be fine and we will give it a default value of 0.5 now the thing is that we need to put everything which is related to initializations over here inside so we will take this and we will put this in and we will remove some spaces first of all we need to write here self dot minimum detection confidence is equals to minimum detection confidence so in in objects or in the classes all we need to do is we need to write here self dot and here we will write self dot self dot and self dot so it is not a generic variable or a global variable it is now an instant variable so whatever we are defining is for this class or for this object so that's why we are writing self okay so that is good now we need to define another function or a method called find faces and we need an image do we need anything else we can put a flag of draw so draw is equals to true by default so we are going to put all of this in and let's see what do we need to change first of all we need to write the self dot so here we will write self dot face detection and then we will also write self dot results self thought results in case we want to use it later it's not really required at this point but anyways so we will comment this we can remove all of this and what else okay so now we need to return something so we are finding the faces that is all good but we need to return all these faces so what we will do is we will return the bounding box information the id number and also the score if you want to return the landmarks you can return that to here so what we will do is we will create a new variable a new list and we will call it bounding boxes plural so we will make it empty and all of this is being done for one phase so it is showing and it is processing face by face so what we need to do is whenever it processes we need to put that in our bounding boxes so here when we have our bounding box value then we can send in whatever we want so in this case we are going to send bounding boxes dot append and we are going to append uh first of all the bounding box and then we are going to append the uh detection dot score so if you want you can append the id as well but normally it is the same uh what you call it will have the same list number the index of the list so it is a bit redundant but anyways if you want you can add this too so this is the idea then we simply return these uh bounding boxes so we return bounding boxes and we also returned the image that has all these detections on it so that is the idea so by now we should have it running let's try it out so we are going to create an object from our class we will call it detector and we are going to write that our face detector we are not going to give in any value so by default it's 0.5 we will keep it 0.5 so once we get the frame we are going to write here image is equals to detector dot find faces and we will give in our image and we will keep it the draw as default so that is good now if we run this it should pretty much run the same way as it did before and there's an error so inside of main okay so here we are facing an issue with the bracket probably what is it image this is fine i think it seems fine to me let's go here and copy this yeah it's exactly the same so it should work uh the error is not here it's probably because it's not finding the image okay let's remove this part and try it yeah it is about the image it's not able to get the video why are you not able to get the video um are you not able to get videos1.mp4 okay let's remove this as well okay i get it my bad there are two parameters that needs to be returned so we need to write here bounding boxes yeah that was the issue so let's run this and there you go now this is the module yes this is the module running okay so that is good now it is running and okay it's not detecting properly why did we change the we can we can change the value here so maybe let's say 0.4 but it was working before yeah here it's 0.75 that's the issue so we need to put this here minimum detection self dot minimum detection okay so that is good now we do not need to do 0.4 here it by default at 0.5 so it should work fine there you go so now it is detecting properly okay so that is good now it is working as a module and we can get the information by the way you can just print here print the bounding boxes and let's see what do we get there you go so now you're getting the information so this is face number zero this is the bounding box this is the confidence this face number one bounding box and the confidence so you can extract this information and use it very easily so by now we are pretty much done but i want to do something else now this is you can say kind of a fancy thing if you want to do this you can add it but it is of course not necessary so what i want to do is we are always showing these bounding boxes uh the same way we are just drawing a rectangle and it's very plain it doesn't look good this is a new method so i want to show it in a new way so what can we do so what i was thinking is that we can put some corner what do you call lines a little bit thicker so that will give it um that will give it an image of like a target so whenever you have a target you have these edges that are thicker and then you have the bounding box in between something like that so let's try to add that here so what i will do is i will create a new method here and i will call that um let's say fancy draw instead of regular draw we are going to do fancy draw so we are going to write here that we need our image and we need the bounding box so uh we will do one at a time so just send us one bounding box and we will put the bounding box around it then send the next one and then so on it will not take an array okay so then first of all we are going to open it up we are going to extract the information from it so we are going to write here x y width and height is equals to the bounding box and then we also need the x1 and the y1 which is basically equals to the diagonal point so x x and y is the first point or you can say the origin point and then x1 and y1 is the bottom right point so the corner points at the diagonal position so rather than calculating it again again we can just define it right now so we can write it as x plus width and y plus height so now we have all the information we need now we can start drawing the lines so here we will write cv2.line and inside the line we need to send in our image and then we need the starting point so i will say the starting point is x and y then we have the x plus a certain value so this is the length of that line so let's write here length is equals to let's say 30 we will write here length and then we are going to give in the value of y so y will stay the same so then we will give it the color the color will be the same as the what you call rectangle so we will write here two five five zero two five five and here we have to write the thickness so let's write here that this is the thickness is equals to let's say 10. so i will write here uh sorry t so uh it might be a little bit confusing at this point but let me show you how it works so let's remove this rectangle from here and i will put this here rectangle and then i will write here fancy self dot fancy draw and i will give in the image and the bounding box so that's the idea and it will return the image so we can write here return image return image okay so let's run this and see what happens so there you go so now you see there's a thick line on the corner now we need to draw the line at the bottom so let's draw that so we will copy this and we will paste it here now this time around the x will be without any extra value no length for the x we will add the length to the y so if we run it again now you see we have a corner point so we have a corner we have two lines at the corner so what we need to do now is we need to replicate this on the others i think the thickness for this should be one we can give it let's say rectangle thickness is equals to one and we can put here rt uh let's run that yeah this looks much better okay so and even the thickness for this is too much let's put it as seven let's say yeah it's getting better maybe five yeah that looks good so next what we can do is we can write here that this is for the top left which is basically for x and y okay this is the point x and y then we will copy this and we will paste it here and we are going to say that this is for the top right and now our position will be x 1 and y so we will just change this to x one and that should work fine x one and y yeah so one more thing is that if i run this now you will see the issue it is going outwards we need to bring it inwards so how do you bring it inwards you simply subtract the length instead of adding in the x direction so there you go so now it's inwards and it's done so we need to do the same for the next two so we will copy this and we will paste here so for the bottom left and then we have the bottom right so for the bottom left we will have x and y one so we will just replace this with y one y 1 everywhere and then this will remain positive but this will be negative again you can play with these and you can see if it gives you a problem then you can replace it if you don't understand it directly so this is the third one and then for the fourth one it will be x1 and y1 so we will write here x1 and y1 and the values will be both minus so both of them will be inwards there you go so now it looks much better than just a rectangle drawn around it so and yeah one more thing i forgot to add is the condition for drawing so if we want to draw or not so here i can write if draw then we do this otherwise no need so if i run this it will draw if i go back here and where is it here and if i put this as false it will give us the values but it is not going to draw oh what is that uh we didn't put that in the draw yeah this needs to be in the draw there you go there you go so now it's not drawing anything at all and if we go down and we write here nothing then it will run because by default it's true so as you can see now we have our face detection which is running almost at 60 plus frame rate which is pretty amazing given that we are only using a cpu and this is an hd video so it is 1280 by 720 if we are running even lower than this for example 640 by 480 the frame rate will be crazy high so you can keep that in mind let's try this one it's pretty good let's try how many do we have we have six videos let's try all of them this is number four it's pretty good and you can see here see that's the thing with our cascades that it does not detect on the sides so if you rotate your face it will not detect at all but this one does so that is pretty cool there you go oh this kid is really enthusiastic then number six there you go so even on the side but if it's gone too far maybe it will not detect but here you can see it's blurry and it's on the side and even then it is detecting uh quite a bit that there is a person there so that is pretty amazing hey everyone welcome to my channel in this video we are going to learn how to detect 468 different landmarks on faces we will use the model provided by google that runs in real time on cpu and on mobile devices so here we are in our pycharm project and we have created a new project called face mesh project now we have a special folder here by the name videos and if you go inside you will see that we have a couple of videos in fact more than a couple you can see here that we have a total of eight videos and each of these videos they have a different size so they are not actually the same width and height you can see this one is smaller so the frame rate might vary based on the size so let's try this one yeah this one is quite i think this one is hd or even full hd probably so yeah so these are some of the videos that we are going to test on so what we will do is we will right click and we will create a new python file and we are going to call it face mesh basics so what we will do is we will learn the basics and once we know the basics we are going to create a module out of it so that we don't have to write it again and again for different projects so the first thing will be to install our libraries we will go to file settings and the project interpreter and we will add the opencv python and we will also add media pipe which is the library provided by google that we will be using to detect all these different 400 plus points on the faces so one of them is already done the other one is also done great so now we are going to import cv2 and then we are going to import media pipe as mp and then we will also import time to see the frame rate so as always the first thing we will do we will run our video and we will use cap is equals to cv2 dot video capture and we are going to give in our video path so here it is in videos videos slash one dot mp4 let's use the second one because the second one has two faces in it so that will be better when we are testing okay so then we are going to write while true we are going to say uh success and image is equals to cap dot read we are going to read our image and then we will say cv2 dot i am show we will say that it is an image img and then cv2 dot weight key as one so that is pretty much it so let's run it and see if it works and there you go so you can see the video is now running and we have two faces uh the first one i believe has only one yeah and it is quite big so it is i think hd i mean full hd so let's try let's work on number two and later on once we are done we can try number one number three four five and so on okay so the next thing we will do is to write the frame rate so here we are going to write c time is equals to time dot time and then we are going to write fps is equals to 1 divided by c time minus p time so p stands for previous c stands for current time so then we will write that our previous time is equals to current time and up here we are going to define the previous time is equals to zero so that should give us the frame rate and then all we need to do is we need to put it on our image so we will write put text on the image and we are going to write fps and in the curly brackets we are going to write integer of fps and yeah that should be it and then we have to give in the location so 2070 and then we will given the font cv2 dot font let's give it a plain font and what else then we will give the scale then the color we will keep it as green and then we will give in the thickness so yeah that should be enough so let's try that out and there we have it so now we are getting the frame rate it is quite high at the moment okay so that is pretty good we can bring this down now the next step would be to use our media pipe library to actually find the points the different points on the face so what do we need first of all we are going to write here mp draw is equals to mp.solutions solutions dot drawing utilities so this will help us draw on our faces now we could draw ourselves as well but the thing is that when they are using their own function they actually make some lines in between some connections between these points and that is quite complicated so rather than doing it manually we could use their function for displaying purposes but if you just want to see the points you can just draw circles by yourself as well i will show you how to do that as well okay then we are going to write mp face mesh is equals to face mesh is equals to mp dot solutions dot face underscore mesh so we will be using uh this to actually create our face mesh so we will write here face mesh is equals to mp phase mesh dot face mesh i know it's quite a bit of repetition but this is what we need to actually create our object and from where we can actually find our faces so then inside this if i press the control button and i click on this it will take me to the function itself to the class face mesh and here it will tell me that what is the what are the input arguments so here we can see its static image mode is false then we have the total number of faces then minimum detection confidence and the minimum tracking confidence so the static image is whether you are using it only for detection or you are using detection and tracking so if it is a static image mode it will always detect in each and every single image but if it is false then it will detect and then it will track so detection is always heavier than tracking so therefore we will detect first whatever the confidence is above 0.5 which means that it has found the probability of 50 for a face then we are going to detect and then we are going to keep tracking that face if the tracking confidence is higher than 50 as well so we can change these parameters as well but for now we are not going to do anything here uh actually we can change the maximum number of faces because uh we want to detect two faces so here we can write max number of faces is equals to two uh the rest we can keep it as it is so then we are going to go down here and this actually accepts this class actually accepts only an rgb image so we have to convert it so here this image is bgr so we will convert it so we will write image rgb is equals to cv2 dot cvt color and then we will write image cb 2 dot color underscore bg r to rgb so that is the idea and then we can simply write results is equals to face mesh dots process and we are going to send in our rgb image so that is the idea and now what you will see is that the frame rate has dramatically reduced there you go so now we are getting right 50s to 60s which is actually quite high considering that i'm only using cpu but if we do it on the first one uh it will be slower because that is full hd video yeah so it's around 40s and if we are doing it on an hd video then it is around 60s so that is pretty amazing okay so that is good and now we are getting some results but now we need to display them so to display them we are going to write if results dot multi-face multi underscore face underscore landmarks then if something is detected then we are going to go ahead and draw but the thing is that you can have multiple faces here so you need to loop through the faces before we actually draw so to do that we will write here for face landmarks let's say face landmarks in results dot multi multi underscore face underscore landmarks we are going to go ahead and loop through that and for that we are going to now draw we will write here mp draw dot draw landmarks and then we are going to give in our parameters so if we go over here you can see that you have the image you have the landmark list you have the connections so here first of all we will give in our image then we have the landmarks which is basically your face lms these are the face landmarks and then the connections so we are going to write uh mp face mesh dot face connections so let's try that out let's move it a little bit here okay let's try that out and there you go so as you can see now we are getting the faces and both the faces are being detected and if you go back and you write here one you will see that only one face is detected there you go and you can see how fast and accurate that is which is pretty amazing to see okay so what if you want to change the size of let's say the thickness of the circles or the thickness of the lines around it so what what if you want to do that well to do that you can write here some specifications so you can write here draw no not mp draw draw specs is equals to mp draw dot drawing specifications and there you can write that my thickness is equals to 1 and my circle radius is equals to 2. let's see because i'm changing right now i think initially it is 1 1 or 2 2 something like that so we are changing it a little bit so that we can see an effect so once we do that then we will go to our mp draw over here and we are going to write our specifications here so we will say that landmark landmark uh drawing spec i think it is the next one so we can directly write yeah so these are the next two parameters so we can directly write them so we can write here that it is draw specs and then again draw specs so let's run that and there you go so now you can see it has changed and um let's say i increase the radius like dramatically let's put it five so there you go so now you can see all these weird points and let's see let's increase this as well to five and there you go so now it is completely blocked so anyways you get the idea so you can put one one here one one with an hd video is fine you know it looks good but if you have a full hd video then one one is is way too subtle at least to me like i can't really see the points here so if you put maybe two and you put here two then it is more visible yeah something like that so anyways you can play around with this all day and you can see which one suits you the best so this is basically the the basic idea of how you can draw now that is all well and good but in reality when you are creating a project you need to use these points you need to know their positioning to actually use them in a project so how can you get the actual points now there are a total of 406 points so that's a lot but what we can do is we can at least look at them and we can maybe number them maybe you don't know which one is which so we can put the numbering over there to see which is the nose which are the starting of the lips starting off the eye edge of the eye and so on so how do you get these values so what we will do is now here we are getting into one face so face lms is the landmarks of one face now in order to now in order to get further deep and find out all the different points we are going to add another loop so we will write here for lm in face landmarks dot landmark we are going to get each of these landmarks and we are going to print it so we are going to write here print lm so if i run this now you will see these are the landmarks so you get the x position ui position and z position so this is the basic idea so now these landmarks we are going to first convert them into pixels so that we can use them right now they are normalized from zero to one so we are going to write here i h iw and ic i stands for image image height image width image channels is equals to image dot shape so when when we get the shape now we can multiply it with the normalized values to get the actual pixel values so what we are going to look at is the x and y if you want the z as well you can add that as well but i'm going to write x and y only so to get the x and y value all we have to do is we have to write l m dot x and then we have to multiply it with the width and for the height we will write lm dot y multiplied by eye height so that is the basic idea so now we have our values uh in terms of pixels and we can do whatever we want with them so let's first of all print them out so we will write here x and y and we can comment this part so let's run that and if we go back you can see these are the points that we are getting so if we want to get the id for it well we can put it in a list and we can check the index of that list but if you just want to look at it you can write here enumerate and here you can write id and what you can do is you can write here id so that will give you the id number and then it will tell you the actual value so here you can see it starts at or does it start okay it starts here at zero and then all the way it goes till 467 so we have a total of 468 values and each value has an x and a y point so a y x and a y number so here what we will do is we will put this in a list so what we can do or or let's let's keep it still here and then now what we can do is we can create this into a module and in the module we are going to put it in a list and we will return something so let's keep the basics still here that we are getting the numbers and everything and yeah that should be good so let's run it for two phases and let's run it with video number two there you go so if we go down here you will see a lot of values being generated that's good we could also add uh an enumerate here and we could write which face number are we talking about face number one or face number two but uh let's let's forget that and let's go ahead and do the what you call module so the idea of the module is that once you create the module you don't have to write all of this initializations and all of this conversions again and again so all you need to do is you need to call that function or that method within our class and that will do the magic for you and it will return you the values and you will be happy to work with it so how do we do that we right click we go to new we create a python file we call it face mesh module module okay so what we will do is we will go ahead and copy all of our code and now we are going to convert this into a module so for the module the first thing we have to do is we have to write what to do if you are running the module by itself so we will write here if underscore underscore name is equals to underscore underscore main then we are going to run our main function and we will define our main function here main and we will write down our loop inside it so we will copy this part actually we will cut it and we will paste it here and then we will go here and we will cut this and we will paste it here and what else we will cut this and we will place it above the while so that gives us the initial part so if we were to comment all of this and if we were to run this it should run so let's try that there we go so this is like we are starting from the scratch um okay so then we are going to convert it into a class so here we are going to write class is equals to face mesh detector you can write a better name probably but we are going to use face mesh detector and then we are going to define our initial method uh for initialization and we are going to given some parameters now these parameters will be the ones that are uh where is it this one so for the face mesh so whatever parameters we have here we are going to give in to our object so let me write down here so we have static image mode so we will write here static mode is equals to false so this will be by default false and then max faces is equals to 2 by default then minimum detection confidence is equal to 0.5 and then minimum track confidence is equals to 0.5 so we are going to write these and then we have to tell that these are the values of this instance so we will copy this twice then we will copy this again we will copy this again and we will copy this again so here we will put equals to equals to equals to an equals to and here we will write self self dot this so we will copy that self.self.self. so if you are not familiar with this i would highly recommend that you check out uh object oriented programming uh the basics of object oriented programming so then we are going to uncomment this and we are going to write self in front of each of these and then we are going to write self here as well and here in the max number of faces and all of this we need to write our new variables so here we have self.static mode self dot faces then minimum detection let's write it in a new line and then minimum tracking so that is the idea so that is good for initializations uh this if you want to make it a parameter here as well feel free to do that i think it should be fine without it okay so next we are going to write a function called or a method called find mesh face or faces i don't know find face mesh let's say face mesh and then we will write of course self and then why is it not okay the indentation is wrong it should be here okay so then we are going to write image and draw so we will have a option to uh draw or not draw so it will be a flag so we can uncomment this and we can go back up here and then we will see what is missing so this indentation is wrong so we need to go back okay so then we will just copy this self dot and again we will start putting the self dot everywhere and self dot results uh self.mp draw whatever it's giving an error just put a self dots okay so that is good and wait what happened here i need to go back okay so that is good and now we should be able to see our results without going into the returning part we should be able to see the result but we didn't create an object or we didn't call the find face so it will not do anything so we need to do that we will write here detector detector is equals to uh face mesh detector and then here we are going to write here that our image is equals to detector dot face mesh or is it find find face mesh here find face mesh and we will give in our image and we will keep its true for the drawing part and here we are going to return our image so we will write here return image so that should be good and let's see if it works there you go so now it's working as an object but the last thing we have to do is we have to convert this so that we are getting our values in return so that's the main thing so and again the drawing part again is optional so we can directly put here if draw then do this so if i run this now it should draw if i go down here and i write here false false it should not draw it will work but it will not draw anything so that's good okay so what is next yeah so now we need to uncomment this and for every face we are going to go through every landmark and through every landmark we are going to convert it into x and y and then we need to store it so where do we store it we store it in let's say a variable called face so this will be a list so we will store it in list face dot append and we are going to append the x and y so this will be the x and y value now that is good for one face but we have multiple faces so what do we do we create another list we call it faces plural and this time around we append after the loop we append faces dot append face so basically when we are looking for the landmarks we append the landmarks in the faces and then once we have that face with all the landmarks then we append the faces so that we get the final result so that is the basic idea actually let's put it outside so it doesn't give random error that uh it has been used before declaration so yeah so in any case we are going to return faces so even if it's empty it doesn't matter we are going to return it so that's the idea and then in the image here i can write here faces and that should return the faces um what we can do we can print here so we can write here if the how can we write this uh the length of faces but if we write the length of faces it will be yeah it will be something okay yeah that should work if the length of faces is not equals to zero then we are going to print the length of faces let's say so let's see how many faces do we get okay we are printing a lot of things that's why we are not having a clear picture so yeah let's run it again there you go so now it is showing one face and why is it showing one face because we put we put maximum two phases why is it showing one that is weird max faces is two did we say something here are we running the module yeah we are running the module still saying one okay let's see why does it say one so it should be four oh this should be inside the loop my bad so now it should work there you go so now you have two faces being detected and if we go to video number one uh where is that with your number one then it will be one face so you can see here it's only one face so that is the idea and then what we can do is to check whether we are on the uh correct path or not we can write here faces at zero let's see what does it print so oh actually it's printing the length no no we yeah actually it's good to see that it has 468 points so that means we are getting all the points that is good but now let's print all the points it will be a long list there you go so now you can see these are all the different points that we have so this is quite good okay so one more thing we can do now if you're not familiar with which point is which number then what you can do is you can print the id number over here so you can write that cv2 actually we wrote it somewhere yeah why write it again if we have it already we can write here that cv2.put text and we are going to put the text of our id so let's just write here string id and where do we need to put it we need to put it at x and y so this is the x and y position uh 3 is way too big so we are going to put it as 1 1. let's see how that works out so this is going to print out the id number of each of the points oh boy okay so it's like a matrix okay so that is bad um maybe we need to look at a video that is uh more focused on the face doesn't have a lot of other stuff let's see maybe this one maybe this one will be better let's try that this number six there you go so now it's much better so still not that good we uh first of all let's let's change the maximum to uh max faces is equals to one so we only look at one face and then let's put this as 0.5 okay it's going to the kid i was hoping it will attach to the elderly person but no it did not happen okay let's keep the maximum faces as 2 and let's make this even smaller 0.3 let's try that no that's not readable 0.5 is let's try that again okay so i can see 1 is here 4 5 then it goes to 195. 151 9 8 so yeah it's a little bit harder to read still 1.7 yeah now it's a little bit better here at the edges you can see very clearly what are the points numbers so 21 54 103 is 67 and so on but in the middle area especially with the nose i think one i can see one here so one is the nose uh the center of the nose let's let's try another video hopefully we will get something better so number by the way you can read all of this in the paper so if you go to media pipe website they have a paper listed there and if you go in the paper you will find these you'll find more information on these points so yeah now it's hard to see with this uh maybe if you have just an image and you apply that apply this method on the image and then you scale it up to see the numbers maybe that will work so anyways this is the basic idea of how you can detect 468 points on a face and that running on a cpu all of this running only on a cpu so that is a pretty amazing task and the results are pretty good you can see let's try out different videos so we have video number one you can see it is pretty good video number two actually let's let's uh remove this part let's remove the id and uh let's keep its normal yeah so let's try it again wait what happened there uh oh yeah the draw is false so we need to remove that there you go that is pretty good you can see it is very smooth it's very smooth yeah then let's try number three there you go even when the faces are a bit far it is detecting quite well number four that is good okay when it goes to the side it disappears and that is understandable then let's try number five this seems like a like a zoom meeting yeah could be used in that with the frame rate we are getting it could be used in a zoom meeting okay let's try number six there you go uh it's flickering a little bit here maybe because they're merging the faces at some point yeah probably because of that then let's try number seven how many do we have eight yes okay that's good okay when she goes down then uh of course it will not detect but as soon as she gets back up uh the face is detected there you go the person is laughing and you can see that that is pretty good okay so this is it uh for today's video i hope you have learned something new if you like the video give it a thumbs up and don't forget to subscribe and share it with your friends and i will see you in the next one hey everyone welcome to my channel in this video we are going to learn how to use gesture control to change the volume of a computer we will first look into hand tracking and then we will use the hand landmarks to find the gesture of our hand to change the volume this project is module based which means we will be using a previously created hand module which makes the hand tracking very easy so here we are in our pyjama project this is the same one that we used in the previous video so in the previous video we learned about hand tracking minimum code the bare minimum code that is required to run hand tracking and then we created a module out of this so that we don't have to write it again and again we can simply import this module and run it as it is now if you haven't checked this video i highly recommend that you go and check that video out first and then you can come back here and we will continue with the uh hand tracking project so the project that we are working on is the let's call it volume control or let's call it volume hand control so we are going to control the volume of our computer with our hand so that is something very interesting now the first thing is that we will go to file and we will go to settings we will make sure that our uh libraries or packages are installed so we will write here opencv python so this is the first one that we need we will hit install and then i've already done it that's why i'm not doing it then we are going to write media pipe and we will click on that and we will hit install so these are the two main libraries that we need for now and later on we can check what we need uh afterwards the first thing you have to do is you have to import your packages so we will write here import cv2 then we will import time and then we have to import numpy numpy as np so these are the basic packages that we will be using later on we will add some more as well so the first thing we want to do is we want to check if the webcam is working and everything is running fine so we will write here that cap is equals to video capture cv2.videocap and we will write the id number so i'm using id1 most probably you will need id 0. then we will write while true we are going to check the success of the capture and then we will write image and then we will write cap dot read so this is the main idea and then we will write cv2 dot im show we will write the image and then img and then we will write cv2 dot white key as one so that will give it a one millisecond delay so this is looking good let's run it and see what happens there you go so this is my hand that we will be tracking and we will change the volume from and this is our webcam so what can we do okay one more thing we can do here is we can write here let's say this is the part where we have our parameters so we can write here that our cam width let's write width of our camera and the height of our camera is equals to 640 by 480 so we are basically defining it here and then we can use it here we can say cap dot set prop id number three is width so we will write with cam and then prop id at number four is height cam so we can write height gap so that should be good and let's try it with the different value 1280 by 720 so this will make it a little bit bigger yeah there you go so but we are going to use 640 by 480 instead okay so that is good what else can we do we can add the frame rate so here we are going to write that our current time is equals to time dot time and then we have the fps is equals to 1 divided by our current time minus the previous time and then previous time previous time is equals to current time so we can define the previous time as 0 over here and what else so now we can put this fps on our image so we can write here cv2 dot put text and we will write image and then we will write f p s and we will write the value of the fps let's say integer fps because it's decimal we don't want decimal places and then we can write the the location 70 and then we write the font so we can put any font that we want and then we will write the scale and the color so let's write two five five zero two five five and then we will write the thickness so that should be good so let's run this and see if it works oh it does work but looks really bad okay so first of all the color is bad so let's change it to blue so we can remove this and then let's try the blue color first yeah the blue color is much better then we can make it smaller it's really big so scale let's put it as one yeah that seems good thickness thickness is fine i think it's fine we can reduce a little bit but and then we can push it up a little bit let's say 50 yeah now it really looks good and in the correct place okay so now that we are all set now we will do the magic part the magic part here is that we already have our hand tracking module so we don't have to write a lot of code we will just use its functionality and we will be able to get our hand very quickly the landmarks of our hand very quickly so we will write here imports hand hand tracking module now if you don't see hand tracking module that is because you need to put it in the same folder as your project so if it's in the it's not in the same folder or if it's not even here then you will not be able to use it so you need to make sure it is in the same folder and we can import it as something else hdm let's say hand tracking module because we don't want to write that complete name it's quite big okay so then we will create an object so this is a class inside here we have a class where is it here hand detector so we will uh create an object from it we will call it detector detector is equals to uh what was the name htm dot hand detector and then we have our default parameters already here so we don't need to write anything for now so we will keep it like that and then what else can we do then after doing this we need to find the location or we need to find the hand so we will write here that detector detector dot find hands so this is the method that we created so this is this method find hands and all we need to do is we need to send in the image so this will give us the hands so we want to draw it so we will not put it as false we will keep the draw as true and it gives us the image back so we will accept that image back again so that is pretty much it so we can we can separate this code not too much we can separate this code so that it is easier to read so now we can run it and see if it works and there you go so now we are getting good detection and it seems quite good so what happens sometimes you can see it detects a little bit on the side and then it detects small hand as well so what we want to do is we want to change the detection confidence here so at default it is 0.5 so the detection confidence is 0.5 we wanted to be really sure that it is a hand and then only detect the hand this way when you are changing the volume it will be a little bit smoother because it will not flicker too much so we can right here detection confidence is let's say 0.7 so you can play around with this value if you want but i will keep it at 0.7 yeah seems fine to me and then we are going to go to the next part which is the best part of getting the position so now all we have to do to get this position is to write that our landmark list is equals to detector dot find position and then we just send in our image that's it uh and we will also write draw is equals to false we don't want to draw it because we are already drawing it so we will keep that as false and now let's run it and there you go we are getting it but i forgot to print so print lm list so let's run that and there you go so at the bottom you can see that we are getting that list so here we are getting that list and we have a total of 20 values here uh 21 values because we are counting 0 as well so we have a total of 21 values now if we want to get the value of a particular point then we can write that point itself so here for example we need point number two so this is the landmark number two so if we run that again okay so this is a good point we are getting an error because the index is out of range so before we actually print or before we actually do anything related to the points we have to make sure that there are some points so we will write here if lm list the length of it is not equals to zero not equals to zero then we are going to do this otherwise it will skip it so let's try that and there you go so now i'm only getting landmark number two so this is good but how do we know which landmark do we need so here is the media pipe website and they have given us the landmark model information so these are all the values that we get all the landmarks that we get so here we will need the value number four which is for our thumb and we will need the value number eight which will be for our index so we need the tip of both the thumb and the index so we will go back here and here i can write i need the value number four and i need the landmark list i need the value number eight so this way we will get only these two values there you go so now i have these values and you can see they are changing but to make sure that we are using the correct ones we are going to create a circle around them so we will write here cv2 dot circle and in the circle we are going to say that we want to put it on our image and then we have to give in the center value now here what we can do is we can create some variables so that we don't have to write all of this again and again so we can write here x1 and y1 is equals to lm list at number four and we need the first element of it so this is the id number zero then this is number one element which is the x and this is the second element which is the y so we need the first element as x and then we need the second element as y and then we can copy this and we can do it for the index as well so we will write here this is 8 and this one is 8 as well so now i can simply write x1 and x2 x1 and white one and then we are going to write the radius let's say 15 and then we will give in the color let's say 255 0 and then 255 and then we will write cb2 dot filled okay so we can copy this for the other one and we can paste it here and we can write x2 and y2 so this will create two circles at that point so if we are getting the correct points then it should draw on the thumb and the index and there you go so now you can see that it is drawing on the thumb and the index so this is good now the next thing we can do is we can create a line between them so we can write here cv2.line and within the line we'll give in our image and then we will give an x1 and y1 and then we will give in x 2 and x 2 and y 2 and then we will give in what else do we have we have the color so 2 5 5 0 two five five and then we have the thickness so let's try that out there you go so now we are getting the line as well in between okay so next thing what we have to do is we have to get the center of this line so we can do that simply by writing here cx and cy is equals to x1 plus x2 divided by 2 and then y1 plus y2 divided by 2. so we can simply write it like this and this will give us the cx and cy and we can put a circle for that as well so let's write here c x and let's write here c y so let's try that out and there you go so now we are getting a nice circle in between so this is good now next thing we can do is to find the length so this is the most important thing we need to know what is the length between these two points or what is the length of this line when we know the length of this line then we can change the volume based on that so what we can do is we can write our own function and we can do a little bit of maths to square it and then square root but instead we are going to import math and within math we have the hypotenuse function so we will use that so we will write here that our length is equals to math dot hypotenuse and then inside that we are going to write x2 minus x1 and then y2 minus y1 so that will give us our length so we can print it out so let's try that i think we are printing something else else as well so we need to remove that first okay let's try it okay so here you can see that when i increase the distance the value increases when i decrease it the value decreases so the maximum you can say is around 300 something so we can say let's say it's 300 and the minimum let's say is 50. okay so this is the maximum and minimum that we have so one thing we can do is we can write here that if our length is less than 50 then we want to change the color of our center circle so let's make it green so this will give it like a button effect so when you're pressing it it changes so here let's let it load okay so here when we come closer there you go so now it actually feels like a button and it is very soothing i don't know why but it's pretty cool and it's quite fast so it is good so this is good and now what we can do is we can uh change the volume based on this length so how can we do that well we have a couple of libraries that can help us with this the one that i found was paiko i don't know why it says call but it is developed by andrey miras so thanks to him for developing this awesome library which allows us to change the volume of our computer so it is under the mit license and if you go down you have to just write pip install pico and then you can use this code as the template so i will copy this because i'm lazy and i will paste it here okay so now what we have to do is we have to simply go to file settings and we will add pico by call you can do pip install as well it's the same thing so we can click on pico and we can click on install so that is installed and we're good to go if we go back you will see all the errors are gone so we will copy the imports and we will we will cut them and we will paste it at the top and the rest of it we are going to see what do we need and what we don't so what we can see here here is that we have the volume get mute so this seems like the initializations so we are not going to change anything there then here we have volume dot get mute uh we don't want that then we have get master volume level um do we need i don't think we need that we might need the volume range and then we have volume dot master volume so we can set the volume so before we set it let's see what is the range so i can print this and there you go so our range is from -65 to zero so this is our range so we are going to use these two parameters so 0 will be maximum and 65 will be minimum we will ignore this value so if i set the volume as -20 let's run it and see what happens so right now it is at 26 the volume goes to 26. if i set it as let's say minus 5 let's see what happens to the volume it goes to 72. and if i put it as 0 then you can see it goes to 100 so this is basically the idea so what we can do is we can get the minimum and maximum range so we can write here that our volume range is equals to volume get volume range and then we will say that um or we will keep it like this let's keep it like this we can take the values later on or i think it's better to write so we'll write here minimum volume is equals to uh volume range at one visit yeah at zero and then maximum volume is at one so this is our range okay so now we can use minimum volume and maximum volume instead of using this okay so what is the next thing that we need to do now the next thing is we need to convert our volume ranges so as you saw that our minimum and maximum was 350 so we were getting let's write it after the length so here we know that our our hand range was from 300 was the maximum and the minimum was 50. so it was from 50 to 300. now we need to convert it to our volume range so our volume range is from uh minus 65 to zero so we need to convert this range into this range so in order to do that we have a very simple function in numpy so we have did we include yes we included numpy so in numpy we can write here volume is equals to numpy dot interpret and inside that we are going to give in the value that we want to convert so we want to convert the length and now we have to give in the range so our range was 50 to 300 and then now we have to give in the range to which we want to convert so our range here is the minimum minimum volume and this will be the maximum volume so let's print this out so let's print the volume so let's remove that and let's see if this works so right now it's minus 25 if i go down it goes to minus 65 so here our volume is zero and if i go up you can see it should go till zero yeah there you go this is the maximum so this is good uh if you are a little bit confused you can write the length here as well so the length and the volume you can see side by side we can make it let's make it integer that doesn't give weird values so yeah here you can see when we have the minimum volume then it gives us minus 65 and then when the length is maximum it gives us zero so this is exactly what we wanted so now that we have converted this we can simply send it to our master volume so we have the function here we can actually remove it from here and we can go down and after the volume we can paste it here so set master volume level we are going to set it as our volume so let's try this so i will open up my volume and let's try this there you go so if i go to zero it goes to zero as well if i increase the length you can see and now it goes to the maximum i think 300 is a little too far well it's fine you can see i can change the volume from here you can make it a little more smoother as well by changing the range of these two over here and even this you can play around with because i can see it's not very proportional so you can play around with those values to make some changes now the last thing we can do is we can show the volume bar on the side so that it looks a little bit nice to see what is the volume at any given point so what we can do here is that we can create a rectangle so we will go down and we will write here cv2 dot rectangle and we will put it on our image we will give it uh the initial position and then we will give it the ending position so we will give that and then we will give in the color so let's make it green and then we will write cb2 dot filled so this is the idea so the width of our bar is basically 85 minus 50 which is 35 so if we run this there you go so this is our bar and what we have to do is we have to remove the fill we don't want to fill we want let's say three so let's run that yeah so there is our bar and then the next part will be filled so the next part here we will copy this and now we will give in the volume so we will say that our width is the same but the height will be different so we will write here integer our height so volume of this and do we need to change anything else uh not really so here we need to change cv2 dot filled so if we run this volume is not defined oh yeah volume is not defined here so we can write here volume is equals to zero okay there you go so now you can see that it's going out of the uh what he called image so if i try to change let's try to change does it change no because it is too big for us so what we need to do is we need to convert our range again so in this instance our range is from let's say 400 so 400 let me show you here so 400 is at this point so this is the height 400 so when our volume is 0 it should be 400 and at this point we have what do we have we have 150 when the volume is maximum it should be 150 so our new range so we can write here that this is volume for the bar so we will write here it's again 50 from 300 till 300 but the minimum now is 400 and the maximum is 150 so instead of volume we will send in volume bar so let's run that okay now volume bar is not defined so we need to write here volume bar and let's do volume as well volume is equals to zero okay let's run it still the same why where did i make a mistake volume bar is from so where is the issue let's run it again and see if i put my hand in ah okay so the first value is wrong so we need to go up here and volume bar should be 400 so it is at the first point it should be zero so our zero is at 400 so there you go so now if i bring in my hand and now if i change you can see it changes the value there you go so the last thing we can do is to add a percentage at the bottom so we can copy this and we can paste it here and in the text we are going to given our percentage but again we don't have any percentage so we can create another conversion here we can write here volume percentage equals to 50 to 300 and now it will be from 0 to 100 so we need the percentage of that and then we can write here that our volume this is our volume should we write volume or let's just keep it like this and then we can write here volume percentage and then we can write percentage in front of that but the location let's put it 440 let's keep it at 40 and then let's put this as 450 and let's keep the rest same or let's change the color let's make it green so it matches that so let's try that out okay volume percent i always forget this volume percentage is not defined you need to define it as zero or should it be zero uh yeah it should be zero okay so zero percent sixty-four it's hard to see this color let's change the color completely so let's keep it as blue so i will copy the blue color where is it from the fps and we will put it for all of these okay let's try it again yeah now it's much much clearer so here we can see so 100 is a little bit hard to reach so instead of 300 i can make it 250 for example 275 but now you can see the percentage is changing let me show the volume bar and you can see here that the percentage is changing hey everyone welcome to my channel in this video we are going to learn how to count fingers we will first look into hand tracking and then we will use the hand landmarks to count the fingers and all of this will be happening in real time and it requires close to no installations and configurations so here we are in our python project and you can see that this is the same project that we used in our previous two videos in the first video we looked at the bare minimum code that is required to do the hand tracking parts and once that was done we created that into a module so that we do not have to write the code again and again and it will be easy for us to create new projects so this will be one of those examples where we create a project out of this module another example that we did earlier was the volume hand control and you can see here that this was the code that we did earlier so all of this is available on my website and now what we will do also there is another folder here you can see that it says finger images so basically what this is that we have the images of different fingers so when it is one when it is two three four five and one it is zero so we can have very specific ones as well where you have the index and the pinky finger up so then you can have only the thumb up and you can have all sorts of different scenarios but for simplicity we are just going to use these six scenarios but if you want to add more later on you can do that and it will pretty much use the same code and you can keep adding on to it okay so once we are in our project we will go to file settings and we will make sure that everything is installed now because this is the same project i know that the packages are already installed but if somebody is doing this for the first time then i will show you what you have to do so here you will write cv dash python and you will install this and then you will go to media pipe media pipe and then you will install this so both of these are installed so we don't have to worry about that and then we will go to our project we will create a new file and this time around we are going to call it finger something finger counting projects let's say so the first thing we will do is to import our cv2 the opencv library then we will import time and do we need anything else we will also import os so i will tell you why we need os later on now the first thing as always we will turn on our webcam so here we will write cap is equals to cv2 dot video capture and we are going to give in device number one most probably for you it will be device number zero and then you have the option of giving the size so you can write for example cap dot set and here we are going to write that this is number three number three is for the width so i can write the width of the cam and then i can write cap dot sets and the heights of the cam so we need to define the width and height so the width of the cam and the height of the camera is equals to 640 by 480. so we can write it like this and what else so then we have to write our while loop so while true then we are going to write here success and image is equals to cv2 dot cap dot read so it will read our frame then we write cv2 dot i am show and inside that we write image and then img and at the end of the day we have to give it a delay so cb2 dot weight key as one so this will give it a one millisecond delay so that we can see our images okay so what else okay the spellings here are wrong success okay so let's try this out and see if we are on the right track so this is my webcam and you can see my hand and that seems good okay so next we are going to do something new here and that will be to import our images so we have all these images so what we need to do is we need to get them one by one and then we want to store them so that later on whenever we have the certain amount of fingers shown then we can display that image so we need to store them first so how do you store it you will use os so what we can do is we can write here that let's say our list my list is equals to os dot list directory so we want to list the directory we want to list all the files that are present in finger images so we will say that our folder path is equals to finger finger images so i will copy this and i will paste it here so now if we print this out you will see that we get a complete list there you go so we get all the names so one thing to note here is that i have put them in order and the last one is zero so when there are no fingers then it will be zero here now you might say why didn't you put it here in the beginning and there is a reason why and i will show you later on why this is the reason what is the reason so then we are going to create a list of images so we can say list of images or we can say let's say overlay because we want to overlay this image on our main image so we will say overlay list is equals to empty and now we can loop through our list so we can say that for image paths in our list we want to create we want to import our image so we will write here image is equals to cv to dot i am read and then we have to given the path of the image so this is the path of the image so it is in finger images and it is one dot jpg so one dot jpg is basically this i am path and then our folder path is basically finger images so we can write here f and then we can write here folder path okay f needs to be small folder path and then we can write slash and then we can write our image path so we can write here image path impact okay so now if you're confused let me show you what this will look like so you can go here and we can print this and we can skip the import or we can import doesn't really matter so there you go so for each of the images you get this finger images at one dot gpg then finger images 2.jpg and so on so this way we get all the paths and we can simply import now we have imported it let's keep it there we have imported it but we didn't save it so we need to save it in our list so we will say overlay list dot append and we want to impend our image so that will give us our image list now to confirm that everything is working fine we can write here length of our overlay list and we can write here print so if that list is six then we should be good to go and there you go so the value is six so this means we have imported all these images and we are good to go so how exactly do we overlay an image now the thing is that image itself is a matrix so what we can do is we can define that our new image we want to put in our old image based on this location so what we can say is that our image is equals to so if i wanted the elements the first element of this list i would say overlay list 1 or overlay list 0 right so the same way if i want to target a specific region of my image then i can write that target space in this bracket so this is also called slicing so what we will do is we will give in the height first the range of the height and then we will give the range of the width so we will say that i want to put my image this overlay image whatever it is let's say we are using overlay list at number zero so we are using the first image so i want to put that at zero wait what happens uh i think insert is pressed or yeah okay zero and then 200 this is the limit of my height and then 0 to 200 this is the limit of my width so now now the reason i'm putting 200 is because these images are 200 so the size of this image is basically 200 by 200 so we can automate that i will show you how to do that too so let's try this and see if it works so i will run this and there you go so now you can see the image number one is displayed at 200 by 200 so this is 0 0 and this is 200 200 so our image is now displayed properly if i wanted to shift this i can write for example 100 here and then 100 here so we are getting an error that we have hundred and two hundred uh okay so the problem is that the image is of size 200 by 200 and they are saying that you cannot convert it to 100 by 100 so here hours if i'm increasing 100 here i need to increase 100 here as well so that it maintains that 200 size so if we run this now then you can see that the image has shifted down so this is how you can place your image within our original image so here we are going to write 0 and 200 and 0 and 200 but now we want to automate this so let's say you don't have a 200 by 200 image you have something else you're the size of the image is something else and even the size of each image could be different so what can we do then what we can do is we can write here that for example we are using overlay list 0 okay so we will write here overlay list 0 dot shape so this will give us that shape and we can store it in height width and channel because we have these three things and then we can replace this with height and we can replace this with word so if we run this now then it will give us the same effect because we don't have to worry about uh the size of the image it will put it on the corner so this is good now we understand how to overlay our finger image on the original image now what we can do is we can display the frame rate so here we can write current time is equals to root time dot time and then we have to write fbs is equals to 1 divided by current time minus the previous time and the previous time we have to declare up here in we will put it as zero and then we will say that our previous time is equals to the current time so this will update every loop and then we can simply write cv2 dot put text we will write in our image and then we will write the then we will write the fps as an integer and we will write here fbs and what else then we will write the uh what do you call the location so we can put 470 and then we will write cv2 dot font any of these and then we will have the scale and then the color and then the thickness so let's try this out and see if we get a good image whoa that is really big so let's change this to font plane there you go so now it is quite good and we can see that it is working so that is good so the next thing would be to actually go into our hand tracking part so here we are going to go up and we will import we will import our module which is the hand tracking module and we will import it as h m so this is what we will do or let's say htm hand tracking module and then we will create uh not here we will go down and here we are going to create a detector so we will write here the detector is equals to htm dot hand detector and we will not give it any values or should we we can give it the detection confidence so we can keep it a little bit higher so 0.75 let's say later on if we get some errors we can change that too so that is good so here we are going to write that we want our detector to find the hands and we will send in our image and we will just ask it ask it to return our image so if we go to the hand tracking module you will remember that this is our class hand detector class inside that we have a method called find hands and it just needs an image and it will output the image with the drawing so we can write here image we will return image back and if we print this it should uh draw our hand so there you go so we have our hand and it is drawing nicely so that should be good and then what we can do is we can create a list of the landmarks that we detect so we will say that detector dot find position and we want it to find the position within our image and we want to draw as false because we are already drawing so we don't want to draw again so we will write here that drawing is false and then we can print our lm list to see if we are getting something so right now it's empty and when i bring in my hand you can see that the list fills up so that is good and now what we can do is we can write here that if the length of our lm list is not equals to zero then we are going to do something so that something could be anything so what we are trying to do is that we are trying to get the tip of our fingers and based on that tip we can decide whether our fingers are open or closed so here is the website of media pipe and we can see that these are our landmarks so what we have to do is we have to first of all get all these points so we need point number four point number eights 12 16 and 20. so we need to use these and then we need to check whether these are below let's say number six or number seven i think it's better to take number six you can even use number five but i think number 5 will be too much so you can say that if number 8 is below number 6 then it is what you call closed then the finger is closed and if it is above six then the finger is open so what we can do is we can pick one of these and we can try it out and then we can apply it to the rest of them so let's say we pick our index finger so this is eight and six so what we will do is we will write here that if the lm list at number 8 we will get the value of the y not the x so y is the third element so it will be 0 1 and 2 so we will write here 2. so if the point number 8 is less than the point number point number six then it means it is open so in that case we will write here print index finger open so because we are using the opencv orientation so up means lower values so our image is starting from the top so the maximum value at the maximum height is zero so to check for example here our value is 50 and here our value is 100 then it means our finger is open if it's the opposite if it's 100 here and then 50 here then it is it means it is closed so here we are going to try out and see if it says finger open so here right now it's saying finger open if i close it you will see that it stops saying actually let me remove the print it's quite annoying so let's run it again so right now it will say index finger open if i stop if i close then it will stop and if i open it again it will say index finger open so this is how you can tell if the finger is open or not so now we need the tip points for each one of these so for each finger we need a point now we could write a lot of if statements and if you are using a lot of different types for example if you have the pinky finger and the index one up as well and then you consider that a different gesture than your two fingers of any kind for example the index finger and the middle finger up so if you want to differentiate between these then you have to create lots of different images lots of different scenarios but here we are only going to use six scenarios so we can simply use a for loop so what we will do is we will create a list here and we will call it tip ids and this tip ids will be basically number four which is for the thumb number eight for the index number 12 for the middle finger then 16 for the ring finger and then 20 for the pinky finger so these are the tips and then what we can do is we can put a for loop here and then we can change this value so here we can say for let's say id in range what is the range from 0 to 5 we are going to repeat this and here we are going to put in the value so the idea is that we have our id number four so we can write here tip ids at number id okay and the other value so here you can see it is eight and here it is six so it is minus two so whatever value we have here minus 2 so this minus 2 so this should basically loop and it should tell us for each one of these but once it tells us that if the finger is open or not we need to save that so here we can write fingers fingers is equals to empty and here we can write fingers dot append we can append either one or we can append zero so if the finger is open then we will append one if the finger is closed then we will append zero so here we will write prince fingers so let's see how that works out so if we have our hand in you can see all the fingers are open if i close all of them close except for the thumb so we will discuss the thumb but let's try the other ones out so here we have one the index finger then the middle finger then the ring finger and then the pinky finger so you can see it is really good i can do one two three and four there you go so that is amazing so what i was saying earlier is that for example if you have this pose it will show you that the index is open and the pinky is open but for images we are still going to use this image that it will show us that there are two fingers open because we are not using all the scenarios now one thing you have noted is that the in the thumb is an issue and the reason is that the tip of the thumb is not acting the same way so we are using the tip at this point so this is the point where we are using the tip and we are saying that when it is below this point because -2 will be here at this point so this basically is never going below that if i really push my finger maybe not even then so that is not a good way to check it so how can we check it can we say that if this is below instead we can write -1 even in that case it is very hard to bring it below that maybe now it will say 0 but it will be very hard so how can you tell if the thumb is closed or not so for the thumb what we actually do is naturally when we are closing we put it on the side we don't bring it down we put it on the side so we can check whether this point here at the top is on the left of the second point or on the right so right now it's on the right now it's on the left so when it comes to the left side we will say it is closed and when it's on the right side we will say it's open so this is the idea so let's see how we can do that so what we will do is we will keep this for loop for the four fingers so we will just make it one to five and then over here we are going to create another loop uh not another loop just an if statement and we will say that the id number one which is the x axis is less than this point id number one and then we will check minus one not the uh two values below only one value below only one landmark below so if you are not clear about this i'm talking about this point number three so if this point is on this side then we will consider it closed if this point is on this side of the point so here it is on the left then we will consider it open so keep in mind for the left and for the right hand it will be different so if you are doing it for the right hand uh it will be the same what i am doing if you are doing it for the left hand then you can uh do it the other way around now you might say that if i am using it with the left hand with this code it will not work yes it will not work but there is a possibility of checking whether it's a left hand or a right hand and then based on that you can change your parameters you can change the if statement based on that so it is not something very complicated it's very simple but right now we will just focus on the right hand so here what we will do is uh okay we have already written the code so we don't need to do anything else so we can write here thumb thumb and we can write here uh four fingers so that is the idea so let's try this out if it works or not okay we have another must be integers or slices okay uh oh there's no id my bad so there's no id we have to get our id by itself so this will be number zero so let's run that again okay then we have another issue tip ids zero one okay i think there's an issue with the brackets so there should be a bracket here so tip ids add zero yeah and then the first part of it so the bracket should be here and over here uh it will be tip ids minus one and there you go so it should be like that okay so now you can see it is zero when it should be one so it is basically opposite so we need to make it greater than so now you can see all of them are one if i put my thumb if i close it you can see it says zero so that's how easy it is so now if i put all of them closed you can see it says zero and if i open i can open one by one two three four and five and i can get all the detections so that is good now the next thing we have to do we have to change our image so to change the image first of all we need to know how many fingers did we actually get so to do that we can actually let's go up here let's comment this and instead we are going to write here that our total fingers total total fingers is equals to fingers dot count so this is a method in our list so we have a list in which we can use the count method to count the number of values present of what of the number one so basically we are saying find how many uh ones do you have i think the indentation of this is wrong so we need to fix that okay so then we can print the total fingers let's try that oh it has no attribute counts because c is supposed to be small okay so let's try that here we have 5 0 1 2 3 4 5 4 3 2 1 0. there you go so now it's looking good and all we have to do now is to change our image so how can we change the image so here we have the code for changing so we will bring it in the loop or what is it in the if statement so we will bring it in the if statement and then we have to change the value over here so we have to put a value based on the total fingers so already we have laid down if it is finger number one it should be element number zero so we can take this total fingers wait what happened total fingers minus one so if it is one it will become zero so it will take this image if it is two it will become one and it will take this image so this is how it will work so let's see if it changes we are taking the shape of zero we should take the shape of the same image even though all of them are same doesn't matter at this point but overall it could be different okay so we have 5 now and then we have zero one two three four five so that's how easy it is one two three four five and it looks like an animation if i do it fast and it's running real time so it looks really good now you might say we didn't add zero we didn't we actually never said go till six so this value of total fingers can be let's say at five maximum right and then five minus 1 is 4 how is it going to 6 that does not make any sense right so this is why i put it here as 6. what happens is that when the value of total fingers is 0 it gives the value of minus one and in python what we have is that if we write minus one of the list it will take the last element so the last element is the value number zero so that's why i put the image at the end so it will take that minus one value and whenever it is zero it will become minus one and it will take the last element which is the sixth element so this is uh the results so you can see here whenever it is zero it gives us that a fifth image which is image number six so this is good and we are pretty much done the only thing we can do is we can add a rectangle to show the count so that it is a little more appealing you can say so here we can write cv2 dot rectangle we will put in an image and then we have to give it the uh the points the starting point and the ending point uh 170 so i have uh tried out this before so i know the values that will work properly so i'm directly inputting those so then we can put the color let's say green and then we can write cv2 dot filled so if we run this uh you will see whenever we have our image or whenever we have our detection we get that green rectangle so this is only coming when we have the hand because it is in the if statement so if we put it outside the if statement it will it will always appear but i prefer that it disappears whenever the detection is not there so then we are going to put the text we will write cb2 dot put text and inside that we will add our image and we will change the text to total fingers and then we will give in the location and then we have to give in the font so we will pick the fonts plain font and then we have to give in the scale we will keep it really big so we can see two five five zero and zero and then we will put 25 so this is the thickness so let's try that out and we'll bring in our hands and if we close it's zero one two three four five i will do it again one two three four and five one two three four five so as you can see it works really well and the detection is really great but again if i use these two two fingers it will still show me the image of two and if i let's say put these three up it will still show me the three default image that we have so if you want to change these you will have to put if statements for each of these you will have to have an if statement rather than a loop and you can define if this is the case then this should be the image if this is the case then this should be the image so each of the fingers can be like a binary so if that is 0 0 1 1 then you do this if it's 1 1 1 you do this 1 1 0 0 you do this and so on so you can do it like that hey everyone welcome to my channel in this video we are going to create a personal ai trainer we will use the pose estimation running on cpu to find the correct points and using these points we will get the desired angles then based on these angles we can find many gestures including the number of bicep curves we will write the code in a way that you will be able to find angles between any three points with just a single line of code so here we are in our python project and you can see that this is the exact same one that we used in our previous video so we started off with the bare minimum code so you can see this is all of the bare minimum code that we required to run our pose estimation and then we created a module out of this so this is that module which allows us to create these projects very quickly so today will be one of these examples where we create a project very rapidly so then here is our awesome project so this was the demo of how you can actually utilize this to run your module so these were the things or these were the files that we created last time so if you haven't checked that video i highly recommend that you do go through that video before you continue here now today we are going to do the ai trainer so we have a new folder here called the ai trainer let's open that up and let's check out the contents so here we have a test image so basically the idea is that we want to find the angle of any three given points so point number one point number two and point number three so this is actually explained here in the summary so what we will be doing is we will be using these three points and based on these three points we are going to find the angle between these two lines so that will tell us uh how much angle we are at and based on that we can do some calculations for the gestures or for the poster so that we can tell the person okay you have done this many curls so what we will be doing is we will be counting the number of curls that a person has done you can apply to other techniques as well to see the posture whether they are using correct ones for yoga or something like that so the main idea is that we will do this in two steps the first one will be to find the angles so we will create a method where we we can input any three points and it will give us the angle of these three points so this way we do not have to worry about getting other angles for example if i want for the leg i will have that information as well if i want for the arm and the shoulder i can have that information as well for the elbow and the wrist i can have that so with one single line of code i will be able to have all of these different angles so i will just have to specify the landmark number for example uh for this arm it is 11 13 and 15 so i can say 11 13 15 and it will give me that one if i say 12 14 and 16 it will give me for this one so for sorry for this one so this is the idea that we will create this method and the second part is where we will try to find the angle and no actually we will have the angle and based on that angle we will see how many curls did the person do so this is our idea so we will start off with this is by the way the video let me bring it here so this is the one we will be using again both of these images and videos i got from pexels.com so you can check it from there or you can find these documents in my website so you will find this folder over there so what we will do is we will right click and we will create a new file and we are going to call this let's say our a ai trainer project so the first thing we will do we will import our packages so we will write here import cv2 then import numpy as np and what else do we need we need time so we will import that so all of these packages we have imported earlier so if you are new and you haven't seen the previous video you can go to settings python interpreter and you can add and here you can write opencv dash python and you can hit install and then you can write media pipe media pipe and there you can install so these are the two libraries that are the most important ones okay so once we have that we are going to import our image and the video so here you can see we have the image not this one the test image and the video so we will be using the test image at first for the angles and once we have that then we will use the video for the curls now uh we will write the while loop anyway so that we don't have to switch at the end we just can remove one line and it will convert to video so we will write here first that we have a video capture device so we will write here uh cv2 dot video capture and we will say this is the ai trainer slash what is it girls dot mp4 and then we will say while true we are going to check the success and the image and we will say cap dot read and oh and then we will say cb2 dot im show we will write here image image and then we will write image and cv2 dot weight key and layoff one so if we run this we should get our video running so let's run that and there you go so as you can see this is quite a big video it's quite huge we can resize it we can write here image is equals to cv2 dot resize and we want to give it a specific one so we will write here one two eight zero by 720 so let's try that and there you go so now this is good and we are ready for the girls part but as i mentioned before we are going to use the image first and later on we are going to use this so we are going to comment this and here instead we are going to write image is equals to cv2 dot i am read and then we can read our image so it will be a i trainer slash what is it test.jpg so we can actually put it outside so let's run that and there you go so we are getting our image so we will check the angle of this and we will see how we can calculate that okay so that is good now we need to find our pose so to find the pose what we have to do is we have to import our pose module so here is our pose module that we did in our earlier video so here we have the class pose detector we are going to use that to create an object once we have the object we can use find pose to find the pose and then find position to get all the data in that list so here we are going to say detector the detector is equals to oh i didn't import forgot to import so imports pause module pose module as pm and then we can write here pm dot post detector and we do not need to give any inputs at this point so then we can come down here and we can write here that uh detector dot find pose and do we need to input anything uh we need to put input the image and then we need to tell whether we want to draw or not so by default it's true so we will draw and what else so do we need anything back uh no not really we can yeah we need the image back so let's run that and there you go so now you can see uh okay this is not good we need to put it inside so it's detecting again and again there you go so now you can see it is detecting the pose and now we can try to find the angle but how do we get the landmark values so we can use the get position so we will write here nlm list is equals to detector dot get position wait why is it not showing detector dot is it oh it's find position okay so find position and then we will write image and we don't want to draw so we will write here false i think there are only two arguments so we can directly write false yeah so that's fine so let's run that actually we need to print to see if we're getting anything lm list and let's do that and there you go so this is our list and we can see we have all these 32 points so that is good okay so now we can uh first of all we need to make sure that we have a list where we have uh the post detected otherwise it will give us an error so we can write here the length of our lm list is not equals to zero then we are going to do something magical so what is the magic that we are going to do so now here the thing is that we can write the code here but then it will be for this project only we can write here pass for now so that will be for this project only but we don't want to do that we want to we want to enhance our pose module by adding a method to it so this pose class which is the pose detector it will have another method that will allow us to get the angle of any three landmarks so instead of giving the points for example 485 and 281 what we will do is we will say we want to find the points between three four and five and because it already has this method and it already has this list we will make it an instant list so that it is for that particular object so then we will not have to even input the point value we just have to input the number so we need to know which landmark numbers we are talking about so how can we do that let's start by writing our code so the first thing we will do is we will create a new method we will call it find angle or angle no find the angle because it's just one angle then we will give in our image um the image the image is for drawing so we will input that and then we will need the three points so we will call it p1 p2 and p3 so these are the three point uh landmarks that we need and then we can have the flag for drawing as always so we will keep it as true okay so now as i said instead of giving in the points we are just using the values so we are using value number four value number three you can call them index index number two three four so these are basic basically index values so what we need to do is we need to get the values of the points based on our index value so how can we do that so in the find position you can see we have the lm list so what we can do is here we can send this lm list back again to our object but that is not a good way to do it because we already have it we can just use it internally so we can write here self dot and now this is part of that object so we will write here self dot and we will write here self dot so now what we can do is we can write here that our x1 and x1 and y1 is equals to self dot self dot lm list at point number one so let's say we want landmark number three so that will give us this landmark number three but the thing is it has three things inside it has number three it has 485 and 281 so what we can do is we can slice it so here we will write we need for from point number one till the end so it will take this and this and it will ignore this so then it will store it in x one and x uh in y one you can also do it like this so you can ignore the first one and you can you can remove this one and you can ignore the first one and you can take just the last two but uh let's do the first method let's do it like this and then we are going to write here x2 y2 and then x3 and white by three and here we are going to write point number two and point number three now to make sure we are getting the correct points we are going to draw so here we will write if draw we are going to write a circle so let's just copy because we are lazy and we will just change this to x1 and x2 okay so yeah that should be good we can copy this and we can make it x wait what did i do x 1 x 2 why did i do x 1 x this should be y 1 i always make that mistake y 2 x 2 and then y 2 and then x three and then y three okay so that should give us but we didn't call it so we need to call it here so we are going to write that detector dot get angle or find angle and we will give in our image so now we need to give in the points so if we go to media pipe and we check for the points you can see that we want point number 11 13 and 15 and then 12 14 and 16 so based on if we want right or left so the left one is the odd one so 11 13 and 15 is the left one so here i think the right one is visible so we will use the right one so 12 14 and 16 and the draw we will keep it as true so now let's run this and see what happens and there you go so now you can see these have turned blue so we know that these are the ones that we are using uh should we decorate it more or should we do it later let's do it now so what we can do is we can make it look a little bit nicer just to make sure that we are using these correct points because uh at some point we are going to remove all the other uh what do you call the detection the pose estimates so here we are going to create a bigger circle and let's put it as 15 and let's put this as 10 and we will not fill it and we will put the value of 2 here and for the color let's put it as red so here we are going to write 255 and we are going to write here all of these as red um yeah and then what else so we will copy this we'll paste it here and paste it here we'll make it two and two and then three and three so let's try that there you go uh oh we forgot to remove the filled no we did remove the fill oh we didn't do the size we forgot to change the size okay there you go so these are the three points and what else we can do the line as well so let's do the line before that because uh the circles we want to draw on the line so we will write here cv2 dot line and the first line will be on the image and we will have two points and then we will have we will put the color of white so it's really visible and then we will put the thickness of three so here our first uh what do you call points will be x1 and y1 and then the second one will be x2 and y2 we will copy this and here we will have what happened there here we will have x3 so 2 will be common for both of them one and three will be changing so let's try this uh nothing happens why didn't anything happen because i put this in the wrong place this should be outside okay there you go so now we have the white line and so now we can go back and we can make this false so that we can just focus on these three points and the rest will be gone so there you go so now we have these three points and we want to know the angle between these points so we will go back to our module and here now we need to find the angle so the angle finding is not actually hard so here we can write for example get the land marks and here we are going to write calculate the angle and then here we will write draw well you know it's written draw here and we're writing draw here very redundant but anyways so we will write here angle is equals to we are going to use math so we will write here imports import math so this is just basic trigonometry so we don't have to worry too much about this so what we can do is we can write here math dot tan 2 sorry a tan 2 a tan 2 and then we have to give in y 3 minus y 2 and then x3 minus x2 then we will subtract and we will write here math dot a tan 2 and then we will write y1 minus y two and then we will write x one minus x two so again i'm missing some brackets so that should be good so this will give us our angle in radians so we can convert it into degrees so math dot degrees and there you go so if you have three points and you want to find the angle between the two lines the this is the method this is the equation that you can use so here we can simply write print angle and we can see what exactly is our angle okay uh what happened there so there's a mistake for sure it's the brackets yeah and this is extra so yeah i put one extra my bad okay so here we are getting our angle 87 which makes sense like it is almost 90 degrees you can see that so 87.3 it's not bad so what we can do is we can put this on the actual text or the actual image so that we can see so we can write here cb2 dot put text and we will write image and we will write the angle but let's convert it into integer and then we will convert it into string otherwise it will not accept and then we will write here uh the value so the position so here we will write x2 so this is the center point so we have x1 and x2 these are the further points and x2 is the middle point so we want to write the value near the middle point so we are using x2 but we don't want to write it exactly at that position so we can subtract like 20 from it and then we can write y2 and we can let's say add 50 to it so we can change these values if we are not satisfied okay then we can write cb2 dot font let's pick the plain one and then we will write the scale and the color so color let's put purple and then we will write the thickness so let's run that there you go so we are getting 88.87 so that's good uh the x is bad so let's do we did minus right so minus 50 let's see yeah it's better now maybe let's convert this into blue that's more visible now the background is a little bit black so it's not that visible maybe red will be more visible or green yeah red is not bad so 88 degrees is what we are getting and sometimes what happens is that we get a negative value so for that case we can write here should we actually let's remove this and for those cases we can write if our angle is less than zero then we will say our angle we want to add 360 to it so 360 minus whatever the value will be so that will solve that problem so now what we can do is we can use any three points to find our angle so let's say i want for this is for the right one so we can say here right right arm let's say and then i can do the exact same thing with just one single line of code i can write here left arm and i will just change the values 11 13 and 15 and now you will see it will do for both of them and that is pretty amazing so it is telling the angle for both of them at the same time so uh the last uh the the one that is hidden is not very clear so you cannot rely on that angle but the one that is here is quite good so we can remove this so now we can try this on our video so for the video as i mentioned we want to do it on the left arm rather than the right and by the way you can do it on the legs as well so it's up to you now which three points you want to take for example you can take 23 25 and 27 so it will tell you the angle between these two lines then you can use 24 26 28 it will tell you the angle between these so it is up to you which ones you want to use for your own project but for now we will use the left arm for the bicep curls so now we will remove this we will uncomment this and we will remove the image part and let's try to see if we get the angle there you go so now we are getting our angle and as you can see probably it's going to negative and all so that's why it's giving this value so what we can do is we can find the minimum and the maximum and based on that we can check if it has reached that point or not so let's try to figure out the minimum and maximum now the good thing is if you click on the window it will stop so we can check our values very easily so here you can see it's 338 but that's not the last one so here i saw 190 something yeah 180 probably or let's say 200 yeah 200 is not bad or just to be on the safe side we can keep it a little bit higher like 220 or something like that so that it goes to zero easily and then over here we can see it goes to 300 something so 340 but again might not go always there so 3 30 maybe 20 or 310 i can see it goes again and again to 328 so it went to 340 340 oh it's going to 40. now it's 28 27 29 okay so to be on the safe side we can we can say that it is um 20 right so or we can say 15. so what we can do is we can convert our range we want it from 0 to 100 we want to know how much curl uh we are at so the percentage of curls so at the zero point or at the 100 point so what we will do is we will create a percentage and we will say that did we import numpy yes so we will use numpy to convert our range so we will say numpy dot inter interp and then we will given our angle not range angle oh we didn't get it back that's why it's saying this angle is equals to so i think i forgot to return yeah so we need to return return angle so that will return our angle over here and then we can use this here angle is equals to what else do we need so we need the first range so our range will be let's say 210 to 310 let's say and we want to convert it into 0 to 100 so this is already 100 so we can easily just subtract but if we had a difficult range you can use this method so that should be good so let's print out this value and we will print out the angle uh not the angle the percentage so or we can print the angle and the percentage that would be good to see are we printing anything here no okay that's good so we can run this so here we can see so it's zero going hundred percent then going back to zero then two hundred percent going to zero hundred zero excellent so now we can see that we are getting the correct values so we should not face any issues because we have taken quite safe values if you go too far you might not get good results you have to check the trade-off between accuracy and taking the risks of getting error okay so then we are going to check when are we reaching the the first curl when are we reaching the second we need to count so here we are going to go up and we are going to define two things the first one will be count which will be zero and the other one will be direction which will be zero now we will have two directions direction number zero and direction number one direction number zero will be when it is going up and direction number one will be when it's going down so we will consider a full curve only if it does both of these so it goes up and then it goes down to zero so from zero to hundred and hundred to zero so if we get that then we will consider it as first curve so we could do the other way around but i want to keep it like this where we have um the complete curl if we go back and forth the whole thing so we are going to write here that we want to check check for the dumbbell dumbbell curl girls okay so then we are going to write here that if our percentage is equals to 100 and then we are going to check if our direction direction is equals to zero so this is the first direction it means we are going up now you might say why didn't you just write and here if percentage is hundred and if direction is zero i will tell you why later there's the reason so then we are going to say count plus equals 0.5 so we will add 0.5 to the count so if it's going up and it has reached 100 it will be 0.5 and then it's going down and it reaches 0 then it will be 0.5 so that will be a complete curve so what i can do now is i can change the direction so direction is equals to 1 and then i can check here that if my percentage is equals to 0 and my direction is equals to one then i will write count is equals to plus is equals to 0.5 so the same thing that we did earlier and then we will make the direction one uh no we will make the direction zero okay so this will keep adding to our count and this way we will know which direction we are moving and how many counts uh did we have so far so here we can remove this and we can simply print the count so we can write here print count and let's see what do we get so 0.5 down 1. 1.5 down 2. 2.5 down 3 3.5 down 4. so excellent so if you um okay let's just display it first and then i can discuss so we can display it like this cv2 dot put text and we can write here image and we can write our string so we can write inside that integer so if you want to directly show the decimal places so if you want to show the 0.5 count as well then you can keep it like this so you can write simply count but i don't like that uh or let's try it now i will show you and then we will change it and then we will write here let's say 50 and 100 and then we will write cb2 dot font let's put the plain one 15 and then we will put 2 5 5 0 0 and then 25 actually these are very big values we will put big values later on but for now we just want to see if we are getting the output properly okay so what is the problem it's not callable is that the issue oh okay i forgot to write comma okay so here we are getting 1.5 to 2.5 3 3.5 that is good so what i was saying is that if you don't want that you can simply write here int and then you can put it like that so if we see that now it's zero it becomes one two three so it's up to you which one do you prefer okay so now that we have this we can make it a little more appealing so what we can do is we can first of all add our fps so we didn't add that earlier so we can add it now so we can write here c time is equals to time time dot time and then we can write fbs is equals to 1 divided by current time minus the previous time and then we will write our previous time is equals to current time and then we will put the text so we can copy this and we can paste it here and then we can change it to fps so that should be good now we need to put previous time as zero over here okay so that will give us the time and we can comment this to check the fps so there you go so we are getting good fps now this video is 1080p and we are reducing the size of it and that's why it's giving a lower frame rate otherwise you will have a better frame rate if you uh because here we are resizing so if you directly use maybe this size or even lower you will get higher frame rate okay so then what we can do is we can put our what you call the number of count in a box so let's create a box so i have already checked the values of this box so i will directly input it so we will write here cv2.rectangle rectangle and we will put in our image and we will put in 0 450 and then 250 and 720 so this is for a 1280 by 720 which is hd image so it is for that so 255 and 0 and then cb2 dot filled so this will give us a green box and what we can do is we can uncomment this and we can format it and then here we need to change the value let's put it as 45 and 670. let's run that there we go okay we need to change the size of this so 15 and 25 25 okay so there you go so now we are getting this nice and big so we can see what is happening that is quite good and it is giving us the angle do we need the angle uh no i don't like the angle let's remove the angle from here so let's run it again there you go so now we are getting without the angle so it looks good now what else can we do okay let's put the bar so if you remember we did it in one of the other videos for the volume gesture control so you can check that out as well it was to control the volume of a computer using your hand gestures so we use the bar in that one as well so here we are going to use the bar again so for the bar first of all we need some values so we will write here bar is equals to np dot interp and we will convert our angle from the range of 220 to 300 uh not 300 310 and we will put it as 650 and 100 so this is the maximum value of our bar uh no the minimum value of our bar and this is the maximum value of our bar because the opencv convention is opposite so this is the minimum this is the maximum okay so why is not giving a space here okay so then we need to create that rectangle so here we can create that rectangle we will copy this because we are lazy we will copy twice and we will also put the text okay so now we need to change the values here so the starting will be 1100 and then hundred then we have one one seven five and we have six fifty then we have uh what we call 1100 here and then we have our bar value so integer bar not bad bar and then we have eleven seven five and then six five zero so for the text we will have our percentage so we will make a string and we will write here integer and then we will write our percentage and at the end we will write percent so this we will display at eleven hundred and seventy five and the size will be four and four okay so we can write here that this is for our bar draw bar and let's say we write here uh show or let's try draw girl count okay so then let's try it out there you go but the bar value is not changing what did we do wrong oh so apparently the value of the bar is not changing because it is filled come on so this should be let's say three and let's try that yep so it goes down it goes up it goes down it goes up now one more thing we can do is when it reaches the zero or the maximum position we can change the color so that we know it's it's like when you reach a certain point and or you press a button it changes color so you can you kind of get a feedback so it looks good so that is the reason why we put an if within an if so here we are going to write color is equals to by default we are going to write the color as purple and if it reaches we are going to change the color as green so we will put the color as green so 2 5 5 and 0. and the same thing we can do for our zero if you want to change the color for zero make it different than the other ones you can do that too but we are going to keep it like that so here we are going to write we are going to write here color and then here we are going to write color and here we are going to write color so all of them are same so now it's green purple purple green purple green green yeah so that's how you can tell if you have reached the correct point or not so as you can see it works very well and again if you want to use a webcam you can simply enter the value or the id number of your webcam and it should run pretty much the same way just make sure that your face is visible within the camera and you are at a good distance because a lot of this depends on the face as well so if you are not within the uh if your face is not in the camera it will not detect properly so that is pretty good so i i had a hard time finding uh these videos with bicep curls so that's why i just had this one but i will try to find some more but yeah so that is the idea the error is because the video ends that's why it gives this error it's not something that we made a mistake here hey everyone welcome to my channel in this video we are going to create a virtual painter using ai we will first track our hand and get its landmarks and then use the points to draw on the screen we will use two fingers for selection and one finger for drawing and the best part is that all of this will be done in real time so the first thing we will do here is to go to canva.com and if you're not familiar with canva it is basically a design tool it's an online website that helps you create all these different designs for brochures business cards flyers whatever you want you can basically design here so this is a free website so you can go and start you can start by just signing up so here we have a canvas of 1280x720 so we are expecting our webcam to be of this size that's why we are using this size so the idea here is that we are going to create a design to actually make it look more appealing and make it look more like a software so this is up to you if you want to skip this step you can do that too there will be files to download from my website the ones that i already did so if you want to skip you can go ahead and do that too but here we are going to start off with a rectangle and what we will do is we can we can put it on the side but the issue with the side is that it will be hard to select different elements so what we will do is we will put the menu on the top so our menu will be here at the top now for the width let's keep it at 100 100 is too small maybe 120 125 yeah that looks good so then we can lock this or let's change the color first so i like this blue or maybe this one yeah this one looks better so let's lock this so it doesn't move around then we are going to add let's add the logo first so i'm going to go to photos and right here logo uh no it's in upload my bad so i have my logo here i'm going to place it on the side and yeah that looks about right then we are going to add some brushes so here we will go to the elements and we will search for brushes so let's write here brush uh okay so these are strokes we are not getting actual brushes here we have let's write paint brush maybe yeah i think now we have better results uh i want to look for something that is free this is free okay so maybe something else that is free this is free as well so i'm looking for something free so that everyone can use it not just the pro users um maybe this one no it's pro this one is free too do we have a color of i think this one is the best okay let's keep it this way and if we want to we can change it later on so let's take this part right about here and we will zoom in and maybe a little bit smaller there you go so i think that looks good now the good thing is that most probably this is an svg file so we can change the color for it so let's say we want pink what is that purple let's try pink and what we can do is we can have another one that is a little bit bigger than this at its back so that will indicate that this has been clicked so we can put it okay let's change the color first we can put it as white or gray or dark gray let's put it at the back i think white will be better okay i'm no designer i'm just eyeballing this so i'm gonna put it as whites yeah okay so then what we can do is we can copy both of these and we can paste them here so somewhere in the middle and we can paste another one and then we need an eraser so we don't need the last one actually the one at the back or do we okay let's keep it let's keep it for now i will tell you why later on okay so now we need an eraser so let's write here eraser okay this one is pro pro pro this one is free no not really all of these are paid oh this one is free yeah i think that could work so let's put that a little bit bigger i think i'm putting it too big anyways you can change the design later on uh we are learning the concept here uh or let's just make it a little bit smaller so i think that will be better so we can grab these two and we can place them here this one in the middle this one a little bit further okay i think that is better and then we can make this a little bit smaller and there you go so i think that looks good now what we can do okay these are not at the top to put it at the top here okay so now what we will do is we will change the color for these so the first one is let's say pink the second one let's say is blue or dark blue in between maybe this looks weird okay let's keep this as blue and then the last one let's keep it as green so these are the let's say three colors that we want and then what we will do is we will this doesn't seem right i think at the back needs to go higher maybe yeah maybe like this okay i'm going to copy this and we will paste it here so we'll put it at back and the same thing to the last one okay so you can move it around with your keys your arrow keys so maybe a little bit higher again you can spend a lot of time on this but we are going to skip that we will not put a lot of effort here just for a demonstration purpose but you can of course go ahead and try a lot of different things so um now what we will do is we will copy all of these we will duplicate them so here you can see or let's change the size first so we already know that our size is 1280 by 125 so we will go to resize and we will change this to 125 and we will copy and resize or you can resize is up to you i'm going to press resize so it will do it on the same one then i can unlock this and i can grab all of this and i will scale it up so it should fit perfectly because it is the same size there you go so this is the image that we need so we are going to copy this four times because we will have a selection for each one of these so we can delete the selection for this and the selection for this and this is the first one so first one this is the one selected the second one this is selected and for the third one we will have this one selected and the fourth one we will have the rubber selected so here we are going to it is whites let's change this to let's change this one to the same color as this and let's make it a little darker like let's make all of them darker so yeah and then let's make this it's a black that's too much okay i think that is enough indication so we just need an indication that that has been pressed so that should be fine so now what we can do is we can download all these four images and the idea here is that whenever we detect a click we are going to uh change these images so we will see which one do we need so if we have clicked the first one then we will change the image to this if the second one is clicked we will change the image to this then the third one and the fourth one which is the eraser so at any given point only one of them can be selected so that's the idea so we can go and we can write here that this is our virtual painter let's say and we can download all of these in jpg or png whatever format you want let's do it jpg and we can download all of these so here i am in the pycharm project now this is the same project that we have been using for the hand tracking for the finger counter for the hand uh what he called volume control so we have done quite a bit of projects earlier than this so this is the exact same uh what do you call project and you can go to file settings and you can see that we have already installed our media pipe and we have already installed opencv but if you are new you can go to the add button you can write here opencv dash python and you can install this and then you can go to mediapipe mediapipe is the google library that we will be using for hand tracking so you can download and install this so this is the main idea now what we will do is first of all we will right click this and we will open it up in the explorer now once we are in our project we are going to create a new folder and we are going to call it header so we will have our header images in here so these are our images that we downloaded so i will copy all of this and i will paste them here and now you can see that these are our header files so we have a total of four header files so when we go back to our pycharm project you will see that we have a new folder by the name header and we are going to use that for our images so then we are going to right click and we will create a new python file and we will call this virtual painter so now we are going to import our libraries so we will write here import cv2 import numpy as np and then we will need imports time if we want to show the frame rate and then we will also need imports os because we need to access these files so we will need that what else we also need our hand module so we will write here import hand tracking module as htm so this will be our hand tracking module now if you are not familiar with this if you haven't watched the previous uh tutorial then make sure you go ahead and look at that because this is the tutorial in which we went step by step and we created this hand tracking module so this is very important this basically tracks the hand so if i run this you will see if i right click here this should open up my webcam and if i bring in my hand you can see it tracks the hand so this is the idea of the hand tracking module and based on these values we are going to do some painting okay so now first of all we are going to import our images so that is the first thing we are going to do we will write here that our folder path is equals to header so these are the header images and we are going to say that my list is equals to os dot list directory list directory and then we will get the name of all our folder uh path files so this is the idea if we print this out if we print my list you will see that we get all these uh images name so here you can see one dot jpg two three and four so what we have to do is we have to import them so that we can use them later on to overlay on the top so here we are going to write for image path in my list my list we are going to loop and we are going to import so we will see that our image is equals to cv2 dot im read and we are going to read from a string and that will be the folder path slash the folder the image path so this will be our complete path that we need to read from and then we are going to store it in a list so let's call this list overlay list so this will have all the images that we want to overlay so we will write here overlay dot append and we want to append our image okay so this will overlay our images uh no it will import all our images now what we can do is we can print the length of our overlay list so we can see that whether we have imported all of them correctly or not so let's run that we have imported indeed four so that is good so far and next what can we do next next we can run our webcam and then once we have our webcam we can simply overlay one of these images by default so we can call our image let's say header and we are going to give it an initial value we will say that over list at zero so now this is our image so whenever we get our original image we are going to overlay this on top of that okay so this is good for now what we can do next is we can create our loop and we can run our webcam so here we are going to write cap is equals to cv2 dot video capture video capture and we will say that device number one now you should write device number zero because i have multiple cameras i'm using one then i will write cap dot set now this is important because we want the exact same size we want one two eight zero by seven twenty so we have to make sure that the width and the height are exactly the same so we are going to write one two eight zero by seven twenty then we are going to write here while true we want to run our webcam and we want to get our images so we will write here that the success and the image is equals to cap dot read and then we are going to um okay let's just display it first so cb2 dot weight key is one and we have cv2 dot i am show i will say image and image so that should be enough so let's run it we have an error oh i wrote i am read no no no i am show what happened there okay so there you go so this is our image and you can see my hand so what we will do we are going to overlay our image now so how can we do that so we can simply write that our image so you might think it's a little bit difficult but it's not it's very easy because our image is a matrix we just need to define where is the location of this new image so we will slice it so here we are going to say that our height is from 0 to 125 because we know that the image size is 125 and then the width we are going to say is from 0 to 1 2 8 0. so at this region we are going to say our image is equals to header and that's how simple it is so if we run this there you go so now you can see that we have our image and on top of that we have overlaid our first image now this is good because by default we want our first uh paint brush selected which is the pink one so we are already up to a good start so here we can write that we are setting the header image setting the header image okay so this is good and now we can just separate this because we are going to write our code over here so what else can we do now the next step is in before we go into the details of the project let's split it up into pieces so first of all we will import the image okay so that's the first step we have pretty much already done that uh there is another thing we need to do i will tell you later this is the first step let's write it here one then the second step is that we want to find the landmarks so find hand landmarks so this can be easily done with our hand tracking module so that should not be an issue then number three is basically checking which fingers are up so check which fingers are up now this is because we want to draw when one finger is up which is the index finger and we want to select only when two fingers are up this will allow us to easily move around the canvas without painting so when our two fingers are up it will not draw anything if we want to draw we have to put one finger up the index finger so this will allow us to easily navigate through the canvas then we have the selection mode so we will check here at the fourth stage we will say if selection mode which is when two fingers are up two fingers are up then we have to what do you call we have to select not draw then in the fifth one we have to check if we have the drawing mode mode when index finger is up so this is the idea now let's go on to the next parts actually do we have a next part no so this is pretty much it we have five different steps we are going to go step by step and see how we can achieve each one of these so the first thing is that for importing image it is pretty much done the only thing we have to do is we have to flip the image so we are going to flip horizontally this is because when you are drawing if you want to draw on the right side then when you move to the left it will draw on the right so to flip the image we are going to right here flip and we will write image and we want to flip in the first selection now this will allow us to solve that problem so let's try that so now the image is flipped so if i go on the right side it's going to the right if i go on the left it's going to the left so it will be easier to draw because it will be more intuitive now the second step here is to find the landmarks now the landmarks we can find but we need to import first so we are going to write here that our detector is equals to hand tracking module dots hand detector and then we are going to give in uh detection confidence so this is up to us what value do we want to use but we are going to keep a high confidence because we want it to be uh good in painting we don't want a lot of mistakes here and there so by default it's 0.5 we have changed it to 0.85 then over here we are going to say that our image is equals to detector dot find hands and we are going to send in our image so this will draw on our image and it will uh detect the hand so let's try that there you go so now we can see it's detecting properly so that is good the next step is to get all the landmark positions so we are going to write here landmark list is equals to detector dot find position and we will write image and we do not want to draw so draw is equals to false and then we can check if the length of our landmark list is not equals to zero not equals to zero then we are going to do something let's print the lm list so let's try that so let's see if it prints there you go so it's printing the landmarks if i move around if i go out of the image it does not print anything if i come back in it prints the what do you call landmarks so that is good so now that we have this we need to know the the tip points of our index finger and the middle finger so what we will do is we will call the index finger tip as x1 and y1 so we will say x1 and y1 is equals to lm list at point number eight so this is the landmark eight so here you can see this is the landmark eight and we have the value of 729 and 396 for example in this case so this is the tip of our index finger so here i can write tip tip of index and middle finger zoom okay so then uh as you can see it is basically the first and the second element not the zero one so here we have to write from one till the end so we will write it like this so it will grab only these two so we are unpacking them here so x1 and y1 so we will do the same thing here for the middle finger and for the middle finger we will write x2 and y2 and the number for middle finger is 12 so that should be good so now that we have our landmarks and everything is good now we need to check which finger is up so this thing we did in our one of our previous videos and that was i believe in the finger counting project so if you remember here if we go back here so here is the code that when it is thumb it will check if it is up or down then it will check the rest of the four fingers if they are up or down so and then it will give us the total finger count and it will also give us the list where is the list yeah the fingers is basically the list which will tell us which of these are up and which of these are down so what we can do is we can pretty much copy paste this part and we can create a method in our hand tracking module so earlier we did not do it as a method in our class but this time around because we are using it again and again it is better to create a class rather than putting it in different projects every time so this is our hand tracking module we are going to open that up and you can see right now we have only two methods find hands and we have find position so next we are going to put another method and that will be called fingers up so we will write here def fingers up so that will tell us which of the fingers are up so we will go to our finger counter project and from here we can copy so we have from fingers till this part i think that should be enough so we will copy that and here we are going to paste it so what we are doing is we are first creating a list called fingers and then we are basically checking if the tip of our thumb is on the right or on the left so that will tell us if it's open or closed and then for the fingers we are checking if the tip of the finger is above uh the other landmark which is two steps below it or it is not so if it is below that then it means it is closed if it's above that it means it is open so it is storing the value of 1 when it is open it is storing the value of 0 when it is closed but why are these errors over here these errors are here because it does not recognize tip ids and tip ids was something that we declared over here so this is the list so we can copy that and in our module because we know that it will not going to change the tip numbers the tip ids they are not going to change we are going to write it in our initialization so we will write here self dot tip ids is equals to this so now if we go back and instead of just using tips we use self.tips it should work so we will do that and there you go so that works but now the lm list it doesn't uh recognize it because we did not define it as a self as an instance so we need to write here self dot lm list and i can copy that self again and we can paste it here we can paste it here so we can paste it here we can paste it here and here and here so what happened now was that not only we are returning this lm list when we find the positions we are also storing it so that if we want it in our other functions or our other methods we can use it as well so we do not need to send it out and then receive it back again internally it is available for us to use okay so that should be good and if we just return these fingers up it should give us if the fingers are up or not which of them are up or not so we can just return the fingers and if we go back to our let's close the finger counter if we go to our virtual painter here we are going to ask we will say that detector dot fingers up fingers up and do we need to send something in no not really so that is good we don't need to send anything and we can simply receive the values in the variable or the list fingers so let's print that out so fingers and let's remove the previous prints so let's run this and see what happens so something is out oh it needs to be inside this my bad okay so this is our image i will bring in my hand oh nothing is happening why is that okay so it takes a little bit uh to detect now you can see uh how many fingers are there uh okay the issue is i will tell you what the issue is for the thumb it's showing one when it is open when it is closed and 0 when it's open because we are not checking the left and right so this is an issue that needs to be solved more precisely but for now we'll just change this what was it before it was greater than less than so by writing less than it will solve the issue for now but later on we do have to fix this so i'm not going to fix it now for this one but in this case you can only use your right hand to draw so if you use your left hand it will be an issue actually it won't be an issue because we are not using the thumb at all but it will give you wrong values so here you can see the thumb you can see two of them then three then four so you can see all of these fingers they are being detected properly so now if i do this i should be able to select any of these and then if i do if i do this i should be able to draw so this is the main idea so we need to check uh now we know that which fingers are up now we need to check if it's selection mode or it is drawing mode so how can we do that that is very simple let me bring that in because i will forget again okay so now we need to check if uh the second oh no the first elements so we have fingers at first and fingers at second if both of them are up this means if both of them are true then it means it is selection mode so we will print here say selection mode and if that is not the case we will write here then we will write here that this is equals to false we only have the index finger up then we will say drawing mode so this is the idea now what we do after that is a different story but for now we just want to check if it is able to understand this or not so let's try that we should remove the previous print let's remove that let's run this so this is selection mode because two fingers are up and now it's drawing mode so now it's working very well we can see when it is so when it is nothing it doesn't do anything when for detection you have to make sure a lot of the image or a lot of the hand is available in the image so yeah so this is drawing mode we can draw this is the selection mode and everything is good so far okay so the next step what we can do is we can uh change or we can draw some circle around it so here if it is selection mode let's draw a rectangle so we are going to write here cv2 dot rectangle and we are going to write their image and then we are going to write that we want from x1 to y1 minus let's say 15 we want to go above because we are creating uh let's say a rectangle using two points so we don't want to just give those points we want to make it a little bit higher and a little bit lower so then we are going to use x2 and y2 and this time we will make it plus 15. so then we will give it a color so let's say 255 0 255 this is purple and then we will say cb2 dot fill now we are drawing a rectangle here because we are going to draw a circle when it is time to draw so for the circle we are simply going to write here cb2 dot circle and we are going to give in our image and we are going to draw from x1 and y1 so we will write here x1 and y1 and then we are going to say 15 and the draw color again 2 5 5 0 2 5 5 and then we are going to write cb2 dot filled so this will be a visual indication of when we are in selection mode and when we are in drawing mode so for selection it will be rectangle for drawing it will be circle so let's try that out so here it is rectangle and then if we do this it becomes circle so here the detection is not that good but still you can see the rectangle is very small 15 is not a good value let's make it 25 and 25. yeah now it is better so here i can say it is selection mode and here i can say it is drawing mode so it will be easy for us to detect so my hand should be really back and then i should be able to draw easily but because i'm near to the computer so it's a little bit difficult to do it this way but it should work fine so far we have done well okay so we have the selection mode now and we have the drawing mode so let's work with the selection mode first what we have to do now is we have to check if we are at the top of the image now if we are at the top of the image we are going to change our uh image or selection mode based on the location so first of all we will check if y1 is less than 125 so this was the value of our header so if we are in the header then we are going to do something so we will say that if our value is between 250 if our x1 is between 250 and 450 then it means it is clicking the first one so here let me just write it here checking for the click so that is what we are doing here so we will write here that if this is the case then our header is equals to overlay list at zero so that's the first one so by default it is the first one because we are using it here the value of header is already where is it yeah it's already the first one then we are going to copy this and we are going to paste it down here and we will write that else if else if the value is between 550 and 750 so these values i have checked before so if it's not the same we can change a little bit so then it is 750 then it should be overlay list one and then we are going to copy this we will paste it here and then one more time so we have a total of four so here it will be 800 to 950 and this will be 1200 and one zero five zero so this will be two and this will be three okay so let's try that out so if we move around we should be able to select so if i go up you can see it selects if i go here it selects if i go here it's an x so wherever i go it selects the correct one and the visual indicator is quite nice so it tells us which one did we select and if i go with one finger it's not going to do it because this is the drawing mode in the selection mode we can select okay so that is the difference here so now we are able to select properly so that is quite good next we are going to change the color so whenever we select something we want to indicate that the color has changed so here for example for the drawing circle or even for the rectangle we want to change the color for it so what we will do is we will declare a color we will go up here and over here we are going to say now the first image is for purple so if we go to the header and we see the first image it is for purple or you can say pink whatever color you want to anyways so we are going to say that by default our draw color color is equals to purple so whenever the value is selected another value is selected we are going to change this draw color so and we will use it to actually draw so instead of putting a random color for the rectangle and for the circle this is the color that we are going to draw and let's draw it after we have uh checked for the click okay we will put it here is it in the if no it should be here okay so what can we do now is that we can change this color individually so here we will say that draw color is equals to for this case it should be purple so it should be two five five zero two five five and for the second one it is blue so it is b g r so this should be on and this should be off and then we have green so b g r so this should be 0 this should be 2 5 5 and this should be zero and then for the eraser we will just make it black so black will erase everything so we will make it zero zero zero two five five two five five two five five is one and zero zero zero is black so let's run that let's try this so here our color is purple and both of them are purple we go to selection mode and now it's changed to blue you can see for both of them it's changed to blue we go to green and now you can see it's changed to green right so and if we go to the eraser it's changed to black so we can erase it for the black actually we can make an exception for the black because for erasing normally the tool is quite bigger but again we will discuss that later so now this is done so the selection mode is done we are able to select our color and it changes the color that is perfect now what we need to do is we need to draw so we already know when we are in drawing mode we now need to draw based on our points okay so now the easiest method and this is one that i have been using before as well because it is very simple whenever you are learning a new concept you should go with the simple thing rather than the most advanced or complicated one so earlier what i did was i created using uh simple points so whenever you have a point you just draw that single circle and then whenever you move the finger you draw the circle there as well but the issue with that is whenever you have a rapid movement it will not draw continuously as a line it will have some gaps so that is not a good way to draw so what we will do now is instead of drawing just a circle we are going to draw lines but the issue with lines is that we need a starting point and we need an ending point so here whatever current position we are in we just have that single point so we need to know the previous point as well once we know those both points then we can simply draw a line so let's draw that so here we are going to write that our cv2 dot line and we are going to draw on our image and where exactly do we want to draw we are going to draw at our x previous position and x y previous position and then the new position which is x1 and y1 then we are going to say which color do we want to draw it with we are going to say the draw color and then we are going to give in the thickness so here we can declare a variable we can say it we can say that it is brush thickness and if we want to change it we can change it from the top here so here we can create our variables and we can write here that let's say our brush thickness is 15. so we can easily change it up and now we have the x and the y previous so we will go up here and we will declare that the x previous and the y previous are 0 and 0 and then we will go down here now the issue here is that at the very first iteration at the very first frame we will not have any xp and yp we will have the value of 0 0 so it will draw a line from 0 0 to whatever the point you are at and that will look really bad so this should not be the case so how can we fix it we can fix it by writing here that if our xp our xp is equals to 0 and our yb is equals to zero it means it is the very first frame that we have detected the hand or we are starting to draw then we are going to say that xp and yp is equals to x1 and y1 so instead of putting 0 0 we are saying whatever value it you are at draw exactly at the same point so instead of drawing a line it will just draw a point so the very first time we see our finger it will just draw a point instead of a line after that it will keep drawing as a line so whenever we have the new points we are going to say that our xp and yp is equals to our x1 and y1 so these are our previous points so it will keep updating those so hopefully that was clear now let's try to run it and see if it draws so here if i do this it is drawing but it is removing at the same time so as you can see it is drawing something but it is removing if i go really fast then you can see it is drawing so this means that our image is updating every iteration so we cannot draw on our actual image so if we want to do that we will have to uh do something else i will tell you what that something is later on but for now we need to create a new canvas on which we can draw so what we will do is we will go up here and at the very top we are going to create a new image and we are going to call it image canvas and this will be the canvas on which we will draw so we are using numpy to draw our canvas and we will use the zeros method and the size is 720 by 1280 they use heights before the width so we are writing it like this and it has three channels because we want colors and it is unsigned uh unsigned integer of 8 bits which means it will have 0 to 255 values so that is pretty much it and now instead of drawing on our original image we are going to draw on the canvas so where did it go here so we will copy this and we'll paste it here and we will say image canvas and we will show the image canvas as well canvas okay so let's try that so now we should have two images so this is the image canvas and this is the original image i will keep this at the top and if i if i bring in my hand and if i draw now you can see it draws and that is very very satisfactory okay so this is the idea but now the thing is it's not drawing here which is fine if you want to you can draw there as well and i will show you how to do that but before we go there let's try out different colors so that we know that it is working well so i will keep this in the front but it will be hard to see okay let's try to put it here on the side and i will try to run that okay so this is a selection mode i will select blue and now i will draw with blue then we have the uh green mode the selection mode and then i will draw with green now i want to show you something else if i don't put this part if i don't put the fist frame condition what it will do is it will draw a line from the very start so if i bring it here uh okay this is very annoying uh we need to fix this first okay let's say i am uh here is my hand and then i start drawing there you go so did you see it started from the zero zero point and it drew all the way to the current point so whenever you have uh the new points let's say i go to the blue one and i select that you will see that okay not now but in the previous one you saw that whenever you have the first first frame then it will create this problem so we are going to open that up so that we don't have this issue okay next we can also try the uh erasing parts so the black one should erase whatever we have drawn earlier but what we can do is we can make it a little bit bigger because if it is bigger it is easier to erase so let's first select so let's draw and now i will go to the eraser and i will come here and you can see it is erasing but this is not very good because it's very small so what we can do is we can have a special condition for the erasing so we can write here that if the draw color is equals to 0 0 0 then we will have the size different so here we will copy this part uh actually we will copy both of these and here we are going to write that instead of the brush thickness it is the eraser thickness so we will write here eraser and we will also write here that this is eraser thickness so we will go up and here we are going to write eraser thickness is equals to let's say 50. so it will be easier to erase and here we can simply write else we do this so that should be good so if we have the eraser tool now so let's draw now let's go to the eraser tool and it's a little bit hard for me because i'm sitting very next to the webcam and there you go so i can erase now very easily and all of this will get very very nice and very very good looking once we do it on the original image the only thing is that we have two images now and it is very hard to see what is happening so now all we need to do is we need to put it on the original image so we need to draw on the original image so how can we do that we cannot draw on the original image because it refreshes every time so instead of doing that we are going to add our two images so what we can do is we can go down here so here what we can do is we can write here that image is equals to cv2 dot uh add weighted and then we have to give in our image the first image and then we give in the value 0.5 let's say and then we give in the second image which is image canvas and then we given the value let's say 0.5 so this will add these two images and it will blend them so let's try that out oh there is a argument missing there you go so if i draw you can see now it is drawing again i need to go back a little bit okay so now you can see it is drawing when the hand is really at the back like this it will draw well but because i'm very close to the pc and the camera it is hard for me to do this i can go back a little bit and there you go but the issue here is that this does not look very good i can change these values but still it will be a blend it will not be an actual uh merging of the images it the colors will not be that bright there will be a transparency on it so if you don't want that what can you do now it is a little bit complicated but if you break it down it is very simple so what we will do is we will write here uh let's where is the statement right here okay so we need to go here and we will write here first of all we are going to create a gray image so we will write here image gray is equals to cv2 dot cvt color and we are going to say that image canvas and cv2 dot color underscore bgr to gray tr to gray so we are converting it into gray image you might be wondering wait why are we converting it to gray image i will explain in a bit then we are going to convert it uh into a binary image then we will write here image inverse is equals to now we are creating we are converting this into a binary image and we are also inversing it what does that mean i will explain as well cv2 dot threshold and image gray and we are going to write 50 and 255 and then cv2 dot threshold binary inverse so the idea basically is that we want let me actually run it so you can see better so let's say i draw something let's say i drew this and now i have this image so what i want to do i want to convert this image into black and white so wherever i have black i want it to be white and wherever i have this colored image i want it to be black so what this will do is it will create a mask with all this white and only this region as black and then i will go to this image and in this image i will make all of this black and then i will merge this image with this previous image with the black area so it will overlay these two so i know it sounds a little bit complicated but you will see how it works so we first of all we are creating that inverse image so that all that region where we drew uh it is black then we are going to write here that our image inverse is equal to cv2 dot cvt color and we are going to write that our image inverse is basically cv2 dot color color underscore gray to bgr now we are converting it back because we want to add it to our original image so we cannot add it if they are not the same dimension you cannot add a gray image to a colored image so we need to make sure both of them have three dimensions then we are going to do image is equals to cv2 dot bitwise and and we are going to add our original image with the inverse image image inverse and then we are going to add our image is equals to cv2 dot bitwise or and then we are going to write image and image canvas so again this might be a little bit confusing but let's see the results and i will go step by step and explain it as well so now here you can see i can draw oh it went down here you can see i can draw easily it's not updating this value it seems yeah okay it's not updating the value okay so this is drawing on the canvas now which is good but actually on the original image but the issue is that it's drawing this straight line whenever it is detecting again so what is happening is that whenever the hand is detected we should put this x p and y p as zero so we will copy this part and whenever the hand is detected then we will make it zero actually no whenever the hand is detected no no no whenever we start drawing again so yeah whenever we start drawing again or whenever we have a selection whenever we have a selection then we are going to do that okay let's try that yeah now it's starting from the right position instead of a random position there you go so that is good so let me explain what is happening at the back so first of all we have our gray image and then we are converting it into an image inverse so let me display that image inverse so this is our inverse image so wherever i draw something it is going to draw but with black area so let's try that so as you can see it will draw with that black area right so that is the idea then the next step is that we add these images so we are adding with and we are adding the image inverse and the image so let me show you how that looks like so i will remove this part and we will run it we will see here that when we draw you will see that now it is showing us black region wherever we drew so all we have to do now is we have to add this image to this image so when we add this because here we have colored part here it's black here we have colored part and here it's black so if we do an or operation between these two it will give us our final image so here we were doing an and operation with the original image and the image inverse here we will do an or operation to add these two up and there you go so now it's moving around and the flickering you can improve if you have a better detection if you don't have a lot of noise so again uh it cannot be completely uncontrolled environment yeah the environment should be a little bit controlled to have some good results so if i go back you can see i can draw a few things and then if i go to the eraser if i go to the eraser and i select the eraser it will rub quite a bit again my conditions are not very well but you can get the idea and if i want to increase the size of the brush or the paint or whatever i can change it from here i can make this 100 for the eraser thickness and it will erase better so here if i draw something and then i can do this and i can go to the eraser and then i can erase you can see how that's how simple that is hey everyone welcome to my channel in this video we are going to create an ai based mouse controller we will first detect the hand landmarks and then track and click based on these points we will also apply some smoothing techniques to make it more usable so here we are in our pycharm project and we have created it by the name ai virtual mouse so what we have here is the hand tracking module now if you have not been following we have written this module from scratch so from the very beginning from the very first project we have added a lot of different methods to this particular class so now the thing is that in our previous project we added the fingers up method and the fine distance method and this will allow us to very easily create this new project so we will have a look at that how we can do that and this file of course will be available online on my website so a lot of you ask how do you access the code on the website you have to log in and you have to enroll to get access and of course it is free just enroll and you will get the access now if you have not been following you have to go to file settings and you have to go to the interpreter and you have to add the open cv open cv python and we have to install it and then we also have to install media pipe through which we will get all this hand tracking functionality so media pipe and we are going to hit install okay so now both of these are installed and we can hit ok so the first thing we will do we will go and create a new python file and we will call it ai virtual mouse project okay so what we will do is we will first import cv2 then we are going to import numpy as np and then we will import our module which is hand hand tracking module as htm and then we are going to imports import time now apart from all of this that we have been doing earlier as well what we will do is we will also add a new library which will allow us to move around with our mouse so with the python script we will be able to move our mouse we will be able to click on it there are a lot of these that you can use the one we are going to use is called auto pi so we are going to hit install on that so an error occurred so let me check again if we can install it okay so he was giving an error earlier but then i clicked on it again and it installed fine so we can close this and we can go back and now we can also import auto pi okay so the first thing we will do is we will run our webcam to see everything is working so we will write here cap is equals to cv2 dot video capture and we are going to write that our video id is one now you will use zero if you have one camera i have multiple so i'm using one now the second thing is that we have to have a fixed width and height so we cannot leave it to the default of the camera so we need to change our width and height so we will write here cap dot set and the prop id for width is three then the prop id for height is 4 so we will make it 480. so that's how you can define the word and height actually what we can do is we can put them in variables because we need to use them later on as well so let's declare our variables over here so we are going to write here that our width of the cam and the height of the cam is equals to 640 by 480 and we can just input these values here so this is the height of the cam and then we can simply go let's remove this and then we can simply go and write while true we are going to say success and image is equals to cap dot read and we are going to get our frame value so once we have this frame value then we are going to say cb2 dot im show and we will say that our image and then i mg and then cb2.weight key and one so this is pretty much that we have been doing in all our projects so let's run this and see if it works there you go so now you can see my webcam and there is my hand so that is all good so next what we can do is we can add our detector for the hand tracking but actually let's discuss what are the steps that we are going to take today to create this project so the first step will be uh let's put some numbering as well so it is easier to remember so the first step will be to find the hand landmarks so that will be the first step then the second step will be to get the tip of the index and the middle finger so the idea is that if we have just the index finger then the mouse will move if we have the middle finger up as well then it will be in clicking mode so if it is in clicking mode and if the distance between the two fingers is less than a certain value then we will detect it as a click so you can bring your fingers together and it will click so in that mode we are not going to move the mouse but in the index mode where we are moving uh that is the only mode in which we will move so what we can do here is that our second what do you call our second step will be to get the tip tip of the index and middle fingers so once we have the tip of the index and middle finger what we will do is we will check which of these fingers are up so we will write here number three check which fingers are up then in the fourth step based on this information we will check if it is in moving mode so we will write here only index finger which means it is in moving mode so we will move our mouse and if it is in moving mode then we are going to then we are going to convert our coordinates the units now why do we need to convert because our webcam will give us a value of let's say 640 to 480. so for my screen i have a full hd which means 920 by 1080 so we need to convert these coordinates so that we get the correct positioning okay so then okay it's bringing it back we can change that later then we will add another step to actually smooth in the values so why do we need to do that so that it is not very uh jittery it doesn't flicker a lot so we will write here smoothen values so we will smoothen these values and once that is done we can simply move our mouse so move mouse okay so this might seem a lot but it these steps are very easy some of them are single lines some of them just two lines so don't worry about these then number eight will be to check if we are in clicking mode so when both index and middle fingers are up then it is clicking mode so once it is clicking mode we will find the distance between these fingers so we will find distance this tends between fingers and if the finger is if the distance is short then we are going to click so we will write here click mouse if distance short so these are the 10 steps we have to follow and the 11th and 12th step is fairly easy so the 11th step is the frame rate to see if we are getting a decent amount of frame rate and the 12th step is to display so we have already done this display thing so we do not need to do anything more in that now what we can do is we can go on to the frame rate the frame rate is very simple as well we have done this quite a lot of times by now so we will simply write time.time and then we are going to write fps is equals to 1 divided by c time which is the current time minus the previous time and then we will write that the previous time is equals to the current time and then we will write cb2 dot put text image then our string which will we will first convert it into integer and then we will write our fps and then we are going to write the position so we will write 10 and let's say 50 or let's say 20 and 50 then we will write cv2 dot font cv2 dot font plane and then let's say for the thickness or this is the scale let's put it as three and then we have okay we need to go back then we have the color two five five uh let's keep it blue yeah and then we will write three this is the thickness so if we run this we should have here p time p times equals to zero so if we run this we should have our frame rate there we go so that is quite good okay so next we have the frame rate we have the display now we are going to do the actual part of all of these steps so first of all we have to get the landmarks to get the landmarks we have to declare here the detector we have to create the object so detector is equals to hdm dot hand detector and inside that do we need to add something we can add for example the maximum hands because we are only expecting one hand so we can write here one and the rest we can keep same then here we are going to go down and we are going to write that our image image is equals to detector dot find hands not fingers up find hands find hands and then we are going to uh find the positions of these hands so we are going to write here lm list and the bounding box so this is something that we added in our previous project and we will write here detector detector dot uh find position i think the spellings detector find position and then we will write image so we are sending in our image and that should detect it and it should also draw so let's run this and see if it detects we have an issue uh finance oh we have to give the argument of image my bad okay so there you go so now we are detecting our bounding box and we are detecting the fingers and the landmarks as well so that is pretty good so we are done with step number one and now we will check that if if our length of our lm list is not equals to zero then we are going to get the tip info so we can actually put this up here so here we will write x1 and y1 so these are the points of the index finger so we are going to write lm list and we will write that it is point number eight and we want from we want the element number one and two so we will write it like this uh the same thing we will do for our second finger which is the middle finger we will write x2 and y2 and here instead of 8 we will write 12. so this will give us the coordinates of our index and middle fingers so we do not need to draw these at this point so we can just print them out if you want to see we can print x1 y1 and then x2 and y2 so we can print those and we are getting an error x1 y1 not enough values to unpack expect it to got one why is that lm list let me check here what is the issue this is find position yeah and this is lm list and bounding box yes so that should be fine okay let's print out the lm list first print lm list let's check that yeah we are getting some values and they seem fine uh oh okay my bad should be one one and colin okay there you go so now we are getting the points there you go so now we are getting all these points so for the index finger and the middle finger we move them around you can see the values they change okay so this is good we are done with our second parts now we will go on to the third part check which fingers are up now this is extremely simple because we have already created a method by the name fingers up all we have to do is we have to call it we will write here detector dot find uh not find fingers up and we will store it in fingers so we can push this in and let's print out so print fingers and we will remove the print from here so let's run that so there you go so all the fingers are up all of them are closed one two three four and five so we are getting these values so that is pretty good now let's go to the next step which is okay let me push those all of these in okay so step number four is only index finger moving mode so now we need to check if only the index finger is up so we will write here if if fingers at 1 which is the index finger is equals to one and fingers at two is equals to zero so this is when the index finger is up and the middle finger is down so this will be moving mode so now we need to move our finger uh now we need to check where our finger is moving so we get those points and we send it to the mouse coordinates okay so first of all what we will do is we will write here that we need to convert so here we are converting our coordinates so we will write here that x3 is equals to we will write np dot interp we are going to convert one range to another range so here we are converting the x one value and the initial range is basically from zero to the width of our webcam and then the second range is from 0 to the width of the screen but we didn't get the width of the screen now i know that my screen is this size but it could be different for yours so in order to get the exact value what we will do is we will go up here and we will write here that our width of the screen and the height of the screen is equals to auto pi dot screen dot size so this will give us the size of the screen so if we remove the print from here and if we remove this statement and all of this uh yeah so then we can print this print w screen and height screen okay let's run this and there you go so now you can see it's telling me it's 9 to 1920 by 1080 so this is the idea now that we have these values let's comment this now that we have these values we'll go back here and we are going to continue that it is from 0 to the width of the screen so the same thing we will do with the height we will copy this and we will write here y3 and then we will write here y1 and then we will write here height and then height so this is the idea so these are the points that now we have converted and now we will send this value to the mouse we will smoothen these values but we will do that later on first we need to see what is the original result and then we can convert it so here we are going to write auto pi dot move mouse dot move and then we are going to write that our x3 and our y3 are our coordinates so let's try this and see what happens so you can see my mouse here if i bring in my hand and this is my index finger and now you can see it is moving but the problem is when i'm going to the right it's going to the left so this is very annoying and it is very it's not intuitive so what we will do is we will flip it in order to flip it we just need to flip the width so what we will say is we will say that whatever the width of the screen is screen is minus this so now if i go to the right it should go to the right so the image here will be flipped but in reality i'm moving to the right and the mouse is also moving to the right now if i move to the left the mouse is also moving to the left so this should be easier to work with so that is good uh what we can do is we can draw a circle so that we know that we are what you call moving the mouse so here we can write cv2 dot circle and we will write image and then we are going to write x1 and y1 so we want to draw on that and let's say 15 is the radius and the color is purple and then we will write cb2 dot build cv2 dot filled there we go so let's run that yeah so now whenever we are in moving mode then it will show us this big circle so that we know that we are in moving mode okay so this is good so now one of the issues here is that when i am moving when i'm moving i can go up very easily it's not that bad uh it flickers at the top much more than in the center but i can go there but if i want to go down it's very hard because the hand is not detected properly again if i move down you can see it's not detecting property and i'm unable to go down so what we can do is we can set a region where we want to detect the movement instead of the whole image size we can set a particular range so how can we do that first of all let's create that range so we will write here cv2 dot rectangle and we will set in our image and then we are going to call this let's say frame so this will be a certain value for example 100 or 200 something like that so we will call it frame reduction and we will also again call it frame reduction so we can go up and we can declare it here frame reduction is equal to let's say 100 so we will write here this is basically frame reduction reduction okay so that should be good now once we have the frame reduction what we can do is now we need to give in the second value so this is the initial value now we need to give the diagonal points so we will write here that the width of the cam minus the frame r and then the second point will be the height of the camera minus frame r then we will given the color 255 0 and 255 and then we will give in the thickness so this will draw a rectangle so let's try that so whenever we are in it's not drawing anything oh it is yeah whenever we are in okay um okay maybe we nee we need to put this outside so we can put it outside here because we won't always want to see it whenever we have the hand in we want to see it so now you can see we have our box now the idea here is that when i reach the top of this rectangle it should be the mouse should be at the top of my screen and when i reach the bottom of this it should be at the bottom of the screen so and same thing for the corners if i am moving at this corner it should be at the corner but now you can see it is not at the corner so again you can adjust these values up and down we will keep it in the center for now but later on if you want to you can adjust so how can you reflect this on our x3 and y3 so how can you change these values so all you have to do is it's very simple instead of 0 you will write here frame r you'll write here frame r and here you are going to write width of the cam minus frame r and height of the cam minus frame r that's it so now your values should reflect properly so here if i have my finger at the top right corner you can see it reaches the top right corner if i have it on the other side you can see it's reaching the top and now if i go back and i go down you can see it reaches down so we are having some issues as well it's going out of bounds uh we can fix those issues later uh what we can do is we can push this up as well a little bit so that it is easier for our fingers to move around but we can do that later we can move on to the next step which is to detect the click okay so then we are going to detect the click so here we have to check if both the index and middle fingers are up so we are going to copy this part and we are going to paste it here and we will write one and one so if both of them are up then we need to find the length of uh between our fingers so what we will do here is we will write that our detector dot find distance dot find distance between which points point number eight and point number twelve so uh these are the landmark ids so landmark eight and landmark 12 and then we will write image then it will unpack the values of length and then the image and then the what did i do here it should be comma and then we also get a bunch of info that we are going to ignore so the main thing that we need is the length so we need to know what is the distance between these two fingers so what we can do is we can write here print length and let's try it so when we are in our detection mode it is giving us the length and it is telling us uh there is a good indication because it actually gives a center point as well and it draws a line in between so that is pretty good okay so what we can do next is we can check that if the length is below a certain value then we will detect it as a click but we need to define that threshold so we are going to go back and let's try it out so here it should be open here it should be closed so i can see it's around 30 something so if it's less than 40 maybe yeah okay so we can say if it's around less than 40 then it is detected as a click uh you can do a normalization here as well but that will be quite a detail so we are not going to go into that so we will write here length is less than 40 then we are going to cv2 dot circle we are going to draw the same circle that we had drawn here but this time we are going to draw it in green so we have the detection that it has been clicked so let's try that so here there you go um we could draw it to the center one as well it will look better okay how can we do that basically this is the information we are getting for the line so we can write here info line or we can write line info and then based on this line info if we go to our fine distance you can see cx and cy are the last elements so this is the fourth and this is the fifth so we will write here this is the fourth and this here is the fifth push it down okay so let's run this and hopefully this time the center one will be green there you go so now it looks a little bit better so that's good okay so what is next now we actually need to click so rather than just changing it to green we need to click and the clicking part is way easier than you think uh and that is auto pi dot mouse dot click and that's it so now it should click uh by the way these two we are already uh doing so we should write here that we are checking the distance here and then we are clicking the mouse if the distance is short over here so that's the idea okay so let's try it so what i will do is i will try to click and minimize this this part here so here is my finger and if i move around and i click you can see it's shaking a lot yeah it clicked there you go it click again but as you can see it shakes a lot so this is a very big problem which is not allowing us to use this properly so what can we do as i mentioned before if we go up here we can smoothen the values so how can we smoothen the values so what we can do is instead of sending in exactly the same value that for example if it goes from 0 to 100 instead of saying go to 100 directly we will dilute it a little bit so we will smoothen it we will reduce its value so it goes step by step so what we can do is first of all we are going to create a value called smooth the ning is equals to let's say five so this is a random value that i've chosen uh later on we can see what is the effect okay so now what we will do is we will uh we need to also create two variables so what we will do is we will write here in fact we need to create more than two variables so we should separate the variable declarations here yeah that should be fine okay so what we will do is we will say that our previous location we will call it previous location of x and the previous location of y is equals to 0 0 and the current location of x location of x and the current location of y so again these will be 0 and 0. so what we will do is what did i do here so what we will do is we will use these values and we will update them each iteration to smoothen our mouse so here we are going to go here now instead of x3 and y3 we are going to send in the smoothened values of current location and we will update our previous location so how can we do that we will write here that our current location of x is equals to our previous location of x plus our x3 minus our previous location of x divided by the smoothing value so whatever the value is we will divide it by that and the same thing we will do with our y value so we can write here y you can multiply with this as well you can multiply smoothing as well then you will have to go into points so 0.1 0.2 or you can divide and keep it whole numbers it's up to you so we'll write here y3 and then we will write y and then we will write that's it okay so then we will just send in our x value and y value instead of x3 and y3 and then we will just update these values once we have uh use them so we will write here previous location x and previous location y is equals to current location x okay let's put y first current location uh current location x okay so that is the idea now uh let's put a very dramatic value let's say 20 and let's run it so now you will see if i move it around you see it is quite smooth but it is quite slow so we need to find you see when i stop it takes a while to stop so what we need to do is we need to find a good balance so let's try five so i like this it moves nicely and it stops it doesn't shake a lot there you go i can click as well there you go and let's click on the minimize there you go so yeah that looks good uh let's try 10. uh 10 is good but it's a little bit slower yeah it's hard to stop at that point yeah 10 is a little bit fast it is a little bit slower so maybe seven yeah this one is better there you go i can do that i can go to this one i can click on this one and this one again there you go so it is pretty good so that is quite nice so that is pretty much it as you can see it works quite well and we broke it down into different steps and when you go and try to solve each of these steps it becomes very easy to get a solution and all of this is possible thanks to our hand tracking module that we created earlier if we don't do that then it will be quite difficult and it will take quite a lot of time to actually create such a project but as you can see it was quite easy and quite simple what we achieved in this short amount of time so this is it for today's video i hope you have learned something new if you like the video give it a thumbs up and don't forget to subscribe and i will see you in the next one you\n"

Advanced Computer Vision with Python - Full Course

Random Videos