Create Your 'Small' Action Model with GPT-4o

**Introduction to GP4 OAT**

The GP4 OAT (Open-Source Action Teaching) is an AI-based framework designed to generate Python code that recreates users' actions from step-by-step plans. This technology uses computer vision and machine learning algorithms to analyze user interactions with a computer or PC, allowing for the creation of customizable code that can perform specific tasks.

**Functionality of GP4 OAT**

The GP4 OAT function has several key components, including OS controls, which enable users to control their computer's operating system. The function also utilizes keyboard and mouse controls, allowing users to interact with their PC in a natural way. One of the unique features of this technology is its ability to generate code that can recreate complex user actions, making it an attractive tool for software developers and testers.

**Setting Parameters for GP4 OAT**

When setting parameters for the GP4 OAT function, users can customize the framework to suit their specific needs. This includes adjusting settings such as sleep time, which allows the system to pause between actions, giving users time to complete tasks. The framework also has an image analysis component, which enables it to analyze screenshots and identify key points in user interactions.

**Cleaning Generated Code**

One of the significant benefits of using GP4 OAT is its ability to clean generated code, removing any unnecessary or redundant elements. This feature is particularly useful for developers who need to maintain and refine their code over time. The framework's automated cleaning function ensures that users receive high-quality, optimized code that meets their specific requirements.

**Example Usage**

To demonstrate the capabilities of GP4 OAT, an example usage scenario was created. A user set up a step-by-step plan using the framework, which included actions such as opening Chrome, searching for a music video on YouTube, and clicking on the desired result. The system successfully recreated this plan in Python code, allowing users to verify that their actions had been accurately captured.

**Clean Output**

The generated Python code was then cleaned by the GP4 OAT function, removing any unnecessary elements. This output was presented in a readable format, making it easy for developers to work with and refine their code. The final result was an optimized codebase that could be used as a starting point for further development or testing.

**Community Engagement**

The author of the article emphasized the importance of community engagement in developing and improving the GP4 OAT framework. By becoming a member of the channel, users can access additional resources, including GitHub repositories and Discord discussions. The author invited readers to share their thoughts on the framework's capabilities and potential applications, encouraging collaboration and innovation.

**Future Development**

The author hinted at the potential for future development in the GP4 OAT framework, suggesting that integrating voice commands and local wish models could enhance its capabilities. By exploring these possibilities, developers may unlock new levels of automation and productivity for their users.

"WEBVTTKind: captionsLanguage: entoday I thought I could share a project I have been thinking about for a while now so basically I call it just a small action model uh thinking about these large action models that we heard around this rabbit and all of that mess right and basically since gp40 released the API I wanted to kind of test out the new vision capabilities by trying to implement this so I divided this into kind of two phases as you can see here we have the recording phase and this is uh we record screen shots while uh I do some actions on my computer so I set this to two frames per second that should be enough right and all of these screenshots will be saved to yeah just a folder and I kind of set the duration here as I will show you in the code and the next part is kind of what I call the execute phase this is where we can actually take advantage of the screenshots so we run them to the wish model with GPD 40 and we're going to analyze and we're going to try to understand the sequential order of the actions the user took right and we're going to feed that into gbd4 again we're going to generate a step-by-step plan to recreate the user's action and this plan is from this plan we're going to generate a code that again can try to recreate these user actions and then we just going to execute the code so the code gets saved into its own uh file name right so we can save it if you want to and re create these small actions so basically I just think we're going to go over to the code and I'm going to show you kind of go through it how I thought about this you got to do some few tests and yeah I thought it was pretty interesting and it's not going to be the longest video but uh yeah hope you find it interesting let's walk over through some of the function we actually need to make this happen so you can see the first function I want to highlight is kind of the screenshot function very simple setup we set the intervals remember I said two frames per second so this is kind of if I put one here that's one frames every second for 15 seconds and if I put 0.5 that is of course going to be two frames per second for 15 seconds so that means we have this 15c window to kind of record our actions right and that is just going to be saved to uh yeah I can just show you here so you can see this is my folder here we have 21 screenshots of some actions I have been taking uh but we get back to that we're going to this to Bas 64 and then we come into kind of analyze uh image function this is going to look at the folder I just showed you and use actually the GPT 40 model to uh analyze just what is happening in the image so we just going to do it as simple as that uh but remember we are going to go through every single one of these images to get kind of the full base 64 encoding so we can kind of put everything together to find out the sequential IAL order of this right uh so we want to kind of want to sort this and but I'm not going to spend too much time going through that and the next part is the generate step-by-step plan function of course we want to to generate a stepbystep plan to generate these actions and this is going to be based on the results we get right from the analysis from the sequential order of the images uh I also added a function to clean generated code because we all always get this weird stuff so I just wanted to remove this when we actually just so it's easier to save the code we get get to recreate our actions right we have a very simple GPT 40 shat function here uh so you can see you're professional software Deb with expertise in Python your task is gener python code to recreate the users's action from a step byep plan generate code uses uh OS controls the computer mouse keyboard uh you have contr user PC uh I don't want to use like selenium and these lips for the browser if you want to go into that because we have the option to kind of go to our Chrome right so you'll kind of see that action I just set some parameters here for our uh gp4 oat function and the main part is basically pretty easy we add some sleep at the beginning of the so we get time to kind of get ready to record our actions we're going to take the screenshots we're going to save them to a folder called training analyze the images print the analysis go into the step-by-step plan feed in the analysis results as args and print the plan then we're going to generate uh our python code using the gp4 oat function so you can see we feed in the plan here from yeah generate plan part from the US ACC a python code that recreates the same actions and I just uh yeah I don't want any extra only want the code so that seems to work pretty good and we kind of print out the yeah you can see we run the code through our clean code function and just print out a code and save it to action. pyite and that is basically it uh I found out that this is working pretty good so now let's just do a few examples and I kind of wanted to leave this code already open so people can work on this to kind of build on it right that thought I could be pretty interesting and if you want just access to this uh just become a member of the channel we had a few discussions today on the Discord about yeah around this capturing images video screenshots so a lot of good value in the community Discord that you will also get access to if you become a channel member uh additionally to the GitHub Community but yeah now let's do some different tests with this and see if it works okay so when we run this now of course I'm just going to do yeah some pretty simple stuff remember I set this to 15 seconds so we don't have all of that time so yeah I'm just going to run this just close this and just start doing something so let's go to the start menu let's do Chrome maybe okay let's go to google.com let's find a music video so Taylor Swift maybe down bad okay let's click on this okay better stop that and hopefully now this is going to create a code that can recreate the steps we just did so if we go back here and let's just let let this run for a while when we can actually see the plan and stuff now okay so here you can see we analyzed all of the images and you can kind of see went to Google typed in tayor Swift kind of down bad clicked on it got to the YouTube video here right and here we have the generated python code so you can see that we can also open the code here let's reload it so you can see it is in uh yeah kind of um good format here now let's try to run this okay so let's run this so you can see yeah I'm going to put my hands up here so the first step was to go to Google to go to Chrome right then it was google.com perfect then we put in our name of our YouTube video so Taylor with down bad and the last part it has to be to click on this video yes perfect so I guess this was a good pass and it kind of recreated all the steps we did in our recording phase right so yeah pretty cool so yeah that is what I wanted to share today I'm just going to leave this totally open if you want to try it out I haven't really explored too much what actually the capabilities of this are I'm sure there are ton of improvements you can do but yeah if if you want to try it out yeah like I said become a member of the channel and I will put this out there and be cool if you can report back in the comments if you have any ideas around this what we can use it for but anyway that was my small action model uh yeah kind of fun project to be honest and it's a framework that we we can build on when we get access to the voices and maybe some local wish models that can do this so that's going to be quite cool but anyway hopefully see you on Sunday have a great day and yeah bye-byetoday I thought I could share a project I have been thinking about for a while now so basically I call it just a small action model uh thinking about these large action models that we heard around this rabbit and all of that mess right and basically since gp40 released the API I wanted to kind of test out the new vision capabilities by trying to implement this so I divided this into kind of two phases as you can see here we have the recording phase and this is uh we record screen shots while uh I do some actions on my computer so I set this to two frames per second that should be enough right and all of these screenshots will be saved to yeah just a folder and I kind of set the duration here as I will show you in the code and the next part is kind of what I call the execute phase this is where we can actually take advantage of the screenshots so we run them to the wish model with GPD 40 and we're going to analyze and we're going to try to understand the sequential order of the actions the user took right and we're going to feed that into gbd4 again we're going to generate a step-by-step plan to recreate the user's action and this plan is from this plan we're going to generate a code that again can try to recreate these user actions and then we just going to execute the code so the code gets saved into its own uh file name right so we can save it if you want to and re create these small actions so basically I just think we're going to go over to the code and I'm going to show you kind of go through it how I thought about this you got to do some few tests and yeah I thought it was pretty interesting and it's not going to be the longest video but uh yeah hope you find it interesting let's walk over through some of the function we actually need to make this happen so you can see the first function I want to highlight is kind of the screenshot function very simple setup we set the intervals remember I said two frames per second so this is kind of if I put one here that's one frames every second for 15 seconds and if I put 0.5 that is of course going to be two frames per second for 15 seconds so that means we have this 15c window to kind of record our actions right and that is just going to be saved to uh yeah I can just show you here so you can see this is my folder here we have 21 screenshots of some actions I have been taking uh but we get back to that we're going to this to Bas 64 and then we come into kind of analyze uh image function this is going to look at the folder I just showed you and use actually the GPT 40 model to uh analyze just what is happening in the image so we just going to do it as simple as that uh but remember we are going to go through every single one of these images to get kind of the full base 64 encoding so we can kind of put everything together to find out the sequential IAL order of this right uh so we want to kind of want to sort this and but I'm not going to spend too much time going through that and the next part is the generate step-by-step plan function of course we want to to generate a stepbystep plan to generate these actions and this is going to be based on the results we get right from the analysis from the sequential order of the images uh I also added a function to clean generated code because we all always get this weird stuff so I just wanted to remove this when we actually just so it's easier to save the code we get get to recreate our actions right we have a very simple GPT 40 shat function here uh so you can see you're professional software Deb with expertise in Python your task is gener python code to recreate the users's action from a step byep plan generate code uses uh OS controls the computer mouse keyboard uh you have contr user PC uh I don't want to use like selenium and these lips for the browser if you want to go into that because we have the option to kind of go to our Chrome right so you'll kind of see that action I just set some parameters here for our uh gp4 oat function and the main part is basically pretty easy we add some sleep at the beginning of the so we get time to kind of get ready to record our actions we're going to take the screenshots we're going to save them to a folder called training analyze the images print the analysis go into the step-by-step plan feed in the analysis results as args and print the plan then we're going to generate uh our python code using the gp4 oat function so you can see we feed in the plan here from yeah generate plan part from the US ACC a python code that recreates the same actions and I just uh yeah I don't want any extra only want the code so that seems to work pretty good and we kind of print out the yeah you can see we run the code through our clean code function and just print out a code and save it to action. pyite and that is basically it uh I found out that this is working pretty good so now let's just do a few examples and I kind of wanted to leave this code already open so people can work on this to kind of build on it right that thought I could be pretty interesting and if you want just access to this uh just become a member of the channel we had a few discussions today on the Discord about yeah around this capturing images video screenshots so a lot of good value in the community Discord that you will also get access to if you become a channel member uh additionally to the GitHub Community but yeah now let's do some different tests with this and see if it works okay so when we run this now of course I'm just going to do yeah some pretty simple stuff remember I set this to 15 seconds so we don't have all of that time so yeah I'm just going to run this just close this and just start doing something so let's go to the start menu let's do Chrome maybe okay let's go to google.com let's find a music video so Taylor Swift maybe down bad okay let's click on this okay better stop that and hopefully now this is going to create a code that can recreate the steps we just did so if we go back here and let's just let let this run for a while when we can actually see the plan and stuff now okay so here you can see we analyzed all of the images and you can kind of see went to Google typed in tayor Swift kind of down bad clicked on it got to the YouTube video here right and here we have the generated python code so you can see that we can also open the code here let's reload it so you can see it is in uh yeah kind of um good format here now let's try to run this okay so let's run this so you can see yeah I'm going to put my hands up here so the first step was to go to Google to go to Chrome right then it was google.com perfect then we put in our name of our YouTube video so Taylor with down bad and the last part it has to be to click on this video yes perfect so I guess this was a good pass and it kind of recreated all the steps we did in our recording phase right so yeah pretty cool so yeah that is what I wanted to share today I'm just going to leave this totally open if you want to try it out I haven't really explored too much what actually the capabilities of this are I'm sure there are ton of improvements you can do but yeah if if you want to try it out yeah like I said become a member of the channel and I will put this out there and be cool if you can report back in the comments if you have any ideas around this what we can use it for but anyway that was my small action model uh yeah kind of fun project to be honest and it's a framework that we we can build on when we get access to the voices and maybe some local wish models that can do this so that's going to be quite cool but anyway hopefully see you on Sunday have a great day and yeah bye-bye\n"

Create Your 'Small' Action Model with GPT-4o

Random Videos