Autonomous Synthetic Images with GPT Vision API + Dall-E 3 API Loop - WOW!

The GPT Vision Preview: A Journey of Synthetic Image Creation

As I sat down to work on this project, I decided to put it down to five seconds and think that would be good enough. However, I soon realized that there was something about the system that prevented me from running the preview too many times. This was due to a rate limit on the GPT Vision preview, which ensured that I couldn't run it excessively without consequences.

In order to proceed, I had to set up my reference image path. I went to Google and searched for famous images, eventually finding an Evo Gyma race flag image that I deemed suitable for use as a reference. I created a folder with the image, titled it "ref image," and was ready to move forward.

I decided to run the preview with Python 3.9. The script started executing, and I watched as the synthetic images began to appear in my folder. I stopped the process after running for five iterations, feeling that it had reached a satisfactory point. Upon examining the first synthetic image, I was pleased to see that it had turned out well, but I couldn't help thinking that the reference image was too famous and might detract from the overall effect.

I decided to try again with an alternative reference image, this time choosing a Breaking Bad Walter White profile picture. The script began running once more, and I watched as the synthetic images evolved over time. To my surprise, the result turned out to be quite impressive, with the gas mask transforming into various forms throughout the process.

The Evolution Process: A Journey Through Steampunk

As the script continued to run, it introduced elements of steampunk into the image, adding a unique twist to the character's appearance. I was struck by how seamlessly this transition had occurred, and how the final product had become a striking representation of Walter White's alter ego.

The Evolution Process: A Journey Through Retro Computing

Next, I decided to run another script that utilized an existing image from my collection. This time, I chose an illustration from the 1990s depicting a computer setup with a Python snake. The result was nothing short of remarkable, as the synthetic image evolved into a vibrant and colorful representation of retro computing.

The Limitations of the System

While I was pleased with the results, I couldn't help but acknowledge that there were some limitations to the system. In particular, the recognition of certain images was inconsistent, with some being identified correctly while others were not recognized at all.

Despite these limitations, I felt that the results had been impressive, and I was eager to continue experimenting with the script. By pushing the boundaries of what is possible with synthetic image creation, we can unlock new possibilities for artistic expression and innovation.

Conclusion

As I concluded this project, I couldn't help but feel a sense of accomplishment and excitement for the future possibilities that lay ahead. The GPT Vision preview had proven to be a powerful tool in its own right, capable of generating stunning synthetic images with relative ease.

In the coming days and weeks, I plan to continue refining the script and exploring new ideas for artistic expression. By doing so, we can unlock new levels of creativity and innovation in our field. And if you're interested in supporting my efforts, I invite you to become a member of my Patreon page, where you'll gain access to exclusive content, including my upcoming scripts and projects.

In the meantime, I bid you farewell, but not before leaving you with a link to my GitHub repository, where you can find the script used for this project. Stay tuned for future updates and adventures in synthetic image creation.

"WEBVTTKind: captionsLanguage: enin today's video we are going to take a look at the project I created yesterday that is basically we tried to combine the new GPT 4 wish API with the dolly3 API so basically what we want to do is to describe a reference image then try to either create a synthetic version of it or evolve it as you can see in the background here so I thought it was pretty cool and quite easy to set up so let's take a look let's start by looking at the flow shart for this system so you can see the first thing we need is a reference image this image is going to be fed into the GPT Vision API and from that we will generate a description and when we have that description we can actually feed that into the dly 3 API ASR prompt and from that description hopefully we will get a like a synthetic version of the image we describe with GPT Vis that was our reference image The Next Step then is going to be to take that original reference image and compare it with the synthetic version and try to use GPT Vision again to compare them and improve the prompt so we can feed that prompt back into dly tree with hopefully an improved prompt and from that improved prompt again we get a new synthetic image and so goes the loop so I have created like a 10 iteration Loop that means we will get 10 synthetic images and that is kind of my first version of this I also created an evolution version of this so it's basically the same only on the second Loop we instead of comparing the synthetic image to the reference image we kind of just compare the two synthetic images and from that we're going to generate a new prompt but for each prompt we're going to add a new style to the image so that means that we get like this evolution of each image and these are like going to be fed back in so we will kind of evolve from the reference image to yeah you will see a whole new style but it also goes back to the reference image but just different styles again we can run this in a 10 times Loop to get 10 images so yeah but now let's take a quick look at the python code for this before we run it let's take a look at some of the functions we have in this system so the first one is just a vision API describe image this is using the gp4 vision preview model uh yes we're going to take an image as an input here is kind of the prompt I created for this so describe the image in detail colors features team style Etc uh yeah we set the token to 300 Max I don't think we need more than that and this is going to return our description text that is that we are going to use going forward right uh next up we have just the doly generate image function pretty standard doly Tre model uh here you can see we kind of feed the description in here as a prompt 1024 * 1024 and we just going to return one image uh yeah and next up we have the vision API compare and describe so this is a bit different uh we used the same gp4 vision preview model but we take in you can see we take in the reference image and we take in the new created synthetic image from doly tree and the prompt here is describe both images in detail then compare them finally create a new and improved description prompt to match the reference images uh as close as possible reference image okay so let's save that and yeah that is basically it and this is going to return an improved description text and then we have basically a for Loop here that is going to do 10 iterations so this is just going to do what we described in the flowart it's going to take in like the newly created synthetic image it's going to look at the reference image and try to improve it I also put in like a sleep timer here I think I'm going to put it down to five seconds so I think that should be good yeah we have some kind of rate limit on the on the GPT Vision preview so we can't run it too many times so this is going to return descriptions and our synthetic image URL so we set our reference image path right this is basically our reference image so I just went to Google I just searched for famous images I found this Evo yima race flag image I put this in my folder and I call it like ref image right and then we can run this so that I think we're just going to run it now and see if we can create a synthetic version of the Evo gima famous image okay so let's just go python IL Loop 2. Pi so I'm going to leave the folder up here so you can kind of see the imagees popping in here as we go so I'm just going to start this I think we're going to do like five images and let's take a look Okay so I just stopped it here because I don't think it's going to get much better so if you take a look at our reference image right this one and let's take a look at the first synthetic image yeah pretty good right you can clearly see uh but we got to take into consideration that this is a very famous image uh I think this looks even better just compare these two so if you look at kind of the bottom here I think this one is much better so if it you can kind of see this the bottom here looks much better on this one yeah I think this looks great so yeah I got to say mission complete uh so I just stopped it I don't think we're going to get any more Improvement than this uh but now let's try to switch to the evolution version and let's try to evolve let's find other reference image and try to evolve it then maybe we can go back and try to create a more unknown image okay so the profile image I picked out was this Breaking Bad Walter White image I don't even know if we can run this because of like copyright and stuff but uh yeah let's try it wow that was so cool so let's take a look here so you see this is kind of the image we started with right then we evolve to this pretty cool it kind of changed the gas mask over to this part right and then we went to this to this this looks awesome I love this style right and to this I thought this one was very cool but then we started to get weird here so we added some kind of Steampunk think I think it already added steampunk but then we ended up with this so we kind of went from theall Walter White gas mask image to this but some of the images in between it just looks badass right this one and this one very cool I was so happy with this uh so I want to do one more of this Evolution style images Okay so let's try actually an image I have created before with doly 3 so this is just a retro 90s illustration of a computer setup with kind this python snake that kind of represents the programming language python so let's run this and see where this takes us I just ended it here and I think this turned out pretty cool you can clearly see we evolved this so if we start looking at the reference image we went to this original reference image so this is supposed to be like a copy pretty good then we evolve to this this and we see I kind of think this was very cute I love the mechanical keyboard style here and we just kept involving on that we ended up with some kind of looks like some kind of music stuff here we have like a keyboard but it's actually like a a musical keyboard and then we turn into I don't know even what this is and this and we ended up with this so we went from this to this h pretty cool some kind of old analog measuring devices or something pretty special right uh but again I think this was pretty cool yeah I think we're just going to call that uh I think this works to some degree I think there's a lot of improvement we can do with the prompts and stuff and there was some bugs with that they didn't recognize the image and stuff I can see in the code but uh yeah for our first try I'm very happy with this uh I'm going to be uploading this uh code to my GitHub so if you want to support me just become a member you can just go for the lowest tier and you will get access to the GitHub where I will be posting this script and future script I have a lot of good cool ideas so stay tuned for that uh I'm going to leave a link in the description uh but other than that thank you for tuning in have a great day and I'll see you again very soon with another cool projectin today's video we are going to take a look at the project I created yesterday that is basically we tried to combine the new GPT 4 wish API with the dolly3 API so basically what we want to do is to describe a reference image then try to either create a synthetic version of it or evolve it as you can see in the background here so I thought it was pretty cool and quite easy to set up so let's take a look let's start by looking at the flow shart for this system so you can see the first thing we need is a reference image this image is going to be fed into the GPT Vision API and from that we will generate a description and when we have that description we can actually feed that into the dly 3 API ASR prompt and from that description hopefully we will get a like a synthetic version of the image we describe with GPT Vis that was our reference image The Next Step then is going to be to take that original reference image and compare it with the synthetic version and try to use GPT Vision again to compare them and improve the prompt so we can feed that prompt back into dly tree with hopefully an improved prompt and from that improved prompt again we get a new synthetic image and so goes the loop so I have created like a 10 iteration Loop that means we will get 10 synthetic images and that is kind of my first version of this I also created an evolution version of this so it's basically the same only on the second Loop we instead of comparing the synthetic image to the reference image we kind of just compare the two synthetic images and from that we're going to generate a new prompt but for each prompt we're going to add a new style to the image so that means that we get like this evolution of each image and these are like going to be fed back in so we will kind of evolve from the reference image to yeah you will see a whole new style but it also goes back to the reference image but just different styles again we can run this in a 10 times Loop to get 10 images so yeah but now let's take a quick look at the python code for this before we run it let's take a look at some of the functions we have in this system so the first one is just a vision API describe image this is using the gp4 vision preview model uh yes we're going to take an image as an input here is kind of the prompt I created for this so describe the image in detail colors features team style Etc uh yeah we set the token to 300 Max I don't think we need more than that and this is going to return our description text that is that we are going to use going forward right uh next up we have just the doly generate image function pretty standard doly Tre model uh here you can see we kind of feed the description in here as a prompt 1024 * 1024 and we just going to return one image uh yeah and next up we have the vision API compare and describe so this is a bit different uh we used the same gp4 vision preview model but we take in you can see we take in the reference image and we take in the new created synthetic image from doly tree and the prompt here is describe both images in detail then compare them finally create a new and improved description prompt to match the reference images uh as close as possible reference image okay so let's save that and yeah that is basically it and this is going to return an improved description text and then we have basically a for Loop here that is going to do 10 iterations so this is just going to do what we described in the flowart it's going to take in like the newly created synthetic image it's going to look at the reference image and try to improve it I also put in like a sleep timer here I think I'm going to put it down to five seconds so I think that should be good yeah we have some kind of rate limit on the on the GPT Vision preview so we can't run it too many times so this is going to return descriptions and our synthetic image URL so we set our reference image path right this is basically our reference image so I just went to Google I just searched for famous images I found this Evo yima race flag image I put this in my folder and I call it like ref image right and then we can run this so that I think we're just going to run it now and see if we can create a synthetic version of the Evo gima famous image okay so let's just go python IL Loop 2. Pi so I'm going to leave the folder up here so you can kind of see the imagees popping in here as we go so I'm just going to start this I think we're going to do like five images and let's take a look Okay so I just stopped it here because I don't think it's going to get much better so if you take a look at our reference image right this one and let's take a look at the first synthetic image yeah pretty good right you can clearly see uh but we got to take into consideration that this is a very famous image uh I think this looks even better just compare these two so if you look at kind of the bottom here I think this one is much better so if it you can kind of see this the bottom here looks much better on this one yeah I think this looks great so yeah I got to say mission complete uh so I just stopped it I don't think we're going to get any more Improvement than this uh but now let's try to switch to the evolution version and let's try to evolve let's find other reference image and try to evolve it then maybe we can go back and try to create a more unknown image okay so the profile image I picked out was this Breaking Bad Walter White image I don't even know if we can run this because of like copyright and stuff but uh yeah let's try it wow that was so cool so let's take a look here so you see this is kind of the image we started with right then we evolve to this pretty cool it kind of changed the gas mask over to this part right and then we went to this to this this looks awesome I love this style right and to this I thought this one was very cool but then we started to get weird here so we added some kind of Steampunk think I think it already added steampunk but then we ended up with this so we kind of went from theall Walter White gas mask image to this but some of the images in between it just looks badass right this one and this one very cool I was so happy with this uh so I want to do one more of this Evolution style images Okay so let's try actually an image I have created before with doly 3 so this is just a retro 90s illustration of a computer setup with kind this python snake that kind of represents the programming language python so let's run this and see where this takes us I just ended it here and I think this turned out pretty cool you can clearly see we evolved this so if we start looking at the reference image we went to this original reference image so this is supposed to be like a copy pretty good then we evolve to this this and we see I kind of think this was very cute I love the mechanical keyboard style here and we just kept involving on that we ended up with some kind of looks like some kind of music stuff here we have like a keyboard but it's actually like a a musical keyboard and then we turn into I don't know even what this is and this and we ended up with this so we went from this to this h pretty cool some kind of old analog measuring devices or something pretty special right uh but again I think this was pretty cool yeah I think we're just going to call that uh I think this works to some degree I think there's a lot of improvement we can do with the prompts and stuff and there was some bugs with that they didn't recognize the image and stuff I can see in the code but uh yeah for our first try I'm very happy with this uh I'm going to be uploading this uh code to my GitHub so if you want to support me just become a member you can just go for the lowest tier and you will get access to the GitHub where I will be posting this script and future script I have a lot of good cool ideas so stay tuned for that uh I'm going to leave a link in the description uh but other than that thank you for tuning in have a great day and I'll see you again very soon with another cool project\n"