OpenAI Realtime API - The NEW ERA of Speech to Speech - TESTED

Making Like a Simple Game: A Spontaneous Experiment

In a burst of creativity, we decided to play a simple game to see if we could make it work. The game was straightforward: each player would respond with the name of the last letter from the previous name. For instance, if I said "John," you would have to say "Lally" because of the N. This rule-based game seemed like a fun way to pass the time, but unfortunately, it didn't quite work out as planned.

The first round went well, with each player responding correctly with the last letter from the previous name. I started with the name "Peter," and you responded with "Rebecca." The next player, "Angie," was followed by "Ethan Nelson Sophie Eric Oliver." However, things took a turn for the worse when someone said "Oliver." According to the rules, that would be the end of the game. Unfortunately for that player, they lost because the name "Oliver" does not have an L as its last letter.

Despite the loss, I was still determined to continue the game. So, I started again with a new name, "Eric," and you responded with "Sophia Anie Ethan Nick Rachel." However, it seems that even this round didn't go in our favor. You lost because your response did not follow the rules of the game.

I have to admit, I was not expecting the game to end so quickly. It seemed like a simple concept, but somehow we managed to mess it up. Nevertheless, I had fun trying out the game and exploring its limitations. Maybe one day, we'll be able to refine the rules and make the game work.

Ideas for Future Development

As our experiment came to an end, I couldn't help but think about all the other possibilities that could have worked with this simple game. One idea I had was to use a phone call to test the limits of the game. We could try calling a real company and see if we can get some simple information out of them using this game. Of course, there's always a risk that things might get too crazy, so we'd have to be prepared for that eventuality.

Another idea I had was to explore the world of AI agents. With the latest advancements in technology, I think it would be fascinating to try and create a voice-controlled AI agent that can respond to our inputs. We could use this game as a starting point and see if we can develop something more advanced using the real-time API.

Function Calling: A Future Experiment

As I delved deeper into my research, I discovered that function calling is another area of interest for me. With the latest advancements in technology, it seems like we're on the cusp of something new and exciting. I'm not sure if I'll be able to try out this feature right away, though, since it's still quite expensive.

Multimodal Features: A Future Experiment

One day, I might decide to explore multimodal features further. The real-time API seems to be incorporating more modalities like vision and voice over time. While this sounds exciting, I'm not sure if it's worth it for now. Maybe one day, we'll have enough resources to try out these new features.

Code and Development

The code behind our experiment was a bit of a mess, to be honest. It was my first time using web sockets with Open AI, and I didn't quite know what I was doing. However, it seems like it works for now, so we can leave it as is for the time being.

Conclusion

As our game came to an end, I couldn't help but feel a sense of disappointment. Despite having fun trying out something new, we still managed to mess up. Nevertheless, I'm glad we could explore some ideas and see where they might take us in the future.

If you're interested in trying out this experiment for yourself, be warned: it can get quite expensive right away. Maybe one day, we'll be able to refine the rules and make the game work. Until then, let me know if you tried it out in the comments below and see if you have any cool ideas of your own.

"WEBVTTKind: captionsLanguage: enPeter Rebecca Angie Ethan Nelson Sophie Eric Oliver then you lost because Oliver is not a c what you see here is the app I built with the real time API so I actually built a voice version and a text version and I also tried to combine them uh but I think I'm just going to show you how this works now so if we go to cursor we can start a back end right so let's just try it out now okay so let's refresh this hello there greetings how may I assist you can you start by switching to Persona 2 and find out what all about AI is sure I'll share a bit about all about AI it's a YouTube channel run by a fellon okay stop stop stop stop stop and find out uh what his email address is his email address is Chris at allabout ai.com how else can I help you okay that was pretty good you can go back to Persona one now and just find out uh what features or microservices Chris offers processing request Chris offers the following microservices engagement on your social media post for $3 a YouTube video shout out for $100 a 15minute conversation for $40 30-minute conversation for 75 uh now now you can talk like a parrot squawk talking like a parrot I am what's next matey swack okay that that was fine uh I think that's enough for now so let's talk again soon shall we acknowledged ready to assist whenever needed goodbye okay so that was kind of The Voice version yeah this was just a small sample size of what you can do but if we go into cursor now you can kind of see we have this Fields here where we can change up the Persona so I have kind of two personas here I have Persona one you're a robot act like a robot speak very fast hypers speed and we also have Persona 2 so you're Irish traveler with a broad Irish accent right speak with a strong Irish dialect and use Irish phrases and you can see I also added some knowledge base here so this is the context so in the context here I just put in some information about me and you can hear it can use that as content when I asked about yeah my YouTube channel and stuff so I kind of looked that up like it's not rag but it's in the prompt right so yeah I've been having a lot of fun with this uh it's a bit of a hassle actually to set up so I'm going to be uploading the code to my GitHub if you want to try this out but it's a bit of a spaghetti code so I'm going to have to work a bit more on it to figure out what works and what does not work but this is working okay now I guess uh but real time API is not only voice it's text to so I kind of want to show you the text version uh of kind of the same uh app here okay so if we switch it up here if you go here and we just do MPN run Dev right and we start up the backend so MPN start uh if we go back here now and we kind of refresh this you can see now uh we have the able to type in messages so I can do hello or hey you can see it's super fast because we have the web socket open right so it's basic basally the same as the voice we have the same web socket but now we're using text instead so if we do like write a long story right something like this uh we can actually interrupt it by pressing stop so that is kind of the first time I've kind of interrupted a chat model that is actually writing so you can see again if I did like uh write a long story right uh if I just type stop and boom it stopped right away and kind of answers like okay I'll stopped let me know so that I thought that was pretty cool uh I don't know if there's any good use case for it but it is something interesting at least uh and that was one thing I did uh so now I also try to combine these two so we can actually switch between Voice and text at the same time so let me show you how that went so if you go into this cursor and we do start and we kind of do our mpm run Dev this should now be kind of the combined version so if we refresh here now you can see we have a text and a voice uh part right so we can start with something like right uh P code for comp interest so and here you can see okay we get some python code it's not in markdown or anything but you can see at least we got some python code here uh yeah so it's pretty quick uh so if we switch to voice now right can you explain that in like a very short sentence how it works the code calculates the future value of an initial investment by applying compound interest over a set period taking into account the initial amount interest rate the frequency it's compounded and the number of years you're welcome if you have any more questions or need further assistance feel free to ask so I don't know if you C it there but the thing that's not working yet in this version is the interruption I haven't really figured out how to do it uh but I'm going to spend some time this weekend and kind of look at how we can kind of interrupt uh this version too uh but like I said I haven't spent too much time on it uh but one thing that is damn crazy about this real time API is the cost I don't know if I want to show you this uh but yeah let me just go over here and kind of show you so today while building this I spent $15 o uh just testing it out I haven't really used it for anything so you can see my bill is up to $38 and 15 of that is uh just a real time API so let me find a better overview here okay so I actually thought I could select the model for October but it seems doesn't work but anyway I spent like you can see here $17 actually on the real time AP I guess wasn't updated so that is pretty crazy if you ask me so you have to have a really good use case if you actually going to create an app or something with this uh but it is promising like I think it's pretty cool I had a lot of fun with it so there's a lot of stuff you can do uh I just wanted to dive a bit more into kind of the code and kind of have us set this up uh yeah just for fun so if you kind of look at the the voice part here that that is something I found most interesting uh yeah like I said the code is just a big mess to be honest uh but we are using a web socket of course uh so I kind of use just the template uh the open AI documentation have you can see we are using the real time preview model here uh we are yeah fitting in our API key but like I said I'm not going to spend too much time on the code today because it's a bit of a mess I kind of want to clean it up uh but like I said I'm am going to do that and put it up to the my GitHub if you want to actually try it out uh but I thought we can do some changes here so we have some parameters here so we can actually change the voice let's do a mail so we can do Echo right so let's change that and we can actually do some other stuff here uh so let me come up with kind of a new uh prompt here to try something else okay so I wanted to try to see how good it is at actually following instructions so let's do only answer with a word that rhymes with the last word from the user so let's see how good it is at actually following uh these instructions uh okay so let me just uh I got some bgs I have to kill a port uh let's run it again let's go back here and refresh how the partner Gardener what does that mean King how old are you two SU pine pine fine mine line sign sign I guess I lost there one lost gone that's synonyms fun fun one two true false Hal okay I think that did an okay job it kind of did exactly what we asked it to but it wasn't perfect I guess it sometimes did this synonym but that's fine but uh it did always answer in one word so that was at least promising uh I want to try to make like a simple game so let's see if we can do that okay so let's try you're in a game the game is as follows uh you must always respond with the name of the last letter from the previous name so if I say John then you have to say Lally because of the N right if the player does not respond uh with the last letter the game is over and that player has lost uh so let's see if it's going to follow these instructions here now Peter Rebecca Angie Ethan Nelson Sophie Eric Oliver then you lost because Oliver is not a c you're right well played would you like to play another sure all right let's do it you start Eric then Nelson Sophia Anie Ethan Nick Rachel no you lost again I'm sorry you suck at this looks like I need more practice by bye bye bye anything else you'd like to do goodbye take care goodbye okay that worked just barely uh but I don't know maybe I should have prompted it better but uh yeah that wasn't too good uh but semi fun I guess uh but uh I guess kind of I don't have any like great things I'm just going to show you a few ideas I have around this that we might do in the future but for now this is kind of all I came up with uh but it's early right and I'm going to explore more uh but the code is yeah it's just a mess to be honest uh so like I said if you want access just become a member I will try to upload this but like I said it's not a guarantee that it works great uh but I had fun today playing around with it and uh I think there there are some cool stuff we can actually do so I'm just going to go through a few ideas I have around this and yeah I think we just going to call this because I don't I don't have anything right now so yeah like you see here this hasn't really sparked too too many idas for me like I have some other in mind but the thing uh I want to try in maybe the next few weeks is actually to do a phone call so I'm going to kind of set it up with some system instructions that has a goal with the phone call uh so of course I'm going to just hang up if it gets uh really crazy but uh I want to see if this can actually make a phone call to like a real company here uh and just try to get some simple information it's not going to be something crazy and it's going to be really quick uh but that is something I want to try to figure out right uh and it's of course function calling that is something also I've been diving into AI agents again so that is also something I'm going to be doing so I'm going to try to set up some different tools that we can kind of have a voice controlled AI agent uh that is something I will probably be doing very soon and you kind of think H why aren't you trying out multimodal features so the reason for that is because what's next in the real time API is more modalities so they start with voice then we plan to add vision and voice uh over time so we could of course try to do like gp4 or wish uh feed the text input into the API and stuff but I think I'm just going to skip it I think I'm just going to wait till we kind of get that feature right because I don't know uh yeah I don't know if it's worth it when it's so expensive we'll see I might try it but uh for now I think this is the only ideas I have and hopefully uh maybe you have some more ideas we can try out in the comments but like I said there's this just been a few days so I have to think about it over the weekend maybe and see if some new ideas pop up if I see something online we're definitely going to try it out a bit more uh but I feel like if since it's so expensive I kind of feel like oh uh I'm not going to explore too much until like I think I have something that is pretty cool uh but yeah that was basically the video so like I said if you want access to the the code uh I will upload it but it's a bit of a spaghetti because I didn't yeah it's the first time I try this web socket uh setup with open AI so it was not the best code but I think it works for my use case for now uh so basically that's that is it uh I would advise you to go try it out but be careful because it gets really expensive right away so maybe just wait for my next video I don't know but let me know if you tried it out in the comments to see if you have some cool ideas anyway thank you for tuning in and yeah have a great weekendPeter Rebecca Angie Ethan Nelson Sophie Eric Oliver then you lost because Oliver is not a c what you see here is the app I built with the real time API so I actually built a voice version and a text version and I also tried to combine them uh but I think I'm just going to show you how this works now so if we go to cursor we can start a back end right so let's just try it out now okay so let's refresh this hello there greetings how may I assist you can you start by switching to Persona 2 and find out what all about AI is sure I'll share a bit about all about AI it's a YouTube channel run by a fellon okay stop stop stop stop stop and find out uh what his email address is his email address is Chris at allabout ai.com how else can I help you okay that was pretty good you can go back to Persona one now and just find out uh what features or microservices Chris offers processing request Chris offers the following microservices engagement on your social media post for $3 a YouTube video shout out for $100 a 15minute conversation for $40 30-minute conversation for 75 uh now now you can talk like a parrot squawk talking like a parrot I am what's next matey swack okay that that was fine uh I think that's enough for now so let's talk again soon shall we acknowledged ready to assist whenever needed goodbye okay so that was kind of The Voice version yeah this was just a small sample size of what you can do but if we go into cursor now you can kind of see we have this Fields here where we can change up the Persona so I have kind of two personas here I have Persona one you're a robot act like a robot speak very fast hypers speed and we also have Persona 2 so you're Irish traveler with a broad Irish accent right speak with a strong Irish dialect and use Irish phrases and you can see I also added some knowledge base here so this is the context so in the context here I just put in some information about me and you can hear it can use that as content when I asked about yeah my YouTube channel and stuff so I kind of looked that up like it's not rag but it's in the prompt right so yeah I've been having a lot of fun with this uh it's a bit of a hassle actually to set up so I'm going to be uploading the code to my GitHub if you want to try this out but it's a bit of a spaghetti code so I'm going to have to work a bit more on it to figure out what works and what does not work but this is working okay now I guess uh but real time API is not only voice it's text to so I kind of want to show you the text version uh of kind of the same uh app here okay so if we switch it up here if you go here and we just do MPN run Dev right and we start up the backend so MPN start uh if we go back here now and we kind of refresh this you can see now uh we have the able to type in messages so I can do hello or hey you can see it's super fast because we have the web socket open right so it's basic basally the same as the voice we have the same web socket but now we're using text instead so if we do like write a long story right something like this uh we can actually interrupt it by pressing stop so that is kind of the first time I've kind of interrupted a chat model that is actually writing so you can see again if I did like uh write a long story right uh if I just type stop and boom it stopped right away and kind of answers like okay I'll stopped let me know so that I thought that was pretty cool uh I don't know if there's any good use case for it but it is something interesting at least uh and that was one thing I did uh so now I also try to combine these two so we can actually switch between Voice and text at the same time so let me show you how that went so if you go into this cursor and we do start and we kind of do our mpm run Dev this should now be kind of the combined version so if we refresh here now you can see we have a text and a voice uh part right so we can start with something like right uh P code for comp interest so and here you can see okay we get some python code it's not in markdown or anything but you can see at least we got some python code here uh yeah so it's pretty quick uh so if we switch to voice now right can you explain that in like a very short sentence how it works the code calculates the future value of an initial investment by applying compound interest over a set period taking into account the initial amount interest rate the frequency it's compounded and the number of years you're welcome if you have any more questions or need further assistance feel free to ask so I don't know if you C it there but the thing that's not working yet in this version is the interruption I haven't really figured out how to do it uh but I'm going to spend some time this weekend and kind of look at how we can kind of interrupt uh this version too uh but like I said I haven't spent too much time on it uh but one thing that is damn crazy about this real time API is the cost I don't know if I want to show you this uh but yeah let me just go over here and kind of show you so today while building this I spent $15 o uh just testing it out I haven't really used it for anything so you can see my bill is up to $38 and 15 of that is uh just a real time API so let me find a better overview here okay so I actually thought I could select the model for October but it seems doesn't work but anyway I spent like you can see here $17 actually on the real time AP I guess wasn't updated so that is pretty crazy if you ask me so you have to have a really good use case if you actually going to create an app or something with this uh but it is promising like I think it's pretty cool I had a lot of fun with it so there's a lot of stuff you can do uh I just wanted to dive a bit more into kind of the code and kind of have us set this up uh yeah just for fun so if you kind of look at the the voice part here that that is something I found most interesting uh yeah like I said the code is just a big mess to be honest uh but we are using a web socket of course uh so I kind of use just the template uh the open AI documentation have you can see we are using the real time preview model here uh we are yeah fitting in our API key but like I said I'm not going to spend too much time on the code today because it's a bit of a mess I kind of want to clean it up uh but like I said I'm am going to do that and put it up to the my GitHub if you want to actually try it out uh but I thought we can do some changes here so we have some parameters here so we can actually change the voice let's do a mail so we can do Echo right so let's change that and we can actually do some other stuff here uh so let me come up with kind of a new uh prompt here to try something else okay so I wanted to try to see how good it is at actually following instructions so let's do only answer with a word that rhymes with the last word from the user so let's see how good it is at actually following uh these instructions uh okay so let me just uh I got some bgs I have to kill a port uh let's run it again let's go back here and refresh how the partner Gardener what does that mean King how old are you two SU pine pine fine mine line sign sign I guess I lost there one lost gone that's synonyms fun fun one two true false Hal okay I think that did an okay job it kind of did exactly what we asked it to but it wasn't perfect I guess it sometimes did this synonym but that's fine but uh it did always answer in one word so that was at least promising uh I want to try to make like a simple game so let's see if we can do that okay so let's try you're in a game the game is as follows uh you must always respond with the name of the last letter from the previous name so if I say John then you have to say Lally because of the N right if the player does not respond uh with the last letter the game is over and that player has lost uh so let's see if it's going to follow these instructions here now Peter Rebecca Angie Ethan Nelson Sophie Eric Oliver then you lost because Oliver is not a c you're right well played would you like to play another sure all right let's do it you start Eric then Nelson Sophia Anie Ethan Nick Rachel no you lost again I'm sorry you suck at this looks like I need more practice by bye bye bye anything else you'd like to do goodbye take care goodbye okay that worked just barely uh but I don't know maybe I should have prompted it better but uh yeah that wasn't too good uh but semi fun I guess uh but uh I guess kind of I don't have any like great things I'm just going to show you a few ideas I have around this that we might do in the future but for now this is kind of all I came up with uh but it's early right and I'm going to explore more uh but the code is yeah it's just a mess to be honest uh so like I said if you want access just become a member I will try to upload this but like I said it's not a guarantee that it works great uh but I had fun today playing around with it and uh I think there there are some cool stuff we can actually do so I'm just going to go through a few ideas I have around this and yeah I think we just going to call this because I don't I don't have anything right now so yeah like you see here this hasn't really sparked too too many idas for me like I have some other in mind but the thing uh I want to try in maybe the next few weeks is actually to do a phone call so I'm going to kind of set it up with some system instructions that has a goal with the phone call uh so of course I'm going to just hang up if it gets uh really crazy but uh I want to see if this can actually make a phone call to like a real company here uh and just try to get some simple information it's not going to be something crazy and it's going to be really quick uh but that is something I want to try to figure out right uh and it's of course function calling that is something also I've been diving into AI agents again so that is also something I'm going to be doing so I'm going to try to set up some different tools that we can kind of have a voice controlled AI agent uh that is something I will probably be doing very soon and you kind of think H why aren't you trying out multimodal features so the reason for that is because what's next in the real time API is more modalities so they start with voice then we plan to add vision and voice uh over time so we could of course try to do like gp4 or wish uh feed the text input into the API and stuff but I think I'm just going to skip it I think I'm just going to wait till we kind of get that feature right because I don't know uh yeah I don't know if it's worth it when it's so expensive we'll see I might try it but uh for now I think this is the only ideas I have and hopefully uh maybe you have some more ideas we can try out in the comments but like I said there's this just been a few days so I have to think about it over the weekend maybe and see if some new ideas pop up if I see something online we're definitely going to try it out a bit more uh but I feel like if since it's so expensive I kind of feel like oh uh I'm not going to explore too much until like I think I have something that is pretty cool uh but yeah that was basically the video so like I said if you want access to the the code uh I will upload it but it's a bit of a spaghetti because I didn't yeah it's the first time I try this web socket uh setup with open AI so it was not the best code but I think it works for my use case for now uh so basically that's that is it uh I would advise you to go try it out but be careful because it gets really expensive right away so maybe just wait for my next video I don't know but let me know if you tried it out in the comments to see if you have some cool ideas anyway thank you for tuning in and yeah have a great weekend\n"