How to Build a Real-Time Transcription Web App in Python using AssemblyAI and Streamlit

Creating a Real-Time Transcription App with Assembly AI: A Step-by-Step Guide

Using the above parameters that we have specified here, so this will be the audio stream and the start listening and stop listening was already explained. After we have performed the transcription and after we have essentially spoken to the application, we click on the stop button and upon clicking on the stop button it will trigger the download transcription to run and in doing so, it will display the transcription text as a downloadable file via the sc.download button.

As you can see, if we click on start and then we say something to the application, and then it will start to transcribing our text, and then I'll hit on stop, and then in just a moment, you're going to see the download button appearing right here. Yeah, right here, download tabs and the download transcription button, and so it will be saved into a txt file.

Here's what we can see on line number 48 until 67, which is essentially the top portion of the web application. If I refresh it's going to be from here, the real-time transcription app header, here until the start and stop button. Let's take a look at line number 49, we could use the emoji of the microphone, here, and then the real-time transcription app is displayed, here in the st title tag.

We're using the sc.expander, right here as an expandable container box, and it's called "about this app," which is displayed here. About this app, and if we click on it, all of the texts that are shown here underneath it will be displayed, and so I've already formatted using markdown language, right here, so it is explaining what all of these individual libraries are doing in this particular web application.

You'll notice here that we're going to define two column variables using the sc.columns and then for column one, we're going to use the start button, which is right here, for column two, we're going to use the stop button, which is right here. And so you can see that we have formatted the layout of the two buttons.

The remaining block of code here on line 69 until line number 136 is the audio input and output that is being used to send signal audio signal to the assembly ai api and receiving the transcribed text from assembly ai and then displaying it on the web application. So, this chunk of code here was taken from the github repo created by misra in the video here on the assembly ai youtube channel.

And so we have essentially modified this code slightly and then added some visuals to the application. So, essentially, the first segment here will perform let me expand this a bit so this is the entire function for sending and receiving the audio signal input and output so it's connecting to the assembly ai using this api and we're specifying the rate which is specified here in the sidebar 16,000, it will be replaced here.

And it will be authorizing via the api key which was provided in the config.py and also in the secrets.toml file. And then it's using async io to perform the concurrent input output of the audio so these block of code here will perform sending of the signal and it will be encoding decoding the audio signal and then it will be accepting the output here in this function.

And then the final transcript or the transcribed text will be made in the form of a json which we read in, and then we selectively take out the transcript text, and then we print it out here line by line in the web application. And then after when we decide that we want to stop the transcription we click on stop, and then it will write everything into a file where we using the async io.dot run in order to perform the concurrent processing of the input output audio.

So, after we have clicked on the stop button, you're going to notice that the download button will appear owing to this block of code here. And so after we clicked on download it's also going to remove the transcribed text so that the next run of the application will start from fresh again. Congratulations, you have built this real-time transcription app using assembly ai to perform real-time transcription.

Various Use Cases for the Real-Time Transcription App

There are various use cases that you could make use of this particular web app. You could create your essay or email just by speaking to the application and after you're done click on stop, and then you have access to the underlying transcribed text, and then you could copy and paste it into various word processing applications.

I hope that you enjoyed the video, let me know how you're going to modify this particular web application, and if you found it useful. Thank you for watching until the end of this video, and if you reach this far in the video please drop a balloon emoji so that I know that you're the real one.

If you're enjoying the video, please also give it a thumbs up, subscribe if you haven't already, and make sure to hit on the notification bell.

"WEBVTTKind: captionsLanguage: enwelcome back to the data professor youtube channel hi my name is chenin and i'm currently a developer advocate at a tech company in the sf bay area and a former bioinformatics professor and in this video i'm going to show you how you could build a real-time speech to text web application using the assembly ai api who happens to be the sponsor of this video and we're going to do that using the streamlit library in python and so if that sounds like fun then you want to watch this video to the end and so without further ado we're starting right now and so let's get started so the contents of this particular project will be shared on github and i'll provide you the link in the video description and so you're going to see here in the rt transcription folder we're essentially going to have the dot simulate folder and inside we're going to have a secrets.t.o.m.l which will contain the api key so you could replace this with the assembly ai api and so let me show you before we begin so let's log on to assembly ai to get our key all right and so now that we're logged in you want to copy the api key by clicking here on the right hand side of the panel once you clicked on it you'll see that it'll say copied and then you'll have the api in the memory and then what you want to do now is go back to the secrets.toml and then you want to replace it with your api key and you want to do the same for the configure.py file you also want to replace the xxx here with the api key and mention here of notes is that you should make it a string where you have a quotation mark at the beginning and at the end of the api all right and now that i have already replaced the x characters with the proper assembly ai api let's continue further so here we have the speech recognition dot py and the streamlet app.py so let me show you so on the assembly ai youtube channel there's a video called real time speech recognition and this particular video was created by misra and you can see here that it provides the github code here as well and if you click on it you'll go to this particular github repo and so in this tutorial we're going to repurpose both of these code and then we're going to adapt it into our own project and aside from that we're going to make use of an additional resource which is a blog on towards data science let me show you and so this great article by georgios provide a very good breakdown of the various code blocks and snippets that are provided in this particular repo and the one that we'll be using today in the tutorial so you're going to see all of the description explanation for each of the code blocks which i'm going to briefly explain as well so i'll provide you the link to both of these resources also in the video description so let's head on back and let's have a look at the speech recognition dot py file so this is the code for the speech recognition in real time and this was from misra and in this tutorial we're not going to focus on this we're going to focus on the trimlet app and so the contents of this file and the streamlet app will be the same and the only difference is that in a streamlit app we're also going to add additional streamlet code here using various st function to make it into a web application and so this web application will be easy to use and so let's continue further so in the streamlit app.py you're going to see that the first eight lines are importing the necessary libraries so we're going to use the streamlit astroweb framework and that will house all of the code inside the web application and it provides you with input and output widgets for you to accept inputs from the user and also to display the output from the transcription the web socket will allow us to interact with the assembly ai api the async io will allow us to perform all of the various audio input and outputs in concurrent manner and base64 will allow us to encode and decode the audio signal before it's being sent to the assembly ai api json will be used to read in the audio output which is the transcribed text and pi audio will essentially be used to accept all of the audio input processing and that is done via the port audio library and that is cross platform for all of the major operating system like mac os windows and linux and we'll also be using the os and path lib for navigating through the various folders of this project and also to perform file processing and handling and so let's continue further so before we continue further with the line by line explanation of the various code block which comprises of 148 lines of code we're going to run the app and see how it looks like so first thing is i'm going to first activate the content environment from which i used to build this particular web app so you want to type in conda activate and then the name of your particular content environment and the one on this computer is called streamlit so i activated it and noticed that the base here changed to streamlit and now we're going to type in streamlightrun streamlit app.py which is how to launch the application streamlit run and then streamlit app.py hit on enter and now you're gonna see the web app open so let me divide the screen here between the app and the code so i'm going to minimize this panel here and notice here that it's defaulted as a black and white because of the theme of the web app and you could feel free to go to the settings and modify the color to a light theme if if that's what you're interested in all right and so let's test the application here so i'm going to click on the start this is a real-time transcription app built in python data science is so cool and so before i continue further let me find something to read let's go to this particular blog article and let's see let's read it how to perform real-time speech recognition with python introduction in one of my latest articles we explored how to perform offline speech recognition with assembly ai api and python in other words we uploaded the desired audio file to a hosting service and then we used the transcript endpoint of the api in order to perform speech to text so you're going to notice that the particular application here is going to send audio that we speak into the app and it's going to do that in chunks of speech and you're going to notice that each chunk will be separated by a new line so if you want all of your speech to go into the same line here you're going to have to say it in one long speech otherwise it will be adding a punctuation to your particular transcribed text and nevertheless all of the text that you see here if you press on the stop button it will be saved into a text file and the text file will concatenate all of the various lines of the text here into a single long paragraph and you can format that later so here we click on stopped and then you can click on download the transcription and then we have the transcribed text right here so this is the entire transcription of what we have just talked into the web application and so this is the entire speech that we have already performed and transcribed through the web app so let's head back to the code so a particular note here lines 10 through 13 will describe the session state and so the session state will be essentially kind of like a memory of the web application and if it is running for the first time the session state text will have a value of listening and it will be assigning a run value of false because we haven't yet started the app and if we clicked on the start button here the run status will be changed to true so let's skip a bit to here we have the start right here start listening and stop listening so if run is true it will be invoked by the start listening function and if stop listening is the function that will assign the value of false to the run which will be the same as when it is starting from over from scratch from the first time that the app is run so here is the button line number 66 and 67 has the start and stop button and upon clicking on this starts it will call the start listening function and upon clicking on the stop it will call the stop listening function so clicking on start and stop will trigger the two functions that i've mentioned previously start and stop head over back here so you notice that i'll jump around the code because the various lines will be related and it might be dispersed in the text of the code all right and so lines 15 until 31 will be the audio parameter and the audio parameter will be here in the side panel so st dot sidebar will allow us to display the text here the header of the audio parameters and lines 18 through 22 will be creating various variables that we're going to be reusing throughout this particular web app and the frames per buffer will be created using a text input so that users could feel free to modify the parameter if they choose and the rate here as well you can play around with this and this will influence the transcription and pi audio here will allow us to initiate the audio stream using the above parameters that we have specified here so this will be the audio stream and the start listening and stop listening was already explained and after we have performed the transcription and after we have essentially speak to the application we click on the stop button and upon clicking on the stop button it will trigger the download transcription to run and in doing so it will display the transcription text as a downloadable file via the sc.download button as you'll see here if we click on start and then we say something to the application and then it will start to transcribing our text and then i'll hit on stop and then in just a moment you're going to see the download button appearing right here yeah right here download tabs and the download transcription button and so it will be saved into a txt file and here on lines number 48 until 67 is essentially the top portion here of the web application if i refresh it's going to be from here the real-time transcription app header here until the start and stop button let's take a look at line number 49 we could use the emoji of the microphone here and then the real-time transcription app is displayed here in the st title tag and we're using the sc.expander right here as an expandable container box and it's called about this app which is displayed here about this app and if we click on it all of the texts that are shown here underneath it will be displayed and so i've already formatted using markdown language right here so it is explaining what all of these individual libraries are doing in this particular web application and you'll notice here that we're going to define two column variables using the sc columns and then for column one we're going to use the start button which is right here for column two we're going to use the stop button which is right here and so you can see that we have formatted the layout of the two buttons and so the remaining block of code here on line 69 until line number 136 it is the audio input and output that is being used to send signal audio signal to the assembly ai api and receiving the transcribed tabs from assembly ai and then displaying it on the web application and so this chunk of code here was taken from the github repo created by misra in the video here on the assembly ai youtube channel and so we have essentially modified this code slightly and then added some visuals to the application and so essentially the first segment here will perform let me expand this a bit so this is the entire function for sending and receiving the audio signal input and output so it's connecting to the assembly ai using this api and we're specifying the rate which is specified here in the sidebar 16 000 it will be replaced here and it will be authorizing via the api key which was provided in the config.py and also in the secrets.toml file and then it's using async io to perform the concurrent input output of the audio so these block of code here will perform sending of the signal and it will be encoding decoding the audio signal and then it will be accepting the output here in this function and then the final transcript or the transcribed text will be made in the form of a json which we read in and then we selectively take out the transcript text and then we print it out here line by line in the web application and then after when we decide that we want to stop the transcription we click on stop and then it will write everything into a file where we using the async io dot run in order to perform the concurrent processing of the input output audio and so after we have clicked on the stop button you're going to notice that the download button will appear owing to this block of code here and so after we clicked on download it's also going to remove the transcribed text so that the next run of the application will start from fresh again and so congratulations you have built this real-time transcription app using assembly ai to perform real-time transcription so there are various use cases that you could make use of this particular web app such as you could create your essay or email just by speaking to the application and after you're done click on stop and then you have access to the underlying transcribed text and then you could copy and paste it into various word processing application and so i hope that you enjoyed the video and let me know how you're going to modify this particular web application and if you found it useful thank you for watching until the end of this video and if you reach this far in the video please drop a balloon emoji so that i know that you're the real one and if you're enjoying the video please also give it a thumbs up subscribe if you haven't already and also make sure to hit on the notification bell so that you'll be notified of the next video and as always the best way to learn data science is to do data science and please enjoy the journeywelcome back to the data professor youtube channel hi my name is chenin and i'm currently a developer advocate at a tech company in the sf bay area and a former bioinformatics professor and in this video i'm going to show you how you could build a real-time speech to text web application using the assembly ai api who happens to be the sponsor of this video and we're going to do that using the streamlit library in python and so if that sounds like fun then you want to watch this video to the end and so without further ado we're starting right now and so let's get started so the contents of this particular project will be shared on github and i'll provide you the link in the video description and so you're going to see here in the rt transcription folder we're essentially going to have the dot simulate folder and inside we're going to have a secrets.t.o.m.l which will contain the api key so you could replace this with the assembly ai api and so let me show you before we begin so let's log on to assembly ai to get our key all right and so now that we're logged in you want to copy the api key by clicking here on the right hand side of the panel once you clicked on it you'll see that it'll say copied and then you'll have the api in the memory and then what you want to do now is go back to the secrets.toml and then you want to replace it with your api key and you want to do the same for the configure.py file you also want to replace the xxx here with the api key and mention here of notes is that you should make it a string where you have a quotation mark at the beginning and at the end of the api all right and now that i have already replaced the x characters with the proper assembly ai api let's continue further so here we have the speech recognition dot py and the streamlet app.py so let me show you so on the assembly ai youtube channel there's a video called real time speech recognition and this particular video was created by misra and you can see here that it provides the github code here as well and if you click on it you'll go to this particular github repo and so in this tutorial we're going to repurpose both of these code and then we're going to adapt it into our own project and aside from that we're going to make use of an additional resource which is a blog on towards data science let me show you and so this great article by georgios provide a very good breakdown of the various code blocks and snippets that are provided in this particular repo and the one that we'll be using today in the tutorial so you're going to see all of the description explanation for each of the code blocks which i'm going to briefly explain as well so i'll provide you the link to both of these resources also in the video description so let's head on back and let's have a look at the speech recognition dot py file so this is the code for the speech recognition in real time and this was from misra and in this tutorial we're not going to focus on this we're going to focus on the trimlet app and so the contents of this file and the streamlet app will be the same and the only difference is that in a streamlit app we're also going to add additional streamlet code here using various st function to make it into a web application and so this web application will be easy to use and so let's continue further so in the streamlit app.py you're going to see that the first eight lines are importing the necessary libraries so we're going to use the streamlit astroweb framework and that will house all of the code inside the web application and it provides you with input and output widgets for you to accept inputs from the user and also to display the output from the transcription the web socket will allow us to interact with the assembly ai api the async io will allow us to perform all of the various audio input and outputs in concurrent manner and base64 will allow us to encode and decode the audio signal before it's being sent to the assembly ai api json will be used to read in the audio output which is the transcribed text and pi audio will essentially be used to accept all of the audio input processing and that is done via the port audio library and that is cross platform for all of the major operating system like mac os windows and linux and we'll also be using the os and path lib for navigating through the various folders of this project and also to perform file processing and handling and so let's continue further so before we continue further with the line by line explanation of the various code block which comprises of 148 lines of code we're going to run the app and see how it looks like so first thing is i'm going to first activate the content environment from which i used to build this particular web app so you want to type in conda activate and then the name of your particular content environment and the one on this computer is called streamlit so i activated it and noticed that the base here changed to streamlit and now we're going to type in streamlightrun streamlit app.py which is how to launch the application streamlit run and then streamlit app.py hit on enter and now you're gonna see the web app open so let me divide the screen here between the app and the code so i'm going to minimize this panel here and notice here that it's defaulted as a black and white because of the theme of the web app and you could feel free to go to the settings and modify the color to a light theme if if that's what you're interested in all right and so let's test the application here so i'm going to click on the start this is a real-time transcription app built in python data science is so cool and so before i continue further let me find something to read let's go to this particular blog article and let's see let's read it how to perform real-time speech recognition with python introduction in one of my latest articles we explored how to perform offline speech recognition with assembly ai api and python in other words we uploaded the desired audio file to a hosting service and then we used the transcript endpoint of the api in order to perform speech to text so you're going to notice that the particular application here is going to send audio that we speak into the app and it's going to do that in chunks of speech and you're going to notice that each chunk will be separated by a new line so if you want all of your speech to go into the same line here you're going to have to say it in one long speech otherwise it will be adding a punctuation to your particular transcribed text and nevertheless all of the text that you see here if you press on the stop button it will be saved into a text file and the text file will concatenate all of the various lines of the text here into a single long paragraph and you can format that later so here we click on stopped and then you can click on download the transcription and then we have the transcribed text right here so this is the entire transcription of what we have just talked into the web application and so this is the entire speech that we have already performed and transcribed through the web app so let's head back to the code so a particular note here lines 10 through 13 will describe the session state and so the session state will be essentially kind of like a memory of the web application and if it is running for the first time the session state text will have a value of listening and it will be assigning a run value of false because we haven't yet started the app and if we clicked on the start button here the run status will be changed to true so let's skip a bit to here we have the start right here start listening and stop listening so if run is true it will be invoked by the start listening function and if stop listening is the function that will assign the value of false to the run which will be the same as when it is starting from over from scratch from the first time that the app is run so here is the button line number 66 and 67 has the start and stop button and upon clicking on this starts it will call the start listening function and upon clicking on the stop it will call the stop listening function so clicking on start and stop will trigger the two functions that i've mentioned previously start and stop head over back here so you notice that i'll jump around the code because the various lines will be related and it might be dispersed in the text of the code all right and so lines 15 until 31 will be the audio parameter and the audio parameter will be here in the side panel so st dot sidebar will allow us to display the text here the header of the audio parameters and lines 18 through 22 will be creating various variables that we're going to be reusing throughout this particular web app and the frames per buffer will be created using a text input so that users could feel free to modify the parameter if they choose and the rate here as well you can play around with this and this will influence the transcription and pi audio here will allow us to initiate the audio stream using the above parameters that we have specified here so this will be the audio stream and the start listening and stop listening was already explained and after we have performed the transcription and after we have essentially speak to the application we click on the stop button and upon clicking on the stop button it will trigger the download transcription to run and in doing so it will display the transcription text as a downloadable file via the sc.download button as you'll see here if we click on start and then we say something to the application and then it will start to transcribing our text and then i'll hit on stop and then in just a moment you're going to see the download button appearing right here yeah right here download tabs and the download transcription button and so it will be saved into a txt file and here on lines number 48 until 67 is essentially the top portion here of the web application if i refresh it's going to be from here the real-time transcription app header here until the start and stop button let's take a look at line number 49 we could use the emoji of the microphone here and then the real-time transcription app is displayed here in the st title tag and we're using the sc.expander right here as an expandable container box and it's called about this app which is displayed here about this app and if we click on it all of the texts that are shown here underneath it will be displayed and so i've already formatted using markdown language right here so it is explaining what all of these individual libraries are doing in this particular web application and you'll notice here that we're going to define two column variables using the sc columns and then for column one we're going to use the start button which is right here for column two we're going to use the stop button which is right here and so you can see that we have formatted the layout of the two buttons and so the remaining block of code here on line 69 until line number 136 it is the audio input and output that is being used to send signal audio signal to the assembly ai api and receiving the transcribed tabs from assembly ai and then displaying it on the web application and so this chunk of code here was taken from the github repo created by misra in the video here on the assembly ai youtube channel and so we have essentially modified this code slightly and then added some visuals to the application and so essentially the first segment here will perform let me expand this a bit so this is the entire function for sending and receiving the audio signal input and output so it's connecting to the assembly ai using this api and we're specifying the rate which is specified here in the sidebar 16 000 it will be replaced here and it will be authorizing via the api key which was provided in the config.py and also in the secrets.toml file and then it's using async io to perform the concurrent input output of the audio so these block of code here will perform sending of the signal and it will be encoding decoding the audio signal and then it will be accepting the output here in this function and then the final transcript or the transcribed text will be made in the form of a json which we read in and then we selectively take out the transcript text and then we print it out here line by line in the web application and then after when we decide that we want to stop the transcription we click on stop and then it will write everything into a file where we using the async io dot run in order to perform the concurrent processing of the input output audio and so after we have clicked on the stop button you're going to notice that the download button will appear owing to this block of code here and so after we clicked on download it's also going to remove the transcribed text so that the next run of the application will start from fresh again and so congratulations you have built this real-time transcription app using assembly ai to perform real-time transcription so there are various use cases that you could make use of this particular web app such as you could create your essay or email just by speaking to the application and after you're done click on stop and then you have access to the underlying transcribed text and then you could copy and paste it into various word processing application and so i hope that you enjoyed the video and let me know how you're going to modify this particular web application and if you found it useful thank you for watching until the end of this video and if you reach this far in the video please drop a balloon emoji so that i know that you're the real one and if you're enjoying the video please also give it a thumbs up subscribe if you haven't already and also make sure to hit on the notification bell so that you'll be notified of the next video and as always the best way to learn data science is to do data science and please enjoy the journey\n"