Creating a Real-Time Transcription App with Assembly AI: A Step-by-Step Guide
Using the above parameters that we have specified here, so this will be the audio stream and the start listening and stop listening was already explained. After we have performed the transcription and after we have essentially spoken to the application, we click on the stop button and upon clicking on the stop button it will trigger the download transcription to run and in doing so, it will display the transcription text as a downloadable file via the sc.download button.
As you can see, if we click on start and then we say something to the application, and then it will start to transcribing our text, and then I'll hit on stop, and then in just a moment, you're going to see the download button appearing right here. Yeah, right here, download tabs and the download transcription button, and so it will be saved into a txt file.
Here's what we can see on line number 48 until 67, which is essentially the top portion of the web application. If I refresh it's going to be from here, the real-time transcription app header, here until the start and stop button. Let's take a look at line number 49, we could use the emoji of the microphone, here, and then the real-time transcription app is displayed, here in the st title tag.
We're using the sc.expander, right here as an expandable container box, and it's called "about this app," which is displayed here. About this app, and if we click on it, all of the texts that are shown here underneath it will be displayed, and so I've already formatted using markdown language, right here, so it is explaining what all of these individual libraries are doing in this particular web application.
You'll notice here that we're going to define two column variables using the sc.columns and then for column one, we're going to use the start button, which is right here, for column two, we're going to use the stop button, which is right here. And so you can see that we have formatted the layout of the two buttons.
The remaining block of code here on line 69 until line number 136 is the audio input and output that is being used to send signal audio signal to the assembly ai api and receiving the transcribed text from assembly ai and then displaying it on the web application. So, this chunk of code here was taken from the github repo created by misra in the video here on the assembly ai youtube channel.
And so we have essentially modified this code slightly and then added some visuals to the application. So, essentially, the first segment here will perform let me expand this a bit so this is the entire function for sending and receiving the audio signal input and output so it's connecting to the assembly ai using this api and we're specifying the rate which is specified here in the sidebar 16,000, it will be replaced here.
And it will be authorizing via the api key which was provided in the config.py and also in the secrets.toml file. And then it's using async io to perform the concurrent input output of the audio so these block of code here will perform sending of the signal and it will be encoding decoding the audio signal and then it will be accepting the output here in this function.
And then the final transcript or the transcribed text will be made in the form of a json which we read in, and then we selectively take out the transcript text, and then we print it out here line by line in the web application. And then after when we decide that we want to stop the transcription we click on stop, and then it will write everything into a file where we using the async io.dot run in order to perform the concurrent processing of the input output audio.
So, after we have clicked on the stop button, you're going to notice that the download button will appear owing to this block of code here. And so after we clicked on download it's also going to remove the transcribed text so that the next run of the application will start from fresh again. Congratulations, you have built this real-time transcription app using assembly ai to perform real-time transcription.
Various Use Cases for the Real-Time Transcription App
There are various use cases that you could make use of this particular web app. You could create your essay or email just by speaking to the application and after you're done click on stop, and then you have access to the underlying transcribed text, and then you could copy and paste it into various word processing applications.
I hope that you enjoyed the video, let me know how you're going to modify this particular web application, and if you found it useful. Thank you for watching until the end of this video, and if you reach this far in the video please drop a balloon emoji so that I know that you're the real one.
If you're enjoying the video, please also give it a thumbs up, subscribe if you haven't already, and make sure to hit on the notification bell.