How to build your own Speech-to-Text Transcription App in Python using AssemblyAI and Streamlit

The Transcriber App: A Streamlit Application for Automatic Speech Recognition

In this article, we will explore the development of a transcriber app using Streamlit, a Python library for building web applications. The app utilizes the Assembly AI API for automatic speech recognition and provides an efficient way to transcribe audio files.

**Development Overview**

The transcriber app consists of two primary functions: contacting the Assembly AI API to perform transcription and displaying the results in a user-friendly interface. The development process involved writing 100 lines of code, resulting in an app with a minimalistic UI that is both functional and efficient.

**Contacting the Assembly AI API**

To initiate the transcription process, the app reads the API key from the sequence and assigns it to a variable called `API_key`. The URL for the API is also stored in memory, allowing for quick access during the development phase. Upon submission of the input parameters, the app sends a request to the Assembly AI API using the `getyt` function, which performs the actual transcription.

**User Interface**

The user interface is designed to be simple and intuitive, with minimal code required to create it. The `Sidebar` component serves as the input parameter for the app, allowing users to enter the URL of the YouTube video they wish to transcribe. The `St` warning function is used to create a text box where users can input their API key.

**Processing Transcription Results**

Once the transcription process is complete, the app performs two secondary functions: sipping up the file and creating a Zip file button. The `custom_function` that we had described earlier from line 112 performs the actual sipping up of the file, while the `St` download function creates a convenient way to download the Zip file.

**Reducing Code**

By deleting unnecessary lines of code, we were able to reduce the overall length of the app from 142 lines to just 148. This highlights the importance of efficiency in coding and the ability to optimize app development without compromising functionality.

**Using the Transcriber App**

The transcriber app is now available for use through various platforms, including Google Colab, Commandline applications, and Streamlit applications. Users can access the app by entering their API key and YouTube video URL, followed by submission of the input parameters. The app will then perform automatic speech recognition and display the results in a user-friendly interface.

**Supporting the Channel**

To support the channel that developed the transcriber app, users are encouraged to like, subscribe, and hit the notification bell for future updates. By doing so, users can stay up-to-date on the latest developments and contribute to the growth of the community.

"WEBVTTKind: captionsLanguage: enDo you have a lot of videos lying around, perhaps from YouTube or fromyour Zoom call meetings, and you would like to transcribe it intowritten form. If you answered yes, then you want towatch this video till the end, because today we're going to show youhow you could develop your very own transcriber application.And we're going to do this using the API from AssemblyAI, who happens to be the sponsor of this video.In a nutshell, AssemblyAI will convert audio towritten form using automatic speech recognition,and so without further ado. We're starting right now.Assembly AI is a speech to text API servicethat is trusted by Fortune 500 startup and thousandsof other developers around the world. Not only can you transcribeyour text, which has a lot of features such as realtimetranscription, labeling, speakers in the video or audio,providing word timings or timestamps exportingsubtitles, as well as providing automatic punctuation andcasing to transcribe text. Aside from that, the AI componentas implied by the name of the company. There's also other featuressuch as topic detection, PII redaction,which will protect sensitive information, sentiment analysis,profanity filtering, content safety detection, as well aschapter detection, which will automatically split content intochapters for longer YouTube videos, so that thetranscript will be more structured and more readable. And there's also featuresfor summarization and entity detection,as well as translation to other languages, as well as emotiondetection, which will be features that is coming soonin the future. Aside from that, you can customize the API byproviding custom vocabulary such as company names,people names, product names, industry terms,and all of this with 99 point 99% uptime 24/7support advanced security and Privacy, and also data Privacy,and best of all, you could get started for free with no informationof your credit card required, and you could essentially just sign up andbe able to use the API in a matter of minutes. Okay,so the first thing that you want to do is to head over to theassembly AI Colab notebook that I'm sharingin the video description, and before we can begin, you wantto register for assembly AI so that you could get theAPI key. So head over to assembly AI.Com and then you want to sign up for it,click on the Start now for free, and then you could createyour account. And as I have already done so, I'm going to log in.And upon logging in, your API will be accessibleto the right hand side under your API key heading,and you could click on it in order to copy the API,and then you're ready to go from the interface here, you can see that youcould use the API to develop your own transcriberapplication. You could directly upload your audiofile or video file, and then in a no codeapproach, you could transcribe it. But this will have to be doneusing the assembly AI website, so you could click here,drag and drop your audio, and then you're all set. So letme show you how you could develop your own usingColab, and so click on the API key, and thenyou want to paste it in here, and then you want to run it.And upon running the code sales, you're assigning your APIkey directly to the variable called API underscorekey, and let's move on to the next part, which is to installPytube. So we're going to use Pytube in order to retrieveYouTube videos, and it will download it as a videofile. And now that we have it installed, we're going to import it.So we're going to use from Pytube import YouTube. And thenwe're going to use the YouTube function and inside it,ask the input argument. We're going to put the URL of the YouTubevideo. And so this is the video that I recently createon what is the best way to learn data science. So it is ashort video of about 15 seconds. And so Ijust copy the URL here, and then I'll pasteit here. Let's run it. And so the URL is the importargument for the YouTube function, and we're assigning it into the videovariable. And then from the video variable, we're going to saythat we want to get only the audio file, and then we're going to assignthat to the YT variable. And in order to download it,we use YT download. Essentially, it will downloada file in MP four format, which is the audio.Let's download it. If you click on theleft hand panel here, you're going to see that the file has been downloaded righthere. And because it is the audio only, the fileis not so big. It is at about 240 KB.This block of code will essentially take the name ofthe file, which is the MP four, and it's going to print outthe full path of the file. So it will print out where the fileis, which is in content folder, and then followedby the name of the actual files. And so we will need this full pathof the audio file in our next block of code. Let's runit. All right. And there you go. You can see that it is exactly thesame as above. But this is just a print out from the Pietwo library. But essentially we want to get the full path,and therefore we use Cos library. All right. And so in stepnumber three, we're going to now upload the YouTube audiofile that we have already downloaded as MP four,and we're going to upload it to assembly AI. This blockof code will essentially do that. So I've actually got this blockof code from one of the tutorial blog posts thatassembly AI has made and links to that is in the bottomof this particular Colab notebook. And so the filename will be the full path to the MP four file. And then we justinput this into this Reach file function. And then here wespecify details on the API key. And then weput in the API address for the Upload,and then it will essentially read in the fileusing this function, and it will assign it to data.And then you want to run it. And so after running this,essentially, you have already uploaded your MP Four fileto assembly AI server. And now it's the fun part.Step number four. Now that the audio file is on the serverof assembly AI, we're now going to perform the transcription.So here we are going to use the transcript URL for the API.And then we will specify the audiourl, which was givenin the prior stage right here. So we take this audiourlvariable, and then we will use it as input here becausethe output will be in JSON format, and the Upload URL isprovided here in the JSON. And now our input will be the audio URL.As always, we have to specify the API key. And now we'regoing to perform the transcription by using the endpoint, which isthe transcribe endpoint, and the JSON variable here willspecify the audio URL. Right. And the others herewill specify the API key. And now we will run it.And then we will assign the output into the transcript input response.And then we're going to print it out the JSON data.And then we will get this. We will see that the status isbeing calculated. It is in queue, and we're nowgoing to extract the transcript ID. So this is specifiedhere in ID. Let's have a look. It's right here. The ID is righthere. This string is the ID and the same thing here weexplicitly extracted the transcript ID,and this is step number five. Now that we have the transcript ID,we're going to put the transcript ID into the URLof the API, and we're going to use the F string in orderto put in the variable value of transcriptID. And now it will be appended into the URL of the API,and we assign that to the endpoint variable. And in the headerswe assign the API key. Now we're going to run it.And then the output will be assigned to the transcriptoutput response variable. Let's run it. Get the transcriptID. Not yet. This is from a prior run.Let's run step four. Okay. And so this is the transcript ID.This was from a prior run. Let's do it again. Okay. There yougo. The transcript ID is here. We're now going to perform the retrievalof the transcription results, and let's have a look at the status.All right. So the status is completed. And now we're going to print outthe results and the results is in the text righthere. Let me show you the full JSON will look like this.Okay. This is the results in Jason's format, and theoutput that we need is in the text, not these, but it willbe in the text somewhere. Right here this one. So we'regoing to explicitly select this, and therefore we'regoing to print it out. And there you go. This is the transcribe text.So let's listen to this short video and let's compare it with the transcription.Why don't I open it side by side so that you could read itand see how it is. So what's the best way to learn data science?Well, the best way to learn data science is to do data science.So if you're into this type of content, you want to consider subscribing formore tutorials and concept video. Okay. Not bad. And thepunctuation is also in there as well. Right. So it detects thatit is a question. And so imagine you have hourlong video meetings, and you wouldlike to transcribe it into written form, and that wouldvery much come in handy if you want to repurposeyour content. So if you're a content creatorand you have podcasts, then you could definitely make useof this to convert your podcast into blog posts as well.Or you could also convert your meetings from your Zoom callsinto written form. Instead of having to listen through allof the recordings, you could essentially read itand you could skim through it. Imagine having severalvideos to go through, and you could skim through tothe highlight of each of the videos by briefly skimming through thewritten transcribe text. All right. So now we're going totake this results, and then we're going to writeit out using the write function in Python intoa text file. Let's do that. And it's going to be here.So we have written it out as a text file. And let'ssay that for those of you who would like to havesubtitles for your YouTube videos, if you're a contentcreator, then you could export it out as the SRT fileformat. And then this SRT file format will be your subtitle,which you could easily upload to your YouTube channel. And sono more expensive transcription service. You could automaticallyperform this using the application that you'redeveloping in this video. Here the SRTfiles right here. Okay. So this will come in handy for those of youwho would like to add subtitles to your YouTubevideos. All right. And so for additional references, as wellas the source of some of the code that Ihave already mentioned that I borrowed from one of the tutorialblog posts is provided here. So let's move onto creating a transcriber application in thecommand line. So I'm going to close the Colab, and I'm going to copythe API key again just in case Streamlit,and it's in the transcriber folder. Okay.So here transcriber PY. Let me open upthe Vs code. All right.This is the transcriber PY. And what I've essentially doneis I copied or exported allof the code from the Google collab that we have already sawin just a moment ago. And so what I did was I just paste ithere, and then I tidied it up a bit.And as you can see, there are an additional block of code,which is the Arg Bars library, which allowsyou to parse arguments in the command line.Because here we're going to develop a commandline application,which you could run by typing in Python transcriberPY, followed by the URL of the YouTube video.Essentially, you could transcribe any YouTube videos justby doing that just by typing in Python transcriber PY,followed by the URL of your YouTube videos.Okay. So this block of code will essentially do that.So we need to add the tag here the I, which willessentially allow us to do this Python transcriberI, which is for input, and then the URLof your YouTube video, so we could do this copy andkind of like that. So let me first activate the condoenvironment, and DPis my very own condo environment. So you want to activate your ownas well, going to the transcriber folder.Okay. And we are here on the subscriber folder. Okay.So let me go through the code. So in this block of code,we're going to leave in the API key from the APItext file. So before you commit your code, you want to make surethat you want to get ignore the API text file.Otherwise, your API key will also go into your GitHub repo.So be careful for that. So here we're just reading it in andassigning it to the API key variable. And foreach of the steps, we're going to print out sothat this will act as kind of like a checkpoint to let us know thatstep one is working perfectly. And step two is workingperfectly so that we will know the relative progress of thecalculation. All right. So in step two, as already shown in the GoogleColab, we're going to retrieve the audio file from theYouTube video, and we're doing that using Pytube. Right.And once we have the YouTube audio file, we're going toprint out the full path and assign itto the MP four file variable. And then subsequently,we're going to use the MP four file variable as aninput here file name input in order to allow us to uploadthe audio file to the assembly AI server.And then we're going to print it out in step number three.Right. And in step four, we're going to extract thetranscript ID. And in step six, we're going toretrieve the transcription results. However, it should be notedthat the transcription will take time. And so if you have avery lengthy video. It will take some significant amount of time.Like for this 15 seconds video, it took about 5 secondsto complete the calculation or process theaudio file. So if you have, like, an hour long video,you can expect a couple of minutes for the processing of thevideo. And so what I've done here is that for every 5 seconds,it will check whether the transcriptionhas completed. And if it hasn't completed,it will reiterate through this loop again untilit has completed. And once the transcription has completed,it will assign the output to this particular variable inthe transcript output response. And then finally,in step seven, we're going to print out the outputof the transcribe text. And in step eight, we're going to saveit out as a text file and also as an SRTfile, both at the same time. And so let's run it. But before doingthat, let's see files in here. So let's see. I thinkI'm missing the API text file. So let mecreate that. All right. We're going to now save the APIkey into the file. So as you can see here, it's goingto be in the API text file. So let me put on command non a Mac or CTRL n on Windows and we justpaste it in APItext.Close it. And now we're going to run it. We're going to type inPython. We're going to run this file transcriber.Hi. I. And then the URL.Let me find it. So this URL is the shorts video and makenote that you need to have the quotation Mark as well. Sit on,enter something went by Python threetranscriber. Okay. I might not have installedit on this computer yet, so I'm going to install Pytube.So I need to do hit three install pytube.All right. And now I'm going to run it clear it first clear the terminal,and now it's Python three, because just a moment ago I installed Pytubeusing Pip three install, and now I'm going to run it using Pythonthree, and then transcriber Pi. I theURL. All right. Seems to work. Now you can see it's.Iterating through all of the steps, and we can see exactly where weare. And now it is retrieving the transcription results processingit. All right. And it has completed. And it hasthe output printed out here. And let's have a look filesso you can see that the files are also created andthey are saved out as YT SRT and alsoYT text. All right. Congratulations. You have now developed yourown Transcriber command line application.Now let's head over to creating the Streamlit webapplication version of the Transcriber app.So the contents is in the app PY whenadd up,let me delete these files sothat we start fresh app Pi before we begin.Let me show you that you need to create a folder called Streamliftbecause we're going to save our API key inside thesecretsTOML. And so the Secrets TOML is essentiallyjust a text file containing the APIthat is assigned to the API key. And so the content hereis that we're going to have the API key variable followedby the API, and this will be read in by the STREAMLETapplication. So it should beinside a streamlit folder. So you could dothat by typing into the commandline MkdirStreamlit and then hit on enter. Afterwards. You want tochange directory into the STREAMLET folder STREAMLET,and then after that, you want to create a file called Secrets.Coml inside that, and then you want to paste in yourAPI into the API key variable. All right.I think we're good to go now. So let's take a look here. So theentire application is 150 lines of code.And let me show you how it looks like streamlit run app.pyenter. There you go.This is the Transcriber app, and so we have the left handpanel in the Sidebar, which will accept the YouTube videoURL, and on the right hand side, which is the main panel,it will display the transcriber app title of theapp, and it's going to display a message saying thatit is awaiting URL to be inputted into the Sidebar,and after it has processed, all of the output will be displayed here alongwith the button to download the transcription. So let's do a lineby line explanation. So the first sevenlines of code here will import the necessary libraries that we'regoing to use Streamlit, of course, which is the Web application FrameworkPi tube, so that we could download the videos from YouTubeOS says Time and request so that we could performprocessing locally and Zip file, which isfor stepping up the results and making that available asa download via the download button of streamlin app. So inline number nine, we're going to use the St markdownto print out the title of the Web app, which is Transcriber app.And you can see here that we could make use of the Emojiline number ten. We'll be using the St progress, and we'regoing to assign it a value of zero because we are at the beginning ofthe workflow, and we're assigning that to the bar variable.So lines twelve until lines number112. We're going to have custom functions, which willessentially be taken from the Google Colab and alsothe command line application that we had developed earlier.So I'm going to skip that and I'm going to provide only ahigh level look. So here we can see that I've dividedinto two essential functions. The first function will bethe getyt, which will get the YouTube videoaudio file, and then it will download it, and then it will assign a progressof 10%, so that's an arbitrary number. You could change this toany other number that you like. And then we're going to take theaudio file. Then we're going to upload it onto AssemblyAIserver inside this transcribeyt function.And after doing so, we're going to perform the transcription.And then you will notice that we will assign various progressvalue here. So we use 20 here 30 here,and the transcription is about 40 here andextracting transcript ID. We give it a progress of 50. So you'regoing to see that the progress bar here will increase as the calculations beingprocessed, and then it will form the iteration until thecalculation is completed, and then it will assign a value of100. And finally we're going to print out the resultsinto here in the main panel. And then we're going to take the textfile of the transcription along with the SRT file.And then we're going to sip it up, and then we're going to make itavailable here as a download button. So that will be done in just a moment.So these two function will essentially take in the input argumentand it will do the processing. Contacting APIof assembly AI to perform the transcription.And after the transcription is completed, we will display the resultshere, and then we're going to save it out as a text file and SRTfor the subtitle. And then we're going to sip it up. And then we're goingto make the zip file available for EC download.Let's have a look at the actual app. You're going to see that most ofthe processing has taken about 100 lines of code,and now the interface itself is less than 50lines. Actually, if I just deleted these empty spaces,we're going to be under 150 lines of code at 148.Let's have a look at the actual UI, the user interfacehere. So line number 11611 seven isgoing to be reading the API key from the sequence,and we're going to use the St sequence function.And then we assign the API key here and assign it to the variableAPI key. And let me paste in the URL.Why don't I do that? Do I have the URL here in memory? Yes,I will click on submit and you see here the progress bar is moving.There you go. And we have to transcribe text here and you can click ondownload Sip and you will download the Sip file. Look right here inthe downloads folder. Transcription works perfectly.Okay, look at the triplet app again. So linenumber 120, it will say awaiting URL input in theSidebar, which is this yellow text box you see here.We're using the St warning. The Sidebar here willbe here input parameter, which is right here.Input parameter. We're using the header function and we're assigningit to the Sidebar. Otherwise, if we don't put Sidebarhere, the input parameter will go here. And now we'regoing to make use of the form, and we assign thekey equals to my form. And inside withSt sitebar form function, we're going to specify theURL variable, and then we're going to create a text input box,which will ask for the URL of the YouTube video,which is right here. And this is enter your URL, which is right here.And now we're going to add the submit button. So that is ECSSt form submit button label is submitbecause we want it to say submit. If you want to change it to somethingelse, you can say go save it. And then the button will say Go,okay, and let's see further lines 131until 148 is going to performthe actual processing. Once we hit the submit button,if the submit button is clicked, it's going to initiate the runningof the getyt function and the transcribe YT function.So both of these are performing the actual transcriptionso I could just delete these two or four lines ofcode and we will reduce the code further. So now it'sonly at 142 lines of code. And so once it has alreadyperformed the transcription using these two functions, the custom functionthat we had described above from until line 112.From the first line, it will perform the actual sippingup of the file. This block of code here willcreate the Zip file button. Let me show you the button todownload the Zip file. So we're using the St download button.And so this is actually a new function that has just been released.And prior to this we had to make use of a hack to createthe button. But now it is very convenient to use the Stdownload button right here, and if you click on it, you'll download the file.So Congratulations. You have now successfully been ableto run the Transcriber app from within Google Colabfrom within the Commandline application and also fromyour very own Streamlit application. And if you're watching thisfar into the video, please drop the Star emojiin the comments section. And please let me know how you'reintending to use this transcriber app for your study, for yourwork or for your content creation. Thank youfor watching until the end of this video. If you want to supportthe channel, please smash the like button subscribe if you haven'talready and make sure to hit on the notification Bell so that you willbe notified of the next video. And as always,the best way to learn data science is to do data science andplease enjoy the journey.\n"