Transcribing live audio streams in real time with Google Colab and Deepgram _ AI Tutorial _ ASR

**Introduction to Deep Gram API**

Deep Gram is a powerful API that enables users to transcribe audio and video streams in real-time, with optional features such as punctuation, numerals, and support for multiple languages. In this article, we will explore the Deep Gram API and its capabilities.

**Setting up the API Key**

Before using the Deep Gram API, you need to obtain an API key. This can be done by creating a Deep Gram account, which provides 12,000 minutes of transcription time for free. Once you have obtained your API key, you can plug it into the provided code and run the cell immediately.

**Understanding the URL Variable**

The URL variable should be set to the URL that you wish to stream from. In this example, we are streaming from BBC Radio. This means that the Deep Gram model will transcribe the audio from BBC Radio in real-time.

**Configuring Parameters**

The parameters variable can be modified to configure the Deep Gram model to suit your needs. The default parameters include punctuation and numerals set to true, meaning that the model will punctuate the transcript with periods, commas, and other punctuation marks, as well as use digits to represent numbers instead of words. Additionally, the model supports multiple languages, including English.

**Time Limit and Transcription Options**

The time limit variable is an integer that represents the number of seconds that you wish to transcribe for. The transcription only variable is a boolean that should be set to true if you want to see the transcribed word, or false if you want to see the full JSON responses, which include metadata, word-level timestamps, and confidence measurements.

**Latencies**

The latencies between the BBC Radio show and your speakers, and the latencies between the BBC Radio show and Deep Gram's AI are independent of each other. The radio-to-speaker latency is currently larger than the radio-to-AI latency, which results in subtitles that appear to be slightly delayed.

**Live Transcription Capabilities**

The Deep Gram API can transcribe live audio and video streams, including real-time conversations, translating, and generating live subtitles. Users have used the API to create interactive experiences, such as driving a small car with their voice or wearing a Disney princess dress that lights up different colors based on the song being sung.

**Extending the API**

The Deep Gram API offers additional features beyond live transcription, including summarizing long audio recordings, diarizing multiple speakers, filtering profanity, and more. Users can also write custom code using the software development kit (SDK) available for Node.js, Python, Go, and other programming languages.

**Conclusion**

In conclusion, the Deep Gram API is a powerful tool that enables users to transcribe audio and video streams in real-time, with optional features such as punctuation, numerals, and support for multiple languages. With its easy-to-use API and extensive documentation, the Deep Gram API is an ideal solution for a wide range of applications, from live subtitles to interactive experiences.

"WEBVTTKind: captionsLanguage: enhi my name is Jose Francisco and today I'll be showing you how to transcribe any live stream audio that you want as accurately and efficiently as possible in real time this tutorial is meant to be extremely quick hence the background music the fast-paced music you're currently Hearing in the background is Frederick Chopin's fantasy impromptu and according to various YouTube videos this piece is around five minutes long but with this quick stop tutorial we'll show you how to transcribe any live audio feed you want before the music ends all you have to do is use our notebook Link in the description ready let's go first things first open up the notebook now make a copy of the notebook like this this tutorial will assume that you're using Google collab but even if you're using jupyter notebooks or running this notebook in vs code the general instruction should be about the same alright now that you made a copy of the notebook let's run the first cell this cell simply installs dependencies using pip oh yeah we're working in Python here give it a few moments and you'll see some colorful text like this now for some people out there you may need to use pip 3 instead of pip depending on your setup but the output should remain the same and now there's only one more cell to run we just need to fill in a few variables first well in reality there's a only one variable that needs to be filled in the rest are optional the variable you must change is the Deep gram API key just create one using your deep game account and paste it in here for security reasons I can't show you mine but yours is just one button click away and if you don't have a deep gram account yet don't worry all you have to do is sign up with your email and you'll receive 12 000 minutes of transcription for free no need to put down a credit card or anything alright if you plugged in the API key you can run this cell immediately no need to toy around with any of the other variables but that being said it might be important to know what these other variables are so before I demo the live transcription let me show you what these other variables do this URL variable should be set to the URL that you wish to stream from by default we're streaming from BBC Radio alright up next check out this params variable this variable should be set to the parameters that you wish to configure your deep Grand model to the ones that are written here in the starter code shouldn't have to be modified for the sake of this demo but if you wish to modify them on your own go for it check out the Deep gram documentation for more information Link in the description and if you're curious here is what these starter code parameters say Punctuation is set to true meaning we're going to punctuate our transcript capitalize words periods commas and so on and so forth numerals is set to true as well meaning we're going to use digits to represent numbers instead of words moreover since we're listening to the penis outer language to English but that being said we do support multiple languages languages that you're seeing on screen right now furthermore we're using the most enhanced version of Deep gram that we have to offer and as of the model we're using we're going for a general all-purpose model however we also have models to support different types of audio streams such as meetings phone calls voicemails video streams and even conversational AI as usual reference to deepgram docs for more information Link in the description but again long story short the parameters we've pre-written for should be good for this demo the next two variables are simple time limit is an INT that represents the number of seconds that you wish to transcribe for and transcription only is a Boolean that should be set to True only if you want to see the transcribed word you can set it to false if you wish to see the full Json responses responses that include metadata Word level timestamps and confidence measurements for the sake of this demo let's say that we just want to create subtitles for this BBC Radio Show here's what that would look like no there's two latencies to keep track of the first is the latency between the BBC Radio Show and your speakers the second is the latency between the BBC Radio Show and deep grams AI luckily these latencies are independent of each other and as of today the radio to speaker latency is larger than the radio to AI latency the result subtitles that look like these a very short time and pulled me up so I didn't go very far from that moment on they are bound one to the other by what climbers call the Brotherhood of the Rope it was a critical moment remembered as such by both of their sons a bond that continues through the generation notice that some of the words are printed to the console before that stream nevertheless these subtitles are looking pretty good and Beyond the world of subtitles you can do much more with real-time live stream audio recording maybe you want to have a live conversation with child gbt maybe you want to translate yourself in real time or perhaps you want to wear live subtitles on your chest deep grum users have done that before want to drive a small car with your voice our users have done that too and what about a Disney princess dress that lights up different colors based on the song that you sing you guessed it our users have done that as well not to mention deep gun can also transcribe pre-recorded audios too we've also made a notebook for that our language models also offer you the ability to summarize long audios diarize audios of multiple speakers filter profanity and much much more so that's how you use deep Graham's live transcription feature as quickly as possible feel free to mess around with the notebook as much as you desire or if you want to write some code with deep game yourself check out our software development kit or SDK we have sdks for node python go and much more but that's deep Graham in a nutshell a quick easy to use API with documentation written by humans for humans alright what's my time still got ithi my name is Jose Francisco and today I'll be showing you how to transcribe any live stream audio that you want as accurately and efficiently as possible in real time this tutorial is meant to be extremely quick hence the background music the fast-paced music you're currently Hearing in the background is Frederick Chopin's fantasy impromptu and according to various YouTube videos this piece is around five minutes long but with this quick stop tutorial we'll show you how to transcribe any live audio feed you want before the music ends all you have to do is use our notebook Link in the description ready let's go first things first open up the notebook now make a copy of the notebook like this this tutorial will assume that you're using Google collab but even if you're using jupyter notebooks or running this notebook in vs code the general instruction should be about the same alright now that you made a copy of the notebook let's run the first cell this cell simply installs dependencies using pip oh yeah we're working in Python here give it a few moments and you'll see some colorful text like this now for some people out there you may need to use pip 3 instead of pip depending on your setup but the output should remain the same and now there's only one more cell to run we just need to fill in a few variables first well in reality there's a only one variable that needs to be filled in the rest are optional the variable you must change is the Deep gram API key just create one using your deep game account and paste it in here for security reasons I can't show you mine but yours is just one button click away and if you don't have a deep gram account yet don't worry all you have to do is sign up with your email and you'll receive 12 000 minutes of transcription for free no need to put down a credit card or anything alright if you plugged in the API key you can run this cell immediately no need to toy around with any of the other variables but that being said it might be important to know what these other variables are so before I demo the live transcription let me show you what these other variables do this URL variable should be set to the URL that you wish to stream from by default we're streaming from BBC Radio alright up next check out this params variable this variable should be set to the parameters that you wish to configure your deep Grand model to the ones that are written here in the starter code shouldn't have to be modified for the sake of this demo but if you wish to modify them on your own go for it check out the Deep gram documentation for more information Link in the description and if you're curious here is what these starter code parameters say Punctuation is set to true meaning we're going to punctuate our transcript capitalize words periods commas and so on and so forth numerals is set to true as well meaning we're going to use digits to represent numbers instead of words moreover since we're listening to the penis outer language to English but that being said we do support multiple languages languages that you're seeing on screen right now furthermore we're using the most enhanced version of Deep gram that we have to offer and as of the model we're using we're going for a general all-purpose model however we also have models to support different types of audio streams such as meetings phone calls voicemails video streams and even conversational AI as usual reference to deepgram docs for more information Link in the description but again long story short the parameters we've pre-written for should be good for this demo the next two variables are simple time limit is an INT that represents the number of seconds that you wish to transcribe for and transcription only is a Boolean that should be set to True only if you want to see the transcribed word you can set it to false if you wish to see the full Json responses responses that include metadata Word level timestamps and confidence measurements for the sake of this demo let's say that we just want to create subtitles for this BBC Radio Show here's what that would look like no there's two latencies to keep track of the first is the latency between the BBC Radio Show and your speakers the second is the latency between the BBC Radio Show and deep grams AI luckily these latencies are independent of each other and as of today the radio to speaker latency is larger than the radio to AI latency the result subtitles that look like these a very short time and pulled me up so I didn't go very far from that moment on they are bound one to the other by what climbers call the Brotherhood of the Rope it was a critical moment remembered as such by both of their sons a bond that continues through the generation notice that some of the words are printed to the console before that stream nevertheless these subtitles are looking pretty good and Beyond the world of subtitles you can do much more with real-time live stream audio recording maybe you want to have a live conversation with child gbt maybe you want to translate yourself in real time or perhaps you want to wear live subtitles on your chest deep grum users have done that before want to drive a small car with your voice our users have done that too and what about a Disney princess dress that lights up different colors based on the song that you sing you guessed it our users have done that as well not to mention deep gun can also transcribe pre-recorded audios too we've also made a notebook for that our language models also offer you the ability to summarize long audios diarize audios of multiple speakers filter profanity and much much more so that's how you use deep Graham's live transcription feature as quickly as possible feel free to mess around with the notebook as much as you desire or if you want to write some code with deep game yourself check out our software development kit or SDK we have sdks for node python go and much more but that's deep Graham in a nutshell a quick easy to use API with documentation written by humans for humans alright what's my time still got it\n"