Getting Started with Live Transcriptions in Browser Using Deepgram Speech Recognition API
Hello there, my name is Kevin Lewis and I'm a Developer Advocate here at Deepgram. Today, I'll show you how to get started with live transcriptions directly in your browser using our speech recognition API. This project has four steps.
The first step is to request access and get data from the user's microphone. To do this, we're going to use this built-in API in most browsers. We're going to ask for access to a user's media device specifically an audio device so a microphone. And this will return a promise which in turn will resolve to what is known as a media stream. Let's just console log that and see what a media stream looks like.
So, here's the page open in a browser. I'm going to refresh and the first thing we see is that the browser handles requesting access to the microphone for us and once we allow that, we see a media stream logged here now this is great. But in order to get raw data from the microphone we need to plug this into what is known as a media recorder.
We'll create a media recorder here new media recorder and in there we're going to plug in our stream and we're going to specify the output format that we desire so that's step one. Next, we're going to create a persistent two-way connection with Deepgram. We'll create a new web socket here and we'll connect directly to Deepgram's live transcription endpoint. We're also going to want to provide our authentication details there's a few ways of doing it but we are going to provide our API key directly here.
Now, we're going to as soon as that connection is opened start preparing and sending data from our mic and to do that we're going to hook in to the socket.on open event like so. And in order to do this we're going to add an event listener to the media recorder. The event we're listening for is called data available all lowercase or one word that will return the data from our mic and we're gonna go ahead and send that data.
So, how do we make data available? We actually have to start the media recorder that's just one final line here media recorder dot start and then here we specify a time slice so this is the increment of time in which data will be packaged up and made available via the data available event. This is in milliseconds so that's thousandth of a second so I'll do this every quarter of a second.
So, that's everything we need to send data to Deepgram. The other side of that is to listen for messages that are being sent from Deepgram to us in the other direction. To do that we're going to listen to the on message event there's loads of useful data that comes back in the returned payload so here we pass it and instead of logging it all we're just extracting the transcript and now we're going to go ahead and console log the transcript.
At this point you may show it to users or do something else with it but that is actually all we need in order to do live transcription in the browser. So, let me refresh give access to our microphone and we should see any minute now that transcripts are appearing right there in our console how cool is that um and you'll see there are multiple phrases coming for everything I'm saying there is an additional property in the returned payload that indicates when a given phrase is in its final form.
So, hopefully you found that interesting. That's how you do browser live transcription before we part ways. I just wanted to mention a blog post that we published not long before this video which talks about best practices with handling your API key so check the description out for that and if you are going to use this in the real make sure that you are doing something to protect your API key from being accessible to users and having two wide reaching permissions.
"WEBVTTKind: captionsLanguage: enhello there my name is kevin lewis and i'm a developer advocate here at deepgram and today i'm going to show you how to get started with live transcriptions directly in your browser using deepgram speech recognition api this project has four steps first of all we're going to request access and get data from the user's microphone second we are going to create a persistent two-way connection with deepgram that allows us to send and receive data in real time third we're going to get that data from our mic and send it to deepgram as soon as it's available and then finally we're going to be listening out for live transcriptions being returned from deepgram and show those to you in the browser console so let's get started the first thing we're going to do is ask for access to the user's microphone to do that we're going to use this built-in api in most browsers we're going to ask for access to a user's media device specifically an audio device so a microphone and this will return a promise which in turn will resolve to what is known as a media stream so let's just console log that and see what a media stream looks like so here's the page open in a browser i'm going to refresh and the first thing we see is that the browser handles requesting access to the microphone for us and once we allow that we see a media stream logged here now this is great but in order to get raw data from the microphone we need to plug this in to what is known as a media recorder so we'll create a media recorder here new media recorder and in there we're going to plug in our stream and we're going to specify the output format that we desire so that's step one next we're going to create a persistent two-way connection with deepgram we'll create a new web socket here and we'll connect directly to deepgram's live transcription endpoint in here we're also going to want to provide our authentication details there's a few ways of doing it but we are going to provide our api key directly here now we're going to as soon as that connection is opened start preparing and sending data from our mic and to do that we're going to hook in to the socket.on open event like so and in order to do this we're going to add an event listener to the media recorder so we're gonna go media recorder dot add event listener and the event we're listening for is called data available all lowercase or one word that will return the data from our mic and we're gonna go ahead and send that data so this is great how do we make data available we actually have to start the media recorder that's just one final line here media recorder dot start and then here we specify a time slice so this is the increment of time in which data will be packaged up and made available via the data available event this is in milliseconds so that's thousandth of a second so i'll do this every quarter of a second so that's everything we need to send data to deepgram the other side of that is to listen for messages that are being sent from deepgram to us in the other direction to do that we're going to listen to the on message event there's loads of useful data that comes back in the returned payload so here we pass it and instead of logging it all we're just extracting the transcript and now we're going to go ahead and console log the transcript at this point you may show it to users or do something else with it but that is actually all we need in order to do live transcription in the browser so let me refresh give access to our microphone and we should see any minute now that transcripts are appearing right there in our console how cool is that um and you'll see there are multiple phrases coming for everything i'm saying there is an additional property in the returned payload that indicates when a given phrase is in its final form so hopefully you found that interesting that's how you do browser live transcription before we part ways i just wanted to mention a blog post that we published not long before this video which talks about best practices with handling your api key so check the description out for that and if you are going to use this in the real make sure that you are doing something to protect your api key from being accessible to users and having two wide reaching permissions if you have any questions at all reach out we love to help people we love to see what you're going to build with our speech recognition api have a wonderful day bye for nowhello there my name is kevin lewis and i'm a developer advocate here at deepgram and today i'm going to show you how to get started with live transcriptions directly in your browser using deepgram speech recognition api this project has four steps first of all we're going to request access and get data from the user's microphone second we are going to create a persistent two-way connection with deepgram that allows us to send and receive data in real time third we're going to get that data from our mic and send it to deepgram as soon as it's available and then finally we're going to be listening out for live transcriptions being returned from deepgram and show those to you in the browser console so let's get started the first thing we're going to do is ask for access to the user's microphone to do that we're going to use this built-in api in most browsers we're going to ask for access to a user's media device specifically an audio device so a microphone and this will return a promise which in turn will resolve to what is known as a media stream so let's just console log that and see what a media stream looks like so here's the page open in a browser i'm going to refresh and the first thing we see is that the browser handles requesting access to the microphone for us and once we allow that we see a media stream logged here now this is great but in order to get raw data from the microphone we need to plug this in to what is known as a media recorder so we'll create a media recorder here new media recorder and in there we're going to plug in our stream and we're going to specify the output format that we desire so that's step one next we're going to create a persistent two-way connection with deepgram we'll create a new web socket here and we'll connect directly to deepgram's live transcription endpoint in here we're also going to want to provide our authentication details there's a few ways of doing it but we are going to provide our api key directly here now we're going to as soon as that connection is opened start preparing and sending data from our mic and to do that we're going to hook in to the socket.on open event like so and in order to do this we're going to add an event listener to the media recorder so we're gonna go media recorder dot add event listener and the event we're listening for is called data available all lowercase or one word that will return the data from our mic and we're gonna go ahead and send that data so this is great how do we make data available we actually have to start the media recorder that's just one final line here media recorder dot start and then here we specify a time slice so this is the increment of time in which data will be packaged up and made available via the data available event this is in milliseconds so that's thousandth of a second so i'll do this every quarter of a second so that's everything we need to send data to deepgram the other side of that is to listen for messages that are being sent from deepgram to us in the other direction to do that we're going to listen to the on message event there's loads of useful data that comes back in the returned payload so here we pass it and instead of logging it all we're just extracting the transcript and now we're going to go ahead and console log the transcript at this point you may show it to users or do something else with it but that is actually all we need in order to do live transcription in the browser so let me refresh give access to our microphone and we should see any minute now that transcripts are appearing right there in our console how cool is that um and you'll see there are multiple phrases coming for everything i'm saying there is an additional property in the returned payload that indicates when a given phrase is in its final form so hopefully you found that interesting that's how you do browser live transcription before we part ways i just wanted to mention a blog post that we published not long before this video which talks about best practices with handling your api key so check the description out for that and if you are going to use this in the real make sure that you are doing something to protect your api key from being accessible to users and having two wide reaching permissions if you have any questions at all reach out we love to help people we love to see what you're going to build with our speech recognition api have a wonderful day bye for now\n"