Rohit Prasad - Alexa Prize _ AI Podcast Clips

The Importance of Understanding Context in Conversations with AI

Conversations with artificial intelligence systems like Alexa require more than just facts and knowledge about specific topics. To have intelligent and coherent conversations, it's essential to understand the context of the dialogue and provide responses that are relevant and thoughtful. This is where research comes in – to develop AI systems that can not only process vast amounts of information but also engage with users in a meaningful way.

One key aspect of this research is the need for context-aware responses. When discussing recent sports events, for instance, it's crucial to understand the entities being mentioned and their relationships to each other. Simply recalling facts about a particular team or player is not enough; one must be able to weave these facts together into a coherent narrative that takes into account the broader conversation. This requires a deeper understanding of the context in which the information is being shared.

For example, if someone says "I learned something fun about [team/player] recently," it's not enough simply to recall a fact about them without considering how they fit into the larger conversation. A more intelligent response might involve connecting the facts together with relevant examples or anecdotes that demonstrate their significance in the context of the discussion.

This emphasis on contextual understanding is reflected in the way AI systems like Alexa are designed to interact with users. In order to engage with customers and help improve the accuracy of their responses, developers must focus on creating conversational interfaces that take into account the nuances of human communication. This includes not only processing vast amounts of information but also being able to recognize and respond to emotional cues, idioms, and other subtleties of language.

The Alexa Prize: A Challenge for AI Researchers

One such challenge is the Alexa Prize, a competition designed to encourage researchers to develop more sophisticated conversational interfaces for AI systems like Alexa. The prize challenges developers to create an engaging and coherent conversation with users, taking into account factors such as tone, language, and context.

To participate in the Alexa Prize, developers must create a social bot that can engage with users in a natural and intuitive way. This requires not only knowledge of specific topics but also the ability to recognize and respond to user input in a way that feels organic and human-like.

The Data Set: A Key to Developing Intelligent Conversations

One of the most significant challenges in developing intelligent conversational interfaces is creating an adequate data set for training AI systems. While researchers are accustomed to working with large, annotated datasets, the Alexa Prize represents a new frontier in this area. Because the goal is not simply to process information but to engage with users in a meaningful way, developers must focus on collecting data that captures the subtleties of human communication.

In particular, the Alexa Prize seeks to answer questions about what constitutes an "engaging" or "fulfilling" conversation. How do users respond to different types of interactions? What are the key characteristics of successful conversational interfaces?

The User Experience: A Key Factor in Developing Intelligent Conversations

To develop intelligent conversational interfaces that truly engage with users, researchers must also consider the user experience. This includes not only designing systems that can process and respond to user input but also creating interfaces that are intuitive, natural, and enjoyable to use.

One key element of this is providing clear and transparent feedback mechanisms for users. In the Alexa Prize, for example, customers are asked to rate their experience with a social bot on a scale of one to five, with the option to provide more open-ended feedback as well. This allows developers to gain valuable insights into user behavior and preferences, helping to refine their designs and improve their conversational interfaces.

Mental Model Shift: Moving Beyond Distant Evaluation

Another key aspect of developing intelligent conversational interfaces is shifting our mental model of how AI systems should be evaluated. Traditionally, researchers have focused on creating large, annotated datasets that can serve as the basis for machine learning algorithms. However, the Alexa Prize represents a new frontier in this area – one that seeks to capture the nuances of human communication and engagement.

Because the goal is not simply to process information but to engage with users in a meaningful way, developers must focus on collecting data that captures the subtleties of human interaction. This includes not only metrics such as accuracy or fluency but also factors like user engagement, satisfaction, and emotional resonance.

Quitting the Conversation: A Signal for Improvement

Finally, an important aspect of developing intelligent conversational interfaces is recognizing when a conversation has reached its natural endpoint. In the Alexa Prize, users have the option to quit their conversations with social bots at any time – and this act can serve as a signal for improvement.

By recognizing and responding to user requests to disengage, developers can refine their designs and improve their conversational interfaces over time. This allows AI systems like Alexa to adapt and evolve in response to changing user needs and preferences, ultimately creating more intelligent and effective conversational experiences.

"WEBVTTKind: captionsLanguage: encan you briefly speak to the Alexa prize for people who are not familiar with it and also just maybe were things stand and what have you learned and what's surprising what have you seen the surprising from this incredible competition absolutely it's a very exciting competition like surprise is essentially Grand Challenge in conversational artificial intelligence where we threw the gauntlet to the universities who do active research in the field to say can you build what we call a social board that can converse with you coherently and engagingly for 20 minutes that is an extremely hard challenge talking to someone in a who you're meeting for the first time or even if you're you've met them quite often to speak at 20 minutes on any topic an evolving nature of topics is super hard we have completed two successful years of the competition first was one with University of Washington second industry of California we are in our third instance we have an extremely strong team of 10 cohorts and the third instance of the of the lexer prizes underway now and we are seeing a constant evolution first year was definitely learning it was a lot of things to be put together we had to build a lot of infrastructure to enable these you know STIs to be able to build magical experiences and undo high-quality research just a few quick questions sorry for the interruption what is failure look like in the 20-minute session so what does it mean to fail not to reach the 20 minimal awesome question so there are one first of all I forgot to mention one more detail it's not just 20 minutes but the quality of the conversation to that matters and the beauty of this competition before I answer that question on what failure means is first that you actually converse with millions and millions of customers as these social BOTS so during the judging phases there are multiple phases before we get to the finals which is a very controlled judging in a situation where we have we bring in judges and we have interactors who interact with these social BOTS that is much more control setting but till the point we get to the finals all the judging is essentially by the customers of Alexa and there you basically rate on a simple question how good your experience was so that's where we are not testing for a 20 minute boundary being claw across because you do want it to be very much like a clear-cut winner be chosen and and it's an absolute bar so did you really break that 20 minute barrier is why we have to test it in a more control setting with actors essentially in tractors and see how the conversation goes so this is why it's a subtle difference between how it's being tested in the field with real customers versus in the lab to award the prize so on the latter one what it means is that essentially the that there are three judges and two of them have to say this conversation is stalled essentially got it and the judges the human experts judges or human experts okay great so this is in the third year so what's been the evolution how far it's in the DARPA challenge in the first year the autonomous vehicles nobody finished in the second year a few more finished in the desert so how far along within this I would say much harder challenge are we this challenge has come a long way do they extend that we've definitely not close to the 20-minute barrier being with coherence and engaging conversation I think we are still five to ten years away in that horizon to complete that but the progress is immense like what you're finding is the accuracy in what kind of responses these social BOTS generate is getting better and better what's even amazing to see that now there's humor coming in the bots are quite you know you're talking about ultimate science of intial and signs of intelligence I think humor is a very high bar in terms of what it takes to create humor and I don't mean just being goofy I really mean good sense of humor is also a sign of intelligence in my mind and something very hard to do so these social BOTS are now exploring not only what we think of natural language abilities but also personality attributes and aspects of when to inject an appropriate joke went to when you don't know the question the domain how you come back with something more intelligible so that you can continue the conversation if if you and I are talking about AI and we are domain experts we can speak to it but if you suddenly switch the topic to that I don't know of how do I change the conversation so you're starting to notice these elements as well and that's coming from partly by by the nature of the 20 minute challenge that people are getting quite clever on how to really converse and essentially mass of the understanding defects if they exist so some of this this is not Alexa the product this is somewhat for fun for research for innovation and so on I have a question sort of in this modern era there's a lot of you look at Twitter and Facebook and so on there's there's discourse public discourse going on and some things are a little bit too edgy people get blocked and so on I'm just out of curiosity are people in this context pushing the limits is anyone using the f-word is anyone sort of pushing back sort of you know arguing I guess I should say in as part of the dialogue to really draw people in first of all let me just back up a bit in terms of why we are doing this right so you said it's fun I think fun is more part of the engaging part for customers it is one of the most used skills as well in our skill store but up that apart the real goal was essentially what was happening is with lot of AI research moving to industry we felt that academia has the risk of not being able to have the same resources at disposal that we have which is law so beta massive computing power and clear ways to test these AI advances with real customer benefits so we brought all these three together in the like surprise that's why it's one of my favorite projects and Amazon and with that the secondary fact is yes it has become engaging for our customers as well we're not there in terms of where we want to it to be right but it's a huge progress but coming back to your question on how do the conversations evolve yes there is some natural attributes of what you said in terms of argument and some amount of swearing the way we take care of that is that there is a sensitive filter we have built that see words and so it's more than keywords a little more in terms of of course there's key word base too but there's more in terms of context these words can be very contextual as you can see and also the topic can be something that you don't want a conversation to happen because this is a criminal device as well a lot of people use these devices so we have put a lot of guardrails for the conversation to be more useful for advancing AI and not so much of these these other issues you attributed what's happening in there I feel as well right so this is actually a serious opportunity I didn't use the right word fun I think it's an open opportunity to do some some of the best innovation in conversational agents in the world why just universities why just you know streets because as I said I really felt young minds young minds it's also too if you think about the other aspect of where the whole industry is moving with AI there's a dearth of talent in in given the demands so you do want universities to have a clear place where they can invent and research and not fall behind with that they can motivate students imagine all grad students left to to industry like us or or faculty members which has happened to so this is in a way that if you're so passionate about the field where you feel industry and academia need to work well this is a great example and a great way for universities to participate so what do you think it takes to build a system that wins the lots of prize I think you have to start focusing on aspects of reasoning that it is there are still more lookups of what intense customers asking for and responding to those are rather than really reasoning about the elements of the of the conversation for instance if you have if you're playing if the conversation is about games and it's about a recent sports event there's so much context in war and you have to understand the entities that are being mentioned so that the conversation is coherent rather than you suddenly just switch to knowing some fact about a sports entity and you're just relying that rather than understanding the true context of the game like you if you just said I learned this fun fact about really rather than really say how he played the game the previous night then the conversation is not really that intelligent so you have to go to more reasoning elements of understanding the context of the dialogue and giving more appropriate responses which tells you that we are still quite far because a lot of times it's more facts being looked after and something that's close enough as an answer but not really the answer so that is where the research needs to go more an actual true understanding and reasoning and that's why I feel it's a great way to do it because you have an engaged set of users working to make help these AI advances happen in this case right you mentioned customers they're there quite a bit and there's a skill what is the experience for the for the user that is helping so just to clarify this isn't as far as I understand the Alexa so this skill is to stand alone for the alakh surprise that means focus on the Alexa prize it's not you ordering certain things that I was on the Cawood trait checking the weather or you're playing Spotify right separate skills exactly so you're focused on helping that I don't know how do people how do customers think of it are they having fun are they helping teach the system what's the experience like I think it's both actually and let me tell you how the how you invoke this skill so you all you have to say Alexa let's chat and then the first time you say Alexa let's chat it comes back with a clear message that you're interacting with one of those you know three social BOTS and there's a clear so you know exactly how we interact right and that is why it's very transparent you are being asked to help right and and we have lot of mechanisms where as the we are in the first phase of feedback phase then you send a lot of emails to our customers and then this they know that this the team needs a lot of interactions to improve these accuracy of the system so we know we have lot of customers who really want to help these you know ste baths and they're conversing with that and some are just having fun with just saying Alexa let's chat and also some adversarial behavior to see whether how much do you understand as a social bot so I think we have a good healthy mix of all three situations so what is the if we talk about solving the Alexa challenge they like surprise what's the data set of really engaging pleasant conversations look like is if we think of this as a supervised learning problem I don't know if it has to be but if it does maybe you can comment on that do you think there needs to be a data set of what it means to be an engaging successful fulfilling conversation that's part of the research question here this was I think it's we at least got the first spot right which is have a way for universities to build and test in a real-world setting now you're asking in terms of the next phase of questions which we are still we're also asking by the way what does success look like from a optimization function that's what you're asking in terms of we as researchers are used to having a great corpus of annotated data and then making Rob then you know sort of tune our algorithms on those right and fortunately and unfortunately in this world of alack surprise that is not the way we are going after it so you have to focus more on learning based on live feedback that is another element that's unique we're just not I started with giving you how you ingress and experience this capability as a customer what happens when you're done so they ask you a simple question on a scale of one to five how likely are you to interact with this social bada game that does a good feedback and customers can also leave more open-ended feedback and I think partly that to me is one part of the question you're asking which I am saying is a mental model shift that as researchers also you have to change your mindset that this is not a dart by evaluation or NSF funded study and you have a nice corpus this is where it's real world you have real data the scale is amazing is the beautiful thing then and then the customer the user can quit the conversation in any tax exactly user that is also a signal for how good you were at that point so and then on a scale one to five one two three did they say how likely are you or is it just a binary I wanted to fire one two five Wow okay that's such a beautifully constructed challenge okay youcan you briefly speak to the Alexa prize for people who are not familiar with it and also just maybe were things stand and what have you learned and what's surprising what have you seen the surprising from this incredible competition absolutely it's a very exciting competition like surprise is essentially Grand Challenge in conversational artificial intelligence where we threw the gauntlet to the universities who do active research in the field to say can you build what we call a social board that can converse with you coherently and engagingly for 20 minutes that is an extremely hard challenge talking to someone in a who you're meeting for the first time or even if you're you've met them quite often to speak at 20 minutes on any topic an evolving nature of topics is super hard we have completed two successful years of the competition first was one with University of Washington second industry of California we are in our third instance we have an extremely strong team of 10 cohorts and the third instance of the of the lexer prizes underway now and we are seeing a constant evolution first year was definitely learning it was a lot of things to be put together we had to build a lot of infrastructure to enable these you know STIs to be able to build magical experiences and undo high-quality research just a few quick questions sorry for the interruption what is failure look like in the 20-minute session so what does it mean to fail not to reach the 20 minimal awesome question so there are one first of all I forgot to mention one more detail it's not just 20 minutes but the quality of the conversation to that matters and the beauty of this competition before I answer that question on what failure means is first that you actually converse with millions and millions of customers as these social BOTS so during the judging phases there are multiple phases before we get to the finals which is a very controlled judging in a situation where we have we bring in judges and we have interactors who interact with these social BOTS that is much more control setting but till the point we get to the finals all the judging is essentially by the customers of Alexa and there you basically rate on a simple question how good your experience was so that's where we are not testing for a 20 minute boundary being claw across because you do want it to be very much like a clear-cut winner be chosen and and it's an absolute bar so did you really break that 20 minute barrier is why we have to test it in a more control setting with actors essentially in tractors and see how the conversation goes so this is why it's a subtle difference between how it's being tested in the field with real customers versus in the lab to award the prize so on the latter one what it means is that essentially the that there are three judges and two of them have to say this conversation is stalled essentially got it and the judges the human experts judges or human experts okay great so this is in the third year so what's been the evolution how far it's in the DARPA challenge in the first year the autonomous vehicles nobody finished in the second year a few more finished in the desert so how far along within this I would say much harder challenge are we this challenge has come a long way do they extend that we've definitely not close to the 20-minute barrier being with coherence and engaging conversation I think we are still five to ten years away in that horizon to complete that but the progress is immense like what you're finding is the accuracy in what kind of responses these social BOTS generate is getting better and better what's even amazing to see that now there's humor coming in the bots are quite you know you're talking about ultimate science of intial and signs of intelligence I think humor is a very high bar in terms of what it takes to create humor and I don't mean just being goofy I really mean good sense of humor is also a sign of intelligence in my mind and something very hard to do so these social BOTS are now exploring not only what we think of natural language abilities but also personality attributes and aspects of when to inject an appropriate joke went to when you don't know the question the domain how you come back with something more intelligible so that you can continue the conversation if if you and I are talking about AI and we are domain experts we can speak to it but if you suddenly switch the topic to that I don't know of how do I change the conversation so you're starting to notice these elements as well and that's coming from partly by by the nature of the 20 minute challenge that people are getting quite clever on how to really converse and essentially mass of the understanding defects if they exist so some of this this is not Alexa the product this is somewhat for fun for research for innovation and so on I have a question sort of in this modern era there's a lot of you look at Twitter and Facebook and so on there's there's discourse public discourse going on and some things are a little bit too edgy people get blocked and so on I'm just out of curiosity are people in this context pushing the limits is anyone using the f-word is anyone sort of pushing back sort of you know arguing I guess I should say in as part of the dialogue to really draw people in first of all let me just back up a bit in terms of why we are doing this right so you said it's fun I think fun is more part of the engaging part for customers it is one of the most used skills as well in our skill store but up that apart the real goal was essentially what was happening is with lot of AI research moving to industry we felt that academia has the risk of not being able to have the same resources at disposal that we have which is law so beta massive computing power and clear ways to test these AI advances with real customer benefits so we brought all these three together in the like surprise that's why it's one of my favorite projects and Amazon and with that the secondary fact is yes it has become engaging for our customers as well we're not there in terms of where we want to it to be right but it's a huge progress but coming back to your question on how do the conversations evolve yes there is some natural attributes of what you said in terms of argument and some amount of swearing the way we take care of that is that there is a sensitive filter we have built that see words and so it's more than keywords a little more in terms of of course there's key word base too but there's more in terms of context these words can be very contextual as you can see and also the topic can be something that you don't want a conversation to happen because this is a criminal device as well a lot of people use these devices so we have put a lot of guardrails for the conversation to be more useful for advancing AI and not so much of these these other issues you attributed what's happening in there I feel as well right so this is actually a serious opportunity I didn't use the right word fun I think it's an open opportunity to do some some of the best innovation in conversational agents in the world why just universities why just you know streets because as I said I really felt young minds young minds it's also too if you think about the other aspect of where the whole industry is moving with AI there's a dearth of talent in in given the demands so you do want universities to have a clear place where they can invent and research and not fall behind with that they can motivate students imagine all grad students left to to industry like us or or faculty members which has happened to so this is in a way that if you're so passionate about the field where you feel industry and academia need to work well this is a great example and a great way for universities to participate so what do you think it takes to build a system that wins the lots of prize I think you have to start focusing on aspects of reasoning that it is there are still more lookups of what intense customers asking for and responding to those are rather than really reasoning about the elements of the of the conversation for instance if you have if you're playing if the conversation is about games and it's about a recent sports event there's so much context in war and you have to understand the entities that are being mentioned so that the conversation is coherent rather than you suddenly just switch to knowing some fact about a sports entity and you're just relying that rather than understanding the true context of the game like you if you just said I learned this fun fact about really rather than really say how he played the game the previous night then the conversation is not really that intelligent so you have to go to more reasoning elements of understanding the context of the dialogue and giving more appropriate responses which tells you that we are still quite far because a lot of times it's more facts being looked after and something that's close enough as an answer but not really the answer so that is where the research needs to go more an actual true understanding and reasoning and that's why I feel it's a great way to do it because you have an engaged set of users working to make help these AI advances happen in this case right you mentioned customers they're there quite a bit and there's a skill what is the experience for the for the user that is helping so just to clarify this isn't as far as I understand the Alexa so this skill is to stand alone for the alakh surprise that means focus on the Alexa prize it's not you ordering certain things that I was on the Cawood trait checking the weather or you're playing Spotify right separate skills exactly so you're focused on helping that I don't know how do people how do customers think of it are they having fun are they helping teach the system what's the experience like I think it's both actually and let me tell you how the how you invoke this skill so you all you have to say Alexa let's chat and then the first time you say Alexa let's chat it comes back with a clear message that you're interacting with one of those you know three social BOTS and there's a clear so you know exactly how we interact right and that is why it's very transparent you are being asked to help right and and we have lot of mechanisms where as the we are in the first phase of feedback phase then you send a lot of emails to our customers and then this they know that this the team needs a lot of interactions to improve these accuracy of the system so we know we have lot of customers who really want to help these you know ste baths and they're conversing with that and some are just having fun with just saying Alexa let's chat and also some adversarial behavior to see whether how much do you understand as a social bot so I think we have a good healthy mix of all three situations so what is the if we talk about solving the Alexa challenge they like surprise what's the data set of really engaging pleasant conversations look like is if we think of this as a supervised learning problem I don't know if it has to be but if it does maybe you can comment on that do you think there needs to be a data set of what it means to be an engaging successful fulfilling conversation that's part of the research question here this was I think it's we at least got the first spot right which is have a way for universities to build and test in a real-world setting now you're asking in terms of the next phase of questions which we are still we're also asking by the way what does success look like from a optimization function that's what you're asking in terms of we as researchers are used to having a great corpus of annotated data and then making Rob then you know sort of tune our algorithms on those right and fortunately and unfortunately in this world of alack surprise that is not the way we are going after it so you have to focus more on learning based on live feedback that is another element that's unique we're just not I started with giving you how you ingress and experience this capability as a customer what happens when you're done so they ask you a simple question on a scale of one to five how likely are you to interact with this social bada game that does a good feedback and customers can also leave more open-ended feedback and I think partly that to me is one part of the question you're asking which I am saying is a mental model shift that as researchers also you have to change your mindset that this is not a dart by evaluation or NSF funded study and you have a nice corpus this is where it's real world you have real data the scale is amazing is the beautiful thing then and then the customer the user can quit the conversation in any tax exactly user that is also a signal for how good you were at that point so and then on a scale one to five one two three did they say how likely are you or is it just a binary I wanted to fire one two five Wow okay that's such a beautifully constructed challenge okay you\n"