**The Future of Music and AI: A New Form Factor**
We're here to discuss the future of music and AI, specifically with regards to our new form factor for interacting with music. As you can see, we're using gestures to control music, but that's not all - we can also use a laser display to navigate through an album, pause or play tracks, and even get more information about the music itself. This is just the beginning of what we have in store for music lovers.
We're taking into consideration the fact that many AI models today, like chatGPT, sometimes "hallucinate" answers, providing information that isn't 100% accurate. Our approach is different. We use a multi-LLM architecture that searches for the right piece of information to provide an answer. If it's an application experience we're supporting, we get you that one immediately. If it's information you need, we try our best to give you the most accurate answer possible.
We don't go directly from the LLM; instead, we fetch information from a broader range of sources, including Wikipedia and specialized databases like Wolfram Alpha for mathematics. This approach helps us avoid the issue of hallucination that can occur with direct LLM interactions. Our processing happens on both local devices and in the cloud, depending on what's required.
**Device Processing**
We do a lot of processing locally on the device itself. For speech recognition, intent processing, and gesture recognition, we process these tasks directly on the device to ensure speed and accuracy. However, when you want more robust functionality, such as interpretation or complex analysis, our requests are sent to the cloud for further processing.
As part of our continuous improvement strategy, we're always pushing updates to our software. This means that instead of following traditional yearly update cycles, we'll be constantly upgrading our OS to add new features and capabilities. This approach will enable us to provide a more seamless and robust user experience over time.
**Scene Recognition**
Now, let's take a look at what happens when you ask the device to recognize an image. It scans the picture, uploads it to our cloud-based servers where we have access to a range of AI models, and then provides a summary of what it sees. This is just a basic example of what we can do with scene recognition - in the future, we hope to enable users to ask more complex questions, such as "Who is wearing that red blazer?" or "What's the make and model of this coffee machine?"
**Longevity and Updates**
We're committed to supporting our device for an extended period. We don't plan on doing yearly updates on the hardware; instead, we'll focus on adding new features and capabilities over time. This approach means that users can invest in their Pin without worrying about the device becoming outdated too quickly.
While it's true that the Pin is a significant investment, our goal is to make sure that users get the most out of their device for as long as possible. We're confident that people will own their Pin for several years, and we'll be constantly upgrading our software to provide new features and capabilities.
**The Future of Interaction**
Our ultimate vision is to turn the world into a single operating system where everything is interactable and easily accessible. With the Pin, users will be able to point and get more information about anything they encounter - from everyday objects to complex systems like medical equipment or industrial machinery. This is just the beginning of what's possible with AI-powered interaction, and we're excited to see where this technology takes us in the future.
"WEBVTTKind: captionsLanguage: enso we're here at mwc with Imran and Bethany from Humane AI uh and we would like to ask you to introduce to us the AI pin so AIP pin is a new kind of computer it's essentially um something that uh allows you to be more present and gives you a sense of Freedom mainly because it's actually got a AI operating system that we've built from the ground up on top of an Android core that allows it to really uh do a lot of work for you and really engage AIS to um make it so that you don't have to be manually doing a lot of operations which really gives you a new way to interact that allows you to really maintain a little bit more presence and more freedom than you have today translate to French and then to French how it works is that when I put two fingers down and speak to it in English it'll speak back in French so maybe I'll start and then you can respond so we have white yes perfect so I could say it's so nice to meet you and now it'll speak back in French now when I put two fingers down you'll speak to it and so I'll come closer so you can speak to it yes it's also very interesting to meet you and see this pin awesome and it speaks and understands over 50 languages and it also knows when you're in a new city it'll default set to the language for the city you're in so when I landed in Barcelona it defaults to kadan or Spanish which is really powerful so we have a first ofit kind laser projection system that we call Laser ink and it essentially allows you to have a display there when you need it and it disappears when you don't we use our time a flight camera to understand when your palm is present and we project only on your palm when you put your hand down the display goes away and this is meant to be for really quick interactions and you navigate the display using touch uh and I can show you so this is my daughter sending me a message uh Al Oliver asks if he can call you after school let's move back now you can go here go through time temperature and the dates you can push back to get to the menu I can go here to see uh photos that I've taken in the past and scroll through previews and videos here this is to make sure that after I take a photo I make sure I got the shot so every time you ask the device a question when I hold up my hand I can interrupt the answer and read the answer on my palm this is really the heart of a multimodal system which is really important we see this as a new kind of computer something that's you know more established in terms of the uh coexistence of a lot of these services and things like the you know emails and slacks which will be coming in the future for us we think smartphones are going to be around for quite a while just like desktops and even even servers are still around in terms of computer platforms but we do see this as being something as uh a quick go-to for being able to do a lot of that for you and I think one of the things that's really powerful about it is that the moment you want to actually send a text or think about uh something that you want more information on it's right there ready for you in a way that other computers aren't so there's you know a couple ways you can play music but I prefer to just you know speak to it very naturally I'm going to play uh you know play some Taylor Swift right famous artist um what it's going to do is because of our streaming partner with tidle it's going to load music from Taylor Swift get her top songs make a playlist and look at that it's starting to play right you can see I can like control uh the music and Playback with simple gestures right I just paused it by double tapping on it I can lower volume just by swiping down I can skip to the next song Right these are all gestures but I can also you know control music just by using the laser display right look at that I can go next I can pause it I see more about uh you know the album I can push back you know get see what's next all sorts of you know music stuff that you might expect from a music you know experience but in a new form factor in a new UI can you tell us a little bit about the chat GPT and open AI models that you're using are you taking into consideration the fact that most AIS right now sometimes hallucinate answers and come up with stuff that isn't you know 100% accurate we don't actually use chat GPT we use the open AI API as one of our llms our OS supports multiple llms as well the way our architecture works though is that we go out and we find the right thing you're looking for and so if it's a an application experience that we're supporting we get you that one immediately if it's information that you want we try and get you the best and most accurate answer as much as we can that's you know all contingent upon what's out there so we will actually go to the right sources we'll go to wolf from alpha for example for mathematics or Wikipedia Hallucination is something that comes when you go directly from the llm we don't go directly from the L&M we go get it from the broader internet a lot of the processing seems to happen on cloud what kind of actions happen locally uh what kind of processing happens locally is like even my voice request being sent to be processed there or is the requests processed locally on the device itself so we we do a lot of things on device we'll be doing a lot more eventually so things like u a lot more of the intricate pieces in terms of in uh Speech work your intent processing all that happens on device we do a lot of our gesture recognition directly on device for Speed as well and then when you actually want something more robust um it kind of goes out to to our Cloud to do a lot of that interpretation we'll be bringing a lot of those models that we're doing on the cloud potentially onto device as they become more and more refined so you'll see a lot of that happening constantly I think one of the interesting things about our OS is that we can constantly be upgrading it so we don't go through yearly Cycles um as typical os's will be we can actually push upgrades directly and you'll see a lot of offline usage um really grow over time describe the scene in front of me and what it's going to do is it's going to scan the image it's going to take an image upload it to our clouds where we have some llms and it's going to uh come up with a summary and it's going to send it back to the device and then the device will be able to you know speak what it sees and this is a very basic example but in the future we'd like to you know have you walk up in front of you shows an individual and a blue shirt holding a professional camera with a mounted external microphone there is also a person wearing a a red blazer holding a microphone suggesting an interview or reporting scenario the background features a busy event or exhibition Hall with various people walking around in booths with company branding there is also a person wearing a blue dress and black boots walking by the environment suggests an indoor setting with artificial lighting look at that so much information big for all sorts of people you know people who may be visually impaired or people who have accessibility you know issues and you know again like this is a very brief example but in the future you know maybe you can walk up to a coffee machine and say how do I make a latte with this and it'll be able to identify the make and model of the coffee machine and give you instructions and we know our vision is to sort of turn the world into your operating system where everything is interactable and you just need a point and get more information what's the update longevity that you're planning for the pin because it's not a cheap device and it's also for many people a secondary device am I going to put my money down and get get one year of support two years of support what kind of longevity do you see for this yeah I think we definitely believe that people are going to own their pin for a while we're not planning on doing yearly updates on the hardware this is really about us adding more features like Imron mentioned over time you wake up and your PIN does something new and that's really what we think is the future and why Ros is so critical if you want to know more about the Humane AI pin be sure to check out our article on androidauthority.com and don't forget to follow for all the latest news from mwcso we're here at mwc with Imran and Bethany from Humane AI uh and we would like to ask you to introduce to us the AI pin so AIP pin is a new kind of computer it's essentially um something that uh allows you to be more present and gives you a sense of Freedom mainly because it's actually got a AI operating system that we've built from the ground up on top of an Android core that allows it to really uh do a lot of work for you and really engage AIS to um make it so that you don't have to be manually doing a lot of operations which really gives you a new way to interact that allows you to really maintain a little bit more presence and more freedom than you have today translate to French and then to French how it works is that when I put two fingers down and speak to it in English it'll speak back in French so maybe I'll start and then you can respond so we have white yes perfect so I could say it's so nice to meet you and now it'll speak back in French now when I put two fingers down you'll speak to it and so I'll come closer so you can speak to it yes it's also very interesting to meet you and see this pin awesome and it speaks and understands over 50 languages and it also knows when you're in a new city it'll default set to the language for the city you're in so when I landed in Barcelona it defaults to kadan or Spanish which is really powerful so we have a first ofit kind laser projection system that we call Laser ink and it essentially allows you to have a display there when you need it and it disappears when you don't we use our time a flight camera to understand when your palm is present and we project only on your palm when you put your hand down the display goes away and this is meant to be for really quick interactions and you navigate the display using touch uh and I can show you so this is my daughter sending me a message uh Al Oliver asks if he can call you after school let's move back now you can go here go through time temperature and the dates you can push back to get to the menu I can go here to see uh photos that I've taken in the past and scroll through previews and videos here this is to make sure that after I take a photo I make sure I got the shot so every time you ask the device a question when I hold up my hand I can interrupt the answer and read the answer on my palm this is really the heart of a multimodal system which is really important we see this as a new kind of computer something that's you know more established in terms of the uh coexistence of a lot of these services and things like the you know emails and slacks which will be coming in the future for us we think smartphones are going to be around for quite a while just like desktops and even even servers are still around in terms of computer platforms but we do see this as being something as uh a quick go-to for being able to do a lot of that for you and I think one of the things that's really powerful about it is that the moment you want to actually send a text or think about uh something that you want more information on it's right there ready for you in a way that other computers aren't so there's you know a couple ways you can play music but I prefer to just you know speak to it very naturally I'm going to play uh you know play some Taylor Swift right famous artist um what it's going to do is because of our streaming partner with tidle it's going to load music from Taylor Swift get her top songs make a playlist and look at that it's starting to play right you can see I can like control uh the music and Playback with simple gestures right I just paused it by double tapping on it I can lower volume just by swiping down I can skip to the next song Right these are all gestures but I can also you know control music just by using the laser display right look at that I can go next I can pause it I see more about uh you know the album I can push back you know get see what's next all sorts of you know music stuff that you might expect from a music you know experience but in a new form factor in a new UI can you tell us a little bit about the chat GPT and open AI models that you're using are you taking into consideration the fact that most AIS right now sometimes hallucinate answers and come up with stuff that isn't you know 100% accurate we don't actually use chat GPT we use the open AI API as one of our llms our OS supports multiple llms as well the way our architecture works though is that we go out and we find the right thing you're looking for and so if it's a an application experience that we're supporting we get you that one immediately if it's information that you want we try and get you the best and most accurate answer as much as we can that's you know all contingent upon what's out there so we will actually go to the right sources we'll go to wolf from alpha for example for mathematics or Wikipedia Hallucination is something that comes when you go directly from the llm we don't go directly from the L&M we go get it from the broader internet a lot of the processing seems to happen on cloud what kind of actions happen locally uh what kind of processing happens locally is like even my voice request being sent to be processed there or is the requests processed locally on the device itself so we we do a lot of things on device we'll be doing a lot more eventually so things like u a lot more of the intricate pieces in terms of in uh Speech work your intent processing all that happens on device we do a lot of our gesture recognition directly on device for Speed as well and then when you actually want something more robust um it kind of goes out to to our Cloud to do a lot of that interpretation we'll be bringing a lot of those models that we're doing on the cloud potentially onto device as they become more and more refined so you'll see a lot of that happening constantly I think one of the interesting things about our OS is that we can constantly be upgrading it so we don't go through yearly Cycles um as typical os's will be we can actually push upgrades directly and you'll see a lot of offline usage um really grow over time describe the scene in front of me and what it's going to do is it's going to scan the image it's going to take an image upload it to our clouds where we have some llms and it's going to uh come up with a summary and it's going to send it back to the device and then the device will be able to you know speak what it sees and this is a very basic example but in the future we'd like to you know have you walk up in front of you shows an individual and a blue shirt holding a professional camera with a mounted external microphone there is also a person wearing a a red blazer holding a microphone suggesting an interview or reporting scenario the background features a busy event or exhibition Hall with various people walking around in booths with company branding there is also a person wearing a blue dress and black boots walking by the environment suggests an indoor setting with artificial lighting look at that so much information big for all sorts of people you know people who may be visually impaired or people who have accessibility you know issues and you know again like this is a very brief example but in the future you know maybe you can walk up to a coffee machine and say how do I make a latte with this and it'll be able to identify the make and model of the coffee machine and give you instructions and we know our vision is to sort of turn the world into your operating system where everything is interactable and you just need a point and get more information what's the update longevity that you're planning for the pin because it's not a cheap device and it's also for many people a secondary device am I going to put my money down and get get one year of support two years of support what kind of longevity do you see for this yeah I think we definitely believe that people are going to own their pin for a while we're not planning on doing yearly updates on the hardware this is really about us adding more features like Imron mentioned over time you wake up and your PIN does something new and that's really what we think is the future and why Ros is so critical if you want to know more about the Humane AI pin be sure to check out our article on androidauthority.com and don't forget to follow for all the latest news from mwc\n"