Zen5 Mobile SoC - Chat about Sleep, and Voltage towards Long Mobile Battery Life

AMD's Next-Generation AI Engine: A Breakthrough in Power Efficiency and Compute Capacity

AMD has been making significant strides in its development of an AI engine, dubbed "XDNA", which promises to deliver unparalleled power efficiency and compute capacity. The latest iteration, XDNA 2, boasts 32 AI engine tiles and up to 50 NPUs (Neural Processing Units) on the mobile Zen 5 SoC, resulting in a staggering 10 trillion operations per second - five times the compute capacity of the previous generation and twice the power efficiency. This achievement marks a significant leap forward in AMD's pursuit of delivering high-performance AI capabilities for mobile devices.

The significance of this development lies in its implications for mobile AI applications. With the increasing demand for AI-powered features on smartphones, the need for efficient and powerful processing has become paramount. The XDNA 2 engine's impressive specifications make it an attractive option for developers looking to create innovative AI experiences on mobile platforms. AMD's emphasis on power efficiency is particularly noteworthy, as it addresses a critical concern in mobile devices - the constant struggle to balance performance with battery life.

The XDNA 2 engine's architecture has been designed to optimize AI processing, leveraging advancements in data representation and hardware transistors to achieve remarkable results. The unified Onyx EP (Engineered Performance) is a key component of this technology, providing a standardized platform for developers to build AI applications. With the Onyx EP, developers can download pre-trained models from Hugging Face, which will be executed on the XDNA 2 engine. This approach enables developers to focus on building innovative AI experiences without worrying about the underlying hardware infrastructure.

While AMD's XDNA 2 engine demonstrates impressive capabilities, its impact on the market will depend on the development of killer applications that can take advantage of these features. The reality is that AI programming for mobile devices remains a challenging task, requiring significant investment in research and development to deliver compelling user experiences. As such, it is essential to identify innovative use cases that can drive adoption and growth.

The comparison with Intel's Lunar Lake and WDDM (Direct Media Interface) approach is also worth noting. AMD's XDNA 2 engine and the Onyx EP represent a distinct departure from Intel's strategy, which focuses on optimizing GPU performance for AI workloads. The question remains as to which of these approaches will prevail in the market, with both sides vying for the attention of developers, consumers, and manufacturers.

As AMD continues to refine its XDNA 2 engine and Onyx EP platform, it is clear that the company is committed to delivering high-performance AI capabilities for mobile devices. The ultimate success of this technology will depend on the development of compelling applications that can showcase its potential. For now, AMD's commitment to innovation and power efficiency has set a promising course for the future of mobile AI.

AMD's Developer Experience: A New Frontier

One crucial aspect of AMD's XDNA 2 engine is the developer experience, which is expected to play a significant role in its success. The company's personnel have been open about their vision for the Onyx EP platform, emphasizing its potential to enable developers to build innovative AI experiences. As part of this effort, AMD has conducted interviews with developers and industry experts to gather insights on the challenges and opportunities associated with building AI applications.

While AMD is fortunate to have had access to such expert opinions, the company's own efforts are also noteworthy. The Onyx EP platform provides a standardized framework for developers to build AI applications, which will be executed on the XDNA 2 engine. This approach enables developers to focus on creating innovative experiences without worrying about the underlying hardware infrastructure.

The question remains as to how AMD's developer experience compares with Intel's Lunar Lake and WDDM approach. While both approaches have their strengths and weaknesses, it is clear that AMD's Onyx EP platform represents a unique opportunity for developers to create innovative AI applications.

As AMD continues to refine its developer experience, it will be essential to gauge the response from the development community. Will developers respond positively to the Onyx EP platform, or will they require additional support and resources? The answer to this question will have significant implications for AMD's success in the market.

The Future of Mobile AI: Killer Applications and Market Impact

The ultimate success of AMD's XDNA 2 engine and Onyx EP platform will depend on the development of killer applications that can showcase its potential. The reality is that AI programming for mobile devices remains a challenging task, requiring significant investment in research and development to deliver compelling user experiences.

While it is likely that some developers will focus on creating GPU-first solutions, which can be later ported to platforms-specific NPUs like Intel's WDDM or AMD's Onyx EP. The question remains as to whether these applications will drive widespread adoption and growth.

The market impact of AMD's XDNA 2 engine will also depend on the development of compelling user experiences that demonstrate its capabilities. Will users be willing to adopt mobile devices with AI-powered features, or will they require more convincing evidence of their utility? The answer to this question will have significant implications for the success of AMD's technology.

Conclusion

AMD's XDNA 2 engine and Onyx EP platform represent a significant breakthrough in power efficiency and compute capacity. As the company continues to refine its developer experience and AI capabilities, it is clear that AMD is committed to delivering high-performance AI solutions for mobile devices. While the market impact will depend on the development of killer applications and user experiences, AMD's XDNA 2 engine has set a promising course for the future of mobile AI.

The competition between Intel's Lunar Lake and WDDM approach and AMD's Onyx EP platform will likely continue to shape the market, with both sides vying for attention from developers, consumers, and manufacturers. As the landscape evolves, it is essential to monitor the progress of both approaches and assess their impact on the development community.

Ultimately, the success of AMD's XDNA 2 engine and Onyx EP platform will depend on its ability to deliver compelling user experiences that demonstrate its capabilities. With its impressive specifications, power efficiency, and developer experience, AMD has set a promising course for the future of mobile AI.

"WEBVTTKind: captionsLanguage: enso I'm very excited because I've got something very special we're going to talk all things 9,000 Series launch but also a bunch of the stuff that we haven't really covered but also brought this with with which to troll you so this is an x86 handheld from 20 years ago and it runs for 40 hours in two aa's this is Zen 5 Zen 6 Zen 7 power go no man that's that two doua batteries two doua batteries this is what this is where we want to get to people this is where we want to get to the problem um but no I mean maybe we should have a couple of cores that can run 8 MHz you know uh for some yeah for some background toas so it's I I use that to kind of you know introduce the the idea of I've been trying to get my um framework laptop to run for 16 hours and I've gotten close I've gotten it up to 12 with just the patches to the Linux kernel and everything else and that is with a 7740 8 core and 32 GB of memory that is a very very impressive so with the New Zen 5 stuff launching um you guys have have doubled down on power management and everything else talk to me about like for my peanut gallery Journey on power management it's not really to do with like the core but transitions like those take a long time like I would do I'll run a trace on an application and there's a lot that goes into that and I don't have any any clue nobody appreciates it so clue Us in right so so I think there is there was at one point in time there was an arms race on on um in terms of you know s so states that you want to go into and how how you would go into and and ultimately save power right arms race I like that right and and so people kick out of those States more um more often than they are like and essentially the the energy it takes to to qu everything and go down that state and back up net net you might end up you know over engineering this and end up not getting the benefits that you want but um with stricks we are increasing compute everywhere we're going 50% more course 30% more Graphics uh 3x the npu all still in the same form factor and we cannot let you know battery life drop I mean that's you cannot add anything without by by Taking battery life away that's that's a cardinal sin so more than just the IPS doing their bit on being efficient while they're active idle and and quing them quickly harmonizing them and making sure that we can powergate them completely at at the IP level right um becomes really important and that needs to happen automatically like you cannot have you should not have to wait for a complete straight transition to get those those States out so individual IP hierarchical power gating is something that we have invested in um and this is this is a like a a dynamic um power management algorithm that looks at um like a qos based a quality of service based you know voltage frequency uh and kind of says hey look um if someone's urgent and wants to do something I'm going to wrap them up quickly get them to where they are and then and then bring them back out we understand the characteristics of our IPS and being able to gate them and that should happen automatically you don't have to go into the next level sleep state if you will by quing the world in order to get that savings we can start um managing to to get some of those savings by just going IP by IP so that's the first part uh and then these chips are getting bigger these chips are getting more powerful um so driving it drives a larger fabric both on the data and the control side and it's sprawled all around the chip right so now we had to get more intelligent on segment based you know gating of those of those of those clocks and say hey look in this particular um application only a a sub portion of the fabric is alive the fabric runs on one clock if you will right it's uh and the clock tree managed uh like that as you had small chips but as you go larger we are like I need a subsection of this fabric only can I gate everything off can I and this has to happen automatically really the the previously we gated these fabric clocks only when the entire entire system can be down because the fabric goes everywhere but now we said you know pay attention to yeah one of the one of the interesting things that um that kind of came out in some of the material provided is that applications are not always the best judge of that and some of the engineering that amds had to do reaches all the way into the application layer to say okay this application will use as much resources as you give it it doesn't actually need that much we should probably keep it in a power gated State exactly right right so the moment you get a ping to say hey look this is the kind of work that you need to do um understanding what when and how and um how often should should we go into and come out of these states is is really important to to to getting battery life it's not about on paper we saw the state we we made this this this new uh uh you know engineered state which we never go into we get no residency into and to that effect we also did um what I call um you know more improved retention techniques because going into a uh into a state like you said is complicated and you go to that state and you try to save State and then you try to restore it by then you've been asked to go back into that state um and as you go to that state you try to take all of the power down there's dcdc losses on the platform net net you end up on the S so seems like you were doing something intelligent no net battery life gain so we have been a little more disciplined in saying hey look you can stay in an active state but we will go into retention mode on on a lot of things so you can do a video playback now which is what you know a lot of people commonly do in a completely low power State because we are actually gating a whole bunch of thing it looks like a lower state but it gets that automatically by just monitoring activity and saying that's not active that's not active that's not active get all of that out you all go into retention don't bother saving and restoring cuz that's going to take a whole bunch of time cuz we might have to jump out of this at any point in time and then if you get more and more residency and you train yourself into saying yeah I think we're no no this really is is down you take that even even lower to that side so it's really about battery life on low utilization but High residency tasks that we have to optimize because people are doing something it's bad life not in people say idle all the time but idle is when you go in and out of idle between keystrokes we do that um and and and and being able to you know gain as much power back in those periods of inactivity in a efficient manner not by going to a different state but by saying can we put all of the IPS that are burning you know both leakage and and um Dynamic uh Power into a state where we don't have to burn excessive power going in and out of those States at the same time end up with a net Improvement in battery life so you're going to see that with the with the St architecture I think running lightly threaded workloads you're going to see that Gain come out of it because we've designed it to to get you more battery life not out of you know the the regular um just keep it in idle and see how long it is or standby power and so on a lot of that really goes down to the platform as well the components you use how you build it we're not a vertically integrated uh system we're an open ecosystem we have 200 designs and we have a large AVS Like An approved vendor list so people build designs whichever way we want we are a fighter we're coming up so we cannot be making rules on these are the only choice competents you can use so those are limitations that we have and and sometimes you know um you end up with the systems like my AMD laptop and I'm like it's not just the S so it's a situation where it's like oh look at that the CPU only uses 12 Watts let's put in a 5 watt battery it's like oh exactly and and so we don't control all of that and but at the end of the day we take the we take the blame is a ryen laptop at the end of the day it's it's not a uh you know XYZ component in that's what AMD Advantage is for that's that's right and so we are advancing that as you can gain more traction you know we'll be able to get um a little more say in in in terms of you know how we how we make these designs and some of these high level design like this um topnotch designs we're making uh is going to start showing that having both the Zen 5 and the Zen 5c cores opens up a lot of possibilities too because a lot of what you're describing with power gating and everything else almost counterintuitively most of the stuff that you're going to do on laptop are going to live on those Zen 5c cores but then when you need the performance the the performance cores can light up and you're good to go oh yeah absolutely and and um I think it it is about it is about that once you once you understand the capabilities of the of the classic and and the compact it becomes clear what you want to use each of them for for example um if you're running um like a teams application where it is constant it's running probably stay on teams longer than I should uh but uh it's it it is uh it's going to get full full residency to your compact course but while they are running uh you want to open up a browser or or or or or an application that requires that responsiveness that that bursty uh performance it's going to get scheduled on the classic and it's going to have its own context because it's going to be on its own core complex so the cashes and not getting mudded uh they are two independent programs that are intended to run in two different enclosures if you will and once the browser is done you get that responsiveness from those sces it's burst it it goes back to sleep and your teams continues running in the background right so the goal was to elevate the user experience without taking away um you know anything in terms of efficiency but just adding on to say hey look um we've got the best of both worlds right yeah there's uh there's also a power aspect for um pcie as well like nvme like power gating the nvme you could save a lot of power there too we can and and again um we've there are choices that our comp competitors have made around around advancing um storage on in terms of speeds on on um on nvmes right on our PCI Lanes going forward you can can see that you know why aren't we going to PCI Gen 5 and you bring up a excellent point right it's it's an end to a means five is greater than four we get it it's higher bandwidth it's there in our desktop Parts in in fact if you want to make a a mobile um gaming system you can use one of our FL platforms the range products and you can build a PCI Gen 5 you know based laptop if you want it but for our mainstream laptop laptops where power efficiency is key are sticking with PCI Gen 4 for another generation right I mean it's we have that capability in our Arsenal you can see it in our other products we're not bringing it into Mobile because it doesn't need it I think the bandwidth to storage right now yet does not really move any meaningful user experience other than some benchmarks um if you're using nonlinear editing you know if you're that kind of a user you're going to dis and bringing gobs of data over sure Gen 5 would be great but for 90 % of the folks that use a laptop I think Gen 4 is good so we're not going after speeds and feeds we're being power conscious you know what else is faster than nvme Ram 48 96 gigs of RAM not a problem that's right so I think it's it's it starts with that Vision like what what do we think our is going to move our user we think it's better performance it's what you said battery all day and being able to do and and to to have that system be something that doesn't hold you back you want more compute plug it into the wall you know if you know there there is a a Delta between AC and DC performance and it's it's for a reason right I mean most people use our some people use our laptops as as replacement desktop replacement they want that computer 12 cores can still chug along right and and give you that performance if you want it but they can hum at 15 wats and give you that give you that power efficient operation giving you 16 hours of uh of battery life and an OLED screen uh it's basically all anybody wants that's right and but I think as as time progresses people would want to do more during that like that 16 hours might involve eight hours of teams with three FS running and so you have to get uh and collaborative work that you'd be doing with a lot of people I mean so the npu comes in handy that those tasks not only do they have to get done they have to be they have to be done efficiently so the days of you know idle idle battery life is is going I think like you said it's between 12 and 16 hours of battery life want but what you intend to do during those 12 hours is evolving it's changing and the accelerators we're putting in the more we're not shying away from putting more compute in I hope that comes through right we're not saying yeah it's all about battery let's try to go smaller and narrower and more efficient we're putting in more cores more Graphics more npu more than the standard that's needed for co-pilot they said 40 we're giving you 50 like we're always pushing the boundary and I think building the hardware first um I think software the the ecosystem in terms of experiences and Fe will come around it and that always been the case we we broke the mold on you know desktop going to eight cores when two or four was supposed to be enough and and now 12 with thread Ripper and like now you have 12 on your mobile platform that has a 15 wat TDP so I think we'll continue that's an our DNA we'll continue doing that so thank you thank you for joining me m this is very awesome is there anything else you want to shine a light on in the in thec that doesn't normally get attention because you know like I couldn't sea states are exciting that's the thesis of this part no and it is and and I think I think the attention to detail that we've given to power efficiency I just want to underline that a little more like when we added 50% more course we had to innovate to bring a compact course in to balance that it didn't come for free when when we added 33% more Graphics uh that came in with all the features that Mark just spoke about at De they where performance per bit performance um uh per what enhancements as well as memory management so we go to memory less often and burn less power and the npu right when we said accelerator 2x efficiency and 3X the the peak tops all of that again keeping efficiency in mind squeezing all that in even the npu design everybody now talks about the 40 tops npu and everybody has it each design is unique each design is different ours like like we talked about it is column based it is meant to be scalable as we get more and more concurrent applications to start running you will see our NP shine more we have the pedigree of Designing npus from ouring STS I mean we we have done this a few times and we know what we think is the most configurable uh ready for the future you know um npu design that we think is going to the software will catch up as we try to add more and more experiences in and the and the nature of our architecture will blend itself lend itself well into it so I think on the Forefront of having the best IP on CPU GPU and npu is important to us but blending it all in keeping power in mind I think that's that's been the feat that's been that's been the way of um yeah that's the miracle I I think that we've we've uh you know put together here in in the stck point as we see and we did get a quick run through with uh vomi bana about Ai and amd's AI approach in in the npu and they had some slides that they shared with us at Tech day as to what it means for the third generation of ryzen AI the AMD xdna AI engine like Mahesh was saying is is kind of a grid you can partition it however you want and this is maybe an Innovative feature because you can say okay this part is going to be reserved for Microsoft's AI function in Windows but this other third of it or this quarter of it could go toward uh your own runtime or doing video decode or doing something else outside of the Microsoft ecosystem and certainly partners of AMD like Lenovo and HP have their own ideas of what they're going to run on amd's xdna engine outside of the 40 tops or whatever that Microsoft reserves for the operating system xdna 2 here ends up being 32 AI engine tiles and up to 50 npu tops which is 10 trillion operations per second above the minimum that Microsoft has set for Windows AI experiences so over the previous generation that's five times the compute capacity and twice the power efficiency so not a bad achievement moving into zen 5 and the zen5 mobile s so and if you're thinking gosh I would love to have this on the IOD die on ryzen Des toop H maybe in a future generation but you can get a lot of this done with your GPU instead in a desktop context the reason this doesn't make sense for mobile is strictly down to power they convert things that could be done with a purely software experience into something that is done with Hardware transistors because overall that uses dramatically less power little bit less flexibility but if you know you're going to be running certain types of AI applications it makes sense to go this route on mobile in the demos that AMD showed were stable diffusion XL turbo which ran quite competently and they also ran a couple of demos where you run the AI locally and you feed it documents so you don't have anything going to the cloud and then you can ask the AI questions about the documents that you've you fed it and it's nice having that peace of mind that the AI is not sending your documents to the cloud for processing because Lord knows what happens to them once they leave your machine and to be clear with the unified Onyx EP you just download models from hugging face these are the models that you've come to know in love uh there is some special sauce some interesting things happen happening under the hood to do with how AMD is representing the the data types it's almost 16bit accuracy but it's runs like it's 8 bit and it turns out it's because it's it's nine bit it's mostly 8 bit and then there's a shared exponent for this data type which is interesting it's interesting to see how that's going to play out in the market but this is uh drop in compatibility so like Lama 2 7 billion and it's going to be you know 99.9% as accurate as standard 16bit floating point so impressive now what mahes was talking about with the S so is that there's an npu on the S so on mobile zen5 and the reason there needs to be an npu on mobile and you don't see that on desktop is because you need efficient AI you need a way to run AI efficiently that doesn't have your GPU blasting it's like playing a game you know you play a game with your GPU and your system will be dead in an hour or two you don't want that if you're running AI tasks in the background so the npu now something that's been top of mind for me is what is the npu developer experience and I'm fortunate because I was able to do an interview with AMD Personnel at computex about Ai and everything like that I'm just now getting that done partly because of the Onyx runtime and partly because I wanted to take some of those things for a spin and so I've got that coming up in a video stay tuned for that but yeah the Onyx and the npu architecture that the direction that AMD is going is not really uh is not really the same as what Intel is doing with lunar Lake and their wddm approach and so it's going to be interesting to see which one of these wins in the market the the reality with AI programming right now on mobile is that there needs to be a killer application and the reality is that developers at least smalltime developers or small Studios that are working on on these kinds of things if they come up with a killer application it is probably going to be GPU first and then it will be ported to a platform specific npu Intel AMD maybe for mobile phones whatever they have whatever Apple's doing whatever Samsung is doing whatever Qualcomm is doing with their their thing but the killer application has to exist first but it's a little bit of a chicken and egg problem where we're at right now it's like AI is going to be big probably for the individual it's certainly been big for business but how is that actually going to translate into usable applications for the end user and that remains to be seen But if you build it they will come I don't know we're have to stay tuned for a future video I'm wless level one I'm signing out you can find me in the level one forums look for that on patreon of float plane and pretty soon on the Channel all right I'm signing out and I'll see you in the Forum\n"