AMD's AI Chip Event - Everything Revealed in 8 Minutes

supports up to 17 terabytes per second of bandwidth and of course to take advantage of all of this compute we connect eight stacks of hbm3 for a total of 192 GB of memory at 5.3 terabytes per second of bandwidth that's a lot of stuff on that show what you see here is eight Mi 300X um gbus and they're connected by our high performance and infinity Fabric in an ocp compliant design now what makes that special so this board actually drops right into any ocp compliant design which is the majority of AI systems today and we did this for a very deliberate reason we want to make this as easy as possible for customers to adopt so you can take out your other board and put in the Mi 300X Instinct platform and if you take a look at the specifications um we actually support all of the same connectivity and networking capabilities of our competition so PCI Gen 5 support for 400 gig ethernet um that 896 gbt per second of total system bandwidth but all of that is with 2. four times more memory and 1.3 times more compute server than the competition so that's really why we call it the most powerful geni system in the world we architect at Rockham to be modular and open source to enable very broad user accessibility and Rapid contribution by the open source community and AI Community open source and the ecosystem are really integral to our software strategy and in fact really open is is integral to our overall strategy this contrast with Cuda which is proprietary and closed now the open- source Community everybody knows moves at the speed of light in deploying and proliferating new algorithms models tools and performance enhancements and we are definitely seeing the benefits of that in the tremendous ecosystem momentum that we've established so I'm I'm really super excited that we'll be shipping Rockham 6 later this month I'm really proud of what the team has done with this really big release Rockham 6 has been op for Gen particularly large language models has powerful new features Library optimizations expanded ecosystem support and increases performance by factors it really delivers for AI developers Rock 6 supports fp16 bf16 and the new fp8 data pipes for higher performance while reducing both memory and bandwidth needs we've Incorporated Advanced graph and kernel Ops ations and optimize libraries for Approved efficiency we're shipping state-of-the-art attention algs like flash attention 2 page attention which are critical for performing llms and other models in 2021 we delivered the Mi 250 introducing third generation Infiniti architecture it connected an epic CPU to the Mi 250 GPU through a high-speed bus Infinity fabric that allowed the CPU and the GPU to share a coherent memory space and easily trade data back and forth simplifying programming and speeding up processing but today we're taking that concept one step further really to its logical conclusion with the fourth generation Infinity architecture bringing the CPU and the GPU together into one package sharing a unified pool of memory this is an APU an accelerated processing unit and I'm very proud to say that the industry's First Data Center Apu for AI and HPC the Mi 300a began volume production earlier this quarter and is now being built into what we expect to be the world's highest performing system and let's talk about that performance 61 teraflops of double Precision floating point fp3 64 122 teraflops a single Precision combined with that 128 GB of hpm3 memory at 5.3 tabes a second of bandwidth the capabilities of the Mi 300a are impressive and they're impressive too when you compare it to the alternative when you look at the competition Mi 300a has 1.6 times the memory capacity and bandwidth of hopper for low Precision operations like fp16 the two are at parity in terms of computational performance but where where Precision is needed Mi 300a delivers 1.8 times the double and single precision fp64 and fp32 floating Point performance so today I'm very happy to say that we're launching our Hawk Point ryzen 8040 series mobile processors and thank you haulo combines all of our industry-leading performance in battery life and it increases AI tops by 60% compared to the previous generation so if you just take a look at some of the performance metrics for the ryzen 98945 it's actually significantly faster than the competition in many areas delivering more performance for multi-threaded applications 1.8x higher frame rates for games and 1.4x faster performance across content creation applications a very very special thank you to all of our partners who joined us today and thank you all for joining us

"WEBVTTKind: captionsLanguage: engood morning everyone welcome to all of you who are joining us here in Silicon Valley and to everyone who's joining us online from around the world so that's why I'm so excited today to launch our Instinct Mi 300X it's the highest performance accelerator in the world for generative AI Mi 300X is actually built on our new cdna 3 data center architecture and it's optimized for performance and power efficiency cdna A3 has a lot of new features it combines a new compute engine it supports sparity the latest data formats including fp8 it has industry-leading memory capacity and bandwidth and we're going to talk a lot about memory uh today uh and it's built on the most advanced process Technologies and 3D packaging now let's talk about some of the performance and why it's so so great um for generative AI memory capacity and bandwidth are really important for performance if you look at m300x we made a very conscious decision to add more flexibility more memory capacity and more bandwidth and what that translates to is 2.4 times more memory capacity and 1.6 times more memory bandwidth than the competition now when you run things like lower Precision data types that are widely used in llms the new cdn3 compute units and memory density actually enable Mi 300X to deliver 1.3 times more Tera flops of fp8 and fp16 performance than the competition and if you take a look at how we put it together it's actually pretty amazing uh we start with four IO die in the base layer and what we have on the io dies are 256 megabytes of infinity cache and all of the nextg io that you need uh things like 128 Channel hbm3 interfaces pcie Gen 5 support our fourth gen Infinity fabric that connects multiple mi30 X's so that we get 896 gigabytes per second and then we stack eight cdna 3 accelerator chiplets or xcds on top of the IOD and that's where we deliver 1.3 pedop flops of fp16 and 2.6 pedop flops of fp8 performance and then we connect these 34 compute units with dense through silicon vas or tsvs and that supports up to 17 terabytes per second of bandwidth and of course to take advantage of all of this compute we connect eight stacks of hbm3 for a total of 192 GB of memory at 5.3 terabytes per second of bandwidth that's a lot of stuff on that show what you see here is eight Mi 300X um gbus and they're connected by our high performance and infinity Fabric in an ocp compliant design now what makes that special so this board actually drops right into any ocp compliant design which is the majority of AI systems today and we did this for a very deliberate reason we want to make this as easy as possible for customers to adopt so you can take out your other board and put in the Mi 300X Instinct platform and if you take a look at the specifications um we actually support all of the same connectivity and networking capabilities of our competition so PCI Gen 5 support for 400 gig ethernet um that 896 gbt per second of total system bandwidth but all of that is with 2. four times more memory and 1.3 times more compute server than the competition so that's really why we call it the most powerful geni system in the world we architect at Rockham to be modular and open source to enable very broad user accessibility and Rapid contribution by the open source community and AI Community open source and the ecosystem are really integral to our software strategy and in fact really open is is integral to our overall strategy this contrast with Cuda which is proprietary and closed now the open- source Community everybody knows moves at the speed of light in deploying and proliferating new algorithms models tools and performance enhancements and we are definitely seeing the benefits of that in the tremendous ecosystem momentum that we've established so I'm I'm really super excited that we'll be shipping Rockham 6 later this month I'm really proud of what the team has done with this really big release Rockham 6 has been op for Gen particularly large language models has powerful new features Library optimizations expanded ecosystem support and increases performance by factors it really delivers for AI developers Rock 6 supports fp16 bf16 and the new fp8 data pipes for higher performance while reducing both memory and bandwidth needs we've Incorporated Advanced graph and kernel Ops ations and optimize libraries for Approved efficiency we're shipping state-of-the-art attention algs like flash attention 2 page attention which are critical for performing llms and other models in 2021 we delivered the Mi 250 introducing third generation Infiniti architecture it connected an epic CPU to the Mi 250 GPU through a high-speed bus Infinity fabric that allowed the CPU and the GPU to share a coherent memory space and easily trade data back and forth simplifying programming and speeding up processing but today we're taking that concept one step further really to its logical conclusion with the fourth generation Infinity architecture bringing the CPU and the GPU together into one package sharing a unified pool of memory this is an APU an accelerated processing unit and I'm very proud to say that the industry's First Data Center Apu for AI and HPC the Mi 300a began volume production earlier this quarter and is now being built into what we expect to be the world's highest performing system and let's talk about that performance 61 teraflops of double Precision floating point fp3 64 122 teraflops a single Precision combined with that 128 GB of hpm3 memory at 5.3 tabes a second of bandwidth the capabilities of the Mi 300a are impressive and they're impressive too when you compare it to the alternative when you look at the competition Mi 300a has 1.6 times the memory capacity and bandwidth of hopper for low Precision operations like fp16 the two are at parity in terms of computational performance but where where Precision is needed Mi 300a delivers 1.8 times the double and single precision fp64 and fp32 floating Point performance so today I'm very happy to say that we're launching our Hawk Point ryzen 8040 series mobile processors and thank you haulo combines all of our industry-leading performance in battery life and it increases AI tops by 60% compared to the previous generation so if you just take a look at some of the performance metrics for the ryzen 8040 series if you look at the top of the stack so ryzen 98945 it's actually significantly faster than the competition in many areas delivering more performance for multi-threaded applications 1.8x higher frame rates for games and 1.4x faster performance across content creation applications a very very special thank you to all of our partners who joined us today and thank you all for joining usgood morning everyone welcome to all of you who are joining us here in Silicon Valley and to everyone who's joining us online from around the world so that's why I'm so excited today to launch our Instinct Mi 300X it's the highest performance accelerator in the world for generative AI Mi 300X is actually built on our new cdna 3 data center architecture and it's optimized for performance and power efficiency cdna A3 has a lot of new features it combines a new compute engine it supports sparity the latest data formats including fp8 it has industry-leading memory capacity and bandwidth and we're going to talk a lot about memory uh today uh and it's built on the most advanced process Technologies and 3D packaging now let's talk about some of the performance and why it's so so great um for generative AI memory capacity and bandwidth are really important for performance if you look at m300x we made a very conscious decision to add more flexibility more memory capacity and more bandwidth and what that translates to is 2.4 times more memory capacity and 1.6 times more memory bandwidth than the competition now when you run things like lower Precision data types that are widely used in llms the new cdn3 compute units and memory density actually enable Mi 300X to deliver 1.3 times more Tera flops of fp8 and fp16 performance than the competition and if you take a look at how we put it together it's actually pretty amazing uh we start with four IO die in the base layer and what we have on the io dies are 256 megabytes of infinity cache and all of the nextg io that you need uh things like 128 Channel hbm3 interfaces pcie Gen 5 support our fourth gen Infinity fabric that connects multiple mi30 X's so that we get 896 gigabytes per second and then we stack eight cdna 3 accelerator chiplets or xcds on top of the IOD and that's where we deliver 1.3 pedop flops of fp16 and 2.6 pedop flops of fp8 performance and then we connect these 34 compute units with dense through silicon vas or tsvs and that supports up to 17 terabytes per second of bandwidth and of course to take advantage of all of this compute we connect eight stacks of hbm3 for a total of 192 GB of memory at 5.3 terabytes per second of bandwidth that's a lot of stuff on that show what you see here is eight Mi 300X um gbus and they're connected by our high performance and infinity Fabric in an ocp compliant design now what makes that special so this board actually drops right into any ocp compliant design which is the majority of AI systems today and we did this for a very deliberate reason we want to make this as easy as possible for customers to adopt so you can take out your other board and put in the Mi 300X Instinct platform and if you take a look at the specifications um we actually support all of the same connectivity and networking capabilities of our competition so PCI Gen 5 support for 400 gig ethernet um that 896 gbt per second of total system bandwidth but all of that is with 2. four times more memory and 1.3 times more compute server than the competition so that's really why we call it the most powerful geni system in the world we architect at Rockham to be modular and open source to enable very broad user accessibility and Rapid contribution by the open source community and AI Community open source and the ecosystem are really integral to our software strategy and in fact really open is is integral to our overall strategy this contrast with Cuda which is proprietary and closed now the open- source Community everybody knows moves at the speed of light in deploying and proliferating new algorithms models tools and performance enhancements and we are definitely seeing the benefits of that in the tremendous ecosystem momentum that we've established so I'm I'm really super excited that we'll be shipping Rockham 6 later this month I'm really proud of what the team has done with this really big release Rockham 6 has been op for Gen particularly large language models has powerful new features Library optimizations expanded ecosystem support and increases performance by factors it really delivers for AI developers Rock 6 supports fp16 bf16 and the new fp8 data pipes for higher performance while reducing both memory and bandwidth needs we've Incorporated Advanced graph and kernel Ops ations and optimize libraries for Approved efficiency we're shipping state-of-the-art attention algs like flash attention 2 page attention which are critical for performing llms and other models in 2021 we delivered the Mi 250 introducing third generation Infiniti architecture it connected an epic CPU to the Mi 250 GPU through a high-speed bus Infinity fabric that allowed the CPU and the GPU to share a coherent memory space and easily trade data back and forth simplifying programming and speeding up processing but today we're taking that concept one step further really to its logical conclusion with the fourth generation Infinity architecture bringing the CPU and the GPU together into one package sharing a unified pool of memory this is an APU an accelerated processing unit and I'm very proud to say that the industry's First Data Center Apu for AI and HPC the Mi 300a began volume production earlier this quarter and is now being built into what we expect to be the world's highest performing system and let's talk about that performance 61 teraflops of double Precision floating point fp3 64 122 teraflops a single Precision combined with that 128 GB of hpm3 memory at 5.3 tabes a second of bandwidth the capabilities of the Mi 300a are impressive and they're impressive too when you compare it to the alternative when you look at the competition Mi 300a has 1.6 times the memory capacity and bandwidth of hopper for low Precision operations like fp16 the two are at parity in terms of computational performance but where where Precision is needed Mi 300a delivers 1.8 times the double and single precision fp64 and fp32 floating Point performance so today I'm very happy to say that we're launching our Hawk Point ryzen 8040 series mobile processors and thank you haulo combines all of our industry-leading performance in battery life and it increases AI tops by 60% compared to the previous generation so if you just take a look at some of the performance metrics for the ryzen 8040 series if you look at the top of the stack so ryzen 98945 it's actually significantly faster than the competition in many areas delivering more performance for multi-threaded applications 1.8x higher frame rates for games and 1.4x faster performance across content creation applications a very very special thank you to all of our partners who joined us today and thank you all for joining us\n"

AMD's AI Chip Event - Everything Revealed in 8 Minutes

Random Videos