State of ROCm 5.3 in 2022 - 6x Mi210, 1petaflop, in the 2u Supermicro AS -2114GT-DNR

Tying Instinct Accelerators with OpenVB: A Game-Changer for Stock Trading and AI Analysis

One of the exciting aspects of using Instinct accelerators is their ability to tie in with OpenVB, which has proven to be a game-changer for stock trading and AI analysis. The connection between these two technologies allows users to leverage the power of Instinct accelerators to analyze data from the Open Bloomberg trade terminal, making it an attractive option for traders looking to gain an edge in the market.

Getting Rock M up and running with TensorFlow is another exciting development, enabling users to perform AI analysis on their data. The level of compute power available with these Instinct accelerators is unparalleled, making them a valuable tool for those looking to develop new algorithms or improve existing ones. This level of horsepower can be achieved at a relatively affordable cost, making it an attractive option for traders who want to take their trading to the next level.

Surprisingly, getting an AI framework up and running with OpenBB was surprisingly easy, allowing users to conduct experiments and test out new ideas. However, as someone who is not familiar with trading, I recognize that there is a need for expertise in this area. Therefore, I may consider featuring a different video that brings in an expert on the trading side of things to provide valuable insights and ideas.

AMD Powers Instinct: The Top Performer

A notable mention is AMD's Instinct cards, which have been recognized as the top performer in various benchmarks such as the Top500 and the Green500. These cards are designed to deliver exceptional performance for AI and machine learning applications, making them an attractive option for those looking to accelerate their workloads.

Configuring the System

The system has already been configured with a cluster setup, allowing users to run experiments on the Instinct accelerators. If you have a specific project in mind that you would like to run, I encourage you to reach out and provide your credentials. I will do my best to accommodate your request and work with you to get your project up and running.

Running cfd Computational Fluid Dynamics

I was recently looking into computational fluid dynamics (cfd) and discovered that NASA has a copy of the Space Shuttle, which I would love to run on. Unfortunately, I couldn't find any information on where it is currently located, but I'm aware that the National Institute of Standards and Technology (NIST) has some interesting models for cfd.

Learning from Experts

One piece of advice I can offer is to familiarize yourself with tracing application performance using tools like Rock M's profiling guide. This tool provides valuable insights into how applications are executing on the GPU, including which operations are being performed, compute vs memory transfers, and more. The Rock M profiling guide for version 5.3 includes a comprehensive guide on using this tool to analyze your application's performance.

A Key Skill for Developers

Running this tool can generate a CSV file showing where your program has spent its time, allowing you to identify areas of improvement. This is an essential skill for developers, as it enables them to understand how their code is executing and make targeted optimizations.

The Importance of Performance Analysis Tools

Some developers struggle with programming in parallel due to the complexity of memory management and other safety features. However, having access to performance analysis tools like Rock M's profiling guide can help bridge this gap. By using these tools, developers can gain a better understanding of how their applications are executing and make informed decisions about optimizations.

Conclusion

The Instinct accelerators have opened up new possibilities for stock trading and AI analysis, providing unparalleled compute power at an affordable cost. With the ability to tie these accelerators with OpenVB and Rock M's profiling guide, users can gain a deeper understanding of how their applications are executing and make targeted optimizations. Whether you're a seasoned developer or just starting out, learning about performance analysis tools like this one is essential for taking your work to the next level.

"WEBVTTKind: captionsLanguage: enthis video is brought to you by ingenious and their new ecw336 which I recently reviewed and found to be pretty impressive this is true Wi-Fi 6E solution look at that it's so awesome it comes with a ceiling grid installation kit and some other stuff it's power over ethernet but that that's a five gigabit interface so they're very very serious about this wi-fi 6E 6 gigahertz gives you 14 additional 80 megahertz channels or seven additional Ultra wide 160 megahertz channels that means you can do more than a gigabit over Wi-Fi yeah more than wired ethernet oh yeah it's possible with Wi-Fi 6E good range and the appropriate mesh I use ingenious at home also using genius Cloud products at home this is the ECW 336 if you don't want the cloud there are non-cloud options from ingenious but I really would encourage you to take a look at these because there's no license fees there's no subscription or anything like that they'll run you'll be able to use them you can set all the features you get roaming and all that you don't need any device to manage connectivity and the configuration from what I can tell for the most part is all local so like once they're set up if something weird happens to your internet connection everything will still keep working and these are very very impressive pieces of hardware and so for a small business installation or you know the Ultra Premium home installation which is what I have these are pretty solid and I really like them thanks to genius for sponsoring this video and on with the show thank you I'm back with the super micro 2114 gt- DNR the GT yeah it's a very very nice system it's a 2u2 node system I did a full tear down with Steve from Gamers Nexus we've also done some other content here but I'm here to tell you to talk about the mi210s that are in this thing six in my two tens as a matter of fact that's amd's Instinct accelerator what does that mean well I mean I can do 181 teraflops with the floating Point 16 which means that this box is just over a petaflop and less than 2 000 watts of power this is kind of transformational the mi-210 is the uh come as you are accelerator which is going to work in just about any server grade chassis without anything special I mean they're they're pretty awesome don't get me wrong but you know the frontier supercomputer at Oak Ridge is the Mi 250s but this this node it's about a half a node give or take maybe a little more depends on how you cut the cheese but anyway Rock M 5.3 this is something that I've been working on for the past couple of months now I just got this system but learning Rock M and the ecosystem and all of that has been uh sort of an interesting challenge for me personally because I wanted to look at it and see what all was was in there and how it worked and some fun interesting things so we're going to try to do some other interesting content around on machine learning and the fun applications that you can have with neural magic and fluid numerics and other folks like that that work in and around machine learning and machine learning adjacent systems for this though I thought I could give you a quick rundown of installing Rock M 5.3 under Ubuntu 22.04 LTS system because it really is super super easy I was able to install it myself in a matter of minutes and get up and running with Rocky M 5.3 so much power everything is all racked up as before but this is our 220 volt power cord it says 20 amps 250 volts on it and it really is this is what we mean when we say top of rack nice in order to see a little power so now we will plug this in don't stare into the laser so we're getting all this hooked up these are 100 gigabit you know color chip media transceivers and I really don't need these because it's such a short run but you'll notice that when I plug these in they don't do anything I don't get a link light or anything like that that's because unlike um your typical ethernet switch these are not pre-configured they're not Plug and Play well actually that's not correct when you're running this firmware on the switch it expects to be plugged into the network and the network will tell the switch what to do but I haven't done that so the switch is doing a whole lot of nothing see that's how it is in the Enterprise the the wire monkey plugs it in but they're not expected to do anything other than set it up so we're going to do it the manual way which involves connecting a console cable you know funky RJ45 that's not ethernet to USB yeah cereal I'm super cereal right now to actually do anything with the cereal pork we're gonna need minicom so now our super micro two node system is connected at a hundred gigabits to our main switch and up to 25 gigabits for all of our other machines including dual 25 gig for one of our other machines are a four node cluster from a while back so this is going to be pretty cool and it's no secret that in Nvidia and especially their kudu libraries are kind of the incumbent here I mean Cuda as an ecosystem is pretty cool for NVIDIA because it means if you misfire on one or two GPU Generations it doesn't really matter because your customers are sort of locked into the platform you have to use Cuda but AMD is sort of a Relentless execution machine they will chip away and chip away and keep working and chipping and working and chipping and so you know initially when Rock M launched we had you know Vega and Vegas sort of evolved into cdna and now we're on cdna2 we also have our DNA and you know there are some folks that want to try to run machine learning accelerators on their individual gpus and it's sort of a weird time as far as all that goes but Rock M 5.3 is a major major release how do we know because it makes porting from Cuda a lot easier so that you can move out of that ecosystem it's an uphill battle that AMD has to climb but they've been chipping away at it and now with rock M 5.3 it's actually shockingly successful at sort of you know giving you the tools that you need to not just analyze performance because it's actually got really good performance analysis stuff too we'll talk about in a minute but also get really good performance from stuff that is available on the internet you know git repositories it's pretty easy to download and be able to run those applications you know it's no secret that the people that are working at Oak Ridge are some of the smartest people and it's it's not a coincidence that they chose AMD to build their platform because they wanted an open research thing I mean every research scientist I've talked to wants the platform to be as open and consistent and repeatable as possible and that's kind of what AMD has been building with rock M this whole time first it's the major news that amd's become a sponsor of the pi torch Foundation so if you have something like an mi210 you will basically at this point more or less have a seamless experience deploying machine learning stuff via python for your instinct card that's huge because a lot of scientists don't really want to fool with things they just want to be up and running with their code that's understandable the second thing is meta the company that you know has the largest connected graph and probably more data on all of us than than anyone released a bunch of free tools that make it easier to switch your GPU compute platform that is insanely huge it sort of suggests that meta has been evaluating AMD Instinct products continuously for their own internal use and now they've sort of shared tools with the rest of the industry which will make it easier to adopt AMD Instinct based compute devices there's a kind of a tangent that's also related to that also related to The Meta release and it's Intel so Intel is working on their own gpus as well that's no secret why does that matter in an AMD video well Intel finds itself in the same position it's an uphill battle to provide an industry standard so that everybody is not super dependent on Cuda that makes AMD and Intel unlikely allies I mean this is this is uh one of the most unusual places where maybe Intel and AMD would be allies and they're not directly working together on anything as far as I know Intel's approach for this of course is one API but this is so important to Intel that with The Arc GPU release and some of the other news lately around Intel's GPU division that even their CEO Pat gelsinger has come out and said we want developers to be able to write their code once and run it anywhere does that sound familiar yeah it should that's kind of the promise here with everything that AMD is doing on the open side of things the AMD documentation is pretty good historically it's had some warts and it's still got Improvement to make but the rock M 5.3 documentation for Ubuntu 22.04 LTS installation went really super well and I was up and running immediately on this insane super micro system to be able to run whatever workload so I still got a little bit of work to do and I've got some other fun demonstrations that I want to do while I still have these mi210s I get to send this back pretty quick so I'm not going to have a lot of time to educate myself and learn everything that I need to learn in order to put this together which is why I'm going to lean on fluid numerics and maybe a little bit of neural engine maybe with an upcoming launch and some other fun stuff with that and maybe some other stuff that we can look at to sort of compare not every workload runs perfectly in a GPU scenario I mean Frontier has one really beefy CPU compute node for every four GPU compute nodes so a lot of stuff runs on the GPU but depending on what you're doing that might not necessarily make sense you got other stuff you have to worry about but you can still take advantage of the libraries that AMD offers which is pretty what exactly is the mi210 well it's a pcie add-in card just like a GPU but it's for compute acceleration it is 104 compute units with 6656 stream processors 416 Matrix scores it can do 22.6 teraflops in floating point 64 Vector floating point 64 Matrix that can do 45.3 teraflops floating point 64 and bf16 that's 181 teraflops and it's Peak int4 and int rate is also 181 teraflops it has 64 gigabytes gigabytes of hbm2e at a 496-bit bus width it's clocked at 1.6 gigahertz so that's 1.6 terabytes per second memory bandwidth now this full chip ECC support Ras support and you can have up to three Infinity fabric links it is coherency enabled so you can do dual and quad hives it supports 64-bit Linux and is fully Rock M compatible as I said it is a full height full length car hard I did a tear down with Gamers Nexus that I mentioned before it's a pcie Gen 4 support it also supports pass-through Sr iov and it's a 300 watt TDP card it's a three year warranty from from AMD now with six cards in this system with 64 gigs of hbm2e each yes that is like 384 gigabytes of vram at my disposal in just 2u of Rackspace in just under 2 000 watts of power that is an absurdum if you I mean I can't explain how absurd it is to have a petaflop of floating Point 16 in such a small space I mean this would be like showing somebody a cell phone from the 1960s it's it's completely it's just it boggles the mind now one of the most fun things that I saw from fluid numerics is the GPU accelerated Fortran Fortran the computer language if you're not familiar has a fun and storied history that goes back almost to the beginning of computing and there is a lot of Fortran software out there in the financial services industry it's it's sort of works and no one wants to touch it no one has wanted to touch it and no one continues to want to touch it so the code continues to run to do things Fortran makes sense because AMD with their instant cards has come to support openmp open multi-processing it's a library to make shared memory uh multi-processing compute programming a lot easier it's so well supported on Instinct that it's pretty easy to get pretty much anything that's designed for openmp to work with GPU acceleration and that includes Fortran there's actually a whole bunch of languages you should check out the Wikipedia article on that but openmp is so well supported on this platform that if you have existing code that's designed for openmp getting it up and running on this platform with six gpus across two nodes connected by 100 gigabit Ethernet interface between the nodes it's actually pretty easy to do oh and if you're into tensorflow I was really impressed with how easily Dr schoonover from fluid numerix was able to Port native tensorflow code directly to what AMD provides in the Rock M setup so that you can get that up and running you know with pi torch or well I mean whatever that you have but tensorflow runs really really quickly on these cards AMD includes that in their white paper as well just for Giggles to give you an idea of how easy it was to get up and running fees Ryan did a video on open BB the open Bloomberg terminal you can do your own stock trading and there's a lot of you in the audience that do that that's pretty exciting it turns out you can actually tie in these Instinct accelerators with openvb and that works pretty well getting Rock M up and running with tensorflow for AI analysis of data from the open Bloomberg trade terminal that's a thing you can do and having this much compute power at your disposal if you think you've got a new algorithm that's going to figure out the market and do options trading or whatever crazy thing it is that you want to do this is a relatively insane amount of horsepower for you know if you're a Trader not a lot of money and it was surprisingly easy to get an AI framework up and running with openbb in order to be to be able to do experiments and things like that I might actually put that in a different video although probably would have to bring in somebody that's a little bit more of an expert on the trading side to uh give us some actually good ideas because I have no idea what I'm doing on the actual trading side of it all I can do is uh janitor the technology together with the best of them oh and in case you didn't know AMD Powers is the number one computer on the top 500. top the number one on the green 500 and the number one on the hpl AI That's these cards the Instinct cards the very same cards that we're working with here remember what I said AMD and they're just Relentless execution just like in The Shining now because I've got the system configured in the cluster already if you have something you want to run on here you want to try to run on here hit me up let me know what your credentials are what you're going to try to run and I'll try to work it out you can email me Wendell level1tax.com or hit me up in the Forum or whatever you want to do I just need to be able to download whatever it is and run a script I can keep it private I can obscure the data you can have access to the system maybe we can work that out but let's try some interesting experiments does anybody know where NASA put the copy of the Space Shuttle I can run cfd computational fluid dynamics on I was looking for that the other day and it was sort of M.I.A I know the nist has some really interesting models for cfd but uh I needed a little bit more of a step by step on how to do that I've got some learning to do bottom line I want to know what the people at Oak Ridge know I mean those are really smart people they picked this system why did they pick this system I want to know why why did meta spend all the time doing what they did in order to better support amd's ecosystem and pytorch of course goes without saying I have a lot to learn now one of the things that was really exciting to me too is a low level tool but I want to share is being able to trace application performance you see when you're doing computer janitoring stuff it's usually a pretty good idea if you have the tools to figure out how things are going wrong so you sort of learn to figure out how to take a program apart so that you can watch it executing so that you understand better how it works that's not always easy to do when you have massively parallel things some of the most gifted programmers on planet Earth really struggle with programming in parallel and no one can program parallel C code which is why a whole bunch of other languages were invented at least no human being can because you know unsafe and memory types and that was a joke anyway the rock M profiling guide for version 5.3 has a really interesting guide on using their profiling tool to see what your application is doing what operation is it doing on the GPU is it doing compute is it doing a memory transfer do we have something happening where the GPU spends a lot of time waiting on a memory transfer and it does a compute can we be doing a memory transfer while we're doing a compute is that an option that's one of the example exercises that we saw when we were doing some of the demo code during the flu numerix training you can run this tool and actually generate a CSV showing you where your program has been all of its time and then you can decide to make changes to your program or make optimizations and this is a skill in the nicest possible way that I can say this some of the people that I fix things for have no idea how to do that or even to really think in those terms let alone use tools to sort of Divine information about the state of things in order to figure out what's going on so if I had one piece of advice for you it is to look at these tools that are provided for tracing and performance analysis because they're very goodthis video is brought to you by ingenious and their new ecw336 which I recently reviewed and found to be pretty impressive this is true Wi-Fi 6E solution look at that it's so awesome it comes with a ceiling grid installation kit and some other stuff it's power over ethernet but that that's a five gigabit interface so they're very very serious about this wi-fi 6E 6 gigahertz gives you 14 additional 80 megahertz channels or seven additional Ultra wide 160 megahertz channels that means you can do more than a gigabit over Wi-Fi yeah more than wired ethernet oh yeah it's possible with Wi-Fi 6E good range and the appropriate mesh I use ingenious at home also using genius Cloud products at home this is the ECW 336 if you don't want the cloud there are non-cloud options from ingenious but I really would encourage you to take a look at these because there's no license fees there's no subscription or anything like that they'll run you'll be able to use them you can set all the features you get roaming and all that you don't need any device to manage connectivity and the configuration from what I can tell for the most part is all local so like once they're set up if something weird happens to your internet connection everything will still keep working and these are very very impressive pieces of hardware and so for a small business installation or you know the Ultra Premium home installation which is what I have these are pretty solid and I really like them thanks to genius for sponsoring this video and on with the show thank you I'm back with the super micro 2114 gt- DNR the GT yeah it's a very very nice system it's a 2u2 node system I did a full tear down with Steve from Gamers Nexus we've also done some other content here but I'm here to tell you to talk about the mi210s that are in this thing six in my two tens as a matter of fact that's amd's Instinct accelerator what does that mean well I mean I can do 181 teraflops with the floating Point 16 which means that this box is just over a petaflop and less than 2 000 watts of power this is kind of transformational the mi-210 is the uh come as you are accelerator which is going to work in just about any server grade chassis without anything special I mean they're they're pretty awesome don't get me wrong but you know the frontier supercomputer at Oak Ridge is the Mi 250s but this this node it's about a half a node give or take maybe a little more depends on how you cut the cheese but anyway Rock M 5.3 this is something that I've been working on for the past couple of months now I just got this system but learning Rock M and the ecosystem and all of that has been uh sort of an interesting challenge for me personally because I wanted to look at it and see what all was was in there and how it worked and some fun interesting things so we're going to try to do some other interesting content around on machine learning and the fun applications that you can have with neural magic and fluid numerics and other folks like that that work in and around machine learning and machine learning adjacent systems for this though I thought I could give you a quick rundown of installing Rock M 5.3 under Ubuntu 22.04 LTS system because it really is super super easy I was able to install it myself in a matter of minutes and get up and running with Rocky M 5.3 so much power everything is all racked up as before but this is our 220 volt power cord it says 20 amps 250 volts on it and it really is this is what we mean when we say top of rack nice in order to see a little power so now we will plug this in don't stare into the laser so we're getting all this hooked up these are 100 gigabit you know color chip media transceivers and I really don't need these because it's such a short run but you'll notice that when I plug these in they don't do anything I don't get a link light or anything like that that's because unlike um your typical ethernet switch these are not pre-configured they're not Plug and Play well actually that's not correct when you're running this firmware on the switch it expects to be plugged into the network and the network will tell the switch what to do but I haven't done that so the switch is doing a whole lot of nothing see that's how it is in the Enterprise the the wire monkey plugs it in but they're not expected to do anything other than set it up so we're going to do it the manual way which involves connecting a console cable you know funky RJ45 that's not ethernet to USB yeah cereal I'm super cereal right now to actually do anything with the cereal pork we're gonna need minicom so now our super micro two node system is connected at a hundred gigabits to our main switch and up to 25 gigabits for all of our other machines including dual 25 gig for one of our other machines are a four node cluster from a while back so this is going to be pretty cool and it's no secret that in Nvidia and especially their kudu libraries are kind of the incumbent here I mean Cuda as an ecosystem is pretty cool for NVIDIA because it means if you misfire on one or two GPU Generations it doesn't really matter because your customers are sort of locked into the platform you have to use Cuda but AMD is sort of a Relentless execution machine they will chip away and chip away and keep working and chipping and working and chipping and so you know initially when Rock M launched we had you know Vega and Vegas sort of evolved into cdna and now we're on cdna2 we also have our DNA and you know there are some folks that want to try to run machine learning accelerators on their individual gpus and it's sort of a weird time as far as all that goes but Rock M 5.3 is a major major release how do we know because it makes porting from Cuda a lot easier so that you can move out of that ecosystem it's an uphill battle that AMD has to climb but they've been chipping away at it and now with rock M 5.3 it's actually shockingly successful at sort of you know giving you the tools that you need to not just analyze performance because it's actually got really good performance analysis stuff too we'll talk about in a minute but also get really good performance from stuff that is available on the internet you know git repositories it's pretty easy to download and be able to run those applications you know it's no secret that the people that are working at Oak Ridge are some of the smartest people and it's it's not a coincidence that they chose AMD to build their platform because they wanted an open research thing I mean every research scientist I've talked to wants the platform to be as open and consistent and repeatable as possible and that's kind of what AMD has been building with rock M this whole time first it's the major news that amd's become a sponsor of the pi torch Foundation so if you have something like an mi210 you will basically at this point more or less have a seamless experience deploying machine learning stuff via python for your instinct card that's huge because a lot of scientists don't really want to fool with things they just want to be up and running with their code that's understandable the second thing is meta the company that you know has the largest connected graph and probably more data on all of us than than anyone released a bunch of free tools that make it easier to switch your GPU compute platform that is insanely huge it sort of suggests that meta has been evaluating AMD Instinct products continuously for their own internal use and now they've sort of shared tools with the rest of the industry which will make it easier to adopt AMD Instinct based compute devices there's a kind of a tangent that's also related to that also related to The Meta release and it's Intel so Intel is working on their own gpus as well that's no secret why does that matter in an AMD video well Intel finds itself in the same position it's an uphill battle to provide an industry standard so that everybody is not super dependent on Cuda that makes AMD and Intel unlikely allies I mean this is this is uh one of the most unusual places where maybe Intel and AMD would be allies and they're not directly working together on anything as far as I know Intel's approach for this of course is one API but this is so important to Intel that with The Arc GPU release and some of the other news lately around Intel's GPU division that even their CEO Pat gelsinger has come out and said we want developers to be able to write their code once and run it anywhere does that sound familiar yeah it should that's kind of the promise here with everything that AMD is doing on the open side of things the AMD documentation is pretty good historically it's had some warts and it's still got Improvement to make but the rock M 5.3 documentation for Ubuntu 22.04 LTS installation went really super well and I was up and running immediately on this insane super micro system to be able to run whatever workload so I still got a little bit of work to do and I've got some other fun demonstrations that I want to do while I still have these mi210s I get to send this back pretty quick so I'm not going to have a lot of time to educate myself and learn everything that I need to learn in order to put this together which is why I'm going to lean on fluid numerics and maybe a little bit of neural engine maybe with an upcoming launch and some other fun stuff with that and maybe some other stuff that we can look at to sort of compare not every workload runs perfectly in a GPU scenario I mean Frontier has one really beefy CPU compute node for every four GPU compute nodes so a lot of stuff runs on the GPU but depending on what you're doing that might not necessarily make sense you got other stuff you have to worry about but you can still take advantage of the libraries that AMD offers which is pretty what exactly is the mi210 well it's a pcie add-in card just like a GPU but it's for compute acceleration it is 104 compute units with 6656 stream processors 416 Matrix scores it can do 22.6 teraflops in floating point 64 Vector floating point 64 Matrix that can do 45.3 teraflops floating point 64 and bf16 that's 181 teraflops and it's Peak int4 and int rate is also 181 teraflops it has 64 gigabytes gigabytes of hbm2e at a 496-bit bus width it's clocked at 1.6 gigahertz so that's 1.6 terabytes per second memory bandwidth now this full chip ECC support Ras support and you can have up to three Infinity fabric links it is coherency enabled so you can do dual and quad hives it supports 64-bit Linux and is fully Rock M compatible as I said it is a full height full length car hard I did a tear down with Gamers Nexus that I mentioned before it's a pcie Gen 4 support it also supports pass-through Sr iov and it's a 300 watt TDP card it's a three year warranty from from AMD now with six cards in this system with 64 gigs of hbm2e each yes that is like 384 gigabytes of vram at my disposal in just 2u of Rackspace in just under 2 000 watts of power that is an absurdum if you I mean I can't explain how absurd it is to have a petaflop of floating Point 16 in such a small space I mean this would be like showing somebody a cell phone from the 1960s it's it's completely it's just it boggles the mind now one of the most fun things that I saw from fluid numerics is the GPU accelerated Fortran Fortran the computer language if you're not familiar has a fun and storied history that goes back almost to the beginning of computing and there is a lot of Fortran software out there in the financial services industry it's it's sort of works and no one wants to touch it no one has wanted to touch it and no one continues to want to touch it so the code continues to run to do things Fortran makes sense because AMD with their instant cards has come to support openmp open multi-processing it's a library to make shared memory uh multi-processing compute programming a lot easier it's so well supported on Instinct that it's pretty easy to get pretty much anything that's designed for openmp to work with GPU acceleration and that includes Fortran there's actually a whole bunch of languages you should check out the Wikipedia article on that but openmp is so well supported on this platform that if you have existing code that's designed for openmp getting it up and running on this platform with six gpus across two nodes connected by 100 gigabit Ethernet interface between the nodes it's actually pretty easy to do oh and if you're into tensorflow I was really impressed with how easily Dr schoonover from fluid numerix was able to Port native tensorflow code directly to what AMD provides in the Rock M setup so that you can get that up and running you know with pi torch or well I mean whatever that you have but tensorflow runs really really quickly on these cards AMD includes that in their white paper as well just for Giggles to give you an idea of how easy it was to get up and running fees Ryan did a video on open BB the open Bloomberg terminal you can do your own stock trading and there's a lot of you in the audience that do that that's pretty exciting it turns out you can actually tie in these Instinct accelerators with openvb and that works pretty well getting Rock M up and running with tensorflow for AI analysis of data from the open Bloomberg trade terminal that's a thing you can do and having this much compute power at your disposal if you think you've got a new algorithm that's going to figure out the market and do options trading or whatever crazy thing it is that you want to do this is a relatively insane amount of horsepower for you know if you're a Trader not a lot of money and it was surprisingly easy to get an AI framework up and running with openbb in order to be to be able to do experiments and things like that I might actually put that in a different video although probably would have to bring in somebody that's a little bit more of an expert on the trading side to uh give us some actually good ideas because I have no idea what I'm doing on the actual trading side of it all I can do is uh janitor the technology together with the best of them oh and in case you didn't know AMD Powers is the number one computer on the top 500. top the number one on the green 500 and the number one on the hpl AI That's these cards the Instinct cards the very same cards that we're working with here remember what I said AMD and they're just Relentless execution just like in The Shining now because I've got the system configured in the cluster already if you have something you want to run on here you want to try to run on here hit me up let me know what your credentials are what you're going to try to run and I'll try to work it out you can email me Wendell level1tax.com or hit me up in the Forum or whatever you want to do I just need to be able to download whatever it is and run a script I can keep it private I can obscure the data you can have access to the system maybe we can work that out but let's try some interesting experiments does anybody know where NASA put the copy of the Space Shuttle I can run cfd computational fluid dynamics on I was looking for that the other day and it was sort of M.I.A I know the nist has some really interesting models for cfd but uh I needed a little bit more of a step by step on how to do that I've got some learning to do bottom line I want to know what the people at Oak Ridge know I mean those are really smart people they picked this system why did they pick this system I want to know why why did meta spend all the time doing what they did in order to better support amd's ecosystem and pytorch of course goes without saying I have a lot to learn now one of the things that was really exciting to me too is a low level tool but I want to share is being able to trace application performance you see when you're doing computer janitoring stuff it's usually a pretty good idea if you have the tools to figure out how things are going wrong so you sort of learn to figure out how to take a program apart so that you can watch it executing so that you understand better how it works that's not always easy to do when you have massively parallel things some of the most gifted programmers on planet Earth really struggle with programming in parallel and no one can program parallel C code which is why a whole bunch of other languages were invented at least no human being can because you know unsafe and memory types and that was a joke anyway the rock M profiling guide for version 5.3 has a really interesting guide on using their profiling tool to see what your application is doing what operation is it doing on the GPU is it doing compute is it doing a memory transfer do we have something happening where the GPU spends a lot of time waiting on a memory transfer and it does a compute can we be doing a memory transfer while we're doing a compute is that an option that's one of the example exercises that we saw when we were doing some of the demo code during the flu numerix training you can run this tool and actually generate a CSV showing you where your program has been all of its time and then you can decide to make changes to your program or make optimizations and this is a skill in the nicest possible way that I can say this some of the people that I fix things for have no idea how to do that or even to really think in those terms let alone use tools to sort of Divine information about the state of things in order to figure out what's going on so if I had one piece of advice for you it is to look at these tools that are provided for tracing and performance analysis because they're very good\n"

State of ROCm 5.3 in 2022 - 6x Mi210, 1petaflop, in the 2u Supermicro AS -2114GT-DNR

Random Videos