**The AMD-M Intel Math Library Update: A Comprehensive Analysis**
It appears that there has been a significant update to the math library, which is now modular and allows for better switching between CPUs. This change is likely due to the new architecture of the CPU, which will enable the switch between CPUs more efficiently. However, it's essential to verify this information through reputable sources, as the internet can be a challenging environment.
The author of this article has been analyzing the situation and has come across some interesting points. Firstly, Intel has made significant changes to their microarchitecture, which may have led to this update. It's also possible that AMD is benefiting from this development, as it allows for more open collaboration between scientists and researchers. The author acknowledges that this is a long shot, but it's an exciting possibility.
When running the benchmarks with the new math library, users should be aware of potential limitations on Zen processors. Unless they jump through hoops to patch some stuff, they may not get the performance they deserve. However, there are tools available that can help with this issue, such as Patch ELF, which can add a library that overrides the Intel check and always returns true. This could be useful for diagnostic purposes, but users should also be aware of potential side effects.
The author notes that the current implementation of AVX-512 is not without its challenges. Programs that implement AVX-512 explicitly check for AVX-512, which can lead to issues with Intel CPUs that don't have this feature. This means that AMD CPUs continue to work correctly, even when running on an Intel CPU that doesn't support AVX-512.
The author concludes that while there may be some challenges associated with the new math library, it's ultimately a positive development for the field of computer science. The opportunity for open collaboration and innovation is vast, and the author hopes that this update will lead to significant advancements in the field.
"WEBVTTKind: captionsLanguage: enso what's up with the intel mkl if you're using a non-intel processor is intel intentionally trying to cause problems for amd something rotten in denmark well that's a little more complicated than that let's try to talk through it the intel mkl is a math kernel library that's what mkl stands for it is a set of routines for doing math certain kinds of math operations really quickly and that probably means we're going to get a lot of terrible math jokes in this video because what other opportunity would i have for doing terrible math jokes these routines are optimized highly highly optimized for running on intel processors intel processors i'll come back to that x86 processors in general but intel processors is what they're optimized for because believe it or not the different ways of doing math some are faster on processors than others depending on how you're actually executing it on silicon and that is very interesting in a whole other world of crazy i'm going to introduce you to some resources for people that are into that level of world of crazy in terms of how long does this instruction take to complete because there's a lot more variables than you would think i mean in addition to the different instructions for doing different kinds of mathematical operations and you know you break a large complicated mathematical operation into smaller pieces there are other constraints power constraints because you know processors turbo until they reach a power limit or they reach a thermal limit and different kinds of mathematical operations may cause the computer to throttle so one math operation might be faster until the processor hits a certain thermal wall and then another math operation becomes faster and so if you're a researcher and you're running these things continuously chances are turbo and stuff like that doesn't really matter much to you unless the processor can sustain those turbo frequencies for an extended period of time because chances are whatever it is that you're doing fancy complicated math four is going to run for hours or days or weeks or months or years anyway before i get off track since time immemorial uh there has been product segmentation from intel on intel processors uh you know going all the way back from before the pentium you know 486 there's sx and dx 386 sx and dx there were different versions of the 286 as well but the sx and dx denoted a different functionality that was available basically do you want a math coprocessor even all the way back to the 8086 days 888 days you could get a math coprocessor like a physical chip that you would install and then that would you know offer you math functions and then later uh they sort of cheated a little bit the the math coprocessor would actually just shut off the main processor because it was cheaper and easier to just do everything on one processor than to try to shuttle things back and forth between the sx processor and then it's you know 487 dx math coprocessor they would actually just shut off the 486 sx and then just do everything on the 486 dx because the other option was more complicated and we have avx 512 i mean avx 512 but there's like 74 different versions of avx 512 which has really been a large part of its uptake problem and avx 512 also uh has some of the thermal and power constraints like i was talking about at least on skylake but to a lesser extent that's not as true as it once was on cascade lake on the server but then do people really need that on the desktop there's a lot of little tendrils in this conversation that is related to the modern parts of computer science that really keep me excited because like at the end of the day all you need to know about me is that i'm a computer janitor but i really actually enjoy solving the puzzles to me you know like the blue screen why doesn't this work is a holmesian mystery that uh if i'm not careful i'm gonna get sucked into and figuring it out because the answer is usually really interesting and so we have uh you know the mob maybe is grabbing their torches and pitchforks a little bit because they're looking at this intel mkl thing so the intel math kernel library their stated mission is their goal is to be the best performing math library period and this is even something that they'll say and answer on the math kernel library forums because there's a large community using these intel tools intel employs a huge number of very brilliant software developers intel is doing a lot of really cool amazing stuff with their software and their integration more so than just their processors more so than any kind of like fabrication or you know process problems that they they may be having any kind of like uptake problems they may have with avx 512 but from a computer science standpoint avx 512 is genuinely very exciting and these really high performance cpus from you know literally everybody i mean we've got arm entering the picture i've always been excited about the mar-vell you know ibm open power stuff is sort of lurking in the background all of this stuff is genuinely very very exciting to me but uh with the intel math kernel library there was a workaround in 2019 that would say uh just use avx2 no matter what like don't check the cpu just use avx too and the avx2 implementation on amd zen cpus is actually quite good it's shockingly good it is uh giving intel a run for its money if we're really honest about the situation and if you're a researcher or a computer scientist believe it or not helping some people with their their phds and their their research and you know just doing the computer janitorial tasks for those brilliant folks uh they super don't really want to play with things as much as i do homezy and mystery it's like no i don't really care it's like oh it's stuck up again yeah well just uh can you get it unstuck that's where i come in and so you dig under the covers to find out why this is because in you know with uh when ryzen 2000 launched the work around was basically you could set an environment variable and just tell the intel mkl which is that version kind of was not really amd aware and so you could look at it and say okay the intel wasn't really planning for this you just tell it no just use avx2 the avx2 instruction set get it done and you can do it and avx2 is kind of like mmx and kind of like the whole sx and dx thing because in the beginning there was sx and dx and then there was the pentium and then the pentium with mmx multimedia extensions and you could get it you know with and without and then you can only get it with mmx and then there was sse and sse2 these are all different instruction sets that have been added over the years and there is still some product segmentation i mean even amd has done it you know it's like oh i've got a phenom too yeah it's sorry you're not going to get very many very very many uh vector instructions there that's just not it's not something that you get well you get a little bit of stuff on the phenom too but not much and intel with you know their pentium and cellar on lines they don't last i checked i don't think they even enable avx2 on those which is you know that's i mean that's five years on those i think that's a mistake but you know and it's what do i know so we're in a situation where those avx ii amd instructions with that workaround work great and so no matter what you're doing with this math library i mean it can be vector multiplication or fast fourier transform or any kind of like really pretty high-end math well i mean that's linear algebra but it's mostly linear algebra but any kind of high-end math that a researcher is going to want to do you can count on this library to look at what silicon you're running figure it out and go from there so the rotten part is that it looks like this library pretty consistently is just looking at are you intel or are you not intel if you're not intel then we're not going to bother to figure it out but there are mechanisms in the processor to figure out what instruction sets are available so you ask the processor and it's like okay i've got avx2 i've got avx i've got sse sse2 ssc 3.1 ssc4 i've got fma i've got fma2 but not fme3 fma3 some of those instructions work some of them don't let's just pretend fma3 is not there so the processor will tell you what it can do and uh other competing libraries like open blasts open basic linear algebra system which is a competing system for the intel mkl which is not quite as optimized generally not quite as fast it depends on what you do some things are okay some things you can sort of hand optimize it but again we're sort of back to that researchers don't really they just want to hit a button and go and that's really like companies have definitely exploited that for commercial game intel and nvidia which is not necessarily a bad thing if that's your that's your business model i mean they make it easy and they can profit from that i mean that's fine but um it is perhaps a little unsettling that instead of having stuff in the mkl that says what instruction sets are available and it's instead saying is this an intel cpu and there's a there are some things that we can do even though the old hack sort of works so last year in 2019 you could just say use avx2 no matter what with the debug symbol and it would go on and it would use like say the avx2 code path and that works great because avx2 on zen cpus is genuinely great it's genuinely awesome stuff people notice the performance regression about up 15 to 20 performance regression not across the board but with a lot of functions in the 2020 update one from intel and we just had 2020 update too and so i was looking through the change log for 2020 update 1 and 2020 update 2 and it doesn't mention anything about amd specific kernels or amd specific code paths or changes or optimizations that have been made i'm looking at this and it really does look like the intel team is building zen specific code paths that can actually take advantage of zen cpus so depending on what the function that you're running is like uh s e g m versus deg mm um those new code pathways that are in update two are not awful like they're okay but it's still a fairly big difference because like one of the bundled benchmarks with the mkl system uh we'll do about 370 gigaflops on a 2000 series verizon i was able to confirm that i saw some people on the internet do the same sort of testing and uh you can squeeze a little bit more out of it with the the zen pathway but if you use the old debug symbol workaround that no longer works it doesn't work the same way and so we can't use an environment variable as workaround anymore but daniel who is working as a researcher uh has offered a clever workaround which is basically the name of the function in the intel mkl intel underscore serve underscore intel underscore cpu underscore true and that returns one if it's an intel cpu at zero if it's not so you just patch this executable to always return one and you're good to go now if you're doing something with avx 512 obviously that's not gonna work because avx 512 on you know an amd cpu is not going to work because ndcps don't have avx 512 but if you need to do a test to force the avx2 code pathway that works perfectly fine and don't you know if you do that and you run mt-s-g-e-m to do your benchmark again look at that you're getting 800 gigaflops per second 851 give or take because you're actually going through the avx-2 code path once again and so it's getting a little harder to do the work around with the intel mkl uh with the whole environment variable thing and i also question this because it's checking to see if it's an intel cpu not what instruction set is supported again the stated goal of intel mkl is to be the best mkl possible intel is so large and has so many resources and so many like brilliant phd level uh people working for them i at this point i refuse to believe that uh this is deliberate shenanigans or at least this is the deliberate shenanigans on the part of large swaths of intel employees this is probably a bad timing or an oversight and intel will probably make a statement about this because at least in research circles this probably will have legs and will be carried by a lot of outlets and that's one of the reasons that i'm doing a video on it because it does look not great like the the optics here are uh not the best for intel especially you know given some historical considerations um but the good news is that amd actually does have some optimizations for open blasts or for blasts basically in your algebra system and they have that on their web page and so i think amd is aware that hey we need to have some tools that are readily available for researchers and it needs to be push button and all this other kind of thing in the past people would say things like um oh this you know this is a great product because it has mind share of the users already so like if you're working on a say a machine learning project or a you know research project that needs crunching of numbers and you go to your research committee or your phd committee and it's like here's my idea here's how i'm going to do this and they ask you and they get into the details with you if those people have mind share of the kinds of things you're talking about it's going to be easier to talk them into doing something and i don't think that the intel mkl has a particularly high mind share just in the it is easy to use it is very well put together it is very well polished intel has a great and busy forum that goes with that and there are a lot of people doing really cool stuff there's a lot of people doing really like they're talking about really interesting projects and it's hard not to get lost there looking at all the interesting projects people are doing now i will say that intel and doing all the work on the mkl if they want it to be only for intel processors that's fine put that in the license agreement and be upfront about it don't you know do shenanigans with the the checks but i think that intel r d is smart enough and forward thinking enough that they really don't need to do those kind of tricks and i would be surprised and disappointed if that's really what it is so i think that we will see more stuff in the change logs from intel addressing this if we don't actually get some sort of direct response um if you know on the other hand uh if the work around the new workaround where you're literally patching the binary stops working then i don't i will i'll do another video and i think that it'll be uh much more interesting to talk about intent like what was intended with those changes because i mean this is kind of like the 2019 to 2020 update that's kind of a big update it's possible that the environment variable just sort of fell out accidentally because of the new architecture because it does look like this new architecture is going to be able to uh switch around cpus a lot and we know that intel has major major micro architecture changes coming for their own processors down the road and so these changes could be a result of intel um making changes preemptively knowing that new micro architecture and like new silicon like we saw the tiles thing with the intel xe stuff but at the same time you really have to uh trust but verify or verify the crap out of it and even though some of this is not fully open although most of it's it's open source enough the internet is a much different place than it was even five or ten years ago and so with communities like ours like you're watching this video and you're putting this together we will be able to supervise the situation and lay bare to exactly what the situation is if it turns out to be uh something a little bit shady or something a little bit sideways it's going to be really easy to diagnose that and the optics of that like i say would be much worse and so i really doubt that that is what's going to come to fruition here that said it's also possible for amd to benefit from all the work being done in this space i mean if intel's stated goal here is true that this should be the best math library that's available then this new modular architecture that allows zen kernels will allow open source scientists and researchers even people on amd's own team to contribute to the project with their own kernel potentially i mean i'm maybe that's a little pie in the sky maybe it's a little wishful thinking but certainly if not there's open blasts and open blast with a few million dollars of funding and some really smart phds at the helm it's not going to be an issue for now when you run the benchmarks just out of the box though with mkl you should be aware that if you're running on a zen processor so like threadripper or zen ryzen 2000 ryzen 3000 ryzen 4000 you're probably not getting the performance that you should unless you jump through some hoops to uh you know patch some stuff if you have a compiled binary that uh you don't have the source to that's okay there's a utility called patch elf and that's you know what daniel dk mentions in his blog post here you can use patch elf to literally add a library that has an override function that will override that intel check and it will always just return true um that would certainly be useful for diagnostic purposes if nothing else but there could be side effects if your binary depends on say apx 512. in general program should check instruction sets not the type of cpu because you know avx 512 it's probably going to turn out to be safe with avx 512 like i keep going back to that but for this video i actually struggled to find an example where that worked because every program that actually implements avx 512 explicitly checks for avx 512 so in addition to running this they also do the avx 512 check which is kind of hilarious because i think if intel hadn't segmented the market so badly with avx 512 that they probably wouldn't have done that it's like oh it's an intel cpu oh it's got avx 512 we can count on that but because you can't and because there's so many different versions of avx 512 at least for this math library in addition to checking to see if is it an intel cpu the programs that incorporate this library also check for avx 512 so you can have an intel cpu that doesn't have avx 512 which is not as uncommon as you would think that it is and because of that amd cpus continue to work correctly because oh you're you've got avx2 but you don't have avx 512 oh you're one of those intel cpus and the amdcp is over the corner yes that is exactly correct that everything is fine nothing is on fire and well this is level one hopefully this is not too long of a ramble i had some fun putting this together and uh you should check out the uh the stuff on the the intel website because there's a lot of good stuff there and the discussion forums and stuff all right i'm signing out and i'll see you later so you want to do really fast linear algebra what do you call a baby eigen sheep ah a lamb duh i like the pandemic version of that joke a lot better which is uh what do you get when you cross a mountain goat with a mosquito well you can't because that's crossing a scalar with a vector so yeahso what's up with the intel mkl if you're using a non-intel processor is intel intentionally trying to cause problems for amd something rotten in denmark well that's a little more complicated than that let's try to talk through it the intel mkl is a math kernel library that's what mkl stands for it is a set of routines for doing math certain kinds of math operations really quickly and that probably means we're going to get a lot of terrible math jokes in this video because what other opportunity would i have for doing terrible math jokes these routines are optimized highly highly optimized for running on intel processors intel processors i'll come back to that x86 processors in general but intel processors is what they're optimized for because believe it or not the different ways of doing math some are faster on processors than others depending on how you're actually executing it on silicon and that is very interesting in a whole other world of crazy i'm going to introduce you to some resources for people that are into that level of world of crazy in terms of how long does this instruction take to complete because there's a lot more variables than you would think i mean in addition to the different instructions for doing different kinds of mathematical operations and you know you break a large complicated mathematical operation into smaller pieces there are other constraints power constraints because you know processors turbo until they reach a power limit or they reach a thermal limit and different kinds of mathematical operations may cause the computer to throttle so one math operation might be faster until the processor hits a certain thermal wall and then another math operation becomes faster and so if you're a researcher and you're running these things continuously chances are turbo and stuff like that doesn't really matter much to you unless the processor can sustain those turbo frequencies for an extended period of time because chances are whatever it is that you're doing fancy complicated math four is going to run for hours or days or weeks or months or years anyway before i get off track since time immemorial uh there has been product segmentation from intel on intel processors uh you know going all the way back from before the pentium you know 486 there's sx and dx 386 sx and dx there were different versions of the 286 as well but the sx and dx denoted a different functionality that was available basically do you want a math coprocessor even all the way back to the 8086 days 888 days you could get a math coprocessor like a physical chip that you would install and then that would you know offer you math functions and then later uh they sort of cheated a little bit the the math coprocessor would actually just shut off the main processor because it was cheaper and easier to just do everything on one processor than to try to shuttle things back and forth between the sx processor and then it's you know 487 dx math coprocessor they would actually just shut off the 486 sx and then just do everything on the 486 dx because the other option was more complicated and we have avx 512 i mean avx 512 but there's like 74 different versions of avx 512 which has really been a large part of its uptake problem and avx 512 also uh has some of the thermal and power constraints like i was talking about at least on skylake but to a lesser extent that's not as true as it once was on cascade lake on the server but then do people really need that on the desktop there's a lot of little tendrils in this conversation that is related to the modern parts of computer science that really keep me excited because like at the end of the day all you need to know about me is that i'm a computer janitor but i really actually enjoy solving the puzzles to me you know like the blue screen why doesn't this work is a holmesian mystery that uh if i'm not careful i'm gonna get sucked into and figuring it out because the answer is usually really interesting and so we have uh you know the mob maybe is grabbing their torches and pitchforks a little bit because they're looking at this intel mkl thing so the intel math kernel library their stated mission is their goal is to be the best performing math library period and this is even something that they'll say and answer on the math kernel library forums because there's a large community using these intel tools intel employs a huge number of very brilliant software developers intel is doing a lot of really cool amazing stuff with their software and their integration more so than just their processors more so than any kind of like fabrication or you know process problems that they they may be having any kind of like uptake problems they may have with avx 512 but from a computer science standpoint avx 512 is genuinely very exciting and these really high performance cpus from you know literally everybody i mean we've got arm entering the picture i've always been excited about the mar-vell you know ibm open power stuff is sort of lurking in the background all of this stuff is genuinely very very exciting to me but uh with the intel math kernel library there was a workaround in 2019 that would say uh just use avx2 no matter what like don't check the cpu just use avx too and the avx2 implementation on amd zen cpus is actually quite good it's shockingly good it is uh giving intel a run for its money if we're really honest about the situation and if you're a researcher or a computer scientist believe it or not helping some people with their their phds and their their research and you know just doing the computer janitorial tasks for those brilliant folks uh they super don't really want to play with things as much as i do homezy and mystery it's like no i don't really care it's like oh it's stuck up again yeah well just uh can you get it unstuck that's where i come in and so you dig under the covers to find out why this is because in you know with uh when ryzen 2000 launched the work around was basically you could set an environment variable and just tell the intel mkl which is that version kind of was not really amd aware and so you could look at it and say okay the intel wasn't really planning for this you just tell it no just use avx2 the avx2 instruction set get it done and you can do it and avx2 is kind of like mmx and kind of like the whole sx and dx thing because in the beginning there was sx and dx and then there was the pentium and then the pentium with mmx multimedia extensions and you could get it you know with and without and then you can only get it with mmx and then there was sse and sse2 these are all different instruction sets that have been added over the years and there is still some product segmentation i mean even amd has done it you know it's like oh i've got a phenom too yeah it's sorry you're not going to get very many very very many uh vector instructions there that's just not it's not something that you get well you get a little bit of stuff on the phenom too but not much and intel with you know their pentium and cellar on lines they don't last i checked i don't think they even enable avx2 on those which is you know that's i mean that's five years on those i think that's a mistake but you know and it's what do i know so we're in a situation where those avx ii amd instructions with that workaround work great and so no matter what you're doing with this math library i mean it can be vector multiplication or fast fourier transform or any kind of like really pretty high-end math well i mean that's linear algebra but it's mostly linear algebra but any kind of high-end math that a researcher is going to want to do you can count on this library to look at what silicon you're running figure it out and go from there so the rotten part is that it looks like this library pretty consistently is just looking at are you intel or are you not intel if you're not intel then we're not going to bother to figure it out but there are mechanisms in the processor to figure out what instruction sets are available so you ask the processor and it's like okay i've got avx2 i've got avx i've got sse sse2 ssc 3.1 ssc4 i've got fma i've got fma2 but not fme3 fma3 some of those instructions work some of them don't let's just pretend fma3 is not there so the processor will tell you what it can do and uh other competing libraries like open blasts open basic linear algebra system which is a competing system for the intel mkl which is not quite as optimized generally not quite as fast it depends on what you do some things are okay some things you can sort of hand optimize it but again we're sort of back to that researchers don't really they just want to hit a button and go and that's really like companies have definitely exploited that for commercial game intel and nvidia which is not necessarily a bad thing if that's your that's your business model i mean they make it easy and they can profit from that i mean that's fine but um it is perhaps a little unsettling that instead of having stuff in the mkl that says what instruction sets are available and it's instead saying is this an intel cpu and there's a there are some things that we can do even though the old hack sort of works so last year in 2019 you could just say use avx2 no matter what with the debug symbol and it would go on and it would use like say the avx2 code path and that works great because avx2 on zen cpus is genuinely great it's genuinely awesome stuff people notice the performance regression about up 15 to 20 performance regression not across the board but with a lot of functions in the 2020 update one from intel and we just had 2020 update too and so i was looking through the change log for 2020 update 1 and 2020 update 2 and it doesn't mention anything about amd specific kernels or amd specific code paths or changes or optimizations that have been made i'm looking at this and it really does look like the intel team is building zen specific code paths that can actually take advantage of zen cpus so depending on what the function that you're running is like uh s e g m versus deg mm um those new code pathways that are in update two are not awful like they're okay but it's still a fairly big difference because like one of the bundled benchmarks with the mkl system uh we'll do about 370 gigaflops on a 2000 series verizon i was able to confirm that i saw some people on the internet do the same sort of testing and uh you can squeeze a little bit more out of it with the the zen pathway but if you use the old debug symbol workaround that no longer works it doesn't work the same way and so we can't use an environment variable as workaround anymore but daniel who is working as a researcher uh has offered a clever workaround which is basically the name of the function in the intel mkl intel underscore serve underscore intel underscore cpu underscore true and that returns one if it's an intel cpu at zero if it's not so you just patch this executable to always return one and you're good to go now if you're doing something with avx 512 obviously that's not gonna work because avx 512 on you know an amd cpu is not going to work because ndcps don't have avx 512 but if you need to do a test to force the avx2 code pathway that works perfectly fine and don't you know if you do that and you run mt-s-g-e-m to do your benchmark again look at that you're getting 800 gigaflops per second 851 give or take because you're actually going through the avx-2 code path once again and so it's getting a little harder to do the work around with the intel mkl uh with the whole environment variable thing and i also question this because it's checking to see if it's an intel cpu not what instruction set is supported again the stated goal of intel mkl is to be the best mkl possible intel is so large and has so many resources and so many like brilliant phd level uh people working for them i at this point i refuse to believe that uh this is deliberate shenanigans or at least this is the deliberate shenanigans on the part of large swaths of intel employees this is probably a bad timing or an oversight and intel will probably make a statement about this because at least in research circles this probably will have legs and will be carried by a lot of outlets and that's one of the reasons that i'm doing a video on it because it does look not great like the the optics here are uh not the best for intel especially you know given some historical considerations um but the good news is that amd actually does have some optimizations for open blasts or for blasts basically in your algebra system and they have that on their web page and so i think amd is aware that hey we need to have some tools that are readily available for researchers and it needs to be push button and all this other kind of thing in the past people would say things like um oh this you know this is a great product because it has mind share of the users already so like if you're working on a say a machine learning project or a you know research project that needs crunching of numbers and you go to your research committee or your phd committee and it's like here's my idea here's how i'm going to do this and they ask you and they get into the details with you if those people have mind share of the kinds of things you're talking about it's going to be easier to talk them into doing something and i don't think that the intel mkl has a particularly high mind share just in the it is easy to use it is very well put together it is very well polished intel has a great and busy forum that goes with that and there are a lot of people doing really cool stuff there's a lot of people doing really like they're talking about really interesting projects and it's hard not to get lost there looking at all the interesting projects people are doing now i will say that intel and doing all the work on the mkl if they want it to be only for intel processors that's fine put that in the license agreement and be upfront about it don't you know do shenanigans with the the checks but i think that intel r d is smart enough and forward thinking enough that they really don't need to do those kind of tricks and i would be surprised and disappointed if that's really what it is so i think that we will see more stuff in the change logs from intel addressing this if we don't actually get some sort of direct response um if you know on the other hand uh if the work around the new workaround where you're literally patching the binary stops working then i don't i will i'll do another video and i think that it'll be uh much more interesting to talk about intent like what was intended with those changes because i mean this is kind of like the 2019 to 2020 update that's kind of a big update it's possible that the environment variable just sort of fell out accidentally because of the new architecture because it does look like this new architecture is going to be able to uh switch around cpus a lot and we know that intel has major major micro architecture changes coming for their own processors down the road and so these changes could be a result of intel um making changes preemptively knowing that new micro architecture and like new silicon like we saw the tiles thing with the intel xe stuff but at the same time you really have to uh trust but verify or verify the crap out of it and even though some of this is not fully open although most of it's it's open source enough the internet is a much different place than it was even five or ten years ago and so with communities like ours like you're watching this video and you're putting this together we will be able to supervise the situation and lay bare to exactly what the situation is if it turns out to be uh something a little bit shady or something a little bit sideways it's going to be really easy to diagnose that and the optics of that like i say would be much worse and so i really doubt that that is what's going to come to fruition here that said it's also possible for amd to benefit from all the work being done in this space i mean if intel's stated goal here is true that this should be the best math library that's available then this new modular architecture that allows zen kernels will allow open source scientists and researchers even people on amd's own team to contribute to the project with their own kernel potentially i mean i'm maybe that's a little pie in the sky maybe it's a little wishful thinking but certainly if not there's open blasts and open blast with a few million dollars of funding and some really smart phds at the helm it's not going to be an issue for now when you run the benchmarks just out of the box though with mkl you should be aware that if you're running on a zen processor so like threadripper or zen ryzen 2000 ryzen 3000 ryzen 4000 you're probably not getting the performance that you should unless you jump through some hoops to uh you know patch some stuff if you have a compiled binary that uh you don't have the source to that's okay there's a utility called patch elf and that's you know what daniel dk mentions in his blog post here you can use patch elf to literally add a library that has an override function that will override that intel check and it will always just return true um that would certainly be useful for diagnostic purposes if nothing else but there could be side effects if your binary depends on say apx 512. in general program should check instruction sets not the type of cpu because you know avx 512 it's probably going to turn out to be safe with avx 512 like i keep going back to that but for this video i actually struggled to find an example where that worked because every program that actually implements avx 512 explicitly checks for avx 512 so in addition to running this they also do the avx 512 check which is kind of hilarious because i think if intel hadn't segmented the market so badly with avx 512 that they probably wouldn't have done that it's like oh it's an intel cpu oh it's got avx 512 we can count on that but because you can't and because there's so many different versions of avx 512 at least for this math library in addition to checking to see if is it an intel cpu the programs that incorporate this library also check for avx 512 so you can have an intel cpu that doesn't have avx 512 which is not as uncommon as you would think that it is and because of that amd cpus continue to work correctly because oh you're you've got avx2 but you don't have avx 512 oh you're one of those intel cpus and the amdcp is over the corner yes that is exactly correct that everything is fine nothing is on fire and well this is level one hopefully this is not too long of a ramble i had some fun putting this together and uh you should check out the uh the stuff on the the intel website because there's a lot of good stuff there and the discussion forums and stuff all right i'm signing out and i'll see you later so you want to do really fast linear algebra what do you call a baby eigen sheep ah a lamb duh i like the pandemic version of that joke a lot better which is uh what do you get when you cross a mountain goat with a mosquito well you can't because that's crossing a scalar with a vector so yeah\n"