#GTC24 - PCIe (and SXM) Supermicro HGX At The Edge, Petascale Storage In 1U, And Omniverse

The Promise of Digital Twin Technology

The concept of digital twin technology has been gaining significant attention in recent years, and it's exciting to see how it's being applied to various industries. The idea of creating a virtual replica of an object, system, or process is not new, but the advancements in AI, machine learning, and computing power have made it more feasible than ever. In the context of robotics, digital twin technology can be used to simulate and optimize the performance of robots, making them more efficient and effective.

For instance, when designing a robot, engineers would first create a digital twin of the system using specialized software. This digital twin would then be tested and simulated in various scenarios, allowing engineers to identify areas for improvement and optimize the design before actually building the physical robot. Once the design is finalized, the best possible robotic hand or mechanism would be deployed on the robot, and it would just work as intended. The promise of digital twin technology lies in its ability to take this process to the nth degree, allowing for the creation of highly accurate simulations of complex systems like buildings, data centers, and even individual robots.

The Limitations of Traditional Rack Systems

Traditional rack systems, such as those based on PCIe, have limitations when it comes to providing high-speed connectivity. For example, traditional PCIe cards can only support up to 4 lanes, which limits the number of devices that can be connected. To overcome this limitation, researchers are exploring alternative architectures, such as the H100 and B100 systems. These systems use a different type of interconnect, called NVLink, which provides much faster speeds than traditional PCIe.

The Importance of Storage

Storage is another critical component in any system, and it's essential to have a reliable and fast storage solution. Traditional hard drives are not suitable for high-performance applications, as they can slow down the entire system. To overcome this issue, researchers are exploring new types of storage solutions, such as NVMe over Fabric (NoF). NoF allows for fast storage and retrieval of data over long distances, making it an attractive option for high-performance computing applications.

The Role of Super Micro in Advancing Digital Twin Technology

Super Micro is a leading manufacturer of server systems, and they're playing a significant role in advancing digital twin technology. Their new Blackwell system is designed to provide high-performance computing capabilities, including liquid cooling and two A1 100s connected together with an NVLink bridge. This setup enables fast processing and simulation of complex systems, making it ideal for applications like AI and Omniverse.

The DevOps-in-a-Box Solution

Super Micro's new Blackwell system is more than just a high-performance computing platform; it's also a DevOps-in-a-box solution. This means that customers can get everything they need to develop, test, and deploy their applications in one convenient package. The system comes with a large language model, voice recognition, and video production capabilities, making it an ideal choice for AI developers.

Touring Super Micro's Campus

The author had the opportunity to tour Super Micro's campus, which was a fascinating experience. They got to see behind the scenes of their new water-cooled systems and Blackwell systems, as well as other innovative technologies that are being developed by Super Micro. The tour gave them a deeper understanding of the company's mission and vision, and they were impressed by the level of expertise and innovation on display.

Conclusion

In conclusion, digital twin technology has the potential to revolutionize various industries, including robotics and computing. With advancements in AI, machine learning, and computing power, it's becoming increasingly feasible to create highly accurate simulations of complex systems. Super Micro is at the forefront of this development, with their new Blackwell system providing high-performance computing capabilities and a DevOps-in-a-box solution for developers. As the technology continues to evolve, we can expect to see even more exciting developments in the years to come.

"WEBVTTKind: captionsLanguage: enI'm here at GTC taking a look at Super micro's offerings in the hgx segment hgx that's Nvidia vernacular but they're rolling out the b200 and the B100 to integrate with their hgx platform and hgx really it just means that it's more like a recognizable form factor like you would recognize this as a Mount server whereas mgx and what Nvidia calls dgx or their full data center Solutions or full rack Solutions are uh more aimed at buying lots and lots and lots of machines this is a single machine based around you know eight or so h100s or uh l40 s's in this case or whatever so maybe maybe you just need pcie connectivity maybe your organization is not ready well there's this it's five rack units which is pretty awesome but it's got tons of connectivity you can you can run 8 to 10 pcie systems in this 350 WS they're outfitted with l4s gpus right now and it's a dual socket platform so you've got a lot of PCI connectivity to begin with but normal pcie yeah perfect so this is one of the one of the very powerful system uh we call it our industry Workhorse this can support up to 10 double bit gpus and right now as you rightfully said this is you know uh 8 L4 PS gpus here um this system is really designed uh according to nvidia's guidelines and it's really really supports really well for the Omniverse and metaverse those type of applications and uh perfect fit for if you are looking for best performance per dollar and so this is for the multi-gpu interface it's really just using pcie as its crossbar to support scaling and all that kind of stuff it's not quite as good as NV link but it still gets the job done for a ton of customers absolutely because not all customers are looking for or sheer performance yeah some most of them they're looking for best performance per dollar or scalability I just need to run a million clients with this model or whatever yeah no that's awesome as a platform for Omniverse this seems like this would be a pretty good Lego brick in in that kind of a solution tell me about Omniverse absolutely so Omniverse and you know the the digital Computing that is getting lots and lots of traction these days and what we did this basically this system is built exactly according to nvidia's guidelines and on that it should be dual root complex so that is here so all whatever you know nvd has reference designs mentions you know everything is pretty much you know covered on this platform it's starting with you can put up to 8 l4s or l40 gpus and up to five Nicks so all you know whatever their um guideline supports or the I should say their reference design is that is pretty much supported here yeah so building your own private Omniverse you know just build a couple of racks of these and it's ready to plug in with whatever oh of course yes yes yes it is do nice Omniverse Nvidia Omniverse what the heck does Omniverse mean I met a robotics expert who's going to be doing fun interesting stuff well I can't tell you what Omniverse means but I can tell you what it is uh it is NVIDIA 3D modeling simulation robotics toolkits they're all built on their Hardware so it's Cuda all the way down and a bunch of other Nvidia software stuff and another like other Nvidia Hardware all the way down along with things that make deployment on Hardware easier so I'm a robotics person I care about the Jetson and if you look over there there's a Boston Dynamics robot that allows you to do reinforcement learning stuff in Omniverse and then you can deploy the same things that you train in simulation on hardware and yeah Omniverse is just a bunch of tools so the idea is that you download the Boston Dynamics robot in the API and simulated in the API and the digital simulation is high enough Fidelity that you can just copy your program into the real robot when you're ready so like all of the training everything that you did everything for the environment everything that was not real and that was done in software you can then copy to a real imagine playing a video game and figuring out everything in the video game and then you copy what you learned in the video game into a real robot and it works that's what Nvidia just released for the the Boston Dynamics robot yep and you know speaking of the video game thing you can actually play a few million video games at the same time if you have a big enough GPU and you can learn all the cool things all the Easter eggs all the specialized skills and once you're done with that you deploy the best one on the robot and it just works hopefully yeah that's the part of the digital twin like you can take the digital twin thing to the nth degree everything from buildings and data centers all the way down to individual robots machine learning Vision simulation the robotic hands that we've seen that's the promise of this platform standard rack systems for hgx are not just limited to pcie because well I mean if you want a full eight-way EnV link setup you're going to need uh something a little different super micros got you covered in a similar 5u chassis but with completely different GPU interconnects you can get modular h100s and soon B100 and this version of Blackwell is coming a lot sooner than the pcie version at least from what I understand not just PCI connectivity suppose you want full 8way inv vlink connectivity it's hard to do that with bridges on pcie cards but super micros got you covered in a similar 5u chassis so not just AI but also storage you know some people have pedestrian needs like storage also we're doing generative AI so we need to store things and we need to store things really quickly super micro has a system here to show you some of the new form factors and the stuff that's coming they support up to one pyte in a oneu form factor not in this particular chassis but it is something that that is coming very quickly this chassis is interesting because we've got e3s 2T it's a double hey version of e3s but in this slot in this configuration it actually supports cxl modules cxl could be really useful but customer adoption and some other form factors I don't know it remains to be seen a little bit it's really handy to have a cxl type device when you're doing transactions like in a database server where the machine could go down and you need to be able to recover the database to the moment of the of the crash you can also do cxl where you don't have to have the entire system memory backed up similarly but then you've also got four e3. S slots here for normal storage you know you want to run four or 800 gbit Ethernet or infiniband those are just pcie devices as well this will support all of the storage needs that you have for these uh full rack scale systems we don't want Network to be the bottleneck uh for the storage because ultimately the goal of this is to provide high performance storage so we we kind of literally split the number of PC Express lenses in half in half of it is uh supported uh for the drives and the other half in the back I don't know if you can actually get to that you have like well I think we can eventually get there we have like four of the PC Express gen 5x6 so you can actually have any kind of Ethernet or infinite band of your choice depending on what storage you get well I'm I'm really excited by the perfect balance between pcie lanes in and out because eventually if PCI Express Fabrics take over that gives you a direct pcie interface so you could do nvme over Fabric or whatever the software stack will support we do have people talking about like you know how the fabric extension Works NVM or fabric but you know as you say it all depends on the customer uh preference and how the overall ecosystem is going to develop around our Hardware looking you know this is looking very promising and exciting and you know I'm super excited about it this is Dev Ops in a box basically it's the data center it's the data center in a box super micro has a platform that has a liquid cooling and two A1 100s in it that are connected together with an nvlink bridge and that is running this AI assistant demo you can just talk to it it's a large language model but it also has voice recognition and it produces video and so you can ask it about a super micro product and it will tell you it's like hey what super micro products would be good for AI what would be good for Omniverse and it will just show you which is pretty awesome now I get to come to GTC because of Super Micro I got to tour super micro's campus so big thanks to Super Micro for bringing me out here and letting me take a look behind the scenes at their new water cooled systems and their new Blackwell systems and their new h 100 systems and everything else and yeah everything with an h100 has basically dropped in ready for Blackwell but the first form factors that are coming for Blackwell these fully Integrated Systems Nvidia definitely targeting the data center stay tuned for my other videos including my oncampus tour of Super Micro which I really enjoyed I got to see behind the curtain and oh boy a lot of interesting stuffI'm here at GTC taking a look at Super micro's offerings in the hgx segment hgx that's Nvidia vernacular but they're rolling out the b200 and the B100 to integrate with their hgx platform and hgx really it just means that it's more like a recognizable form factor like you would recognize this as a Mount server whereas mgx and what Nvidia calls dgx or their full data center Solutions or full rack Solutions are uh more aimed at buying lots and lots and lots of machines this is a single machine based around you know eight or so h100s or uh l40 s's in this case or whatever so maybe maybe you just need pcie connectivity maybe your organization is not ready well there's this it's five rack units which is pretty awesome but it's got tons of connectivity you can you can run 8 to 10 pcie systems in this 350 WS they're outfitted with l4s gpus right now and it's a dual socket platform so you've got a lot of PCI connectivity to begin with but normal pcie yeah perfect so this is one of the one of the very powerful system uh we call it our industry Workhorse this can support up to 10 double bit gpus and right now as you rightfully said this is you know uh 8 L4 PS gpus here um this system is really designed uh according to nvidia's guidelines and it's really really supports really well for the Omniverse and metaverse those type of applications and uh perfect fit for if you are looking for best performance per dollar and so this is for the multi-gpu interface it's really just using pcie as its crossbar to support scaling and all that kind of stuff it's not quite as good as NV link but it still gets the job done for a ton of customers absolutely because not all customers are looking for or sheer performance yeah some most of them they're looking for best performance per dollar or scalability I just need to run a million clients with this model or whatever yeah no that's awesome as a platform for Omniverse this seems like this would be a pretty good Lego brick in in that kind of a solution tell me about Omniverse absolutely so Omniverse and you know the the digital Computing that is getting lots and lots of traction these days and what we did this basically this system is built exactly according to nvidia's guidelines and on that it should be dual root complex so that is here so all whatever you know nvd has reference designs mentions you know everything is pretty much you know covered on this platform it's starting with you can put up to 8 l4s or l40 gpus and up to five Nicks so all you know whatever their um guideline supports or the I should say their reference design is that is pretty much supported here yeah so building your own private Omniverse you know just build a couple of racks of these and it's ready to plug in with whatever oh of course yes yes yes it is do nice Omniverse Nvidia Omniverse what the heck does Omniverse mean I met a robotics expert who's going to be doing fun interesting stuff well I can't tell you what Omniverse means but I can tell you what it is uh it is NVIDIA 3D modeling simulation robotics toolkits they're all built on their Hardware so it's Cuda all the way down and a bunch of other Nvidia software stuff and another like other Nvidia Hardware all the way down along with things that make deployment on Hardware easier so I'm a robotics person I care about the Jetson and if you look over there there's a Boston Dynamics robot that allows you to do reinforcement learning stuff in Omniverse and then you can deploy the same things that you train in simulation on hardware and yeah Omniverse is just a bunch of tools so the idea is that you download the Boston Dynamics robot in the API and simulated in the API and the digital simulation is high enough Fidelity that you can just copy your program into the real robot when you're ready so like all of the training everything that you did everything for the environment everything that was not real and that was done in software you can then copy to a real imagine playing a video game and figuring out everything in the video game and then you copy what you learned in the video game into a real robot and it works that's what Nvidia just released for the the Boston Dynamics robot yep and you know speaking of the video game thing you can actually play a few million video games at the same time if you have a big enough GPU and you can learn all the cool things all the Easter eggs all the specialized skills and once you're done with that you deploy the best one on the robot and it just works hopefully yeah that's the part of the digital twin like you can take the digital twin thing to the nth degree everything from buildings and data centers all the way down to individual robots machine learning Vision simulation the robotic hands that we've seen that's the promise of this platform standard rack systems for hgx are not just limited to pcie because well I mean if you want a full eight-way EnV link setup you're going to need uh something a little different super micros got you covered in a similar 5u chassis but with completely different GPU interconnects you can get modular h100s and soon B100 and this version of Blackwell is coming a lot sooner than the pcie version at least from what I understand not just PCI connectivity suppose you want full 8way inv vlink connectivity it's hard to do that with bridges on pcie cards but super micros got you covered in a similar 5u chassis so not just AI but also storage you know some people have pedestrian needs like storage also we're doing generative AI so we need to store things and we need to store things really quickly super micro has a system here to show you some of the new form factors and the stuff that's coming they support up to one pyte in a oneu form factor not in this particular chassis but it is something that that is coming very quickly this chassis is interesting because we've got e3s 2T it's a double hey version of e3s but in this slot in this configuration it actually supports cxl modules cxl could be really useful but customer adoption and some other form factors I don't know it remains to be seen a little bit it's really handy to have a cxl type device when you're doing transactions like in a database server where the machine could go down and you need to be able to recover the database to the moment of the of the crash you can also do cxl where you don't have to have the entire system memory backed up similarly but then you've also got four e3. S slots here for normal storage you know you want to run four or 800 gbit Ethernet or infiniband those are just pcie devices as well this will support all of the storage needs that you have for these uh full rack scale systems we don't want Network to be the bottleneck uh for the storage because ultimately the goal of this is to provide high performance storage so we we kind of literally split the number of PC Express lenses in half in half of it is uh supported uh for the drives and the other half in the back I don't know if you can actually get to that you have like well I think we can eventually get there we have like four of the PC Express gen 5x6 so you can actually have any kind of Ethernet or infinite band of your choice depending on what storage you get well I'm I'm really excited by the perfect balance between pcie lanes in and out because eventually if PCI Express Fabrics take over that gives you a direct pcie interface so you could do nvme over Fabric or whatever the software stack will support we do have people talking about like you know how the fabric extension Works NVM or fabric but you know as you say it all depends on the customer uh preference and how the overall ecosystem is going to develop around our Hardware looking you know this is looking very promising and exciting and you know I'm super excited about it this is Dev Ops in a box basically it's the data center it's the data center in a box super micro has a platform that has a liquid cooling and two A1 100s in it that are connected together with an nvlink bridge and that is running this AI assistant demo you can just talk to it it's a large language model but it also has voice recognition and it produces video and so you can ask it about a super micro product and it will tell you it's like hey what super micro products would be good for AI what would be good for Omniverse and it will just show you which is pretty awesome now I get to come to GTC because of Super Micro I got to tour super micro's campus so big thanks to Super Micro for bringing me out here and letting me take a look behind the scenes at their new water cooled systems and their new Blackwell systems and their new h 100 systems and everything else and yeah everything with an h100 has basically dropped in ready for Blackwell but the first form factors that are coming for Blackwell these fully Integrated Systems Nvidia definitely targeting the data center stay tuned for my other videos including my oncampus tour of Super Micro which I really enjoyed I got to see behind the curtain and oh boy a lot of interesting stuff\n"