Inside a GPU Die - Exploding 2080 Ti GPUs by Overheating, ft. TiN
The intricacies of Overclocking and Power Management in Graphics Cards: A Deep Dive
The moment you want to show off your exciting graphics card setup, but ultimately nothing happens, is a common phenomenon. This occurs because the power management system in graphics cards is designed to prevent damage by shutting down the device when it detects excessive heat or voltage fluctuations.
GPU protection mechanisms are in place to safeguard the component from drawing too much current or voltage. These mechanisms include the VRM (Voltage Regulator Module) and the VRM controller, which work together to regulate power delivery to the GPU. The VRM is responsible for converting 12V input to a lower voltage that is suitable for the GPU die, while the VRM controller adjusts the output voltage based on the demand from the GPU.
The temperature sensor in the graphics card detects excessive heat and sends a signal to the VRM controller, which then toggles the output of the VRM. This ensures that the power delivery to the GPU is reduced or halted when the temperature becomes too high. Additionally, the VRM has a built-in protection mechanism to prevent damage from overheating.
When it comes to voltage management, the graphics card's power supply uses PWM (Pulse Width Modulation) to regulate the output voltage. The controller adjusts the PWM signal to ensure that the voltage output is within the desired range. This process allows for fine-tuning of the voltage delivery to the GPU and is essential for overclocking.
The importance of proper V sense and feedback cannot be overstated. A faulty or missing V sense can lead to damage or failure of the graphics card, as the controller may not be able to accurately adjust the PWM signal. Therefore, it's crucial to verify that the V sense is functioning correctly before attempting to overclock the graphics card.
The power management system in graphics cards also includes a monitoring loop, which allows for real-time adjustment of the voltage and frequency settings. This loop consists of sensors that monitor the temperature, voltage, and current drawn by the GPU, sending signals to the controller to adjust the PWM signal accordingly. The controller can then adjust the PWM signal to fine-tune the output voltage and keep the system running within safe parameters.
One of the key features of modern graphics cards is the ability to adjust the switching frequency. This allows for more precise control over the power delivery to the GPU, as different frequencies are used to manage different loads. In some cases, increasing the switching frequency may be necessary to maintain stable operation when using certain workloads or benchmarking tools.
In addition to its role in regulating power delivery, the graphics card's temperature protection is also designed to prevent damage from overheating. This feature is critical for ensuring the longevity of the component and preventing costly repairs or even system failures. When the temperature becomes too high, the VRM controller will reduce the power delivery to the GPU, helping to prevent overheating.
For those who enjoy pushing their graphics cards to their limits, there's a way to disable this protection mechanism when using specific voltage biases, such as LM-2 mode. However, it's essential to note that this feature is purely on the PCB level and requires careful consideration before attempting to use it.
In some cases, forgetting to shut down the system can result in costly damage or even system failure. The consequences of neglecting to properly cool a graphics card can be severe, with expensive repairs or replacement becoming necessary. As such, it's essential to prioritize proper cooling and maintenance when working with high-performance components.
As we continue to explore the intricacies of power management in graphics cards, it becomes clear that there's always more to learn. From the inner workings of the VRM controller to the importance of accurate V sense, each component plays a crucial role in maintaining the stability and performance of modern graphics cards. By taking the time to understand these complexities, enthusiasts can unlock new levels of performance and optimization from their hardware.
"WEBVTTKind: captionsLanguage: enoh yes very important - oh so people are gonna be crying that oh my god you destroyed it at 1890 our video yeah this was actually the last year engineering sample so like this card will ready to went through all the validation and we cannot sell this board anymore doesn't have a heatsink doesn't have anything it's even like the power design is a little bit different so it doesn't have commercial value so it's all of us to use it for this experiment and indicate in the real practical snare what can happen if you forget to turn on there one more temperature okay before that this video is brought to you by us and the GN store the best way to support our independent reporting is through store cameras and access net this is made possible with your purchases of merch like our GN medium mod matte in stock and shipping now and designed with GPS tear down diagrams and grids our 100% custom eight 2-tone shirt is also a great way to help and it's currently on sale the shirt uses 95% cotton and 5% of lasting for a sporty fit with vibrant colors and was designed entirely by the GN team learn more at the link of the description below or go to store that gamers access net so I'm with tin from EVGA we've done a few videos here and this you've presented me is say a GP with the crater in it yeah it's a victim of the somebody forget him to turn on the over temperature protection back after the bench session and GPU overheat but the power was never cut off and GPU exploded yes so we have some demos of that we the video will start with probably one of them overheating we didn't get a giant crater in it but we got some cracks on the die and a big puff of smoke essentially there this demo to show the importance why the thermal protection is in place originally on pretty much every VGA cart and motherboards as well and why it is important because like we have the people who are trying different extreme overclocking experiments like running nitrogen or even water cooling on the cards but evidence always happened and sometimes there is no water or like pump failure and what can happen if you don't turn the protection back on when you finish with your benchmarking right yeah if you if you Tyler not to protect itself in it it'll listen to you yes exactly so yeah this one is created and I guess another time is could happen would be like if you forget - poor Ellen - in the fight you walk away or something yeah if you get distracted go talk to somebody and then half an hour you come back to the system run in with a 1/2 pot no Ellen 2 it will be temperature will be 200 C and GPU or CPU will be there as well that's even sound like black oh yeah look goo come out so just like I guess the top is like a diffusion barrier or something on top of the well silicon no the top is the silicon that was it and the goo is the under field that's like a glue under the chip so the humidity and air doesn't get under the ship I see okay so but other actually you can see on there to explore the card that there are small little balls just like you have on the BGA package but much smaller ones that connect the GPU dye to the substrate yes okay very important to intro people it's just hurt right and actually you can see like all those shiny things that's actually the silicon level that's where all those two millions of transit might be careful so what's the actual transistor and there you have the copper layers aha pretty much like PCB but much much much smaller okay so because they're all the components all the transistors are on the back side not on the topside outside doesn't do anything is it the so is it is the GPU die also BGA to the substrate yeah but it's not using the solder it's using the copper bumps I see okay so micro bumps record tall and then what are the layers like do you know of like when you cracked it open just now so the top layer it's the biggest layer like like they have on the PCB all the layers the same but on the GPU on the silicon they have like the biggest layer which will handle all the power it will connect like power from the PCB like memory power or all the like traces that that doesn't need a lot of high-speed signals but you need to kill a lot of power so that will be top layer I can it will connect all these blocks like around the GPU and then their next layers they go like more fine more fine till they go to the bottom layer which have actual transistors again and that will be like what when you he rode like ten and a meter GPU or CPU or like seven and a meter that's where all those nano meters they are right on the bottom of the chip okay because that's why this package called flip chip yeah so you have the actual structures flipped down to the PCB and then on the top you have just silicon material which is allowed to transfer the heat to the heat sink mmm and provide the cooling nice and actually like when you see like the pair of beautiful pictures of the silicon like dyes like all those like rainbow rainbow color transistors that's essentially the bottom layer I see so the bottom layer is pre close to the contact bumps yes yes and then they have just an insulation layer and then they have contact bounce to connect to their substrate he's grabbing another specimen thank you have the card debt bloop yeah where's it it's always this moment with legs you like you've got something exciting and then you want to show someone and then you're trying to show it they lot nothing happened and there's one more thing on there too so we did try to blow up another one and with that one you thought you were hitting I guess OCP yes because there is always not just GPU protection mechanisms but vrm itself the iron will try to protect itself from drawing too much current or voltage and it will also shut itself off but it's actually much higher limit on any overclocking cards because if you have the limit too low then you will not be able to run extreme overclocking on the car it will shut off too early I also made the simple diagram sort of erm works like overheat signal like the essentially we have power controller we have power stages we have GPU and how the over temperature protection works when the temperature sensor and GPU detects temperature too high it will toggle the signal over heat output and then the signal go to the vrn controller and connect to enable signal when it signal is off then the whole vrm shut down I see so that's essentially very simple concept how it works and then about the vid rope-like you have the controller then you have the power stage which doing all the 12 volt input goes to the power stages they can convert it to the lower voltage and provided by the big beefy shape to the GPU die and then there is a two special pin give you power and GPU return sense pin they go back to the controller and like you can think like there is some small oompa loompa who sits in the controller looks at the voltage and then if the voltage is not correct what is expected he will adjust the pwm signal fire so the voltage increases and if voltage is too low because for example you're running heavy load like 3dmark benchmark then the bumper will adjust the voltage law required to compensate for that okay so that's why I write this very professional guy who is watching this feedback the same voltage I can constantly adjust everything in the loop that's why it's very important to have the correct V sense and feedback that's one of the first thing we test during the power design on any product like the motherboard or crash car and just keep your eyes and screen now he's medium presents now he knows it's gonna work that's it and this is happening within the controller yes so essentially you tune the controller which have different adjustment knobs like the frequency health and this monitoring loop we'll be working on and then also you can adjust like you can artificially tell this guy like oh like actually the real voltage is 50 millivolts higher than you see and then their correction will be applied and the whole voltage will change accordingly as well okay switching frequencies like essentially one of the knobs that controllers have set as well and then there you can different transient different speed have everything works will be affected by the switching frequency that's why on the older motherboards and DJ cars often like you need to increase the switching frequency so everything on the BRM side can catch up with the demand from the GPU back all right so that's the walkthrough of over temperature protection or lack thereof you wanted to add something though yeah basically like our temperature protection is important for the safety reasons and anybody puts it down there for good reason right yes so thermal protection is important and all the cards have has been able that it's default and if you want to use maximum fan speed like we're providing the lm-2 bias position you would best to do that is take the bus and flash it into the normal position on the buy switch or into the aussie position that will remain the ability to go maximum fan speed but still with all the thermal protection simply okay so is it so then if you're in the regular bios position but the thermal protection is on okay so only disabled when you switch into the lm-2 mode which is red light on the backside right indicator and that's when all the protection is disabled from the term upon the throne protection is independent then from v bios yes you can use any v bias and this control is purely on the hard way on PCB level right right and I think the coolest the biggest takeaway here is if you forget to shut down and you leave your system running yeah then that won't be a very expensive day effects I see BIOS on and then you are a may your card and you put the cooler back on it hoping that they won't notice if you leave a hole in it like that they might know so that's it for this one pretty cool stuff you don't really get to see 28 et eyes get exploded every day so thank you for watching that it was fun for us and check back for the other videos on voltage and LLC Thank You Tennant for joining me we'll see you all next time\n"