Intel's Ring Bus Explained

Intel's Ring Bus: Understanding the Interconnect Technology

Think back to pre-Sandy Bridge era before the i7 2600, maybe something like this big boy right here, the i7 820. It's a Bloomfield chip, a variant of Nehalem, with 45 nanometer lithography, four cores, and eight threads. The boost clock is around 3 GHz, but it lacks an integrated GPU and video transcoder. This was indeed a bare-bones system by today's standards.

But Sandy Bridge added the GPU right, it added the transcoder. It actually split up L3 cache pipe lines between cores and shoved in the system agent that's traditionally the Northbridge. This freed up space on the motherboard, and when you do all of this to a die, shrinking it because it's smaller than a Neelam die, you need a way for the bits and pieces to communicate. Little highways for transferring data, but rather than running thousands of individual wires between each agent, Intel created the Ring bus.

Think of the Ring bus like a circular highway around a large city. Many suburbs not only need quick access to the urban center but also quick access to other suburbs. Other neighborhoods shouldn't need to drive through downtown to get to the next neighborhood. There's too much traffic, too many roads, and too many stoplights in this analogy. In a very basic sort of way, that's what Intel's ring bus does – it acts as the interconnect between cores, the IGP, and L3 cache, and the system agent.

The traditional Northbridge acted as an intermediary, but by moving it into the die, Intel simplified things and reduced latency overall. The Ring bus beats having a connection from here to here, here to there, here to there, and so on. There's nothing inherently special about this technology apart from its excellent execution on a chip level.

The Ring bus is divided into four rings: the data ring, which does what you think it does when transmitting data; the request ring, which tells probably what you think it does; the acknowledged ring, which confirms the request; and the snoop ring, which actively looks for said requests from other agents. Together, these rings work to minimize latency and maximize core throughput.

The nature of this design is that data transfers via the shortest physical path between the agents. This makes bus bandwidth scalable with core count, making it effective for most consumer-grade applications. There is a soft limit on supported agents in a single Ring bus system; however, as cores move further apart within the die, latency increases.

This issue becomes more pronounced with an increasing number of cores, which is something AMD has also tackled through their own efforts, but Intel refers to this solution as a mesh topology. In this case, each agent acts as its own router sort of and sends data where it's needed via the shortest path across the mesh. These individual routers then determine the shortest path between Ring buses or whatever the mesh topology is, all in an effort to minimize latency.

This approach is better than having a standalone controller that would have to manage millions of connections at one time. It's just probably too much for such a task. So, Intel has instead opted to have each agent act as its own router, reducing complexity and improving overall system performance.

There is indeed an interesting article from End Tech summarizing some issues associated with traditional Ring buses and HDT chips that you might find worth reading if you're interested in learning more about the topic. And while this short video may not cover everything, it should help clear up a few questions you might have had regarding the Ring bus itself.

The Ring bus has become an integral part of Intel CPUs over nearly a decade now, and its design principles remain in place to this day. In understanding the inner workings of modern processors, knowledge of the Ring bus is essential for anyone interested in hardware and computing systems.

"WEBVTTKind: captionsLanguage: enlet's get technical it's been a while since we've had one of our deep dives they and I've had many of you asked for them look to be frank they don't they don't earn as many views as I would like given our channel size so that's why I've kind of moved away from them but every now and then it's nice to pay homage to where this channel originated I know many of you subscribed for that kind of content so here we go roughly five-minute deep dive into intel's ring bust should be pretty interesting stick around so I want you to think back to pre Sandy Bridge before the i7 2600 maybe something like this big boy right here the i7 820 it's a bloomfield chip a variant of Nehalem 45 nanometer lithography 4 cores 8 threads boost to around 3 gigahertz no IGP though and no video transcoder it was pretty bare-bones by today's standards but Sandy Bridge added the GPU right it added the transcoder it actually split up l3 cache pipe lines between cores and it shoved in the system agent that's traditionally the Northbridge which freed up space on the motherboard and when you do all of this to a die and also shrink it because it's smaller than a Neelam die you need a way for the bits and pieces to communicate little highways for transferring data but rather than run thousands of individual wires between each agent Intel created the Ring bus so get this think of the Ring bus like a circular Highway around a large city many suburbs not only need quick access to the urban center but quick access to other suburbs right other neighborhoods I shouldn't need to drive through downtown to get to the next neighborhood right there's too much traffic too many roads too many stoplights and in a very basic sort of way that's what Intel's ring bus does it acts as the interconnect between cores the IGP the l3 cache and the system agent remember that's traditionally the Northbridge you don't need dedicated wires running between specifically the ia GP and save chunks of l3 cache it would over complicate the fab and you need a huge chip but also likely run fairly hot during how much metal how many traces you would need in there and so it just the way to simplify it and the way to reduce latency overall was the ring bus it beats having a connection from here to here here to here here to here and so on I think you get the point now to be clear there's nothing inherently special about Intel's ring bus apart from the fact that it's been so well executed on a chip level but it still is a bus switch in electronics carries a definition of a subsystem interconnect away for different systems to communicate now the ring bus is divided into four rings the data ring which does what you think it does when it transmits data the request ring which also tells probably what you think it does the acknowledged ring which confirms the request and then the snoop ring which actively looks for said requests from other agents together they work to minimize latency and thus maximize core throughput latency is reduced by nature of the rings designed by the way data will transfer via the shortest physical path between the agents so bus bandwidth also scales with core count and that makes it effective for most consumer grade applications there is a limit what kind of a soft limit of supported agents in a single ring bus system though of course further away from each other in the die will inevitably experience more latency we're talking about further away physically like that that's that's how much latency can be incurred between cores that are physically distant from each other and as core count increases over the years which it certainly has thanks a large part AMD so too has the inherent latency so there's a battle to keep up with or stay on top of keeping latency down chips with more than eight cores traditionally utilize more than one ring bus or more recently what Intel it's called a mesh topology in this case each agent acts as its own a router of sorts and would send data where it's needed via the shortest path across the mesh so now those individual routers are determining the shortest path between the ring buses or whatever the mesh topology to minimize latency and that's I obviously better than a standalone controller which would have to be in charge of millions of connections at one time it's just probably too much that's why Intel decided to have each agent act as its own router there's actually a great article from an end tech that summarizes a few issues associated with traditional ring buses and hdt chips so we're talking more recently like X 99 X 299 CPS has what he brought lovely things like that but I've linked it down below if you are interested in learning a bit more about it I do hope this short little video has helped somewhat clear up a few questions you might have had regarding the ring bus what it does in essence and if you've never heard of it before at least now you know its general functions and why it's been a staple and Intel CPUs for nearly a decade now that's all for me leave a comment consider subscribing and I'll catch you in the next one my name is Greg thanks for learning with melet's get technical it's been a while since we've had one of our deep dives they and I've had many of you asked for them look to be frank they don't they don't earn as many views as I would like given our channel size so that's why I've kind of moved away from them but every now and then it's nice to pay homage to where this channel originated I know many of you subscribed for that kind of content so here we go roughly five-minute deep dive into intel's ring bust should be pretty interesting stick around so I want you to think back to pre Sandy Bridge before the i7 2600 maybe something like this big boy right here the i7 820 it's a bloomfield chip a variant of Nehalem 45 nanometer lithography 4 cores 8 threads boost to around 3 gigahertz no IGP though and no video transcoder it was pretty bare-bones by today's standards but Sandy Bridge added the GPU right it added the transcoder it actually split up l3 cache pipe lines between cores and it shoved in the system agent that's traditionally the Northbridge which freed up space on the motherboard and when you do all of this to a die and also shrink it because it's smaller than a Neelam die you need a way for the bits and pieces to communicate little highways for transferring data but rather than run thousands of individual wires between each agent Intel created the Ring bus so get this think of the Ring bus like a circular Highway around a large city many suburbs not only need quick access to the urban center but quick access to other suburbs right other neighborhoods I shouldn't need to drive through downtown to get to the next neighborhood right there's too much traffic too many roads too many stoplights and in a very basic sort of way that's what Intel's ring bus does it acts as the interconnect between cores the IGP the l3 cache and the system agent remember that's traditionally the Northbridge you don't need dedicated wires running between specifically the ia GP and save chunks of l3 cache it would over complicate the fab and you need a huge chip but also likely run fairly hot during how much metal how many traces you would need in there and so it just the way to simplify it and the way to reduce latency overall was the ring bus it beats having a connection from here to here here to here here to here and so on I think you get the point now to be clear there's nothing inherently special about Intel's ring bus apart from the fact that it's been so well executed on a chip level but it still is a bus switch in electronics carries a definition of a subsystem interconnect away for different systems to communicate now the ring bus is divided into four rings the data ring which does what you think it does when it transmits data the request ring which also tells probably what you think it does the acknowledged ring which confirms the request and then the snoop ring which actively looks for said requests from other agents together they work to minimize latency and thus maximize core throughput latency is reduced by nature of the rings designed by the way data will transfer via the shortest physical path between the agents so bus bandwidth also scales with core count and that makes it effective for most consumer grade applications there is a limit what kind of a soft limit of supported agents in a single ring bus system though of course further away from each other in the die will inevitably experience more latency we're talking about further away physically like that that's that's how much latency can be incurred between cores that are physically distant from each other and as core count increases over the years which it certainly has thanks a large part AMD so too has the inherent latency so there's a battle to keep up with or stay on top of keeping latency down chips with more than eight cores traditionally utilize more than one ring bus or more recently what Intel it's called a mesh topology in this case each agent acts as its own a router of sorts and would send data where it's needed via the shortest path across the mesh so now those individual routers are determining the shortest path between the ring buses or whatever the mesh topology to minimize latency and that's I obviously better than a standalone controller which would have to be in charge of millions of connections at one time it's just probably too much that's why Intel decided to have each agent act as its own router there's actually a great article from an end tech that summarizes a few issues associated with traditional ring buses and hdt chips so we're talking more recently like X 99 X 299 CPS has what he brought lovely things like that but I've linked it down below if you are interested in learning a bit more about it I do hope this short little video has helped somewhat clear up a few questions you might have had regarding the ring bus what it does in essence and if you've never heard of it before at least now you know its general functions and why it's been a staple and Intel CPUs for nearly a decade now that's all for me leave a comment consider subscribing and I'll catch you in the next one my name is Greg thanks for learning with me\n"