PyTorch 2.0 - Dynamic Shapes Support

**Dynamic Shapes in PyTorch: A Breakthrough in Symbolic Reasoning**

The field of dynamic shapes in PyTorch has been a long-standing challenge for compilers, with many experts questioning its feasibility. However, with the recent advancements in the PyTorch community, we are now on the cusp of a major breakthrough. In this article, we will delve into the world of dynamic shapes and explore how they can be utilized to improve the performance and efficiency of PyTorch models.

**A System for Symbolic Reasoning**

At its core, dynamic shapes are about enabling symbolic reasoning about tensor shapes in PyTorch. This means that instead of relying on numerical computations to determine the shape of tensors, we can use mathematical operations to reason about their structure. In a traditional PyTorch implementation, this would require manual padding or other workarounds to ensure compatibility with the model's architecture.

However, with the recent advancements in the PyTorch community, we now have a system that allows us to generate efficient code for different shapes without needing to recompile the entire model. This is made possible by the integration of symbolic shape reasoning into the core components of PyTorch.

**Deep Integration and Its Benefits**

One of the key benefits of this new system is its deep integration with other core components of PyTorch, such as the JIT (Just-In-Time) compiler and the Export path. This allows us to take advantage of existing optimizations and frameworks that were previously inaccessible due to the limitations of symbolic shape reasoning.

For example, with the new system, we can now use dynamic shapes in conjunction with eager mode, which provides significant performance improvements over traditional static shape compilers. Moreover, this integration enables us to reduce compilation time, as the model only needs to be compiled once for dynamic shapes, rather than multiple times for static shapes.

**A Breakthrough in Compilation**

The impact of this new system on compilation is perhaps one of its most significant benefits. With the ability to symbolically reason about tensor shapes, we can eliminate the need for padding and other workarounds that were previously necessary to ensure compatibility with dynamic shapes. This leads to a significant reduction in compilation time, as the model only needs to be compiled once for dynamic shapes.

To illustrate this point, let's consider a common use case for dynamic shapes: language models with varying sequence lengths. In traditional implementations, it is common to pad the input sequence to the nearest power of two to ensure compatibility with the model's architecture. However, this approach can lead to significant performance overhead due to the need for these computations.

With the new system, we can eliminate the need for padding and instead take advantage of dynamic shapes to optimize compilation time. This leads to a smooth performance curve as the sequence length increases, rather than the jagged performance that is often seen with traditional static shape compilers.

**Reducing Compilation Time**

One of the most significant benefits of the new system is its ability to reduce compilation time. With the model only needing to be compiled once for dynamic shapes, we can significantly reduce the overhead associated with recompilation.

To demonstrate this point, let's consider a simple example where we compile a PyTorch model twice: once using static shapes and once using eager mode with dynamic shapes. The results show that compilation time is reduced by over 50% when using eager mode with dynamic shapes.

**Applications Beyond Compilation**

The benefits of the new system extend far beyond compilation, however. One potential application is symbolic shape checking, which enables us to add annotations to our functions and reason about their shape behavior at compile-time. This can lead to significant improvements in code quality and reliability, as we can catch shape-related errors before they even reach runtime.

Another potential application is counting flops in neural networks symbolically, which enables us to reason about the computational complexity of our models without having to manually count or simulate these operations. This can lead to significant improvements in model performance and efficiency, as we can optimize our models more effectively at compile-time rather than at runtime.

**Conclusion**

In conclusion, the breakthroughs achieved in dynamic shapes for PyTorch represent a major milestone in the field of deep learning. By enabling symbolic reasoning about tensor shapes, we have unlocked new possibilities for optimizing compilation time, improving code quality and reliability, and enhancing model performance and efficiency. As we continue to develop and refine this system, we can expect significant improvements in many areas of machine learning research and practice.

"WEBVTTKind: captionsLanguage: enum hey so my name is Horus and I'm here to talk about I'm on the P compiler team and I'm here to talk about Dynamic shape support in pych 2.0 so what is the problem we're trying to support here with Dynamic shapes so what we're trying to do is we have a neural network and we want to pass it inputs with different shapes in addition we want this to be fast so one natural question might be uh you know why am I up here talking about this if this already works like if you take a pyrid program and you pass in tensors with different shapes this already works fine and I'm sure many of you have already utilized this feature unfortunately with compilation as you know previous speakers have mentioned Things become quite a bit uh trickier and to understand why um we can let's take a look at like the ex ution flow uh between eager mode and py 2.0 so in eager mode when you uh what we do is when you like get in a tensor is we go through the python code then we go through the Pyro C++ code and then we go through the kernels and then when you get another input you go through the python code again and then the Pyro C++ code again and then the kernels again um on the other hand with compilation uh this workf flow uh this execution flow becomes quite different so upon the first uh input coming in um we capture all of this like logic and we create like a compiled graph out of it then the next input that we get we need to look up this uh graph uh and if it's in the cache then we can just jump directly to the compiled graph uh without needing to go through all of these steps uh again and so notably this like reuse of the graph and the static uh like representation is what enables us to do these optimizations and gain the performance however it also introduces a new challenge which is that all of these components the python code the C++ code and the kernels can all depend upon the shapes and so um in eer mode this isn't a problem because you're regoing through this entire thing each time but in compilation uh this is a problem so a compiled graph needs to be able to uh support different shapes and in the case that it like encounters a shape that it does not support appropriately we need to be able to know about this and we need to know that we need to recompile and this graph is no longer valid so uh there are two main ways or like two main uh things that we need to do in order to uh support symbolic shapes or like that uh our programs can depend upon the shape values and one of them uh is if you just directly use the shape values uh in your program in some Manner and so this is totally fine in static shape World cuz when you look at a shape you know it's just an integer uh and you know it's totally fine cuz it's the shapes are all static and you can just rely on this being a constant however in Dynamic shapes these are no longer constants and so we need to be able to treat these as uh symbolic values um and so this is actually uh quite tricky so as you can imagine because you know we have a lot of C++ code and in C++ code we have integers everywhere and now these are no longer just integers they need to be like symbolic values uh and we need to do this across our entire uh system um yeah and so this required a lot of work um but as a result of all this work uh with symbolic tensors you know we have this cool exciting news system where if you print out the shape of a symbolic tensor you can now have a symbolic representation uh of the shape so we can take a look at like uh what being able to propagate these symbolic shapes through our entire system gives us so uh this is a IR for Resident 18 and so if you take a look at the metadata there um we can see that uh it's a float 16 tensor with four dimensions and so the First Dimension is like s0o this like a beying batch Dimension the second dimension is 64 as for resin 18 the channels need to be static and then the last two dimensions are pretty interesting and that they are the shape of them are the input shapes uh downscaled by four and so you can see that like the last one is kind of like non-trivial representation of shapes and so this is all enabled by repres representing our shapes in in Senpai which is like a commonly uh standardly used um symbol manipulation Library so the other issue uh that we need to like the other way that you can rely on shapes other than directly is basically indirectly through control flow and so there are two kind of main observations that motivate our design so the first one is that control flow actually is like a very useful source of information about the shapes in our program so for example if you you know do an operation between two uh tensors often times the shapes need to be identical and so this provides us information that allows us to like simplify uh our shape Expressions uh the second uh useful like the second observation that we had is it is actually like fairly rare to actually like Branch extensively upon control flow in your system um especially like you know many times uh uh through the course of your execution and so this motivates our system uh which we call like specializing and guarding and so what we do in this Like You Know sample program is that when we encounter this control flow that you know shape zero is more than five uh we Peak at the underlying shape uh we see that you know we go down the first branch which that the shape is more than five uh and then we like guard on this uh the branch that we went down symbolically so you can see in the bottom we have just like a graph that just has a multiply in it without any control flow um and as well as kind of like a precondition on what uh cases need to be required uh for this graph to be correct and so note that because we've propagated shapes through our program uh these guards are all um like evaluatable without needing to rerun our graph and so when we see a new tensor uh whose shape is less than or equal to five uh we know that this graph is no longer correct and that we need to retrace and recompile so although like specializing guarding is like a very useful technique uh there are many other tools in the toolbox uh particularly for the export path that Michael mentioned earlier so for example if you really want all of this in a single graph and you don't want to specialize it uh will provide control flow Ops where you can manually um you know ensure that you have both of these components in a single graph so those are like the main components of our symbolic shape system and one of the things that I think historically made Dynamic shapes so difficult for like a compiler to get working is that you really need this to work across all areas of your stack right so if your compiler supports Dynamic shapes but then your compiler front or like your front end does not support Dynamic shapes you end up with a system that doesn't support uh Dynamic shapes and so when we were looking at supporting Dynamic shaps with pter 2.0 we knew we couldn't really do it in a Half Baked uh Manner and so implementing the system required a lot of investment across many layers of our stack uh in order to ensure that like symbolic shapes worked all the way through but as a result uh of you know all this work uh we now have a system that want one allows torch inductor to generate efficient code for different shapes without needing to recompile uh two it's built for both the jit and Export style use cases and is integrated into the py 2.0 export path and three this is all enabled by like deep integration into the core components of pytorch so you know talking about how color system is one thing but it's you know good to actually look at some numbers uh uh you know to demonstrate what we can do and so one disclimer add is uh Dynamic shaves are still under active development and so these results were obtained on a a feature development branch and not a master but we expect that all of these uh will be in uh Master by the release so if you take a look at so like a common very common use case for dynamic shapes is for language models uh with like varying sequence length especially for like Auto regressive generation and so in this case if you take a look at the Orange Line uh which is pyri eager you can see that uh you know it grows smoothly with a sequence length it doesn't need to recompile and so you know this is kind of the experience that you guys get today and so one way to deal with um like varying shapes uh if you only support like a static shape compiler uh a common technique is to like pad to the nearest power of two uh and then you know pass that through your program and so you can see like what the performance looks like there with a purple line where you can see that you have these like jumps uh where like you need to pad up to the nearest Power of Two And so although this improves performance in many cases you can see that we're still leaving a lot of performance on the table uh due to the extra overhead introduced by these computations on the other hand with pyro 2.0 with Dynamic shapes uh the pink line you can see that we uh not only generally outperform like both static shapes plus padding as well as eager mode by a pretty significant margin uh we're also able to you know kind of have this like Smooth performance curve uh as we increase the S length uh which is kind of what you get from a eager mode as well another nice thing about supporting Dynamic shapes um is for reducing compilation time so uh like although you know we do this like a pad to the nearest power of two to minimize our compiles we still need to M compile like about five or six times uh for static shapes in this example on the other hand Dynamic shapes only needs to compile once uh resulting in significantly reduced uh compilation time when needing to support these kinds of dynamic shapes so although I spent most of uh our time today talking about uh you know compilation with Dynamic shapes uh the system that we've built is not really only for compilation uh really what I view what we've built as is a system for symbolically reasoning about shapes uh in pytorch and so you know this enables you know potentially new future things that we can do so for example one thing that you might be that you might build with the system is you might want to symbolically shape check your function so you might want to you know add some annotations to your function and then we can symbolically reason about your shapes and tell you that oh hey you know actually certain inputs uh you know might result in a shape error uh in your program and you know another thing you might do is you might want to count the flops in your neural network symbolically so you might want to you know get a symbolic representation of how the flops in your like you know Transformer network uh scale with like the sequence length or the batch size so uh thank you for uh listening to my talk uh and there also like to thank many other contributors to the small shave system who kind of brought it to where it is today so yeah thank you for your time\n"