Lightning Talk - TorchFix - a Linter for PyTorch-Using Code with Autofix Support - Sergii Dymchenko
**Performance Degradation and Data Loader Issues**
One issue that can lead to performance degradation is when a user doesn't provide the `num_workers` parameter for the data loader, which defaults to zero. This means that data loading happens synchronously with computation, potentially blocking it. To improve efficiency in production, users want to provide a non-zero value for `num_workers`, depending on the number of CPUs available. However, this issue is not necessarily an error and may be valid for certain use cases. The user should be aware of this potential performance impact.
Another problem with data loading is that it can lead to issues if not handled correctly. In particular, when using popular domain libraries like PyTorch Vision, changes to the API can break existing code. For example, in TCH Vision, the API for loading pre-trained models was recently changed from providing a boolean value (`pre_trained=True` or `False`) to specifying an exact weight parameter. This new API is more flexible and will eventually replace the old one, making it essential to update all code to use this new API.
**Introduction of torch-fix**
To address these issues, PyTorch Fix (torch_fix) is a static analysis tool designed for PyTorch. It uses LipSt and LipCST, popular libraries for working with Python syntax trees. The main goal of torch_fix is to help users identify potential problems in their code, such as performance degradation or incorrect usage of PyTorch features. By analyzing the Python code and checking it against a set of predefined rules, torch_fix can provide feedback on issues that may need attention.
torch_fix has two primary modes: a flake8 plugin mode and a standalone mode. In the flake8 plugin mode, users simply install torch_fix and use flake8 as usual. This is a convenient option for projects that already use flake8 in their CI pipeline. The standalone mode requires running torch_fix as a script and can be used to autofix issues that are Auto-fixable.
**Getting Started with torch-fix**
To get started with torch_fix, users simply need to install it using pip (`pip install torch-fix`). For the latest code from GitHub, they can download the repository and clone it. One of the benefits of using torch_fix is its ability to keep up-to-date with PyTorch developments. By default, torch_fix assumes that users want to use the latest version of PyTorch, which may not be the case for everyone.
**Future Development**
The current stage of torch_fix is considered early beta, but it has already shown promise by identifying issues in multiple projects and being used internally at Meta and in open-source repositories. The development team plans to add more rules for various classes of issues and expand the configuration options available. Additionally, they aim to integrate with PyTorch docs and documentation generation.
**Getting Involved**
To contribute to torch_fix or report a bug/feature request, users can start by running torch_fix on their codebase and identifying any issues it finds. They can then visit the GitHub page and submit a pull request or issue for feedback. The development team welcomes new contributors and is open to suggestions and ideas from the community.
**Future Directions**
Some potential future directions for torch_fix include expanding its rule set, improving configuration options, and integrating with PyTorch's documentation generation system. As users continue to find issues with their code using torch_fix, the development team can use this feedback to improve the tool and make it more effective in helping developers write better Python code with PyTorch.
"WEBVTTKind: captionsLanguage: enhello my name is Sergey I work at meta on pytorch developer experience and today I want to talk about torch fix torch fix is a new tool we created recently to help pytorch users to maintain healthy code bases and follow best practices for p torch so first I want to show you several examples of the problems we are trying to solve so in this first example uh an API PCH API to compute sheski decomposition was recently changed so the function was moved from torch sheski to torch linal sheski and also parameters were changed so in the old API you can provide upper equals through parameter but in the new API you just compute ad joint ad joint instead so we want to update our code to use this new API but doing this manually is extremely tedious another example so sometimes you don't want to compute gradients for your parameters usually for performance reasons and to tell py torch that you don't need gradients you just set requires Great attribute to false unfortunately often people type require gradient require gr false and because it's python the attribute gets dynamically created and there is no error and your program continues to work but it doesn't do what it expected to do and this can lead to Performance degradation and this is actually hard to notice and eventually this exact issue was found in multiple popular large open- Source reppers another problem about data loader so if you don't provide num workers parameter for the data loader the default is zero and that means that the data loading happens in the same process as the computation synchronously synchronously so this mean that data loading can potentially block computation so for efficiency reasons in production you want to provide non workers parameter and set it to something greater than zero and the exact number depends probably on number of CPUs you have or something like that but this issue is not necessarily an error depending on your goals and how you run your code the default zero may be perfectly valid but we still want to be able to flag this to the user so the user can inspect and understand if it's an actual issue or not for them and this example is not about corite torch it's about a popular domain Library torch vision and in TCH Vision recently the API for loading pre-training pretrained was changed so previously you provide a Boolean pre Trend equals true or equals false but with the new API you provide weight parameter and specify exactly which weight you want to load this new API is much more flexible and we want to update our code to use it actually we want to update all code in the v to use this new API because after that torch Vision can drop completely support for the old API and again it's extremely tedious to do this by hand especially taking into account that TCH Vision doesn't have one model TCH Vision has many models with many weights and this API change applies to all of them so this is a solution for all these problems uh it's torch fix torch fix is a static analysis tool specialized for pytorch Tor uses lipst and lip CST is a popular library to work with python syntax trees lip CST allows torch fix to load python code get the syntax tree update the syntax tree and then write back the modified code and how to run TCH fig there are two mods one mod is a flate plug-in and another one is Standalone so in the flake 8 plug-in mode you just install TCH fix and then basically use flake8 normally this mode is very convenient if you already use flake in your project so if you have flate running in your CI you just install torch fix and specify addition now warnings you want to handle but in this mode there is no autof fixes only Linkin only errors and another mode is Standalone you run torch fix as a script and you can provide doh doh fix argument to autofix things that are Auto fixable and the latest line on this slide shows that not all the rules are enabled by default this is because some rules are a bit too noisy to be enabled by default to see all the rules all the results from all the rules you provide select equal equals all parameter and how to get TCH fix uh pretty easy to get the latest from piie you just P install TCH fix and if you want the latest code that is on GitHub you just download to clone GitHub repo and do people install there so what is the current stage so I'd say it's an early beta but it's already useful Tor fix already has the rules to find and fix all the examples I mentioned before and much more and it was already used to find dishes and update code in multiple projects s internally at meta and also in open source and it has and it has been running in CIS on several meta open source projects on GitHub on every commit in the future we want to add more rules for more classes of issues and this work will be guided by actual issues we find in real code bases also we want to add more configuration options for example right now torch fix assumes you want to use the latest version of P torch and this is not necessarily the case another direction we want to integrate with pytorch c and documentation generation for py torch so for example when you deprecate a function in py torch we want to be able to check that at that point there is exist a rule for TCH fix to actually flag and update that deprecated function to the new variant and of course we want to see torch fix used in CIS of more projects hopefully this will happen organically as people try torch fix and find it useful and how to get involved with it so first of all just run it try to run it on your code base and see if it finds any issues or you can find some issues with torch fix itself and after that you can go to the GitHub page and report a bu request feature or do code contribution it's all very welcome and if you want to do some coding for TCH fix and don't know where to start exactly we already have couple of good first issues open so that's it that's all I have thank youhello my name is Sergey I work at meta on pytorch developer experience and today I want to talk about torch fix torch fix is a new tool we created recently to help pytorch users to maintain healthy code bases and follow best practices for p torch so first I want to show you several examples of the problems we are trying to solve so in this first example uh an API PCH API to compute sheski decomposition was recently changed so the function was moved from torch sheski to torch linal sheski and also parameters were changed so in the old API you can provide upper equals through parameter but in the new API you just compute ad joint ad joint instead so we want to update our code to use this new API but doing this manually is extremely tedious another example so sometimes you don't want to compute gradients for your parameters usually for performance reasons and to tell py torch that you don't need gradients you just set requires Great attribute to false unfortunately often people type require gradient require gr false and because it's python the attribute gets dynamically created and there is no error and your program continues to work but it doesn't do what it expected to do and this can lead to Performance degradation and this is actually hard to notice and eventually this exact issue was found in multiple popular large open- Source reppers another problem about data loader so if you don't provide num workers parameter for the data loader the default is zero and that means that the data loading happens in the same process as the computation synchronously synchronously so this mean that data loading can potentially block computation so for efficiency reasons in production you want to provide non workers parameter and set it to something greater than zero and the exact number depends probably on number of CPUs you have or something like that but this issue is not necessarily an error depending on your goals and how you run your code the default zero may be perfectly valid but we still want to be able to flag this to the user so the user can inspect and understand if it's an actual issue or not for them and this example is not about corite torch it's about a popular domain Library torch vision and in TCH Vision recently the API for loading pre-training pretrained was changed so previously you provide a Boolean pre Trend equals true or equals false but with the new API you provide weight parameter and specify exactly which weight you want to load this new API is much more flexible and we want to update our code to use it actually we want to update all code in the v to use this new API because after that torch Vision can drop completely support for the old API and again it's extremely tedious to do this by hand especially taking into account that TCH Vision doesn't have one model TCH Vision has many models with many weights and this API change applies to all of them so this is a solution for all these problems uh it's torch fix torch fix is a static analysis tool specialized for pytorch Tor uses lipst and lip CST is a popular library to work with python syntax trees lip CST allows torch fix to load python code get the syntax tree update the syntax tree and then write back the modified code and how to run TCH fig there are two mods one mod is a flate plug-in and another one is Standalone so in the flake 8 plug-in mode you just install TCH fix and then basically use flake8 normally this mode is very convenient if you already use flake in your project so if you have flate running in your CI you just install torch fix and specify addition now warnings you want to handle but in this mode there is no autof fixes only Linkin only errors and another mode is Standalone you run torch fix as a script and you can provide doh doh fix argument to autofix things that are Auto fixable and the latest line on this slide shows that not all the rules are enabled by default this is because some rules are a bit too noisy to be enabled by default to see all the rules all the results from all the rules you provide select equal equals all parameter and how to get TCH fix uh pretty easy to get the latest from piie you just P install TCH fix and if you want the latest code that is on GitHub you just download to clone GitHub repo and do people install there so what is the current stage so I'd say it's an early beta but it's already useful Tor fix already has the rules to find and fix all the examples I mentioned before and much more and it was already used to find dishes and update code in multiple projects s internally at meta and also in open source and it has and it has been running in CIS on several meta open source projects on GitHub on every commit in the future we want to add more rules for more classes of issues and this work will be guided by actual issues we find in real code bases also we want to add more configuration options for example right now torch fix assumes you want to use the latest version of P torch and this is not necessarily the case another direction we want to integrate with pytorch c and documentation generation for py torch so for example when you deprecate a function in py torch we want to be able to check that at that point there is exist a rule for TCH fix to actually flag and update that deprecated function to the new variant and of course we want to see torch fix used in CIS of more projects hopefully this will happen organically as people try torch fix and find it useful and how to get involved with it so first of all just run it try to run it on your code base and see if it finds any issues or you can find some issues with torch fix itself and after that you can go to the GitHub page and report a bu request feature or do code contribution it's all very welcome and if you want to do some coding for TCH fix and don't know where to start exactly we already have couple of good first issues open so that's it that's all I have thank you\n"