PyTorch Mobile Runtime for Android _ Brad Heintz

**Fusing Layers for Optimization**

The first optimization step is to fuse layers. Layer fusion combines multiple pi torch modules into single operations, improving speed and memory footprint. Only certain ordered combinations of modules are allowed for full details see the docs at pytorch.org. But for this demonstration, we'll be fusing the convnet batch norm and relu, and we'll also fuse the linear layer and its following value. We have to specify the layers we want to fuse by name. Note how the inclusion of our first three layers in a block impacts the names we have to prepend the name of the sequential block with a dot to each of the names of the layers we want to address this pattern would continue if the blocks were nested more deeply.

Linear layers are not included in a block, so we just address them directly. Our linear layer is not included in a block, so we just address it and its following value directly. Now we run fuse modules, looking at the fused model structure. We can see that in each fused section, the first layer now encompasses the functionality of the whole block and later layers are converted to identity layers essentially no ops.

**Quantizing Your Model**

By default, pi torch models use 32-bit floating point numbers for weights and computation. Quantization converts the model to some narrower bit width number usually an 8-bit integer. Pi torch offers three workflows for quantization: dynamic post-training quantization, static post-training quantization, and quantization-aware training. These three quantization methods have trade-offs associated with them, which are discussed fully in the quantization documentation at pi torch.org.

Here we'll demonstrate static post-training quantization to quantize our model. First, we get a quantization config here q, and n. Note that this configuration is specific to arm processors, this is what you want for mobile devices but means you won't be able to run the model locally on an x86 processor after conversion. Next, we prepare the model for quantization that sets it up for calibration. The calibration step is optional but recommended to calibrate the model. You'll need to run a representative set of data through it much as you would for a training loop. This helps find the correct zero point and scaling for the float 32 to 8-bit end conversion.

Finally, calling torch.quantization.convert will do the actual quantization of your model's weights and activations. If you don't perform a calibration step, you'll get a default zero point and scaling factor as we did here. Our final step is to convert the model to torch script and optimize it for the pi torch mobile runtime.

**Torch Script and Optimizations**

Torchescript is pi torch's optimized model representation containing both your model's computation graph and learning weights. It allows the pi torch just in time compiler in the pi torch mobile runtime to perform runtime optimizations during inference. The pytorch mobile optimizer makes further adjustments to the model that are specific to pi torch mobile.

Once we've converted the model with torch.jet.script, and optimized it, we can save it the file we save here will be the one we include in our mobile project. We'll need to add some resources - I'm going to add my model file of course you should use your model exported the torch script, and an image for the model to classify. I'll put them in a new assets folder.

I'm also going to add one source file that just contains a string array of the human readable labels for the classes my model is trained against. Next we'll put together a UI - I'll have an image view to show the image we're classifying, a button to start the inference process, and a text view to show the result.

**Setting up the UI**

Watching me set up UI constraints is not very educational so let's speed through this bit now. Let's fill in the code for our activity - we'll make a couple of private members for our image and our model. Next I'll add a helper function that gets the absolute path for an asset pi torch mobile expects a file in the file system for the model.

Now, we'll fill in the oncreate it's a bunch of code so let's take it a piece at a time first we get the bitmap and model objects - we wrapped these in a try block because if there's an issue with either we can't run the app really. Next, we'll fill the image view with the image and set up an on click listener for our button inside the on click listener we're going to convert our image to a pi torch tensor we're going to pass that tensor to the model for classification and receive the output.

We'll find the model's most likely class for this image and its human readable label, and finally we'll report that label in the text view - so let's run it now and see it work.

"WEBVTTKind: captionsLanguage: enin this video i'm going to give you a walkthrough of setting up the pi torch mobile runtime in an android project to follow along you'll need android studio 3.5 or higher and you'll need to have gradle installed you should also be using pytorch 1.7 or higher to take advantage of the mobile optimization processes shown in this video pytorch offers native runtimes for ios and android allowing you to bring your machine learning app to mobile devices including the library in your project is a one-liner but there is more to do getting your model set up for best performance on mobile in this video we'll demonstrate how to set up your android studio project to include pytorch mobile how to export your model to torch script pytorch's optimized model representation how to optimize your model for best performance on mobile how to include your model in your project and how to call the model for inference from your java code for this demonstration we're going to build an image classifier into an android app first we'll create the project i'm going to create an empty activity app i'll set the target language to java and the minimum sdk version to 28. in the build.gradle for your project make sure that jsender is listed in your repositories now we'll add pytorch the build.gradle for the app this will bring in versions of the pi torch android library for all of the android apis for arm and x86 it will also bring in a library from torchvision that contains helper functions for converting android in-memory image types to pi torch tensors now i'll show you how to prepare your model to run on the pi torch mobile runtime you'll want pytorch 1.7 or higher for this workflow first we'll need a model for the mobile project i have a pre-trained model ready to go but i'll be demonstrating the optimization process on a custom model here to better show how you can apply this to your own models this is a simple model containing a few common layer types first there's a section with a convolutional layer a batch norm layer and a relu these are wrapped together in a torsional.sequential this is a common practice to organize submodules within complex models we'll be doing it here to show how organizing your layers this way impacts layer fusion after that we have a linear layer and another value the forward function strings these layers and operations together in a pretty straightforward way note that i've also added quantization stubs you'll need to include these if you want to use quantization aware training now we'll instantiate the model and put it in eval mode this is an important step as it turns off computationally expensive gradient tracking and disables training only features such as dropout layers looking at the structure of the model it looks like we'd expect all the layers we created in the order we created them with the first section wrapped in a sequential block our first optimization step is to fuse layers layer fusion combines multiple pi torch modules into single operations improving speed and memory footprint only certain ordered combinations of modules are allowed for full details see the docs at pytorch.org but for this demonstration we'll be fusing the convnet batch norm and relu and we'll also fuse the linear layer and its following value we have to specify the layers we want to fuse by name note how the inclusion of our first three layers in a block impacts the names we have to prepend the name of the sequential block with a dot to each of the names of the layers we want to address this pattern would continue if the blocks were nested more deeply linear layers are not included in a block so we just address them directly our linear layer is not included in a block so we just address it and its following value directly now we run fuse modules looking at the fused model structure we can see that in each fused section the first layer now encompasses the functionality of the whole block and later layers are converted to identity layers essentially no ops the second step is quantizing your model by default pi torch models use 32-bit floating point numbers for weights and computation quantization converts the model to some narrower bit width number usually an 8-bit integer pi torch offers three workflows for quantization dynamic post-training quantization quantizes the model's weights ahead of time but handles quantization of activations dynamically at runtime static post-training quantization quantizes both the weights and activations ahead of time quantization-aware training simulates the effects of quantization during training for added accuracy these three quantization methods have trade-offs associated with them which are discussed fully in the quantization documentation at pi torch.org here we'll demonstrate static post-training quantization to quantize our model first we'll get a quantization config here q and n pack note that this configuration is specific to arm processors this is what you want for mobile devices but means you won't be able to run the model locally on an x86 processor after conversion next we prepare the model for quantization that sets it up for calibration the calibration step is optional but recommended to calibrate the model you'll need to run a representative set of data through it much as you would for a training loop this helps find the correct zero point and scaling for the float 32 to 8-bit end conversion finally calling torch.quantization.convert will do the actual quantization of your model's weights and activations if you don't perform a calibration step you'll get a default zero point and scaling factor as we did here if you note the warning text our final step is to convert the model to torch script and optimize it for the pi torch mobile runtime torch script is pi torch's optimized model representation containing both your model's computation graph and learning weights it allows the pi torch just in time compiler in the pi torch mobile runtime to perform runtime optimizations during inference the pytorch mobile optimizer makes further adjustments to the model that are specific to pi torch mobile once we've converted the model with torch.jet.script and optimized it we can save it the file we save here will be the one we include in our mobile project we'll need to add some resources i'm going to add my model file of course you should use your model exported the torch script and an image for the model to classify i'll put them in a new assets folder i'm also going to add one source file that just contains a string array of the human readable labels for the classes my model is trained against next we'll put together a ui i'll have an image view to show the image we're classifying a button to start the inference process and a text view to show the result now watching me set up ui constraints is not very educational so let's speed through this bit now let's fill in the code for our activity we'll make a couple of private members for our image and our model next i'll add a helper function that gets the absolute path for an asset pi torch mobile expects a file in the file system for the model next we'll fill in the oncreate it's a bunch of code so let's take it a piece at a time first we get the bitmap and model objects we wrapped these in a try block because if there's an issue with either we can't run the app really next we'll fill the image view with the image and we'll set up an on click listener for our button inside the on click listener we're going to convert our image to a pi torch tensor we're going to pass that tensor to the model for classification and receive the output we'll find the model's most likely class for this image and its human readable label and finally we'll report that label in the text view so let's run it now and see it work and there's our cat we press the infer button and sure enough our model thinks our cat is a cat success youin this video i'm going to give you a walkthrough of setting up the pi torch mobile runtime in an android project to follow along you'll need android studio 3.5 or higher and you'll need to have gradle installed you should also be using pytorch 1.7 or higher to take advantage of the mobile optimization processes shown in this video pytorch offers native runtimes for ios and android allowing you to bring your machine learning app to mobile devices including the library in your project is a one-liner but there is more to do getting your model set up for best performance on mobile in this video we'll demonstrate how to set up your android studio project to include pytorch mobile how to export your model to torch script pytorch's optimized model representation how to optimize your model for best performance on mobile how to include your model in your project and how to call the model for inference from your java code for this demonstration we're going to build an image classifier into an android app first we'll create the project i'm going to create an empty activity app i'll set the target language to java and the minimum sdk version to 28. in the build.gradle for your project make sure that jsender is listed in your repositories now we'll add pytorch the build.gradle for the app this will bring in versions of the pi torch android library for all of the android apis for arm and x86 it will also bring in a library from torchvision that contains helper functions for converting android in-memory image types to pi torch tensors now i'll show you how to prepare your model to run on the pi torch mobile runtime you'll want pytorch 1.7 or higher for this workflow first we'll need a model for the mobile project i have a pre-trained model ready to go but i'll be demonstrating the optimization process on a custom model here to better show how you can apply this to your own models this is a simple model containing a few common layer types first there's a section with a convolutional layer a batch norm layer and a relu these are wrapped together in a torsional.sequential this is a common practice to organize submodules within complex models we'll be doing it here to show how organizing your layers this way impacts layer fusion after that we have a linear layer and another value the forward function strings these layers and operations together in a pretty straightforward way note that i've also added quantization stubs you'll need to include these if you want to use quantization aware training now we'll instantiate the model and put it in eval mode this is an important step as it turns off computationally expensive gradient tracking and disables training only features such as dropout layers looking at the structure of the model it looks like we'd expect all the layers we created in the order we created them with the first section wrapped in a sequential block our first optimization step is to fuse layers layer fusion combines multiple pi torch modules into single operations improving speed and memory footprint only certain ordered combinations of modules are allowed for full details see the docs at pytorch.org but for this demonstration we'll be fusing the convnet batch norm and relu and we'll also fuse the linear layer and its following value we have to specify the layers we want to fuse by name note how the inclusion of our first three layers in a block impacts the names we have to prepend the name of the sequential block with a dot to each of the names of the layers we want to address this pattern would continue if the blocks were nested more deeply linear layers are not included in a block so we just address them directly our linear layer is not included in a block so we just address it and its following value directly now we run fuse modules looking at the fused model structure we can see that in each fused section the first layer now encompasses the functionality of the whole block and later layers are converted to identity layers essentially no ops the second step is quantizing your model by default pi torch models use 32-bit floating point numbers for weights and computation quantization converts the model to some narrower bit width number usually an 8-bit integer pi torch offers three workflows for quantization dynamic post-training quantization quantizes the model's weights ahead of time but handles quantization of activations dynamically at runtime static post-training quantization quantizes both the weights and activations ahead of time quantization-aware training simulates the effects of quantization during training for added accuracy these three quantization methods have trade-offs associated with them which are discussed fully in the quantization documentation at pi torch.org here we'll demonstrate static post-training quantization to quantize our model first we'll get a quantization config here q and n pack note that this configuration is specific to arm processors this is what you want for mobile devices but means you won't be able to run the model locally on an x86 processor after conversion next we prepare the model for quantization that sets it up for calibration the calibration step is optional but recommended to calibrate the model you'll need to run a representative set of data through it much as you would for a training loop this helps find the correct zero point and scaling for the float 32 to 8-bit end conversion finally calling torch.quantization.convert will do the actual quantization of your model's weights and activations if you don't perform a calibration step you'll get a default zero point and scaling factor as we did here if you note the warning text our final step is to convert the model to torch script and optimize it for the pi torch mobile runtime torch script is pi torch's optimized model representation containing both your model's computation graph and learning weights it allows the pi torch just in time compiler in the pi torch mobile runtime to perform runtime optimizations during inference the pytorch mobile optimizer makes further adjustments to the model that are specific to pi torch mobile once we've converted the model with torch.jet.script and optimized it we can save it the file we save here will be the one we include in our mobile project we'll need to add some resources i'm going to add my model file of course you should use your model exported the torch script and an image for the model to classify i'll put them in a new assets folder i'm also going to add one source file that just contains a string array of the human readable labels for the classes my model is trained against next we'll put together a ui i'll have an image view to show the image we're classifying a button to start the inference process and a text view to show the result now watching me set up ui constraints is not very educational so let's speed through this bit now let's fill in the code for our activity we'll make a couple of private members for our image and our model next i'll add a helper function that gets the absolute path for an asset pi torch mobile expects a file in the file system for the model next we'll fill in the oncreate it's a bunch of code so let's take it a piece at a time first we get the bitmap and model objects we wrapped these in a try block because if there's an issue with either we can't run the app really next we'll fill the image view with the image and we'll set up an on click listener for our button inside the on click listener we're going to convert our image to a pi torch tensor we're going to pass that tensor to the model for classification and receive the output we'll find the model's most likely class for this image and its human readable label and finally we'll report that label in the text view so let's run it now and see it work and there's our cat we press the infer button and sure enough our model thinks our cat is a cat success you\n"