Keynote - Enabling Generative AI on the Edge - Cormac Brick, Principal Engineer, Google

# Generative AI on the Edge: A Deep Dive into Innovations and Tools

## Introduction

Good morning! My name is Caric Prick, and I’m excited to share insights about how generative AI is becoming increasingly popular on the edge. This growth is driven by significant advancements in the PyTorch ecosystem and the broader open model community. Edge developers are leveraging these tools to create innovative applications that provide instant responses, work offline, and deliver personalized experiences while respecting privacy by keeping data local.

The idea of deploying generative AI on edge devices might have seemed futuristic a few years ago, but today it’s no longer just a vision—it’s a reality. With the right compute power, models, and tools, developers are building cutting-edge applications that run seamlessly on mobile, desktop, IoT devices, or even within browsers.

---

## Compute Power: The Backbone of Generative AI

When discussing generative AI, one cannot overlook the importance of compute power. Over the past year, there has been a lot of attention focused on AI’s massive computational demands. However, what often goes unnoticed is the quieter revolution happening in mobile NPUs (Neural Processing Units).

In 2024 alone, it’s estimated that five ZetaOps of compute power will be shipped through mobile NPUs—a figure that highlights the immense potential of mobile ecosystems. While GPUs and CPUs also play a significant role in this ecosystem, the advancements in mobile NPUs are particularly noteworthy.

For context, let’s compare the projected compute power for NVIDIA H100s and mobile NPUs:

- **NVIDIA H100s**: 4 ZetaOps of compute power.

- **Mobile NPUs**: Approximately 5 ZetaOps of compute power.

These numbers underscore the growing parity between high-end data center GPUs and mobile hardware, making generative AI deployment on edge devices more feasible than ever before.

---

## Model Quality: From Large to Small, Innovation is Everywhere

The past year has seen remarkable progress in both large-scale and smaller open models. While large language models (LLMs) continue to dominate the headlines, smaller models are also making significant strides.

### Key Observations:

1. **Large Open Models**: The blue bars on our chart reveal a clear upward trend in innovation, with LLMs delivering exciting new capabilities to the open ecosystem.

2. **Smaller Models (Less than 4B Parameters)**: These models are equally important due to their smaller memory footprint and ability to run efficiently on edge devices.

For instance, consider the journey of **Gemma 2**:

- At the start of this year, Gemma 2 was a modest 53-parameter model.

- By the end of last month, it had evolved into a more capable version, showcasing how quickly smaller models are catching up to larger ones in terms of performance and features.

This trend is particularly promising for developers looking to deploy generative AI on devices with limited computational resources.

---

## Deploying Generative AI: A Comparison to Classic ML

Deploying generative AI models on edge devices differs significantly from traditional machine learning (ML) deployments. Let’s break down the differences:

### Classic ML Deployment:

1. **Workflow**: Define, train, and export a model using tools like PyTorch or TensorFlow.

2. **Runtime**: Use frameworks like ONNX or TensorRT for deployment on edge devices.

3. **Application Logic**: Wrap the model in simple application logic and call runtime APIs.

### Generative AI Deployment:

- The process is more complex due to the need for dynamic interactions, such as text generation or image synthesis.

- This complexity has led to the emergence of Boutique Frameworks designed specifically for generative AI on edge devices.

However, there’s good news: the PyTorch ecosystem is actively addressing these challenges. Key priorities include:

1. **Keeping It in PyTorch**: There’s no need to switch tools when deploying generative AI. You can stay within PyTorch for model development and deployment.

2. **Single Model Artifact**: Export a single model artifact that includes all weights while allowing flexibility for different deployment scenarios (e.g., prefill and decode operations).

---

## AI EdgeTorch: A Tool for Modern Deployment

To meet the demands of generative AI on edge devices, our team has developed **AI EdgeTorch**, a library within PyTorch designed specifically for edge deployment. Here’s how it works:

1. **Model Development**: Build models in PyTorch using tools like `torch.tune` for fine-tuning.

2. **Optimization**: Use AI EdgeTorch-optimized layers to improve performance and validate your models directly within the PyTorch environment.

3. **Conversion and Runtime**: Convert your model into a format suitable for edge devices using `LOR`, which is a renamed and evolved version of TensorFlow Lite.

AI EdgeTorch supports a variety of smaller models, including:

- Llama (Tiny)

- Gemma

- Open Elm F2 Small

- T5

- Stable Diffusion

These models are optimized for edge devices, ensuring they run efficiently on mobile GPUs and other hardware.

---

## Tiny Lama: A Case Study in Efficiency

Here’s a quick look at how Tiny Lama—a small language model—was implemented using PyTorch. The code, written in just 30 lines, demonstrates the power of PyTorch for edge deployments:

```python

import torch

from torch import nn

class TinyLlama(nn.Module):

def __init__(self):

super().__init__()

# Standard NN modules here...

self.decoder_layer = nn.TransformerDecoderLayer(d_model=..., ...)

# Optimized rope catch implementation here...

model = TinyLlama()

# Export the model with multiple entry points for prefill and decode

scripted_model = torch.compile(model, mode='max-autotune')

scripted_model.save('TinyLlama.pt')

```

This example highlights how developers can create highly optimized models that run efficiently on edge devices while maintaining flexibility for different deployment scenarios.

---

## Model Explorer: Visualizing and Debugging Models

To ensure optimal performance and usability, we’ve developed **Model Explorer**, a tool designed to visualize and analyze generative AI models. Here’s what it does:

1. **Handling Large Models**: The tool supports massive models, including Gemma 2 (with 2,000 nodes) and internal models with up to 50,000 nodes.

2. **Model Hierarchy**: Visualize the model hierarchy, from high-level blocks down to specific layers like attention mechanisms.

3. **Metadata and Insights**: View metadata about the model, such as node names and performance metrics.

4. **Custom Overlays**: Add custom JSON overlays to provide additional insights, such as heatmaps for runtime latency.

For example, when analyzing Tiny Lama on a CPU, the tool reveals that the final fully connected layer delivers low runtime latency—a critical insight for optimizing performance.

---

## Performance Benchmarks and Examples

AI EdgeTorch has been benchmarked against internally developed handwritten models, and the results are encouraging:

- AI EdgeTorch performs within 10% of target performance on edge devices.

We’ve also built real-world examples of applications running on mobile GPUs, showcasing how generative AI can deliver fast, responsive experiences on-edge.

---

## Conclusion and Call to Action

Generative AI on the edge is at an exciting inflection point, with:

- A growing amount of compute power available for deployment.

- Rapid innovation in both large and small models.

- Tools like AI EdgeTorch and Model Explorer making deployment more accessible and efficient.

If you’re interested in learning more about these tools or want to dive deeper into generative AI on the edge, we invite you to:

1. Explore the open-source AI EdgeTorch library.

2. Check out our poster session for a hands-on demonstration of Model Explorer.

3. Join us at the exhibit booth to see real devices running generative AI models in action.

Thank you for your interest in this space, and we’re excited to see what you’ll build with these tools!

"WEBVTTKind: captionsLanguage: enhey good morning my name is caric prick I'm excited to be here to talk about how generative AI on the edge is becoming increasingly popular thanks to lots of really good work by the pytorch in The Wider open model ecosystem and also highlight how some of the projects um I'm working on are kind of also helping that effort okay so when we work with Edge developers um and Edge development teams they just want to build great apps and they've seen the potential of generation of a Ai and they want to translate that into great AI enabled experiences that work well for all of their users they love to be able to build new AI part applications that respond instantly that work offline and that allow them to build personalized experiences while still respecting privacy by keeping data local and seamlessly scal to a wide user base in a cost-effective way using compute on edge devices helps in all of these ways on availability on price and on cost and when we say Edge this can mean deploying across mobile desktop iot devices or even locally within the browser now if you were to go back to the beginning of the Gen Revolution three or four years ago the idea of deploying to Edge devices would have seemed wild right these days that's no longer the case and what I want to show in this talk is across each of these gen needs that how the community is responding to meet these needs and also how a couple of projects from our team are helping out so first off in compute um we've seen a lot this year a lot of coverage on how AI requires tons and tons of compute and we've also seen how um you know nid products particularly h100s have been really really popular so an interesting way to kind of Baseline what's going on on the edge is let's have a look at the total amount of compute that's projected to ship this year with h100s so we can see from this chart that's a very large number it's like four Zeta Ops which is like a billion tops um and that's a projected number that we'll ship with Nvidia h100s this year meanwhile if you look there's a much quieter revolution has been happening in the world of mobile npus and there it's estimated that roughly you know kind of five Zeta Ops will ship in Mobile npu acceleration in 2024 so this is kind of showing that there's a heap of potential that's now being Unleashed in the mobile ecosystem and not just to call out npus we we also work directly with a lot of GPU optimizations and CPU and later we'll see some examples of how you know GPU compute for this is working really well as well on the edge next up let's have a look at what's happening in the world of model quality right um so here we're looking at large and small open models and what's been happening over the last you know 12 months um so we see here there's been great if we look at the blue bars we see a clear kind of up and to the right Trend where there's been you know great innovation in large open models that are delivering lots of exciting capabilities to the open ecosystem then let's look at what we' call smaller models and here for for the purpose of this chart we're limiting the models with less than four billion parameters that's an important kind of break point because even with you know if we get 4-bit quantization that allows us to kind of fit both model and runtime often into kind of less than two gigabytes of DDR memory footprint and that's a that's a really useful number if we want to scale to lot of mobile devices so then looking at the green bar chart we can see we're going you know from where we were um at the start of this year uh with Gemma 2 and then was a kind of 53 model and then just recently at the end of last month we had a Gemma 2 model getting released and what's interesting is you compare to the state-of-the-art of really large open models where we were last year to you know what's now capable even in the last month with what the open model ecosystem is doing for smaller footprint models um yeah this is amazing there's really uh amazing um capabilities are being delivered into users hands um that are relevant and ready for on device experimentation okay next up I'd like to talk a little bit about kind of deploying classic LML uh versus deploying gen uh so in the world of classic ml life was kind of relatively straightforward you define and train a model you'd use torch export you'd export to a particular Edge AI toolkit which is like AI Edge torch onx executor tens or T and then you would deploy and your application would have uh pretty simple construct of kind of application logic you'd wrap a model and you'd call a runtime API now with applying gen from pytorch this picture gets a little bit more complicated um and that's kind of resulted in some kind of more Boutique Frameworks um for UND device um generative AI deployment um that have also become you know popular for Edge deployment so there's a few key ideas of what we'd like to be able to do uh to make this better starting from pytorch one is we really want to keep developing in pytorch you know there's no need for you to leave and use a completely different set of tools just because you want to deploy gen to the edge and typically all of your model evals um that you want to use are going to start in p p torch as long as kind of along with fine tuning capabilities like torch tune also it's really helpful for Edge deployment to have a single model artifact that has all of the model weights but is flexible enough to enable deployment in different ways so this means exporting a separate entry point for kind of prefill and decode while also being able to share weights if you had like a multimodal model with a specific image encoder or audio encoder that would also happen here and one of the key points here is we also want to have a lot of flexibility and how we deploy and this is one of the things that makes you know the generative a AI picture different is how we want to deploy so for different types of applications we may want to kind of um we may want to have kind of a lot of application specific customizations of how the KV cach is used let's say for really common prompts you want to kind of store a KV cach state that corresponds to like really common um prompt prefixes or you want to call you know decode either in kind of smaller or larger increments depending on the type of user interface you want to see so by having this type of scheme where we can export a model with different entry points this then allows application developers to build lots of different types of applications starting from the same model artifact without having the need to go back to the framework if they just want to you know have a different KV cache scheme or different um way of integrating their model with um the rest of their application okay so if there are the requirements um One Way um that our team has been working on to meet those requirements has been deploying gen to the edge using AI Edge torch which is the library within py torch and light or t which is an optimized runtime for M out deployment and here are the flow as you build the models in pytorch using you know the tools we've heard about earlier like kind of you know you could use things like torch tune and then you use AI Edge torch optimized layers you can then optimize and validate all within pytorch app apply custom quantization and then you can convert and run in kind of the Lor runtime a quickest slide here is Lor is a rename for tensorflow light that we did earlier this month and that reflects um reflects an evolution that's happened in that product over the last um six months to support um to have great support for pytorch in addition to um other ecosystem Frameworks um one benefit of this is it allows you to deploy generous a um the generative AI models to Edge devices using the same kind of runtime and techniques that you use for all of your classic ml models as well and this ai ai Edge torch project uh includes lots of different examples of the types of smaller models that I was talking about earlier including time a Lama Gemma open Elm F2 small T5 stable diffusion and we're continuing to add more as they become relevant to Edge developers now let's uh take look at what that looks like in code so we're not going to read through all of this uh two key ideas here is this is Tiny lamba constructed in less than 30 lines of code uh in the top we see some just kind of classic NN modules which are you know really and kind of standard py orch uh and lower it down with a particular implementation of um uh an optimized rope catch as is used in tiny Lama secondly this is how we export that idea of being able to have a uh being able to export a model with multiple signatures is a key idea particularly we've um yeah two different signatures and shared weights between them and this is showing how after torch export we export a model um using AI Edge to create a single file that can be deployed to Edge devices and still allowing developers all of that customization okay so when we put it all together um we can end up with applications that looks like this one this is an example of a Gemma variant running on mobile GPU and we can see that the output here is pretty quick um we ourselves have benchmarked the performance of um you know kind of AI Edge torch running on edge devices compared to earlier work that some of our teams did internally uh which was kind of fully handwritten and we find it Compares really favorably within 10% of the target performance um this application is also available fully open source if anybody wants uh um anybody wants to hack with that code today that code is available and um this model is also running fully on GPU uh we're also working really closely with kind of uh different Hardware vendors as we've seen that npus are going to be really important to the future of generative a um the future of generative um AI okay uh so that's it for the kind of model development and deployment but you know what when things go wrong or you're worried about performance so next up I'd like to look at a tool that we've developed on how we can visualize models and understand performance both of which are really critical um for on device use so that tool is called Model Explorer um here we're looking at a version of Gemma 2 viewed in the tool the tool has a few key ideas um one is it's capable of handling you know massively large models something which existing ecosystem tools can't do so Gemma 2 is 2,000 nodes we've T tested this in very large internal models of up to 50,000 nodes and it performs seamlessly the second with these really large models it's very helpful to have um uh to have model hierarchy reflected in how you uh view the models so here you can start off with a with a uh kind of very zoomed out kind of top level view which will show only kind of major blocks of the network or you can also drill down through layers of hierarchy to see um for example in this screen grab of how the you know particular attention layer is behaving within the Gemma model and additionally while you view that you can see lots of metadata about the model on the right and as an open source project one of the really nice things about this is it's highly extensible so it's very easy to add support for new model formats but also it has this kind of neat idea of kind of custom model metadata overlays where you can produce um where you can give the model additional files additional Json files that have the um node names and additional information and then um view that in various ways as something like a heat map so this is back to the uh tiny Lama uh model and here we're looking at a kind of heat map of kind of runtime latency running on um a CPU for for that model and you can see the the final fully connected layer is what's uh resulting in low run times for this um for this variant um yeah so this project available today and it supports multiple Frameworks it supports um pytor exported programs which is the example you can see at the end you can just call pytorch Export and then um just pass that exported uh program to a model Explorer visualizer and we support multiple Frameworks so both kind of pych Lor te exported the community has provided Onyx implementation in the last couple of months and we also support tensor flow on Jacks cool um that's kind of mostly it for me today uh takeaways um one it's a really exciting time ahead for Gen on edge devices uh there's a growing amount of compute available with better and better abstractions on top of it and also there's a really rapid pace of model Innovation at the edge second is the pych ecosystem is doing lots to help uh get generation of AI models running well on the edge so here there's kind of two projects that I've discuss which are from our team around AI Edge torch and model Explorer uh also a quick shout out to a couple of other projects you heard about earlier today like torch chat and torch tun also really exciting um uh projects in the space um also I would say you can come check out the model Explorer has a poster session uh if you want to kind of talk to the model Explorer team this evening they'll be available at that and also there's a boo we have with um examples of uh real devices running these types of models if you want to kind of come check that out as well um yeah so that's it uh for me today uh really excited to be here as part of the pitor ecosystem and really excited to see what you're going to build with each of these tools thank youhey good morning my name is caric prick I'm excited to be here to talk about how generative AI on the edge is becoming increasingly popular thanks to lots of really good work by the pytorch in The Wider open model ecosystem and also highlight how some of the projects um I'm working on are kind of also helping that effort okay so when we work with Edge developers um and Edge development teams they just want to build great apps and they've seen the potential of generation of a Ai and they want to translate that into great AI enabled experiences that work well for all of their users they love to be able to build new AI part applications that respond instantly that work offline and that allow them to build personalized experiences while still respecting privacy by keeping data local and seamlessly scal to a wide user base in a cost-effective way using compute on edge devices helps in all of these ways on availability on price and on cost and when we say Edge this can mean deploying across mobile desktop iot devices or even locally within the browser now if you were to go back to the beginning of the Gen Revolution three or four years ago the idea of deploying to Edge devices would have seemed wild right these days that's no longer the case and what I want to show in this talk is across each of these gen needs that how the community is responding to meet these needs and also how a couple of projects from our team are helping out so first off in compute um we've seen a lot this year a lot of coverage on how AI requires tons and tons of compute and we've also seen how um you know nid products particularly h100s have been really really popular so an interesting way to kind of Baseline what's going on on the edge is let's have a look at the total amount of compute that's projected to ship this year with h100s so we can see from this chart that's a very large number it's like four Zeta Ops which is like a billion tops um and that's a projected number that we'll ship with Nvidia h100s this year meanwhile if you look there's a much quieter revolution has been happening in the world of mobile npus and there it's estimated that roughly you know kind of five Zeta Ops will ship in Mobile npu acceleration in 2024 so this is kind of showing that there's a heap of potential that's now being Unleashed in the mobile ecosystem and not just to call out npus we we also work directly with a lot of GPU optimizations and CPU and later we'll see some examples of how you know GPU compute for this is working really well as well on the edge next up let's have a look at what's happening in the world of model quality right um so here we're looking at large and small open models and what's been happening over the last you know 12 months um so we see here there's been great if we look at the blue bars we see a clear kind of up and to the right Trend where there's been you know great innovation in large open models that are delivering lots of exciting capabilities to the open ecosystem then let's look at what we' call smaller models and here for for the purpose of this chart we're limiting the models with less than four billion parameters that's an important kind of break point because even with you know if we get 4-bit quantization that allows us to kind of fit both model and runtime often into kind of less than two gigabytes of DDR memory footprint and that's a that's a really useful number if we want to scale to lot of mobile devices so then looking at the green bar chart we can see we're going you know from where we were um at the start of this year uh with Gemma 2 and then was a kind of 53 model and then just recently at the end of last month we had a Gemma 2 model getting released and what's interesting is you compare to the state-of-the-art of really large open models where we were last year to you know what's now capable even in the last month with what the open model ecosystem is doing for smaller footprint models um yeah this is amazing there's really uh amazing um capabilities are being delivered into users hands um that are relevant and ready for on device experimentation okay next up I'd like to talk a little bit about kind of deploying classic LML uh versus deploying gen uh so in the world of classic ml life was kind of relatively straightforward you define and train a model you'd use torch export you'd export to a particular Edge AI toolkit which is like AI Edge torch onx executor tens or T and then you would deploy and your application would have uh pretty simple construct of kind of application logic you'd wrap a model and you'd call a runtime API now with applying gen from pytorch this picture gets a little bit more complicated um and that's kind of resulted in some kind of more Boutique Frameworks um for UND device um generative AI deployment um that have also become you know popular for Edge deployment so there's a few key ideas of what we'd like to be able to do uh to make this better starting from pytorch one is we really want to keep developing in pytorch you know there's no need for you to leave and use a completely different set of tools just because you want to deploy gen to the edge and typically all of your model evals um that you want to use are going to start in p p torch as long as kind of along with fine tuning capabilities like torch tune also it's really helpful for Edge deployment to have a single model artifact that has all of the model weights but is flexible enough to enable deployment in different ways so this means exporting a separate entry point for kind of prefill and decode while also being able to share weights if you had like a multimodal model with a specific image encoder or audio encoder that would also happen here and one of the key points here is we also want to have a lot of flexibility and how we deploy and this is one of the things that makes you know the generative a AI picture different is how we want to deploy so for different types of applications we may want to kind of um we may want to have kind of a lot of application specific customizations of how the KV cach is used let's say for really common prompts you want to kind of store a KV cach state that corresponds to like really common um prompt prefixes or you want to call you know decode either in kind of smaller or larger increments depending on the type of user interface you want to see so by having this type of scheme where we can export a model with different entry points this then allows application developers to build lots of different types of applications starting from the same model artifact without having the need to go back to the framework if they just want to you know have a different KV cache scheme or different um way of integrating their model with um the rest of their application okay so if there are the requirements um One Way um that our team has been working on to meet those requirements has been deploying gen to the edge using AI Edge torch which is the library within py torch and light or t which is an optimized runtime for M out deployment and here are the flow as you build the models in pytorch using you know the tools we've heard about earlier like kind of you know you could use things like torch tune and then you use AI Edge torch optimized layers you can then optimize and validate all within pytorch app apply custom quantization and then you can convert and run in kind of the Lor runtime a quickest slide here is Lor is a rename for tensorflow light that we did earlier this month and that reflects um reflects an evolution that's happened in that product over the last um six months to support um to have great support for pytorch in addition to um other ecosystem Frameworks um one benefit of this is it allows you to deploy generous a um the generative AI models to Edge devices using the same kind of runtime and techniques that you use for all of your classic ml models as well and this ai ai Edge torch project uh includes lots of different examples of the types of smaller models that I was talking about earlier including time a Lama Gemma open Elm F2 small T5 stable diffusion and we're continuing to add more as they become relevant to Edge developers now let's uh take look at what that looks like in code so we're not going to read through all of this uh two key ideas here is this is Tiny lamba constructed in less than 30 lines of code uh in the top we see some just kind of classic NN modules which are you know really and kind of standard py orch uh and lower it down with a particular implementation of um uh an optimized rope catch as is used in tiny Lama secondly this is how we export that idea of being able to have a uh being able to export a model with multiple signatures is a key idea particularly we've um yeah two different signatures and shared weights between them and this is showing how after torch export we export a model um using AI Edge to create a single file that can be deployed to Edge devices and still allowing developers all of that customization okay so when we put it all together um we can end up with applications that looks like this one this is an example of a Gemma variant running on mobile GPU and we can see that the output here is pretty quick um we ourselves have benchmarked the performance of um you know kind of AI Edge torch running on edge devices compared to earlier work that some of our teams did internally uh which was kind of fully handwritten and we find it Compares really favorably within 10% of the target performance um this application is also available fully open source if anybody wants uh um anybody wants to hack with that code today that code is available and um this model is also running fully on GPU uh we're also working really closely with kind of uh different Hardware vendors as we've seen that npus are going to be really important to the future of generative a um the future of generative um AI okay uh so that's it for the kind of model development and deployment but you know what when things go wrong or you're worried about performance so next up I'd like to look at a tool that we've developed on how we can visualize models and understand performance both of which are really critical um for on device use so that tool is called Model Explorer um here we're looking at a version of Gemma 2 viewed in the tool the tool has a few key ideas um one is it's capable of handling you know massively large models something which existing ecosystem tools can't do so Gemma 2 is 2,000 nodes we've T tested this in very large internal models of up to 50,000 nodes and it performs seamlessly the second with these really large models it's very helpful to have um uh to have model hierarchy reflected in how you uh view the models so here you can start off with a with a uh kind of very zoomed out kind of top level view which will show only kind of major blocks of the network or you can also drill down through layers of hierarchy to see um for example in this screen grab of how the you know particular attention layer is behaving within the Gemma model and additionally while you view that you can see lots of metadata about the model on the right and as an open source project one of the really nice things about this is it's highly extensible so it's very easy to add support for new model formats but also it has this kind of neat idea of kind of custom model metadata overlays where you can produce um where you can give the model additional files additional Json files that have the um node names and additional information and then um view that in various ways as something like a heat map so this is back to the uh tiny Lama uh model and here we're looking at a kind of heat map of kind of runtime latency running on um a CPU for for that model and you can see the the final fully connected layer is what's uh resulting in low run times for this um for this variant um yeah so this project available today and it supports multiple Frameworks it supports um pytor exported programs which is the example you can see at the end you can just call pytorch Export and then um just pass that exported uh program to a model Explorer visualizer and we support multiple Frameworks so both kind of pych Lor te exported the community has provided Onyx implementation in the last couple of months and we also support tensor flow on Jacks cool um that's kind of mostly it for me today uh takeaways um one it's a really exciting time ahead for Gen on edge devices uh there's a growing amount of compute available with better and better abstractions on top of it and also there's a really rapid pace of model Innovation at the edge second is the pych ecosystem is doing lots to help uh get generation of AI models running well on the edge so here there's kind of two projects that I've discuss which are from our team around AI Edge torch and model Explorer uh also a quick shout out to a couple of other projects you heard about earlier today like torch chat and torch tun also really exciting um uh projects in the space um also I would say you can come check out the model Explorer has a poster session uh if you want to kind of talk to the model Explorer team this evening they'll be available at that and also there's a boo we have with um examples of uh real devices running these types of models if you want to kind of come check that out as well um yeah so that's it uh for me today uh really excited to be here as part of the pitor ecosystem and really excited to see what you're going to build with each of these tools thank you\n"

Keynote - Enabling Generative AI on the Edge - Cormac Brick, Principal Engineer, Google

Random Videos