The Challenges and Opportunities of Automating Machine Learning at Scale
In recent years, there has been a growing need to automate machine learning (ML) workflows at scale. This is particularly important for organizations that want to leverage the power of ML to drive innovation and improve business outcomes. One such organization is Netflix, which is working to develop an ML platform that can enable data scientists and researchers to take their experiments from research to production.
One of the biggest challenges in automating ML workflows at scale is aligning the needs of data scientists with those of engineers. In the past, organizations often followed a traditional approach where researchers would create models and then hand them over to engineers to productize. However, as organizations grew and the complexity of problems increased, this approach became increasingly difficult to maintain. Netflix has learned from this experience and now prioritizes aligning its data scientists and engineers around a shared set of goals.
To achieve this alignment, Netflix has invested in creating a core team of hybrid ML engineers who understand both machine learning and productization. These engineers are responsible for leading all product-facing experimentation, but they also work closely with data scientists to provide them with the tools and support they need to take their research from idea to reality. One key tool that Netflix is developing is an API that allows data scientists to benchmark their research and reproduce their findings offline. This enables researchers to show metric wins that justify further testing in production, streamlining the experimentation process.
In addition to investing in new tools and technologies, Netflix is also focusing on building robust guardrails throughout its ML platform. This includes not just offline pipelines but also serving systems and tracking mechanisms for model drift and future importance. By investing in these guardrails, Netflix can ensure that its ML models are reliable, efficient, and secure.
As the field of machine learning continues to evolve, Netflix is identifying new opportunities for growth and innovation. One area of focus is media ML, which involves generating creative assets such as movie recommendations and personalized content. This is a highly unique space for Netflix, with significant investment in original content, and requires innovative solutions to support its infrastructure.
To address this challenge, Netflix is investing in developing abstractions that will be relevant for supporting media ML. This includes exploring automation of various tasks such as feature selection and neural architecture identification. The company also plans to invest more in automating the process of taking research from experimentation to production, using techniques like continuous A/B testing and explore-exploit systems.
Finally, Netflix is committed to providing turn-key systems for various ML techniques. One example is multi-armed bandits and contextual bandits, which are used to quickly learn optimal strategies and start leveraging winning solutions faster than traditional A/B tests would allow. The company has already developed three different implementations of these techniques in-house, and plans to standardize on a single approach.
By investing in these areas, Netflix aims to drive innovation and improve business outcomes through machine learning. With its focus on automation, guardrails, media ML, and turn-key systems, the company is poised to become a leader in the field of ML operationalization.
"WEBVTTKind: captionsLanguage: enso let's start with having you share a little bit about the team that you lead at netflix and how that team came to be what's your story so i joined netflix about six years ago uh in early 2015. over the past six years the personalization infrastructure team has grown to touch almost all facets of machine learning and partners really closely with practitioners to enable product innovation who are the constituents that your team serves at netflix yeah so that's actually a great question because uh it's one of the things that is really important to understand your key consumers well there are essentially what i would characterize as three different sets of ml practitioners that we serve one is what we internally call as algorithm engineers and these are people who are trained in both software engineering design principles but also have a good understanding of machine learning they're capable of sort of taking a good idea exploring it but then also productizing it then there are ml researchers who are exploring new cutting edge techniques in terms of how the evolving space can be leveraged for impacting product uh experiences that we can develop and then there's data scientists who are oftentimes using some of these similar technologies but their application is in the context of decision support so for instance to make a decision of which content should netflix invest in and it's really important for an ml platform developer and teams to really understand the core strengths of those users and sort of play to their strengths so those are the sort of the three main users i'd say in our org traditionally the algorithm engineers have been the biggest user of the infrastructure that we've built but that's evolving over time and expanding you've mentioned the the platform several times here and you provided a slide for us to show a little bit about the platform the the high level point here is demonstrating the point that i was making about a loosely coupled platform of composable building blocks each of these layers is a ml workflow function with training data preparation feature engineering training model evaluation inferencing intent to treat and treatment these boxes here show these various building block components that i've talked about so there's everything from the fact stores to sort of discovery of content and access the core abstraction that enabled the definition of a feature of transformations of a model to higher level ml apis to do the data preparation and and assemble a model you know selecting the right features together there are tools for doing the ad-hoc exploration and things like polynomial which is which happens to be an open source solution you can check it out on polynomial.org and you know then it ties into some of the context that we are operating in so polynomial for instance is a scala python notebook and much of our algo engineers and researchers need to be in this hybrid world of you know jvm feature encoders but also using some python bindings for training and things like that as you go up higher in the stack here we're building more differentiated ml services so things like hyperplane are about you know enabling more automation in various elements hyperparameter tuning or automating the the sort of the deployment of models so you can run continuous background automation test scoring service that allows us to decouple the serving with the scoring so you can through heterogeneous compute resources gpus and others behind when you really need to go after performance uh model life cycle management systems and then sort of going into the actual services that serve the page is where serving and scoring happens so there's often a trade-off for a platform developer on where on the spectrum of standardization and flexibility you need to be we feel that we're at the right point in the spectrum where we can be opinionated but not be limiting we have a way in terms of how our platform standardizes key engineering choices around facts and feature definitions and transformations and management but at the same time gives researchers the flexibility on choosing how models are assembled and put together and which framework that they would be used for you've spoken a little bit upfront about some of the the varied use cases that your group focuses on one of the areas that you work on is the machine learning infrastructure in support of the the media recommendations can you talk a little bit about that so there's two aspects of what you're talking about right one is you have a certain set of assets and you want to figure out and essentially which is the best asset that you are going to use for personalizing in a particular context whether you're coming on a device a handheld device or it's on a tv are you watching a trailer is it part of a gallery page is it on sort of the main home page and so there's artwork personalization uh which i would say is quite similar to the the classic recommender system problem with perhaps one more dimension the thing that we're beginning to be more excited about is not just the personalization of assets which so far let's say have been you know selected by by editors manually we're getting into the space of artwork generation or asset generation where we're using machine learning to enable a more productive generation of assets and help editors to find out even more exciting assets and signals within the actual asset files and so when you're building media ml related infrastructure uh some of the things that are different is the first thing is uh the dimension of scale is different we're talking about raw video you're probably familiar with dealing with some of these uncompressed original master files and so just you know while the inferencing latency is not as important but access to what signals you can and embeddings you can derive out of these giant files and accessing these files discovering these files and you know sort of slicing these files all of those things is a whole new challenge yeah yeah i was just going to ask for a lot of platform organizations the the goal is to enable data scientists and researchers folks that are more on the experimentation side of the spectrum to get things directly into production through automation at your scale is that part of your mandate or is there this assumption that people are going to throw things over the wall an engineer is going to take things take something that's been you know a research experiment and productize it how do you think about that we used to follow that approach i'd say five or six years ago and uh thankfully we're no longer following that approach and we've learned from it uh no there's there's two ways about about doing that right you could potentially have a very good alignment between your data scientists and engineers and you know they can they can have high coupling that's doable in small environments and perhaps maybe not at large scale but it breaks very quickly as organizations grow as well as the scale of your problems grow and so what we have come to doing is sort of is like a core set of engineers or algo engineers who understand the dimensions uh who both understand machine learning as well as productizing them sort of hybrid ml engineers if you will and they are the ones who are primarily leading all product facing experimentation but what we're doing to enable more data scientists to to sort of add to that is doing things like where we provide apis that they can benchmark their research they can reproduce that research and they don't have to take it all the way to the production but they can show enough offline metric wins that justify is doing an a b test so that's one approach the second approach is that where the ml platform can help is investing in a lot of guard rails throughout the production deployment pipelines so if you can invest in building certain guardrails not just in the offline pipeline but also in the serving systems and then tracking future importances and model drifts over time all of those things are just good software engineering principles that enable the system to allow you know research which may even if it is let's say not fully baked in into the engineering goodness there are checks and balances to prevent that from hurting user experience what are some other opportunities that you see ahead both specific to the use cases that you serve at netflix as well as more broadly in the mlaps ml operationalization community yeah for us really there's there's several new things that that we are investing in especially when it comes to things like media ml which is a bit unique to netflix but uh a whole different dimension as i mentioned this is about the creative asset generation opportunity and it can be huge given the amount of investment and that we are putting into into original content and this is really a new space uh for us so investing in identifying the right abstractions that will be relevant for infrastructure support of media ml is a big one the other is leveraging automation more and across the ml stack we're certainly using parts of automation in in various cases but embracing automl functions uh more for picking what can be sometimes tedious parts of a researcher's timeline so for instance picking out selecting the right features or identifying the right neural architectures that's that's an area where we intend to invest more in and then also in the automation of how the these this research is actually taken to production and what i call remember facing continuous a b testing and explore exploit systems because then you can do things which are probably not justifiable for a researcher's time but if they're running in the background with the proper guard rails then you really can do a lot more and something in the background can keep on identifying small metric winds which can accumulate into a big win so so that's an opportunity and then third i would say providing more turn key systems for various ml techniques so for instance in netflix we use multi-armed bandits and contextual bandits quite a bit which is which is an approach to quickly learn an optimal state and start sort of leveraging the the winning solutions faster than our typical av tests would do and there's been two or three different ways in which they have been implemented this is an example of something i mentioned where hey we don't know whether we're going to be investing a lot into it so we said okay let's go ahead and build it in one way and now there are three teams that are doing it in three different ways and here's an opportunity for us to sort of say okay you know what we can standardize here and we can build uh a turnkey system so that that to me is another area uh where we we plan to invest and then you can never you can never over invest in your guard rails and how you make sure that you because it sort of enables a lot more confidence in how quickly you can take your research to productionso let's start with having you share a little bit about the team that you lead at netflix and how that team came to be what's your story so i joined netflix about six years ago uh in early 2015. over the past six years the personalization infrastructure team has grown to touch almost all facets of machine learning and partners really closely with practitioners to enable product innovation who are the constituents that your team serves at netflix yeah so that's actually a great question because uh it's one of the things that is really important to understand your key consumers well there are essentially what i would characterize as three different sets of ml practitioners that we serve one is what we internally call as algorithm engineers and these are people who are trained in both software engineering design principles but also have a good understanding of machine learning they're capable of sort of taking a good idea exploring it but then also productizing it then there are ml researchers who are exploring new cutting edge techniques in terms of how the evolving space can be leveraged for impacting product uh experiences that we can develop and then there's data scientists who are oftentimes using some of these similar technologies but their application is in the context of decision support so for instance to make a decision of which content should netflix invest in and it's really important for an ml platform developer and teams to really understand the core strengths of those users and sort of play to their strengths so those are the sort of the three main users i'd say in our org traditionally the algorithm engineers have been the biggest user of the infrastructure that we've built but that's evolving over time and expanding you've mentioned the the platform several times here and you provided a slide for us to show a little bit about the platform the the high level point here is demonstrating the point that i was making about a loosely coupled platform of composable building blocks each of these layers is a ml workflow function with training data preparation feature engineering training model evaluation inferencing intent to treat and treatment these boxes here show these various building block components that i've talked about so there's everything from the fact stores to sort of discovery of content and access the core abstraction that enabled the definition of a feature of transformations of a model to higher level ml apis to do the data preparation and and assemble a model you know selecting the right features together there are tools for doing the ad-hoc exploration and things like polynomial which is which happens to be an open source solution you can check it out on polynomial.org and you know then it ties into some of the context that we are operating in so polynomial for instance is a scala python notebook and much of our algo engineers and researchers need to be in this hybrid world of you know jvm feature encoders but also using some python bindings for training and things like that as you go up higher in the stack here we're building more differentiated ml services so things like hyperplane are about you know enabling more automation in various elements hyperparameter tuning or automating the the sort of the deployment of models so you can run continuous background automation test scoring service that allows us to decouple the serving with the scoring so you can through heterogeneous compute resources gpus and others behind when you really need to go after performance uh model life cycle management systems and then sort of going into the actual services that serve the page is where serving and scoring happens so there's often a trade-off for a platform developer on where on the spectrum of standardization and flexibility you need to be we feel that we're at the right point in the spectrum where we can be opinionated but not be limiting we have a way in terms of how our platform standardizes key engineering choices around facts and feature definitions and transformations and management but at the same time gives researchers the flexibility on choosing how models are assembled and put together and which framework that they would be used for you've spoken a little bit upfront about some of the the varied use cases that your group focuses on one of the areas that you work on is the machine learning infrastructure in support of the the media recommendations can you talk a little bit about that so there's two aspects of what you're talking about right one is you have a certain set of assets and you want to figure out and essentially which is the best asset that you are going to use for personalizing in a particular context whether you're coming on a device a handheld device or it's on a tv are you watching a trailer is it part of a gallery page is it on sort of the main home page and so there's artwork personalization uh which i would say is quite similar to the the classic recommender system problem with perhaps one more dimension the thing that we're beginning to be more excited about is not just the personalization of assets which so far let's say have been you know selected by by editors manually we're getting into the space of artwork generation or asset generation where we're using machine learning to enable a more productive generation of assets and help editors to find out even more exciting assets and signals within the actual asset files and so when you're building media ml related infrastructure uh some of the things that are different is the first thing is uh the dimension of scale is different we're talking about raw video you're probably familiar with dealing with some of these uncompressed original master files and so just you know while the inferencing latency is not as important but access to what signals you can and embeddings you can derive out of these giant files and accessing these files discovering these files and you know sort of slicing these files all of those things is a whole new challenge yeah yeah i was just going to ask for a lot of platform organizations the the goal is to enable data scientists and researchers folks that are more on the experimentation side of the spectrum to get things directly into production through automation at your scale is that part of your mandate or is there this assumption that people are going to throw things over the wall an engineer is going to take things take something that's been you know a research experiment and productize it how do you think about that we used to follow that approach i'd say five or six years ago and uh thankfully we're no longer following that approach and we've learned from it uh no there's there's two ways about about doing that right you could potentially have a very good alignment between your data scientists and engineers and you know they can they can have high coupling that's doable in small environments and perhaps maybe not at large scale but it breaks very quickly as organizations grow as well as the scale of your problems grow and so what we have come to doing is sort of is like a core set of engineers or algo engineers who understand the dimensions uh who both understand machine learning as well as productizing them sort of hybrid ml engineers if you will and they are the ones who are primarily leading all product facing experimentation but what we're doing to enable more data scientists to to sort of add to that is doing things like where we provide apis that they can benchmark their research they can reproduce that research and they don't have to take it all the way to the production but they can show enough offline metric wins that justify is doing an a b test so that's one approach the second approach is that where the ml platform can help is investing in a lot of guard rails throughout the production deployment pipelines so if you can invest in building certain guardrails not just in the offline pipeline but also in the serving systems and then tracking future importances and model drifts over time all of those things are just good software engineering principles that enable the system to allow you know research which may even if it is let's say not fully baked in into the engineering goodness there are checks and balances to prevent that from hurting user experience what are some other opportunities that you see ahead both specific to the use cases that you serve at netflix as well as more broadly in the mlaps ml operationalization community yeah for us really there's there's several new things that that we are investing in especially when it comes to things like media ml which is a bit unique to netflix but uh a whole different dimension as i mentioned this is about the creative asset generation opportunity and it can be huge given the amount of investment and that we are putting into into original content and this is really a new space uh for us so investing in identifying the right abstractions that will be relevant for infrastructure support of media ml is a big one the other is leveraging automation more and across the ml stack we're certainly using parts of automation in in various cases but embracing automl functions uh more for picking what can be sometimes tedious parts of a researcher's timeline so for instance picking out selecting the right features or identifying the right neural architectures that's that's an area where we intend to invest more in and then also in the automation of how the these this research is actually taken to production and what i call remember facing continuous a b testing and explore exploit systems because then you can do things which are probably not justifiable for a researcher's time but if they're running in the background with the proper guard rails then you really can do a lot more and something in the background can keep on identifying small metric winds which can accumulate into a big win so so that's an opportunity and then third i would say providing more turn key systems for various ml techniques so for instance in netflix we use multi-armed bandits and contextual bandits quite a bit which is which is an approach to quickly learn an optimal state and start sort of leveraging the the winning solutions faster than our typical av tests would do and there's been two or three different ways in which they have been implemented this is an example of something i mentioned where hey we don't know whether we're going to be investing a lot into it so we said okay let's go ahead and build it in one way and now there are three teams that are doing it in three different ways and here's an opportunity for us to sort of say okay you know what we can standardize here and we can build uh a turnkey system so that that to me is another area uh where we we plan to invest and then you can never you can never over invest in your guard rails and how you make sure that you because it sort of enables a lot more confidence in how quickly you can take your research to production\n"