The Evolution of Object Detection: A Journey from Mass Gaussian to Detection 2
When the Taxon 1 was released in 2018, an exact same mass Gaussian model got about one point more accurate than what's reported in the mass Guardian paper in 2017. The reason behind this happened is because an object detection system is too complicated and has a lot of tiny mistakes or suboptimal implementation decisions that can make a big difference. There are many places where we can improve, such as image resize differences between different libraries, which matters a lot.
The researchers understood the importance of correcting these issues over time and revisiting those details to make improvements. In detection 2, they have now seen another gain in accuracy for exactly the same model trend, by fixing some of those issues due to all these complicated details. The models will also include a few large still-term models that are used to generate the video demo, so you can make the same thing by yourself in the future.
Reproducing the results of the detection model is not always easy, and that's why they publish the models with standard models and reproducible results, hoping that their implementation and reference numbers can become a reference for the academic community. The models will also include some optimized small models that can run more efficiently in production. More importantly, detection 2 has a more modular architecture.
A standard object detection model looks like this: an input image first goes through a CNN to extract some image features. Those features are used to predict region proposals which are regions that are likely to contain objects. The features in those regions are cropped and warped into some regional features, then different types of prediction hats use regional features and image features to predict bounding boxes segmentation masks for each instance and they also predict key points as well as dense clothes for each human sound in the image.
You can predict semantic segmentation for the entire image, and other predictions can be combined together to predict semantic segmentation oh sorry synaptic segmentation which is a task that segments both instances as well as a background pixels. And it really gives us a complete understanding of every pixel in the image. This model was implemented in detection - following such an abstraction however.
As researchers know that research is sometimes against abstraction, and it often breaks existing abstraction. That's why we've seen in the past that when we need to do something new we were often modifying the source code of the underlying research platform directly. It's not too bad for research but it sounds like a good idea for maintenance you can certainly still do this in detect on sue but detect on - it's also designed so that you don't have to do this if you care about maintenance of your project.
You can create new things without having to modify or fork its source code, you can import detection - as library and build a thin layer of your own customizations on top of it using the registration system that we provided. You can replace its backbone or add a new type of hat use your own data sets or customize other components in the system from the outside without having to touch its code.
This makes it much easier for detection - users within Facebook to maintain their research and production projects such as a modular design has made it easier to support a variety of research built on top of detections. Who in addition to the standard mask or Xion model then suppose as an example it used to be released as a froth of detection one but now this model can be built on top of detection tube.
Just import and add a few extra layers, some ladies' research papers from our team including panopticon and tensor masks are also coming together with detection suit beyond. To the understanding in a month we will also release the source code for nash artheon which is a model that can predict 3d structures of objects thanks to the flexible design detection - is also been used by many other teams at Facebook.
By directly sharing the same underlying object detection code base, ladies or sometimes even unpublished research can be transitioned into production at the earliest moment. Detection su models are currently running on many Facebook products that detect and segment different types of objects on billions of images they run on servers and mobile phones and also on Facebook's portal device to power its smart camera.
You may have seen the portal device it has this camera that can automatically follow human actions deploying such a model in a compact and efficient runtime is still challenging for external users at the moment and in the future we will also integrate more closely with touch script to make deployment easier. That's all I have to say about detection - today you can learn more about it on our github.
It has a collab tutorial that you can interactly trend a model directly on there and later this month we will also hold a tutorial at ICC B 2019 thank you.
"WEBVTTKind: captionsLanguage: enall right hello everyone I'm Eugene I work as a research engineer at fair on computer vision and today I'm going to talk to you about the attacks on su our next-generation object detection platform so what do we mean by object detection let's look at a video first a few years ago object detection was only about drawing the bounding boxes around each object in the scene today however with advanced in computer vision algorithm we were able to do much more than that our models now can localize key points of human you can predict their poses and label each of their body parts this is also called dance pose prediction we can segment out every object in the scene and beyond objects label every pixel in the background as well this is also called synaptic segmentation this is what object detection is about today it involves any task that localized recognized and predicting attributes for every object in the image and today we're open sourcing detection to the object detection platform that produces this video you might have heard of detection which is the object detection platform that we open sourced almost two years ago we're being asked a lot about when we will have a pilot version of it and detection - is built completely in titles it is a ground-up rewrite of detection with faster speed ladies and more accurate models and also more modular design detection who has a higher training efficiency than our existing platforms a complicated object detection model often contains many types of small operators or operations that work on tensors of dynamic shapes those are not often used in a standard continent that's why Inditex on one we often face a dilemma we could spend a lot of time writing a CUDA operators to make it fast or we can spend much less time writing the same thing in Python but then it'll run slower with 5 Hertz thanks to its vast amount of operators that's available there we can move the entire training on GPUs and utilize better optimizations as a result the model is now trending detection to can run 2 to 3 times faster than the same model in detection and it also makes improvements over mass Gaussian benchmark which is another predecessor project of detection to detect runs who also has a better and more accurate implementation than our previous platforms to understand what that means we'll share an interesting piece of history so when the taxon 1 was released in 2018 the exact same mass Gaussian model gets about one point more accurate than what's reported in the mass Guardian paper in 2017 the reason why this happened is just because an object detection system is too complicated there are a lot of places where we can make tiny mistakes or make some suboptimal implementation decisions there are a lot of tiny details that matter for example the stato differences different library implement image resize actually matters it's just like this are corrected over time and by rewriting detection from ground up we have the chance to revisit those issues that's why in detection 2 we now see another gain in accuracy for exactly the same model trend for the same amount of iterations by fixing some of those issues due to all these complicated details reproducing the results of the detection model is not always easy we publish the models with standard models and reproducible results and hope that our implementation and our reference numbers can become a reference for the academic community the models will also includes a few large still term models that are used to generate the video demo so you can make the same thing by yourself in the future we will also add some optimized small models that are wrong that can run more efficiently in production more importantly detection 2 has a more modular architecture so let's first take a look at what a standard object detection model looks like this is a diagram for a typical generalized RCN object detection framework an input image first goes through a CNN background to extract some image features those features are used to predict region proposals which are regions that are likely to contain objects the features in those regions are cropped and warped into some regional features then different types of prediction hats use regional features and image features to predict bounding boxes segmentation masks for each instance and they also predict key points as well as dense clothes for each human sound in the image you can predict semantic segmentation for the entire image and others predictions can be combined together to predict semantic segmentation oh sorry synaptic segmentation which is a task that segments both instances as well as a background pixels and it really gives us a complete understanding of every pixel in the image this model was implemented in detection - following such an abstraction however we as researchers know that research is sometimes against abstraction researchers research is about doing something in new ways and it often break existing abstraction that's why we've seen in the past that when we need to do something new we were often modifying the source code of the underlying research platform directly it's not too bad for research but it sounds like a good idea for maintenance you can certainly still do this in detect on sue but detect on - it's also designed so that you don't have to do this if you care about maintenance of your project you can create new things without having to modify or frog its source code you can import detection - as library and build a thin layer of your own customizations on top of it using the registration system that we provided you can replace its backbone or add a new type of hat use your own data sets or customize other components in the system from the outside without having to touch its code this makes it much easier for detection - users within Facebook to maintain their research and production projects such a modular design has makes it easier to support a variety of research built on top of detections who in addition to the standard mask or Xion model then suppose as an example it used to be released as a froth of detection one but now this model can be built on top of detection tube I just imported and add a few extra layers some ladies research papers from our team including panopticon and tensor masks are also coming together with detection suit beyond to the understanding in a month we will also release the source code for nash artheon which is a model that are able to predict 3d structures of objects thanks to the flexible design detection - is also been used by many other teams at Facebook by directly sharing the same underlying object detection code base ladies or sometimes even unpublished research can be transitioned into production at the earliest moment detection su models are currently running on many Facebook products that detect and segment different types of objects on billions of images they run on servers and mobile phones and also on Facebook's portal device to power its smart camera you may have seen the portal device it has this camera that can automatically follow human actions deploying such a model in a compact and efficient runtime is still challenging for external users at the moment and in the future we will also integrate more closely with touch script to make deployment easier that's all I have to say about detection - today you can learn more about it on our github it has a collab tutorial that you can interact ly trend a model directly on there and later this month we will also hold a tutorial at ICC B 2019 thank you youall right hello everyone I'm Eugene I work as a research engineer at fair on computer vision and today I'm going to talk to you about the attacks on su our next-generation object detection platform so what do we mean by object detection let's look at a video first a few years ago object detection was only about drawing the bounding boxes around each object in the scene today however with advanced in computer vision algorithm we were able to do much more than that our models now can localize key points of human you can predict their poses and label each of their body parts this is also called dance pose prediction we can segment out every object in the scene and beyond objects label every pixel in the background as well this is also called synaptic segmentation this is what object detection is about today it involves any task that localized recognized and predicting attributes for every object in the image and today we're open sourcing detection to the object detection platform that produces this video you might have heard of detection which is the object detection platform that we open sourced almost two years ago we're being asked a lot about when we will have a pilot version of it and detection - is built completely in titles it is a ground-up rewrite of detection with faster speed ladies and more accurate models and also more modular design detection who has a higher training efficiency than our existing platforms a complicated object detection model often contains many types of small operators or operations that work on tensors of dynamic shapes those are not often used in a standard continent that's why Inditex on one we often face a dilemma we could spend a lot of time writing a CUDA operators to make it fast or we can spend much less time writing the same thing in Python but then it'll run slower with 5 Hertz thanks to its vast amount of operators that's available there we can move the entire training on GPUs and utilize better optimizations as a result the model is now trending detection to can run 2 to 3 times faster than the same model in detection and it also makes improvements over mass Gaussian benchmark which is another predecessor project of detection to detect runs who also has a better and more accurate implementation than our previous platforms to understand what that means we'll share an interesting piece of history so when the taxon 1 was released in 2018 the exact same mass Gaussian model gets about one point more accurate than what's reported in the mass Guardian paper in 2017 the reason why this happened is just because an object detection system is too complicated there are a lot of places where we can make tiny mistakes or make some suboptimal implementation decisions there are a lot of tiny details that matter for example the stato differences different library implement image resize actually matters it's just like this are corrected over time and by rewriting detection from ground up we have the chance to revisit those issues that's why in detection 2 we now see another gain in accuracy for exactly the same model trend for the same amount of iterations by fixing some of those issues due to all these complicated details reproducing the results of the detection model is not always easy we publish the models with standard models and reproducible results and hope that our implementation and our reference numbers can become a reference for the academic community the models will also includes a few large still term models that are used to generate the video demo so you can make the same thing by yourself in the future we will also add some optimized small models that are wrong that can run more efficiently in production more importantly detection 2 has a more modular architecture so let's first take a look at what a standard object detection model looks like this is a diagram for a typical generalized RCN object detection framework an input image first goes through a CNN background to extract some image features those features are used to predict region proposals which are regions that are likely to contain objects the features in those regions are cropped and warped into some regional features then different types of prediction hats use regional features and image features to predict bounding boxes segmentation masks for each instance and they also predict key points as well as dense clothes for each human sound in the image you can predict semantic segmentation for the entire image and others predictions can be combined together to predict semantic segmentation oh sorry synaptic segmentation which is a task that segments both instances as well as a background pixels and it really gives us a complete understanding of every pixel in the image this model was implemented in detection - following such an abstraction however we as researchers know that research is sometimes against abstraction researchers research is about doing something in new ways and it often break existing abstraction that's why we've seen in the past that when we need to do something new we were often modifying the source code of the underlying research platform directly it's not too bad for research but it sounds like a good idea for maintenance you can certainly still do this in detect on sue but detect on - it's also designed so that you don't have to do this if you care about maintenance of your project you can create new things without having to modify or frog its source code you can import detection - as library and build a thin layer of your own customizations on top of it using the registration system that we provided you can replace its backbone or add a new type of hat use your own data sets or customize other components in the system from the outside without having to touch its code this makes it much easier for detection - users within Facebook to maintain their research and production projects such a modular design has makes it easier to support a variety of research built on top of detections who in addition to the standard mask or Xion model then suppose as an example it used to be released as a froth of detection one but now this model can be built on top of detection tube I just imported and add a few extra layers some ladies research papers from our team including panopticon and tensor masks are also coming together with detection suit beyond to the understanding in a month we will also release the source code for nash artheon which is a model that are able to predict 3d structures of objects thanks to the flexible design detection - is also been used by many other teams at Facebook by directly sharing the same underlying object detection code base ladies or sometimes even unpublished research can be transitioned into production at the earliest moment detection su models are currently running on many Facebook products that detect and segment different types of objects on billions of images they run on servers and mobile phones and also on Facebook's portal device to power its smart camera you may have seen the portal device it has this camera that can automatically follow human actions deploying such a model in a compact and efficient runtime is still challenging for external users at the moment and in the future we will also integrate more closely with touch script to make deployment easier that's all I have to say about detection - today you can learn more about it on our github it has a collab tutorial that you can interact ly trend a model directly on there and later this month we will also hold a tutorial at ICC B 2019 thank you you\n"