Full Stack Deep Learning Course Study Group - Session 1 - Spring 2019 1080p

Single-Stage Object Detectors: Understanding the Basics and Applications

Single-stage object detectors have gained significant attention in recent years due to their ability to quickly detect objects in images and videos. These detectors work by dividing the image into smaller regions, called grids, and assigning a fixed number of boxes to each grid. The goal is to predict whether an object exists in each box and, if so, provide coordinates for the object's location.

In single-stage object detectors, the approach is different from traditional two-stage detectors like Faster R-CNN or SSD (Single Shot Detector). Instead of first detecting regions of interest using a proposal network and then refining those proposals using a classification network, single-stage detectors use a single neural network to predict class probabilities and bounding box coordinates simultaneously. This approach allows for faster processing times and can lead to better performance in certain scenarios.

The concept of object detection was popularized by YOLO (You Only Look Once), which proposed dividing the image into grids and assigning boxes to each grid. The number of boxes assigned to each grid depends on the specific architecture, but typically ranges from 3 to 9. Each box is predicted to contain an object class with a certain probability, as well as its coordinates.

The idea behind single-stage object detectors was first introduced by the paper "YOLO9000: Better, Faster, Stronger" which presented YOLOv3 and SSD640, both of which were variants of SSD. One major contribution is that you only need one neural network to do everything instead of two networks (proposal network + classification network) as in traditional detectors.

RetinaNet is the most popular single-stage object detector currently used and was first introduced in 2017 at ICLR. It's a real-time object detection system with state-of-the-art performance on various object detection benchmarks, including COCO. RetinaNet uses a novel approach called Focal Loss to address class imbalance issues and achieve better performance than traditional classification loss functions.

The applications of single-stage object detectors are diverse and numerous. For instance, in self-driving cars, detecting pedestrians, cars, or other obstacles is crucial for safe navigation. In biology, detecting specific cells or organisms in microscopy images can be essential for diagnosis or research purposes. The ability to quickly detect objects in images and videos has far-reaching implications across various fields.

One notable example of the application of single-stage object detectors is in medical imaging, particularly in detecting diseases such as pneumonia from X-ray images. Researchers have used RetinaNet and other object detection algorithms to develop tools for automatic disease diagnosis. Another area where single-stage object detectors are being explored is in computer vision tasks like image segmentation.

The field of object detection has a lot of interest right now with different algorithms coming out like PiTaj which is one of the latest developments which is also based on SSD architecture but uses different parameters and techniques to boost performance.

"WEBVTTKind: captionsLanguage: enour first first meetup about the full-stack deep learning course that we are we talked about last week so we are starting today with lesson one two three and three is actually at boot camp and in the second part of the meetup we're going to have the Michaels presentation about the object detection so let's start with the with a with a deep learning full-stack deep learning boot camp so how do you want to do it we have it was like three last three videos slides and and a lot code I didn't do the left code but a whole Michael is in the is in the meter maybe I can remember can show us how to do the the code so I will do it for next time so that you guys learn anything interesting I cannot hear you maybe it's my aunt No okay so you have to unmute yourself first if you want to talk you have to unmute yourself first because there was a huge background noise I've muted everyone can you can you see my screen are ours it yes vacancy explained can you see that the on the right-hand side the the page with the bootcamp or is it can you see the full page yes okay great so it's cool so yeah so it is just to repeat what we talked last time so this course I'm just going through the page on the full stack deep learning course so we have courses that teach us like theory or math or computer science behind deep learning this is like the deep learning book or the some other courses we have like trainings that teach us how to Train deep learning models and I'm going to meet everyone if you want to talk to some user self yeah so we have course like first AI which we some of us are familiar with we have libraries like Klara's or the deep learning AI courses so in this sort of courses would learn how to train deep learning models so they they train faster or better accuracy different architectures and so on so on what was not included in those courses at least faster I it's not at least a little bit was covered in lesson two of part one how to deploy a web app we're recognizing some images but this course full-stack declaring supposed to supposedly go through the full spectrum of what deep learning projects should contain so it's from the data management model training model validation and deployment so that's what we're hopefully going to learn and there's also a seventh of course we have the videos so each lecture has got a video and also some almost all of them has got slides so and there is also a love code so it's on a geek app and it's on a full stomach that's right buzzer so we have also the lot cold which I've not been able to round but hopefully Michael is with us today so can help us to show at least help me how to do it for okay saying that my voice is too old I don't know how to have to fix that I'm sorry I'll try to speak to a speaker here is it better now okay thank you yeah so so that that's the kind of like brief overview of the course and for today we hopefully watched lesson 1 lecture 1 which is introduction and in lecture 2 I like about machine learning projects and potentially we did also the lab code and this is the this topic of the today's meetup so I'm just going to open the discussion now so if anyone wanted to share any interesting things that they've learned any opinions or any issues like from my end like I didn't know how to run the lab code now that I spent too much time on it but it's not really exactly as per per videos because they add some code we don't have that code and so on so on so we need to modify that thing a little bit to make it work so if anyone wants to share their pins now that's the time to do it and again you youth yourself because I have muted everyone so sorry about that but there was a was a background noise that's why I did yes once you get it set up it doesn't actually seem to be that difficult you know the first tasks are pretty easy it's just built a bottle and run it so okay in the video all the solutions do so it's just kind of nice I guess I had one question that's in my work afraid to use PI George vs yeah like how transferrable are the caressed by George I guess all the theories the same rate it's just a PR yeah I guess so I didn't really use tensor all that much what I've learned was from my Jeremy in frustrated AI course what everything was impiety faster yeah I think you can use the open on X file like you can you can move models around with that kind of like from tensorflow fighters I hope and to other languages I think other framework I've never used I once I'm not so sure I'm try to yeah it's like an open format to present deep learning models so you should be able to move from one framework to another one so PI torch is there and I'm not sure if when I don't see actually tensorflow in it actually so I'm not so sure you can transfer your PI touch models to tensorflow and the other way around I'm not so sure seems like cafe to Microsoft IMAX net PI torch and many other frameworks but maybe if we read somewhere small praying then you can find an angle not so sure yep does anyone know how to how to use different frame framework with that with our code I like partridge instead of tensorflow it's probably not a big deal yeah because that's also interesting for me to know as well maybe for some other people we are in this group no slightly finished or doing some part of fast AI which is mostly in tighter these days and this law obviously tensor fall I believe so I guess you would be easier for most of us to do this in Python if that's possible how much background I don't know I'm brand new to this I just wondered I think Mike Michael would you be able to show us some how to render code in this yeah not familiar with zoom yet but I'll try to okay it's how do I share my screen now this you should see the sure it's either on the center bottom of your screen or the top once if you have it full screen or not it's like a dream share button if I cannot share it because there's another user who's sharing okay so I'll stop sharing now again should I like start from the beginning I posted this small how to because I guess they really don't have it anywhere on their stuff you know what I was going through the I would chunk now around I found it here and this is from the third let the slides okay you have the github repo and this is the code which wasn't important to get to the machine image and included there let me see the constriction of the screen and with that I was able a try two different things at once I tried it on my own personal machine okay that's too long and this was very tricky because I guess you just familiar with Conda and they use this tip and on with this I guess it's a little bit different also from the because here when you don't use this like they will copy the game put a copy in the group and sled you have to specify the path their overall showed immediately this is what I had to do to get it almost running with the first script I wasn't able to run and you cannot see your screen Michael yeah for some reason at least I cannot see yeah so you see it not running on the same environment in there I could run more than on my okay so you connect so you start the notebooks on the your local computer right and then you in some way you have to link the you know Jupiter notebook to the weights and biases right is that how he works no no I first tried on the computer like totally separated and but then I was running into problems I guess one thing this is even on on their image on these weights and biases platform is that they there's problems with pencil flow and all the operations which are maybe not working as they should and but with this white weights and biases website they I could get it running right so we just started enter that code from the slides and then he worked then it work there okay fantastic but you what like on the left one I'm gonna go into the Red Bull or it's even on the main thing still see your screen with the weights and biases with the Jupiter lot instance okay much open hey buddies how would I open it again you see now much war is it like we see something I think but it's frozen for me at least okay so I'm not a github repo but it's not showing or know what I'm saying yeah we just see the model and to the summary of the model and where the training you know the debts of go GPU device that page of the problems with this thing is like when I said when I share when you share like say a PDF file it only shares the PDF file then if you want to share the like a browser you have to you know like stop sharing the PDF and pick the browser now nothing is being shared yet now we see the setup it's just like what do you have to do and then on this page you can enter the codes from the PDF and with that not sure that you see now the terminal again we see your lap a lap one lap one solution and so on so how was the labs like was it is it is it hard or I didn't check your charts like for example like deep learning or di you know like some of the exercises are quite easy how did you feel this Labs they hard to do or yeah you who can the flow hello okay excellent I grew up to part three and and yeah I mean I guess it's more the focus is more on the how to handle that stuff and go run it over the comment line and there they have from the rape poll also the the code you can then try out and play with it I hope you're seeing now the github repo again and there they have this yes and with that you can then start to play around a little bit change the network type or the bed size and also try the testing this was also done working on the machine or on the cloud machine okay because I don't know the sleep environment was also not on there on this weights and biases machine right now it's what works been quite fast actually because it's I guess ok ad not stopped it this was interesting if you close it to interpret the lab was unfamiliar it can be still running here so you can also run out of memory so be sure to also close it here and then simply you can start a new terminal and then you see here that there's already a rip off there and then like to and when we in left to you can try it like this so this is just a code from before and you see there are some of these like air arts but I mean also there the guy in the video had some some of those right but it's this one sensual I mean I'm not familiar with tensorflow on on the level I'm with fast I hope I thought so I'm also like would really enjoyed if you if you could try to put that in a fight or even tensorflow but I am I'm not sure yeah seems like they know using notebooks for that at all they're just using you know yeah I mean to have one notebook tool but this is like very just for looking at the data a little bit it's not no I will try to open it now so it's not another big deal yeah you have it here this works right out of the box when I try to put it in my content by amended I had to install a lot of additional stuff do you see it's like here it's download see later and you have you can have a look at it but that's it from the netbook girl and then out book in the first three lessons was more based on the comment line like a stick you know they also don't have one phone number swing or there is one ok differ I hope you found here one from before but there's another one but this was also not in the in the video is notebook I guess maybe they show it on the fourth lesson or or later but yeah it's just a just to look at the trip of the data for this line translation I guess here the spring and then the recording picture now I guess it's a little bit different setup than the past VII course there and it's quite dense I guess it was just two days there or in these videos yeah but but it's also part I had a look at so I wasn't playing detail with the code yeah yeah it's nice I guess the kind of make sense that once you have your model ready you don't really work in notebooks that you are you kind of like execute your your model if I think about it some maybe that's why they think this way easier then if you have it set up like they have that they really can changed and easy yeah the parameters and yeah you know I once run into problems in Jupiter notebook when you're like to have output the epochs in past and then you can it can get stuck there to be the notebook because it gets too much string sent to it and interestingly when you write it in the terminal in the comment line it has no problem so this would be also an advantage I could think of yeah and then in that lab do they connect to the whites and biases or not yet I don't this so the distributor lab is started from there but that happens I haven't seen anything yet that you send some training matrix there yeah a little bit later because when I can try to open it this really screen when you open it there yeah there you see also oh yeah yes there at least I saw something when I was signing up that there were some yeah quite similar I guess to tens or was it called the visualization from tens of low tens or it's also posting faster forum such quite similar to these kinds of visualization to just looked up to lessons three entire day we're not talking about that yet okay right that's one thing for me to do next week to do two laps in one week the week one coding exercises I still have to do I'm still thinking about if we can somehow transfer it easily to package but I guess it will be not so easy in straightforward I guess most of us are more familiar with fighters because this is the we did most of us I guess did the first AI course and I miss that question from and I was about what's people that background that's actually quite good question so thank you for asking that because indeed it's I am assuming that with most of us at least did some some some for first AI corners but that's not necessarily the case so yeah if you want to share about whether you do fight or tensorflow anything else or whether you just starting with deep learning or your experience today so this course kind of assumes that you've got some knowledge about deep learning and it kind of builds on that I mean I started 2017 with steep learning today I and then switch to to frost the eye and yeah that's where I am now actually you know like set up a local learning group and also get some people and now recently I was talking more in the twin mostly to find like-minded people yeah and try to get more experience with probably projects so I was the scientist but like very early so to say I'm not doing any single keep learning related more more tabular data aggregation and stuff like this but it's hard to do it this sort of group it's but I can ask questions yes I am a brand new house like almost zero excellent welcome so we have some of us the so I think here at this this group we started with part one I was probably a year ago I was for the summer last year when when some started the study groups for the first day I part one so then we finished part one and then other then I guess we will repeat it part one again and then with the part two to live so that's what we just finished and now we starting the full stack which is going to take us four weeks or so and after that we're going again think about what next for this group that's you know so it's going back to the to that course a little bit more I'm not going to go through all slides because there are a lot of them I hope I can open them at least all right so this is quite recent I'm sure he was wrong as bootcamp because in March this year so it's quite recent surrounded by three guys plus some guest speakers yep I guess speakers every Howard from first day I also was a guest speaker and I had some introduction what is deep learning public image recognitions and to cover the object detection very quickly someone like the image recognition that through neural nets is now like default way of recognizing images whereas before it was different way like OpenCV stuff so the deep networks got many layers of course the image net resource with like million of images and I what I what I showed is like the performance of the error rate really improved when the neural network were implemented so that's that was a big change and then there's also image caching thing and a lot of stuff with the what I did neural network I think that's the first if not the only ones new fda-approved or maybe the other one I think he mentioned that there's at least one or maybe DUI fda-approved you are networking the medical industry so that that's a big thing because because FDA approved that neural network and make decisions about patient so that that's a big thing I don't remember which one of those becomes a couple of different examples but there was one oh yeah this is one FDA permits marketing artificial intelligence based device to detect certain diabetes related problems right and that's quite a big thing because usually the neural network is considered to be a black box so you don't necessarily know what's going what's going inside and it's it's difficult to explain us what because you heard like hundred million of parameters inside and you are network it's it's difficult to to explain I don't see chart now so let me let me switch so I can see so I can see your question is way way just tell me when or just say anything you want let me see what's going on a chance I can see questions about like slack channel for this group so we we we are looking at the moment using the the twin lights lag and the first they are a channel is there any reason this course was picked up among any others no there was there wasn't any reason we just think in which course to do next and they seem to be a good one that's why I will pick that one but we could have picked any other one as well and someone mentioned that Apple's EKG monitoring was also have the year proved that's a good question so Stefan if they approved for like for making the medical decision because I believe that the other one that was the way they were talking about this actually is making the medical decision as far as far as I'm aware so the Apple watch is it like just advice to the dogs I don't know maybe maybe you can tell me if I'm wrong or healthy the doctor make weekly assessments so instead of the doctor look through all of the images or forward to combining two different sets of images since for bringing images you can do CTE and MRIs and sometimes it's difficult to get all the informations and besides between the two they're between all of these types of images we use deep learning to mash them all together in some cases thank you so like some people say they're not on our slack so if you want to be added to slack you need to go to twin lie website and there's all you need to ask for so some what I do to slack once you're on slack then just just find the faster yeah deep learning Channel another near there so when I was talking about making medical decision like so making so you don't need a doctor to confirm that decision so if your network makes the decision whether healthier yours your ill that's the decision where's with the Apple watch I don't think that's the case right even if it says that there's something like not fully right with your signals from from about your heart you still go to a doctor and the doctor will make final call whether they need more tests or something that's my understanding right so if someone's got the general so if someone's in the slack and good general and meetup that's fine just click on a plan on the AMA channels and then all the channels that only strike will be available and then just click on the one that you like to be part of yeah yes I don't know I don't know the details is going back to Apple I don't know all the details about the Apple watch I'm not gonna yeah that's that's that's a big thing if I didn't investigate about what he was saying that FDA approved this the wrong networks but for me it's a big thing because it's going to be more of those and it's going to of course automate a lot of stuff and speed up a lot of stuff but this this uncertainty about about the because neural network so Cathy does black box so you'd not necessarily know why it makes decisions you can try to explain a little bit but if it gets complex if you get like hundred millions of parameters in your network that's that's a bit difficult so I think that's why I was thinking to myself that's a big thing that FDA approved neural network so but I didn't really do a deep dive on that so I was thinking this was quite interesting about this one right he was talking about the autonomous vehicles that it's a big thing as well a lot of companies trying to be in America with that speech recognition with neural networks machine translation the it's actually started a long time ago 1950s back propagation 1980s the newer Network the convulsion neural network Jana Kuhn 89 but all of that was not possible because the compute power was not enough or the zillion of data so now we have more and more computer power and we have more and more data again we talked about this this course is more overview of the overall deep learning projects so this right so what they talk to a lot of compiègne companies about that - deep learning and that's how they develop this course so I think that's quite interesting because you can hear what's actually he's being done by people that do deep learning and they also talk a little bit about aspirations about what in ideal scenario you would do with deep learning so we talked about but this alpha beat so we had four lap sessions and about nine hours of video lectures and about four hours of the guest lectures of course they used the different slack than ours so if you want to be on our slack register online on a to my side and then just join the first AI deep learning channel are there any questions to the no slides we have we still have discussion about the Apple or the other medical things or has a different thing someone used Ganz to make a picture right painting and and then Pixar would solve a lot of money was the question who should get that money so is that the in relation to the using neural network in the medical space yes so what were you saying that the people who made the neural network will be viable liable for the wrong decisions or well now it's they approved well if the akin of in my understanding if they can approve things but they still they don't they are not liable FDA is never involved so they they kind of they help you and they check you and they do things to make sure you comply with all the regulations but if something goes wrong even if it's approved you're still responsible for your product that's how I've DEA works unfortunately unfortunately so the manufacturer is responsible for the for the product it's not FDA okay yeah and I don't know what to do next because we have the the lecture to which was like a lot more slides than that I've only slides a lot I think I don't think we have time to go through all those at least not in a speed that would make any sense so we're talking just maybe quickly if anyone's got any from the lecture to any highlights from lecture two maybe we'll go this way instead of going page by page maybe look and if we can just talk about it and right we'll have course I can find the course material slides lectures from Joseph so just if you can find them seduce this full stack deep learning dot-com / March 2019 does a page for that and you'll find everything here actually saw the lectures and the slides are just below the videos and give a link somewhere here possibly but if not then the the geek-out link is here yeah no worse and I've posted that on the online slack so that should be nice luck as well and maybe it's a good idea to start a new channel just for this course so then everything's in one place as at the moment we share with the faster I channel and I've made it confusing you yes I was wondering if anybody had gone through the first few lessons have you have you been able to get through one of the lessons or two or whatever yeah so so the plan the plan for this neat applause to go through lecture one two and three so lecture one is like overview it's like a half an hour video about very generic about deep learning the lecture to take speed more time it's like an hour oh it's actually an hour and an lecture three it's actually lab so there's like he kept briefing with the cold and you can run that code so that was the goal for today I have not sure actually have many people manage to do that I didn't manage to run the code so that's something you have to do for next week what is the what is the code do with a code from the from the this love what I understand from what Michael showed and Michael correctly from wrong is that they showed what the code there's it's the recognizing text from handwritten notes line it's it's nothing yeah so you get a picture of you get a picture of some handwritten text I hope you can see my screen yeah yeah so that's that's kind of the project so you get a picture of handwritten notes from cypher model and then you get that to your hat and then your model dude first of all the deadlines and then from those lines should detect the text and so through the whole force overlap sessions I think they're going to kind of make it work but for this lab it was introduction and setting up the Jupiter a lot and I think making the model work from the terminal yeah yeah I mean I guess the first thing was just a multi-layer perceptron I guess that's really just there to show there's something doing some fitting there I this is just like an example you would never use that for this like final setup and you start out with this extended em list which has also characters next two numbers and I guess but this is just for the latest artists and I guess they want to set up on all network to do the line detection which then feeds the detected lines to the line text to recognize a which is like then translating the text but so far the code was like very I guess just low-level code that we have something to run on to show how you would like iterate and and look at it so I guess what we have seen or up tools to third lesson is nothing fancy at all except at how you treat it with their setup and so with the idea some simple models it is some simple tasks and then and then to try to productionize it is that what the course is gonna do I guess but I guess there will also that the code will get more complex I mean in the lessons Reid also talked about these CTC laws this is also like more advanced stuff so they yeah they increased the complexity but I guess it can just covered this machine learning project pipeline there I guess there will be not to be able to go into detail with that stuff they trust and have some links on the slides they have also this lecture sorry slides I post it now its link to this article which is very interesting but this is like also very mathematical intense post but I guess it's really to get going with the with the workflow also because I guess it's quite different from from like the faster I notebook workflow yeah so far I try to get into the latest bias I got an access code but and then I tried to set up the Jupiter notebook but it said invalid access code I'm stuck there yeah so some people been able to run the code to use the code and he worked some people said as you that didn't work we also are thinking there's a way to start the Jupiter notebook or on your local computer if that's the way you do it I'm gonna coliform coggle in some way and just round the the code it's a tensor flow coach it's a bit different to what we've been learning first AI so far I wasn't able to find the time to do it so the coverage if not get happen yeah the coaching he has unless you're open it on this episode W and B ODI dot slash profile there there I was with the code then there should be should be this instance opens with truth with the lab but maybe it takes time I guess some people already said it was kind of some minutes I guess when it first started on when I tried it it also took a little bit longer and then I guess it's maybe faster when when you already have it yes I'm not so sure how I didn't expect this to work this code because you know I guess they have to pay each time to start some sort of virtual machine or something Jupiter for instance but if it works it's better for us yeah partly that they use this pip and and and I guess most of the people are familiar with Conda and then it's quite not straightforward to rewrite stuff so the cold we have is the so so again so if you go to if you go to the last lecture and I couldn't find spots for a lot like barley is it we don't have those right because we said I'm not sure if they have medias up from the from the lab on its own oh no this is the one else is there yeah so this is the sunlight and the cold here so the code that I give it here this is the code for the for the hatbox let's see yeah there is access to code so in the weights and biases they have they have this hat access pin is that is that the one is this one work to me yeah but I guess make sure you use an L okay so is the plan for next week shall we shall give people some time and to do the the first lap again or shall we move forwards and and ask people to catch up and start with with a lap sessions second session for the left let's give some time to search that distant compass if this thing's are played some time to set it up yeah we could talk back and forth on food on the select channel for people that have problems setting it up and then the people that have already done it can help can help answer their questions and then hopefully we can all be together by next week yes so we had in slack that's the beautiful gem that I put together as a proposal so we could stick to that so today was the first three lectures so the next week is lecture four that's the video lecture five if that's a lot second session and lecture six so let's try to do that and let's try to yeah sure it's a it's in slack but it's here so it's on top of the first day I but are going to copy and paste that to somewhere let's paste it here thanks okay I think if we create the channel is for this course is going to be easier to manage migrants it is paint in a way that it's always there so if you open the first day I know this is the first thing you see here on top so it's not being it's actually in a description of the of the channel so it's always here kind of a QA but it's there okay well next week keppra more lectures and the second session of the lap and there we'll see maybe gets more interesting and laughs to have any more questions before we move to Nikolas presentation object affection I don't see any other questions with chats so Michael we start with the object detection okay which was supposed to do last week but we ran out of time thank you so I will stop sharing my screen okay hi everyone do you do you see a PDF with somebody called computer vision tasks so while I was preparing for this I just realized there's just so much stuff in object detection I'm going to try so what I'm going to try and do is give you like okay what is what are all the methods available and then if there is more interest in the future we can go drill down and deeper into methods but okay so why why do we care about object detection let's say so far you know in lot of classes we have done classification so for example is there a cat and the object is there a car is there it's a repeater whatever images we just care about what object is there in the image but there are a couple of things so for example the first one is semantic segmentation so we're what we are trying to do is for each pixel we are trying to give a class so is this pixel cat pixel is it is it a grass pixel is it a tree pixel or is it a sky pixel right so you're trying to give a label for each for each pixel you're trying to give a class and that is generally referred to as a semantic segmentation okay the next stages is another is then you have object detection where you are trying to put a box around each each object so here you have two dogs and a cat okay and and then the last thing is so within within the box you also want to segment the objects so here you see the red you have tried to segment the dog as carefully as you can you know a red dog green dog and a blue cat okay and so one thing I should say I'm I'm using slides from this lecture called from this class called a CS 231 it's a Stanford class on on computer vision deep learning for computer vision and it's only it's only computer vision it's it's actually one of my favorite classes because they they talk about almost everything I care all the methods and their lectures from 2017 are available on YouTube these slides are from their 2019 lecture which occurred actually 11 days ago on May 14th so we have like sort of the freshest teaching material okay so okay let's I'll give you some intuition into how you know like people went about developing these methods and then maybe in the last I will talk if you have the questions I can go through that okay so here we like I said in semantic segmentation you want to label each pixel right whether it's a tree or a cow sky or grass so one way you could have one way people could have done it is you take each pixel and you take a small patch around each pixel and then you use the traditional classification so you pass that patch to CNN and then say is it a cow or is cow cow or a grass but of course there is there is a big limitation here right you are you have to extract a lot of matches and and then you have to extract patches of different satellites sizes so you have a very high computation cost and this this is work actually from 2013 and 2014 ok so initially people tried it this way but of course this is too intensive so then the yeah ok very it's very inefficient and mostly when when when you are using patches when you're when you're doing patches you know like overlay patches that are overlapping will have similar convolution output so you are not sharing features between the patches so that was one thing so so the next approach after that was let's do a fully convolutional so basically we are just going to do take the whole image and apply convolutions to it and and in the end do some sort of decision to say whether a pixel is pixel is a cow or a grass or whatever class it is so you basically do so it that's why you you will see things like fully convolutional okay and then you make predictions you are making predictions for all the pixel at once but immediately you see there is there is a big disadvantage here because many times in these convolutions you you increase the number of filters so you start with say 16 filters then you go to 32 filters 64 filters and you are doing it at the full resolution which is in computationally intensive yes so the next idea people came up with and this is so initially we were in 2015 and this is the idea of sorry initially the first idea sliding window was 2013 and this idea is 2015 where they figured out okay you have this computational overload why don't I keep down sampling as I increase the number of filters so you see the boxes become thicker here so you do this down sampling first and then you can and then you can build it up and then you can do up sampling to get back to the original resolution of the image so down sampling we have familiar with you know it is the mat is similar to the max pooling of operation max pooling will do down sampling for you but up sampling was new it's okay and they for the next few slides talked about is different ways to do up sampling and I don't want to get into that because it's just a bunch of details about how to get back to the original resolution and lecturing and they have like almost like different approaches so they have a nearest neighbor bed of nails but I just Bob will just Bach get bogged down into details but but the basic idea is you have a 2 by 2 matrix and you want to go to a 4 by 4 matrix so the nearest neighbor is just okay just take or they have a bed of nails approach and people just tried out different things for for up sampling and they have some more details because it's it's the lecture so they have lot of details about how to do the up sampling and some intuitive ways for it ok so that that was the idea of you know like the idea from 2015 around the around okay but what this does is so there's one limitation here right you you can do the objects you can classify objects but there was no way with this method to put boxes around objects okay so that's why there is the next category of methods called object action and they had so you you know object detection has been around for twenty years there was there was a there's a recent paper which is summarizing object detection in the past twenty years and they've reviewed reviewed like around four hundred papers in this in this review paper so why why is you know so object detection is then say in biology it's important biology it's important in self-driving cars sort of everywhere okay so let me so the first object detection problem was was limited is that say you say you know there is only a single object in your image so in this image there is a single cat and so what they did is they have one one one arm for classifying whether it's a cat or a dog and that's the traditional convolutional neural network and then the second since since I told you beforehand that there is only one object so you just have to put one box right and so they they have second arm where you do try to do try to find the coordinates of the box and you have two different losses like for the classification you have a soft max loss and for the Box there is an l2 loss and basically the idea was okay let's combine the two losses and then you can do classification and localization so that is put a box around this object okay so that idea fails if you had if I have multiple objects right so if I had one cat it was fine but if I have multiple objects that idea doesn't work and also I don't know beforehand how many objects there are so here there is one three and then you have so many ducks here right so you don't know beforehand how many objects there can be in the image and okay so the first idea was so now I want what I want to do is detect multiple objects in the image and put a box around them so so one idea the one what's the simplest thing you can do you can take you can take crops of this image you can make different crops from this image and then pass it through a convolutional network and say is it a dog cat or is it background so you know because some places in the image you have nothing so you need a background class and so there is okay so if you do it this way there is an image problem like you have to take a lot of crops so you have to take a lot of crops and also objects may be of different sizes right you have a big dog or you have a small duck so you have to take boxes of like like different scale and at different locations so this becomes you see their base and you have to move the box all around the image so you have like thousands and thousands of boxes and boxes of different size though at different locations scales oh yeah and of course I forgot aspect ratio right say you have a giraffe so you have a very tall box or you have or you have a soccer ball so then you have a square box right so this becomes very computationally intensive if you try to do this box approach that is just take like if you try to do it this way take crops of the image and pass it through the convolutional Network you have too many too many boxes to go through and it's it's too intensive so one of the first ideas people so this was this was before deep learning they had this idea of selective search that is find me the the blue obvious regions or find me proper region proposals where they could potentially be a box where they could potentially be some object so there's one method called selective search which runs in about few seconds on a CPU and it will generate say 2000 regions of interest okay like this so you see some object so you see some of them are of you have different aspect ratios different scales and different locations so those are the three things to remember like locations scales and location scales and aspect ratios so think of giraffes versus soccer ball okay so then there is this guy Robert glee chic he when he was doing his graduate school work at Berkeley he he developed this method called so we have seen CNN before in the FASTA a class or deep learning class so what he developed was the regional CNN he called it our CNN ok and after this one I'll stop for questions and let's see so basically the idea is let's let's we don't want to do all possible boxes in the image so he used like region of like the Selective search method and get 2,000 boxes scale all of them to like us fixed size of 224 by 224 and then pass them through a convolutional neural network so from this part onwards you see that distance what we have done before right we've taken images and passed it through CNN when he did it this was 2014 so he actually took the output of the convolution at CNN and he didn't have a fully connected layer he just passed it he passed it through a different machine learning technique called support vector machines and he took the features of the cornet and did two things like a bounding box and and and support vector machine so in the end what you get is you're saying what class are you and and you also try to predict the coordinates of the bounding box so the you need four numbers X Y width and height okay so immediately you see the problem here it's like it's very slow because you are passing you're going to get 2,000 proposals and each of them you are going to inhabit to 224 by 224 and then passes through coordinates so you had our CNN and the next idea was so somebody I think they're their same group had the idea hey maybe I should switch these two operations okay and this was primarily because the original method was no slow I think they said something like for a single image in 2015 it and 2014 or 15 it took them 84 hours so then they decided to make it faster and and here you see the on the right hand side is the slow our CNN method and here you see that the modification so they first with the convolutional network and and then did the region of proposals on the on the output of the convolutional network and from there onwards it was it was similar it was a CNN and they didn't have an SVM but they could I think they had a fully connected layer look at the paper but again the two things like which class are you and what box box are you in okay okay I will talk about one more thing and then maybe stopped for questions so okay so this since this is deep learning the region of interest proposals as I said it was it is non deep learning it was traditional image processing it was based on this one method was selective search and I think here they had stuff about crop plus resize so this step was quite this step has has some details like and because I am short on time I'm rushing through it but there is a couple of this this is the important step this this is like a little bit tricky so it's the the original was there is a ry pooling but in this case they have if you if you they had they have ry aligned basically to to get a better positioning of the box I just don't have time to go through that equation okay one thing they noticed is so you know you you can see in in one year the improvement in it in 2014 it was about 84 hours to train the are CNN and by 2015 they brought it down to eight hours this is using this one star CNN and and in the architecture that the main difference was you know like a let me let me get that it's just this you you swap the you do the cornet first and then you do the region of interest proposals and this swap those two steps okay I think I stopped for questions I have more material but I didn't check the chat so you mentioned that there is this support vector machine of wide afterwards I just read that first analyst experiments they actually used a support vector machine for image classification and my question is how does that then work I just recently conceptualized how it works with CNN because you can use it everything supplies basically an input how would you do that with the support vector machine so let's say what I will your output of the coordinate is you to the support vector machine takes takes a feature vector so they just take the output of the cornet and put it as a feature vector and feed it to the support vector machine so let's say your might your feature vector might be say 4096 it's just a vector of 4096 values but this is this is quite old you know I just wanted to give you guys the history of like how we got here and is very interesting let me see if I still have it so this this is the current state of the art like the on the y axis on the x axis is the GPU time shoot they don't have units and on the y axis is the accuracy and they have all these different methods like for if you are doing object detection and if accuracy is your is the main goal then you you know you try one of these washed or CNN or faster or CNN methods on this side but if you really care but you don't care about accuracy as much and you really want a speed then there is another set of object detection methods they call them like there is Yolo there is SSP so I thought this graph this is from 2017 at a at a computer vision conference so depending on your application you you can pick different object detection methods and maybe I have a slide I think I have a slide and in about the difference between these high performance versus you know like like high accuracy versus quick you know like a speed versus accuracy trade-offs full-stack deep learning can you maybe allude to what's the difference between this evaluation metric and say for example the area under the curve and a receiver operating characteristic yeah that's a great question I was trying to prepare before my talk so M AP is a mean average precision and it it comes from you know we are used to in ROC we are used to sensitivity and specificity right but in the computer vision and computer science site they have precision and recon and I don't know if anyone else knows the MEP very well but basically it's like you have okay so we are doing object detection right so let's say you have the actual you have the actual mask of the object and your prediction let's say say if you were if you were if you have perfect segmentation right you have your complete overlap so if you do the intersection over the Union you will get a value of 1 and but most of the times you will not have perfect overlap you will have say 50% overlap or 60% overlap so you do you pick it you pick a threshold like okay let me say I got 50% overlap and I'm going to call that a successful detection right I did not detect the cat fully but I got 50% overlap between the actual and mine and then you do it at 50% 60% 70% and you get a precision for like I think each IOU and it's it's a mean there's a mean average precision across io u--'s that's that's a metric they're using an object detection I sorry I didn't explain it well but I will share I was just wondering if precision and recall is just CS speak for sensitivity and specificity but I just Wikipedia it and I think it's something else because let me get that here precision is this called in statistical speak a positive predictive value and the recall sorry I didn't see it here but yeah it probably also has a more statistical terminology yes I think somebody did relationships between them and then in they have precision recall and F score so you will see that a lot of these methods may be one last things I'm almost out of time is yeah so this was the you know I was telling you about speed versus accuracy so the in the earlier stage in faster CNN Washington are CNN what we saw is we had reagent proposals and we try to get the region proposals and then have a convolution and then try to do the classification so in these single stage object detector is what they did is they said okay we I don't want to do all that I just want to divide the image into seven by seven grids okay and in each grid I'm going to have a fixed number of boxes so let's say I'm going to have in this case they have I think they have three boxes of different sizes so you see they have a tall box a vertical box a horizontal box and a square box okay and you could immediately say that okay why why only these three sizes it big you know you could have more sizes and then basically you you have these say you had seven by seven and for each box you're trying to predict five numbers like you're trying to say okay is there say you had C classes so you're trying to say okay is there a dog or a cat or background in that box and if if there is if it is the background object then you don't then the X Y H and W don't matter those those are those are don't cares but if there is a if you are say eighty percent sure there's a dog in that in that class then you you would give the coordinates of that of the box and so this listens them like a big picture you know like you have these single-stage object detectors please run really fast you you will hear about yolo retina SSD and routine on it and actually right now retina net is the most popular one so that's why if you I think in the first day I course in the latest code they have implemented routine on it they didn't even bother to implement our CNN it's just so old you can implement it for just learning and understanding but most people are just using ready internet nowadays which came from this conference in 2017 okay I'm sorry I'm just out of times I will stop here but but there's there's a lot of things I didn't discuss like about like different object detection and there are there's a lot of interest in this for example you you know like in self-driving cars you want to detect oh is there a person let's come in front of my car or is there another car so all of that falls under object detection categories we have in biology a lot of things like detect find the pneumonia in the x-ray or segment the cells in microscopy or all of this comes under object detection let's stop here so would you say I've read about Phi Phi Lee I think she's one of the best computer vision people in the entire world right probably the best right now yeah she's really good and there it's it's their class yeah there you see the note says she and some of her students have been doing this class for like the last three four years and they have really good node so I I just showed you the lecture notes but there's like they have a practical session on you know okay what are the practical challenges when you try to implement it or run existing methods so they have a whole set of notes for that because right now like PI Taj has its own implementation of these methods and same for tensor flow these these methods already exist it's just figuring understanding the theory and then just really trying it's trying them out for your application one could also immediately envision a military application right so if you think if autonomous drones for example flying over Afghanistan and so on is something known there whether there is these algorithms are also implemented there or no I I don't know about that I know like somebody tried to like it if a Philly was at Google and Google was trying to apply for a military contract and lot of people quit over that they didn't want to be part of a military project but that's all I know I don't know more than you can think of these applications if you can immediately think of military applications you know that are being pursued okay great thanks Michael okay so do you think you won't continue next week or is that yeah I don't know if there is interest like there is a lot of material like maybe we can just look at code and I've used this in I've used this for long pneumonia detection and I've used it for a segmentation of microscopy cells that would be really interesting I would be interested in that okay and I have the code already I can share the code if there is Tristan that yeah okay can you share your presentation why should I yeah this is the I just use Stanford slides and I will I will share it in the slack Channel who's already shared in chat mechanics so if you have the okay if you have the transcript you can probably post it in the slack Channel yeah thank you okay excellent so law will continue next week with the first full stack deep learning lectures four five and six and with the object detection code buying whole so fantastic excellent so see you all next week okay see you next weekour first first meetup about the full-stack deep learning course that we are we talked about last week so we are starting today with lesson one two three and three is actually at boot camp and in the second part of the meetup we're going to have the Michaels presentation about the object detection so let's start with the with a with a deep learning full-stack deep learning boot camp so how do you want to do it we have it was like three last three videos slides and and a lot code I didn't do the left code but a whole Michael is in the is in the meter maybe I can remember can show us how to do the the code so I will do it for next time so that you guys learn anything interesting I cannot hear you maybe it's my aunt No okay so you have to unmute yourself first if you want to talk you have to unmute yourself first because there was a huge background noise I've muted everyone can you can you see my screen are ours it yes vacancy explained can you see that the on the right-hand side the the page with the bootcamp or is it can you see the full page yes okay great so it's cool so yeah so it is just to repeat what we talked last time so this course I'm just going through the page on the full stack deep learning course so we have courses that teach us like theory or math or computer science behind deep learning this is like the deep learning book or the some other courses we have like trainings that teach us how to Train deep learning models and I'm going to meet everyone if you want to talk to some user self yeah so we have course like first AI which we some of us are familiar with we have libraries like Klara's or the deep learning AI courses so in this sort of courses would learn how to train deep learning models so they they train faster or better accuracy different architectures and so on so on what was not included in those courses at least faster I it's not at least a little bit was covered in lesson two of part one how to deploy a web app we're recognizing some images but this course full-stack declaring supposed to supposedly go through the full spectrum of what deep learning projects should contain so it's from the data management model training model validation and deployment so that's what we're hopefully going to learn and there's also a seventh of course we have the videos so each lecture has got a video and also some almost all of them has got slides so and there is also a love code so it's on a geek app and it's on a full stomach that's right buzzer so we have also the lot cold which I've not been able to round but hopefully Michael is with us today so can help us to show at least help me how to do it for okay saying that my voice is too old I don't know how to have to fix that I'm sorry I'll try to speak to a speaker here is it better now okay thank you yeah so so that that's the kind of like brief overview of the course and for today we hopefully watched lesson 1 lecture 1 which is introduction and in lecture 2 I like about machine learning projects and potentially we did also the lab code and this is the this topic of the today's meetup so I'm just going to open the discussion now so if anyone wanted to share any interesting things that they've learned any opinions or any issues like from my end like I didn't know how to run the lab code now that I spent too much time on it but it's not really exactly as per per videos because they add some code we don't have that code and so on so on so we need to modify that thing a little bit to make it work so if anyone wants to share their pins now that's the time to do it and again you youth yourself because I have muted everyone so sorry about that but there was a was a background noise that's why I did yes once you get it set up it doesn't actually seem to be that difficult you know the first tasks are pretty easy it's just built a bottle and run it so okay in the video all the solutions do so it's just kind of nice I guess I had one question that's in my work afraid to use PI George vs yeah like how transferrable are the caressed by George I guess all the theories the same rate it's just a PR yeah I guess so I didn't really use tensor all that much what I've learned was from my Jeremy in frustrated AI course what everything was impiety faster yeah I think you can use the open on X file like you can you can move models around with that kind of like from tensorflow fighters I hope and to other languages I think other framework I've never used I once I'm not so sure I'm try to yeah it's like an open format to present deep learning models so you should be able to move from one framework to another one so PI torch is there and I'm not sure if when I don't see actually tensorflow in it actually so I'm not so sure you can transfer your PI touch models to tensorflow and the other way around I'm not so sure seems like cafe to Microsoft IMAX net PI torch and many other frameworks but maybe if we read somewhere small praying then you can find an angle not so sure yep does anyone know how to how to use different frame framework with that with our code I like partridge instead of tensorflow it's probably not a big deal yeah because that's also interesting for me to know as well maybe for some other people we are in this group no slightly finished or doing some part of fast AI which is mostly in tighter these days and this law obviously tensor fall I believe so I guess you would be easier for most of us to do this in Python if that's possible how much background I don't know I'm brand new to this I just wondered I think Mike Michael would you be able to show us some how to render code in this yeah not familiar with zoom yet but I'll try to okay it's how do I share my screen now this you should see the sure it's either on the center bottom of your screen or the top once if you have it full screen or not it's like a dream share button if I cannot share it because there's another user who's sharing okay so I'll stop sharing now again should I like start from the beginning I posted this small how to because I guess they really don't have it anywhere on their stuff you know what I was going through the I would chunk now around I found it here and this is from the third let the slides okay you have the github repo and this is the code which wasn't important to get to the machine image and included there let me see the constriction of the screen and with that I was able a try two different things at once I tried it on my own personal machine okay that's too long and this was very tricky because I guess you just familiar with Conda and they use this tip and on with this I guess it's a little bit different also from the because here when you don't use this like they will copy the game put a copy in the group and sled you have to specify the path their overall showed immediately this is what I had to do to get it almost running with the first script I wasn't able to run and you cannot see your screen Michael yeah for some reason at least I cannot see yeah so you see it not running on the same environment in there I could run more than on my okay so you connect so you start the notebooks on the your local computer right and then you in some way you have to link the you know Jupiter notebook to the weights and biases right is that how he works no no I first tried on the computer like totally separated and but then I was running into problems I guess one thing this is even on on their image on these weights and biases platform is that they there's problems with pencil flow and all the operations which are maybe not working as they should and but with this white weights and biases website they I could get it running right so we just started enter that code from the slides and then he worked then it work there okay fantastic but you what like on the left one I'm gonna go into the Red Bull or it's even on the main thing still see your screen with the weights and biases with the Jupiter lot instance okay much open hey buddies how would I open it again you see now much war is it like we see something I think but it's frozen for me at least okay so I'm not a github repo but it's not showing or know what I'm saying yeah we just see the model and to the summary of the model and where the training you know the debts of go GPU device that page of the problems with this thing is like when I said when I share when you share like say a PDF file it only shares the PDF file then if you want to share the like a browser you have to you know like stop sharing the PDF and pick the browser now nothing is being shared yet now we see the setup it's just like what do you have to do and then on this page you can enter the codes from the PDF and with that not sure that you see now the terminal again we see your lap a lap one lap one solution and so on so how was the labs like was it is it is it hard or I didn't check your charts like for example like deep learning or di you know like some of the exercises are quite easy how did you feel this Labs they hard to do or yeah you who can the flow hello okay excellent I grew up to part three and and yeah I mean I guess it's more the focus is more on the how to handle that stuff and go run it over the comment line and there they have from the rape poll also the the code you can then try out and play with it I hope you're seeing now the github repo again and there they have this yes and with that you can then start to play around a little bit change the network type or the bed size and also try the testing this was also done working on the machine or on the cloud machine okay because I don't know the sleep environment was also not on there on this weights and biases machine right now it's what works been quite fast actually because it's I guess ok ad not stopped it this was interesting if you close it to interpret the lab was unfamiliar it can be still running here so you can also run out of memory so be sure to also close it here and then simply you can start a new terminal and then you see here that there's already a rip off there and then like to and when we in left to you can try it like this so this is just a code from before and you see there are some of these like air arts but I mean also there the guy in the video had some some of those right but it's this one sensual I mean I'm not familiar with tensorflow on on the level I'm with fast I hope I thought so I'm also like would really enjoyed if you if you could try to put that in a fight or even tensorflow but I am I'm not sure yeah seems like they know using notebooks for that at all they're just using you know yeah I mean to have one notebook tool but this is like very just for looking at the data a little bit it's not no I will try to open it now so it's not another big deal yeah you have it here this works right out of the box when I try to put it in my content by amended I had to install a lot of additional stuff do you see it's like here it's download see later and you have you can have a look at it but that's it from the netbook girl and then out book in the first three lessons was more based on the comment line like a stick you know they also don't have one phone number swing or there is one ok differ I hope you found here one from before but there's another one but this was also not in the in the video is notebook I guess maybe they show it on the fourth lesson or or later but yeah it's just a just to look at the trip of the data for this line translation I guess here the spring and then the recording picture now I guess it's a little bit different setup than the past VII course there and it's quite dense I guess it was just two days there or in these videos yeah but but it's also part I had a look at so I wasn't playing detail with the code yeah yeah it's nice I guess the kind of make sense that once you have your model ready you don't really work in notebooks that you are you kind of like execute your your model if I think about it some maybe that's why they think this way easier then if you have it set up like they have that they really can changed and easy yeah the parameters and yeah you know I once run into problems in Jupiter notebook when you're like to have output the epochs in past and then you can it can get stuck there to be the notebook because it gets too much string sent to it and interestingly when you write it in the terminal in the comment line it has no problem so this would be also an advantage I could think of yeah and then in that lab do they connect to the whites and biases or not yet I don't this so the distributor lab is started from there but that happens I haven't seen anything yet that you send some training matrix there yeah a little bit later because when I can try to open it this really screen when you open it there yeah there you see also oh yeah yes there at least I saw something when I was signing up that there were some yeah quite similar I guess to tens or was it called the visualization from tens of low tens or it's also posting faster forum such quite similar to these kinds of visualization to just looked up to lessons three entire day we're not talking about that yet okay right that's one thing for me to do next week to do two laps in one week the week one coding exercises I still have to do I'm still thinking about if we can somehow transfer it easily to package but I guess it will be not so easy in straightforward I guess most of us are more familiar with fighters because this is the we did most of us I guess did the first AI course and I miss that question from and I was about what's people that background that's actually quite good question so thank you for asking that because indeed it's I am assuming that with most of us at least did some some some for first AI corners but that's not necessarily the case so yeah if you want to share about whether you do fight or tensorflow anything else or whether you just starting with deep learning or your experience today so this course kind of assumes that you've got some knowledge about deep learning and it kind of builds on that I mean I started 2017 with steep learning today I and then switch to to frost the eye and yeah that's where I am now actually you know like set up a local learning group and also get some people and now recently I was talking more in the twin mostly to find like-minded people yeah and try to get more experience with probably projects so I was the scientist but like very early so to say I'm not doing any single keep learning related more more tabular data aggregation and stuff like this but it's hard to do it this sort of group it's but I can ask questions yes I am a brand new house like almost zero excellent welcome so we have some of us the so I think here at this this group we started with part one I was probably a year ago I was for the summer last year when when some started the study groups for the first day I part one so then we finished part one and then other then I guess we will repeat it part one again and then with the part two to live so that's what we just finished and now we starting the full stack which is going to take us four weeks or so and after that we're going again think about what next for this group that's you know so it's going back to the to that course a little bit more I'm not going to go through all slides because there are a lot of them I hope I can open them at least all right so this is quite recent I'm sure he was wrong as bootcamp because in March this year so it's quite recent surrounded by three guys plus some guest speakers yep I guess speakers every Howard from first day I also was a guest speaker and I had some introduction what is deep learning public image recognitions and to cover the object detection very quickly someone like the image recognition that through neural nets is now like default way of recognizing images whereas before it was different way like OpenCV stuff so the deep networks got many layers of course the image net resource with like million of images and I what I what I showed is like the performance of the error rate really improved when the neural network were implemented so that's that was a big change and then there's also image caching thing and a lot of stuff with the what I did neural network I think that's the first if not the only ones new fda-approved or maybe the other one I think he mentioned that there's at least one or maybe DUI fda-approved you are networking the medical industry so that that's a big thing because because FDA approved that neural network and make decisions about patient so that that's a big thing I don't remember which one of those becomes a couple of different examples but there was one oh yeah this is one FDA permits marketing artificial intelligence based device to detect certain diabetes related problems right and that's quite a big thing because usually the neural network is considered to be a black box so you don't necessarily know what's going what's going inside and it's it's difficult to explain us what because you heard like hundred million of parameters inside and you are network it's it's difficult to to explain I don't see chart now so let me let me switch so I can see so I can see your question is way way just tell me when or just say anything you want let me see what's going on a chance I can see questions about like slack channel for this group so we we we are looking at the moment using the the twin lights lag and the first they are a channel is there any reason this course was picked up among any others no there was there wasn't any reason we just think in which course to do next and they seem to be a good one that's why I will pick that one but we could have picked any other one as well and someone mentioned that Apple's EKG monitoring was also have the year proved that's a good question so Stefan if they approved for like for making the medical decision because I believe that the other one that was the way they were talking about this actually is making the medical decision as far as far as I'm aware so the Apple watch is it like just advice to the dogs I don't know maybe maybe you can tell me if I'm wrong or healthy the doctor make weekly assessments so instead of the doctor look through all of the images or forward to combining two different sets of images since for bringing images you can do CTE and MRIs and sometimes it's difficult to get all the informations and besides between the two they're between all of these types of images we use deep learning to mash them all together in some cases thank you so like some people say they're not on our slack so if you want to be added to slack you need to go to twin lie website and there's all you need to ask for so some what I do to slack once you're on slack then just just find the faster yeah deep learning Channel another near there so when I was talking about making medical decision like so making so you don't need a doctor to confirm that decision so if your network makes the decision whether healthier yours your ill that's the decision where's with the Apple watch I don't think that's the case right even if it says that there's something like not fully right with your signals from from about your heart you still go to a doctor and the doctor will make final call whether they need more tests or something that's my understanding right so if someone's got the general so if someone's in the slack and good general and meetup that's fine just click on a plan on the AMA channels and then all the channels that only strike will be available and then just click on the one that you like to be part of yeah yes I don't know I don't know the details is going back to Apple I don't know all the details about the Apple watch I'm not gonna yeah that's that's that's a big thing if I didn't investigate about what he was saying that FDA approved this the wrong networks but for me it's a big thing because it's going to be more of those and it's going to of course automate a lot of stuff and speed up a lot of stuff but this this uncertainty about about the because neural network so Cathy does black box so you'd not necessarily know why it makes decisions you can try to explain a little bit but if it gets complex if you get like hundred millions of parameters in your network that's that's a bit difficult so I think that's why I was thinking to myself that's a big thing that FDA approved neural network so but I didn't really do a deep dive on that so I was thinking this was quite interesting about this one right he was talking about the autonomous vehicles that it's a big thing as well a lot of companies trying to be in America with that speech recognition with neural networks machine translation the it's actually started a long time ago 1950s back propagation 1980s the newer Network the convulsion neural network Jana Kuhn 89 but all of that was not possible because the compute power was not enough or the zillion of data so now we have more and more computer power and we have more and more data again we talked about this this course is more overview of the overall deep learning projects so this right so what they talk to a lot of compiègne companies about that - deep learning and that's how they develop this course so I think that's quite interesting because you can hear what's actually he's being done by people that do deep learning and they also talk a little bit about aspirations about what in ideal scenario you would do with deep learning so we talked about but this alpha beat so we had four lap sessions and about nine hours of video lectures and about four hours of the guest lectures of course they used the different slack than ours so if you want to be on our slack register online on a to my side and then just join the first AI deep learning channel are there any questions to the no slides we have we still have discussion about the Apple or the other medical things or has a different thing someone used Ganz to make a picture right painting and and then Pixar would solve a lot of money was the question who should get that money so is that the in relation to the using neural network in the medical space yes so what were you saying that the people who made the neural network will be viable liable for the wrong decisions or well now it's they approved well if the akin of in my understanding if they can approve things but they still they don't they are not liable FDA is never involved so they they kind of they help you and they check you and they do things to make sure you comply with all the regulations but if something goes wrong even if it's approved you're still responsible for your product that's how I've DEA works unfortunately unfortunately so the manufacturer is responsible for the for the product it's not FDA okay yeah and I don't know what to do next because we have the the lecture to which was like a lot more slides than that I've only slides a lot I think I don't think we have time to go through all those at least not in a speed that would make any sense so we're talking just maybe quickly if anyone's got any from the lecture to any highlights from lecture two maybe we'll go this way instead of going page by page maybe look and if we can just talk about it and right we'll have course I can find the course material slides lectures from Joseph so just if you can find them seduce this full stack deep learning dot-com / March 2019 does a page for that and you'll find everything here actually saw the lectures and the slides are just below the videos and give a link somewhere here possibly but if not then the the geek-out link is here yeah no worse and I've posted that on the online slack so that should be nice luck as well and maybe it's a good idea to start a new channel just for this course so then everything's in one place as at the moment we share with the faster I channel and I've made it confusing you yes I was wondering if anybody had gone through the first few lessons have you have you been able to get through one of the lessons or two or whatever yeah so so the plan the plan for this neat applause to go through lecture one two and three so lecture one is like overview it's like a half an hour video about very generic about deep learning the lecture to take speed more time it's like an hour oh it's actually an hour and an lecture three it's actually lab so there's like he kept briefing with the cold and you can run that code so that was the goal for today I have not sure actually have many people manage to do that I didn't manage to run the code so that's something you have to do for next week what is the what is the code do with a code from the from the this love what I understand from what Michael showed and Michael correctly from wrong is that they showed what the code there's it's the recognizing text from handwritten notes line it's it's nothing yeah so you get a picture of you get a picture of some handwritten text I hope you can see my screen yeah yeah so that's that's kind of the project so you get a picture of handwritten notes from cypher model and then you get that to your hat and then your model dude first of all the deadlines and then from those lines should detect the text and so through the whole force overlap sessions I think they're going to kind of make it work but for this lab it was introduction and setting up the Jupiter a lot and I think making the model work from the terminal yeah yeah I mean I guess the first thing was just a multi-layer perceptron I guess that's really just there to show there's something doing some fitting there I this is just like an example you would never use that for this like final setup and you start out with this extended em list which has also characters next two numbers and I guess but this is just for the latest artists and I guess they want to set up on all network to do the line detection which then feeds the detected lines to the line text to recognize a which is like then translating the text but so far the code was like very I guess just low-level code that we have something to run on to show how you would like iterate and and look at it so I guess what we have seen or up tools to third lesson is nothing fancy at all except at how you treat it with their setup and so with the idea some simple models it is some simple tasks and then and then to try to productionize it is that what the course is gonna do I guess but I guess there will also that the code will get more complex I mean in the lessons Reid also talked about these CTC laws this is also like more advanced stuff so they yeah they increased the complexity but I guess it can just covered this machine learning project pipeline there I guess there will be not to be able to go into detail with that stuff they trust and have some links on the slides they have also this lecture sorry slides I post it now its link to this article which is very interesting but this is like also very mathematical intense post but I guess it's really to get going with the with the workflow also because I guess it's quite different from from like the faster I notebook workflow yeah so far I try to get into the latest bias I got an access code but and then I tried to set up the Jupiter notebook but it said invalid access code I'm stuck there yeah so some people been able to run the code to use the code and he worked some people said as you that didn't work we also are thinking there's a way to start the Jupiter notebook or on your local computer if that's the way you do it I'm gonna coliform coggle in some way and just round the the code it's a tensor flow coach it's a bit different to what we've been learning first AI so far I wasn't able to find the time to do it so the coverage if not get happen yeah the coaching he has unless you're open it on this episode W and B ODI dot slash profile there there I was with the code then there should be should be this instance opens with truth with the lab but maybe it takes time I guess some people already said it was kind of some minutes I guess when it first started on when I tried it it also took a little bit longer and then I guess it's maybe faster when when you already have it yes I'm not so sure how I didn't expect this to work this code because you know I guess they have to pay each time to start some sort of virtual machine or something Jupiter for instance but if it works it's better for us yeah partly that they use this pip and and and I guess most of the people are familiar with Conda and then it's quite not straightforward to rewrite stuff so the cold we have is the so so again so if you go to if you go to the last lecture and I couldn't find spots for a lot like barley is it we don't have those right because we said I'm not sure if they have medias up from the from the lab on its own oh no this is the one else is there yeah so this is the sunlight and the cold here so the code that I give it here this is the code for the for the hatbox let's see yeah there is access to code so in the weights and biases they have they have this hat access pin is that is that the one is this one work to me yeah but I guess make sure you use an L okay so is the plan for next week shall we shall give people some time and to do the the first lap again or shall we move forwards and and ask people to catch up and start with with a lap sessions second session for the left let's give some time to search that distant compass if this thing's are played some time to set it up yeah we could talk back and forth on food on the select channel for people that have problems setting it up and then the people that have already done it can help can help answer their questions and then hopefully we can all be together by next week yes so we had in slack that's the beautiful gem that I put together as a proposal so we could stick to that so today was the first three lectures so the next week is lecture four that's the video lecture five if that's a lot second session and lecture six so let's try to do that and let's try to yeah sure it's a it's in slack but it's here so it's on top of the first day I but are going to copy and paste that to somewhere let's paste it here thanks okay I think if we create the channel is for this course is going to be easier to manage migrants it is paint in a way that it's always there so if you open the first day I know this is the first thing you see here on top so it's not being it's actually in a description of the of the channel so it's always here kind of a QA but it's there okay well next week keppra more lectures and the second session of the lap and there we'll see maybe gets more interesting and laughs to have any more questions before we move to Nikolas presentation object affection I don't see any other questions with chats so Michael we start with the object detection okay which was supposed to do last week but we ran out of time thank you so I will stop sharing my screen okay hi everyone do you do you see a PDF with somebody called computer vision tasks so while I was preparing for this I just realized there's just so much stuff in object detection I'm going to try so what I'm going to try and do is give you like okay what is what are all the methods available and then if there is more interest in the future we can go drill down and deeper into methods but okay so why why do we care about object detection let's say so far you know in lot of classes we have done classification so for example is there a cat and the object is there a car is there it's a repeater whatever images we just care about what object is there in the image but there are a couple of things so for example the first one is semantic segmentation so we're what we are trying to do is for each pixel we are trying to give a class so is this pixel cat pixel is it is it a grass pixel is it a tree pixel or is it a sky pixel right so you're trying to give a label for each for each pixel you're trying to give a class and that is generally referred to as a semantic segmentation okay the next stages is another is then you have object detection where you are trying to put a box around each each object so here you have two dogs and a cat okay and and then the last thing is so within within the box you also want to segment the objects so here you see the red you have tried to segment the dog as carefully as you can you know a red dog green dog and a blue cat okay and so one thing I should say I'm I'm using slides from this lecture called from this class called a CS 231 it's a Stanford class on on computer vision deep learning for computer vision and it's only it's only computer vision it's it's actually one of my favorite classes because they they talk about almost everything I care all the methods and their lectures from 2017 are available on YouTube these slides are from their 2019 lecture which occurred actually 11 days ago on May 14th so we have like sort of the freshest teaching material okay so okay let's I'll give you some intuition into how you know like people went about developing these methods and then maybe in the last I will talk if you have the questions I can go through that okay so here we like I said in semantic segmentation you want to label each pixel right whether it's a tree or a cow sky or grass so one way you could have one way people could have done it is you take each pixel and you take a small patch around each pixel and then you use the traditional classification so you pass that patch to CNN and then say is it a cow or is cow cow or a grass but of course there is there is a big limitation here right you are you have to extract a lot of matches and and then you have to extract patches of different satellites sizes so you have a very high computation cost and this this is work actually from 2013 and 2014 ok so initially people tried it this way but of course this is too intensive so then the yeah ok very it's very inefficient and mostly when when when you are using patches when you're when you're doing patches you know like overlay patches that are overlapping will have similar convolution output so you are not sharing features between the patches so that was one thing so so the next approach after that was let's do a fully convolutional so basically we are just going to do take the whole image and apply convolutions to it and and in the end do some sort of decision to say whether a pixel is pixel is a cow or a grass or whatever class it is so you basically do so it that's why you you will see things like fully convolutional okay and then you make predictions you are making predictions for all the pixel at once but immediately you see there is there is a big disadvantage here because many times in these convolutions you you increase the number of filters so you start with say 16 filters then you go to 32 filters 64 filters and you are doing it at the full resolution which is in computationally intensive yes so the next idea people came up with and this is so initially we were in 2015 and this is the idea of sorry initially the first idea sliding window was 2013 and this idea is 2015 where they figured out okay you have this computational overload why don't I keep down sampling as I increase the number of filters so you see the boxes become thicker here so you do this down sampling first and then you can and then you can build it up and then you can do up sampling to get back to the original resolution of the image so down sampling we have familiar with you know it is the mat is similar to the max pooling of operation max pooling will do down sampling for you but up sampling was new it's okay and they for the next few slides talked about is different ways to do up sampling and I don't want to get into that because it's just a bunch of details about how to get back to the original resolution and lecturing and they have like almost like different approaches so they have a nearest neighbor bed of nails but I just Bob will just Bach get bogged down into details but but the basic idea is you have a 2 by 2 matrix and you want to go to a 4 by 4 matrix so the nearest neighbor is just okay just take or they have a bed of nails approach and people just tried out different things for for up sampling and they have some more details because it's it's the lecture so they have lot of details about how to do the up sampling and some intuitive ways for it ok so that that was the idea of you know like the idea from 2015 around the around okay but what this does is so there's one limitation here right you you can do the objects you can classify objects but there was no way with this method to put boxes around objects okay so that's why there is the next category of methods called object action and they had so you you know object detection has been around for twenty years there was there was a there's a recent paper which is summarizing object detection in the past twenty years and they've reviewed reviewed like around four hundred papers in this in this review paper so why why is you know so object detection is then say in biology it's important biology it's important in self-driving cars sort of everywhere okay so let me so the first object detection problem was was limited is that say you say you know there is only a single object in your image so in this image there is a single cat and so what they did is they have one one one arm for classifying whether it's a cat or a dog and that's the traditional convolutional neural network and then the second since since I told you beforehand that there is only one object so you just have to put one box right and so they they have second arm where you do try to do try to find the coordinates of the box and you have two different losses like for the classification you have a soft max loss and for the Box there is an l2 loss and basically the idea was okay let's combine the two losses and then you can do classification and localization so that is put a box around this object okay so that idea fails if you had if I have multiple objects right so if I had one cat it was fine but if I have multiple objects that idea doesn't work and also I don't know beforehand how many objects there are so here there is one three and then you have so many ducks here right so you don't know beforehand how many objects there can be in the image and okay so the first idea was so now I want what I want to do is detect multiple objects in the image and put a box around them so so one idea the one what's the simplest thing you can do you can take you can take crops of this image you can make different crops from this image and then pass it through a convolutional network and say is it a dog cat or is it background so you know because some places in the image you have nothing so you need a background class and so there is okay so if you do it this way there is an image problem like you have to take a lot of crops so you have to take a lot of crops and also objects may be of different sizes right you have a big dog or you have a small duck so you have to take boxes of like like different scale and at different locations so this becomes you see their base and you have to move the box all around the image so you have like thousands and thousands of boxes and boxes of different size though at different locations scales oh yeah and of course I forgot aspect ratio right say you have a giraffe so you have a very tall box or you have or you have a soccer ball so then you have a square box right so this becomes very computationally intensive if you try to do this box approach that is just take like if you try to do it this way take crops of the image and pass it through the convolutional Network you have too many too many boxes to go through and it's it's too intensive so one of the first ideas people so this was this was before deep learning they had this idea of selective search that is find me the the blue obvious regions or find me proper region proposals where they could potentially be a box where they could potentially be some object so there's one method called selective search which runs in about few seconds on a CPU and it will generate say 2000 regions of interest okay like this so you see some object so you see some of them are of you have different aspect ratios different scales and different locations so those are the three things to remember like locations scales and location scales and aspect ratios so think of giraffes versus soccer ball okay so then there is this guy Robert glee chic he when he was doing his graduate school work at Berkeley he he developed this method called so we have seen CNN before in the FASTA a class or deep learning class so what he developed was the regional CNN he called it our CNN ok and after this one I'll stop for questions and let's see so basically the idea is let's let's we don't want to do all possible boxes in the image so he used like region of like the Selective search method and get 2,000 boxes scale all of them to like us fixed size of 224 by 224 and then pass them through a convolutional neural network so from this part onwards you see that distance what we have done before right we've taken images and passed it through CNN when he did it this was 2014 so he actually took the output of the convolution at CNN and he didn't have a fully connected layer he just passed it he passed it through a different machine learning technique called support vector machines and he took the features of the cornet and did two things like a bounding box and and and support vector machine so in the end what you get is you're saying what class are you and and you also try to predict the coordinates of the bounding box so the you need four numbers X Y width and height okay so immediately you see the problem here it's like it's very slow because you are passing you're going to get 2,000 proposals and each of them you are going to inhabit to 224 by 224 and then passes through coordinates so you had our CNN and the next idea was so somebody I think they're their same group had the idea hey maybe I should switch these two operations okay and this was primarily because the original method was no slow I think they said something like for a single image in 2015 it and 2014 or 15 it took them 84 hours so then they decided to make it faster and and here you see the on the right hand side is the slow our CNN method and here you see that the modification so they first with the convolutional network and and then did the region of proposals on the on the output of the convolutional network and from there onwards it was it was similar it was a CNN and they didn't have an SVM but they could I think they had a fully connected layer look at the paper but again the two things like which class are you and what box box are you in okay okay I will talk about one more thing and then maybe stopped for questions so okay so this since this is deep learning the region of interest proposals as I said it was it is non deep learning it was traditional image processing it was based on this one method was selective search and I think here they had stuff about crop plus resize so this step was quite this step has has some details like and because I am short on time I'm rushing through it but there is a couple of this this is the important step this this is like a little bit tricky so it's the the original was there is a ry pooling but in this case they have if you if you they had they have ry aligned basically to to get a better positioning of the box I just don't have time to go through that equation okay one thing they noticed is so you know you you can see in in one year the improvement in it in 2014 it was about 84 hours to train the are CNN and by 2015 they brought it down to eight hours this is using this one star CNN and and in the architecture that the main difference was you know like a let me let me get that it's just this you you swap the you do the cornet first and then you do the region of interest proposals and this swap those two steps okay I think I stopped for questions I have more material but I didn't check the chat so you mentioned that there is this support vector machine of wide afterwards I just read that first analyst experiments they actually used a support vector machine for image classification and my question is how does that then work I just recently conceptualized how it works with CNN because you can use it everything supplies basically an input how would you do that with the support vector machine so let's say what I will your output of the coordinate is you to the support vector machine takes takes a feature vector so they just take the output of the cornet and put it as a feature vector and feed it to the support vector machine so let's say your might your feature vector might be say 4096 it's just a vector of 4096 values but this is this is quite old you know I just wanted to give you guys the history of like how we got here and is very interesting let me see if I still have it so this this is the current state of the art like the on the y axis on the x axis is the GPU time shoot they don't have units and on the y axis is the accuracy and they have all these different methods like for if you are doing object detection and if accuracy is your is the main goal then you you know you try one of these washed or CNN or faster or CNN methods on this side but if you really care but you don't care about accuracy as much and you really want a speed then there is another set of object detection methods they call them like there is Yolo there is SSP so I thought this graph this is from 2017 at a at a computer vision conference so depending on your application you you can pick different object detection methods and maybe I have a slide I think I have a slide and in about the difference between these high performance versus you know like like high accuracy versus quick you know like a speed versus accuracy trade-offs full-stack deep learning can you maybe allude to what's the difference between this evaluation metric and say for example the area under the curve and a receiver operating characteristic yeah that's a great question I was trying to prepare before my talk so M AP is a mean average precision and it it comes from you know we are used to in ROC we are used to sensitivity and specificity right but in the computer vision and computer science site they have precision and recon and I don't know if anyone else knows the MEP very well but basically it's like you have okay so we are doing object detection right so let's say you have the actual you have the actual mask of the object and your prediction let's say say if you were if you were if you have perfect segmentation right you have your complete overlap so if you do the intersection over the Union you will get a value of 1 and but most of the times you will not have perfect overlap you will have say 50% overlap or 60% overlap so you do you pick it you pick a threshold like okay let me say I got 50% overlap and I'm going to call that a successful detection right I did not detect the cat fully but I got 50% overlap between the actual and mine and then you do it at 50% 60% 70% and you get a precision for like I think each IOU and it's it's a mean there's a mean average precision across io u--'s that's that's a metric they're using an object detection I sorry I didn't explain it well but I will share I was just wondering if precision and recall is just CS speak for sensitivity and specificity but I just Wikipedia it and I think it's something else because let me get that here precision is this called in statistical speak a positive predictive value and the recall sorry I didn't see it here but yeah it probably also has a more statistical terminology yes I think somebody did relationships between them and then in they have precision recall and F score so you will see that a lot of these methods may be one last things I'm almost out of time is yeah so this was the you know I was telling you about speed versus accuracy so the in the earlier stage in faster CNN Washington are CNN what we saw is we had reagent proposals and we try to get the region proposals and then have a convolution and then try to do the classification so in these single stage object detector is what they did is they said okay we I don't want to do all that I just want to divide the image into seven by seven grids okay and in each grid I'm going to have a fixed number of boxes so let's say I'm going to have in this case they have I think they have three boxes of different sizes so you see they have a tall box a vertical box a horizontal box and a square box okay and you could immediately say that okay why why only these three sizes it big you know you could have more sizes and then basically you you have these say you had seven by seven and for each box you're trying to predict five numbers like you're trying to say okay is there say you had C classes so you're trying to say okay is there a dog or a cat or background in that box and if if there is if it is the background object then you don't then the X Y H and W don't matter those those are those are don't cares but if there is a if you are say eighty percent sure there's a dog in that in that class then you you would give the coordinates of that of the box and so this listens them like a big picture you know like you have these single-stage object detectors please run really fast you you will hear about yolo retina SSD and routine on it and actually right now retina net is the most popular one so that's why if you I think in the first day I course in the latest code they have implemented routine on it they didn't even bother to implement our CNN it's just so old you can implement it for just learning and understanding but most people are just using ready internet nowadays which came from this conference in 2017 okay I'm sorry I'm just out of times I will stop here but but there's there's a lot of things I didn't discuss like about like different object detection and there are there's a lot of interest in this for example you you know like in self-driving cars you want to detect oh is there a person let's come in front of my car or is there another car so all of that falls under object detection categories we have in biology a lot of things like detect find the pneumonia in the x-ray or segment the cells in microscopy or all of this comes under object detection let's stop here so would you say I've read about Phi Phi Lee I think she's one of the best computer vision people in the entire world right probably the best right now yeah she's really good and there it's it's their class yeah there you see the note says she and some of her students have been doing this class for like the last three four years and they have really good node so I I just showed you the lecture notes but there's like they have a practical session on you know okay what are the practical challenges when you try to implement it or run existing methods so they have a whole set of notes for that because right now like PI Taj has its own implementation of these methods and same for tensor flow these these methods already exist it's just figuring understanding the theory and then just really trying it's trying them out for your application one could also immediately envision a military application right so if you think if autonomous drones for example flying over Afghanistan and so on is something known there whether there is these algorithms are also implemented there or no I I don't know about that I know like somebody tried to like it if a Philly was at Google and Google was trying to apply for a military contract and lot of people quit over that they didn't want to be part of a military project but that's all I know I don't know more than you can think of these applications if you can immediately think of military applications you know that are being pursued okay great thanks Michael okay so do you think you won't continue next week or is that yeah I don't know if there is interest like there is a lot of material like maybe we can just look at code and I've used this in I've used this for long pneumonia detection and I've used it for a segmentation of microscopy cells that would be really interesting I would be interested in that okay and I have the code already I can share the code if there is Tristan that yeah okay can you share your presentation why should I yeah this is the I just use Stanford slides and I will I will share it in the slack Channel who's already shared in chat mechanics so if you have the okay if you have the transcript you can probably post it in the slack Channel yeah thank you okay excellent so law will continue next week with the first full stack deep learning lectures four five and six and with the object detection code buying whole so fantastic excellent so see you all next week okay see you next week\n"