TWiML & AI x Fast.ai Machine Learning Study Group – Session 8 – November 25, 2018

### Article Based on Provided Text

---

#### Introduction to Neural Networks with PyTorch

In the provided text, the discussion revolves around creating a neural network using PyTorch. The individual explains that one layer of a neural network is equivalent to logistic regression. They demonstrate how to define a neural network by subclassing `nn.Module` and implementing the forward pass manually. This hands-on approach shows how to build custom modules in PyTorch, highlighting its flexibility for those who prefer more control over their models.

The text emphasizes that while defining a neural network from scratch can be tedious, it is an effective way to understand the underlying mechanics. The individual also touches on the importance of adhering to certain protocols when subclassing `nn.Module`, such as defining the forward pass and ensuring compatibility with PyTorch's autograd system for automatic differentiation.

---

#### Understanding Logistic Regression as a Single-Layer Neural Network

The discussion delves into the equivalence between logistic regression and a single-layer neural network. The individual explains that in this setup, the output of the linear layer is passed through a softmax function to produce probability-like outputs. This demonstrates how even a simple model can be implemented using PyTorch, with the added benefit of leveraging PyTorch's autograd for efficient backpropagation.

The text highlights the simplicity and efficiency of using PyTorch for such tasks, noting that it allows users to focus on model definition without worrying about the low-level details of gradient computation. This makes PyTorch particularly appealing for those who want to experiment with custom architectures while still benefiting from automatic differentiation.

---

#### Model Training and Evaluation

The individual walks through the process of training a neural network using PyTorch's `nn.ImageFolder` class, which simplifies loading image data for classification tasks. They explain how to define the model, loss function, optimizer, and metrics, and then fit the model to the training data. The code snippet provided shows that even with minimal setup, the model achieves a validation accuracy of 91% after just one epoch.

This section underscores the ease with which PyTorch allows users to train models and evaluate their performance. The individual also touches on the importance of monitoring loss and accuracy during training, providing insight into how these metrics can be used to assess model improvement.

---

#### Exploring Standard Deviation Calculation

The text briefly touches on a specific topic: calculating standard deviation in PyTorch using two different formulas. The individual explains that while one formula is more direct, the other is more efficient for large datasets due to its vectorized nature. They derive the second formula from scratch, demonstrating a deeper understanding of statistical concepts and their implementation in code.

This part highlights the importance of optimizing computations, especially when dealing with large-scale data. The individual's step-by-step derivation serves as a practical guide for anyone looking to implement efficient calculations in PyTorch.

---

#### Random Forests and Image Classification Competitions

The discussion shifts to random forests and image classification competitions. The individual mentions their participation in Kaggle competitions, such as the Quick-Draw competition, where they face challenges like noisy labels and large datasets. They explore potential strategies for dealing with these issues, including over-sampling underrepresented classes and using advanced models like CNNs.

The section also reflects on the limitations of simple approaches like random forests compared to more sophisticated neural networks. The individual's experiences provide valuable insights into the practical aspects of machine learning competitions and the trade-offs involved in model selection.

---

#### Comparing TensorFlow and PyTorch

Finally, the text touches on the differences between TensorFlow and PyTorch, with a particular focus on debugging and dynamic computation graphs. The individual highlights that PyTorch's eager execution mode makes debugging easier, as it allows for direct manipulation of tensors without the need to define a computational graph beforehand.

They also acknowledge TensorFlow's advancements, such as its ability to support dynamic shapes throughtf.GradientTape, which bridges some of the gaps between TensorFlow and PyTorch. The discussion concludes with the individual expressing uncertainty about whether they will switch back to TensorFlow in the future, emphasizing their current preference for PyTorch's flexibility.

---

#### Conclusion

The provided text offers a comprehensive look at various aspects of machine learning, including model building, training, evaluation, and participation in competitions. It bridges theoretical concepts with practical implementation, providing readers with a deep understanding of how to work with neural networks using PyTorch.

The discussion also touches on broader topics like debugging, optimization, and the trade-offs between different frameworks, making it valuable for both beginners and experienced practitioners. By exploring these themes, the text serves as an informative resource for anyone looking to enhance their knowledge of machine learning and deep learning.

"WEBVTTKind: captionsLanguage: enokay cool let's start so its last night so we have finished with the with the random forests and we're moving on to different loci wrong networks a logistic regression methods are based on the optimizers like stochastic gradient descent h right yeah so the first night we are four more to go so we're almost there I know it's a lot most of people do deep learning and machine learning at the same time so it takes a lot of time but now it should be more in line with deep learning course so that's that's good so we're going to have the basic neural network today and then the this this course cover more more in detail more in-depth talks more about like things like regularization or normalization something that the deep learning curves doesn't talk that much so I think it will complement that as well very nicely you also talk more about natural language processing and also kölner data so still a lot of stuff to do in those couple more lessons but at least for those guys for the deep learning as well it take it's it's more in line yes totally I mean I think if you I think if you do this one and nine chapter nine I think it actually helps for what we covered off yesterday and there were questions I had that were covered off today so I think it's brilliant we represent yeah thanks so how to wrong we I think we know that one so the lesson eight yeah so what's in a white we we finished on lesson 7 with Rama forests and although run the forests are very good there are a couple of things that run the forest that are great with so like if there is a time series component in a data they are not that great and also if if we want to predict them something that is outside of the training data that some day that I would did not see in training basically random forests will struggle to predict what the results should be and with this situation the neural networks or the regression whether logistically a regression can help because it builds like a function and it can extrapolate and it can help predict in on the data that did not see and then we quickly move to recognizing so the example for this lesson was the list list example which is a bunch of handwritten digits from 0 to 9 as a database called amnesty it's a so you can download the data from the from the URL like this so Jeremy showed how to get the data using a little bit of a faster library or some helper functions so the data is stored in a in a pickle Python Python pickle file which then it's quite easy to to to load as a Python objects so we get a file and we unzip it because it gzip the thing well actually I think pickle already know how to handle gzip so we get that we load that data into x and y so that's our training data and also there is already validation set and a test set as well but for this example we don't really need the test examples so we kind of loaded in to underscore it suggest we not go to do a lot with that but the trend X&Y and the validation takes valid and Wi-Fi leads that's already in the pickle file so that's quite convenient right so the image so the image is quite small this is just 28 by 28 black and white images how are they are not stored as as images as such just give you a second guys sorry what so the images are the stored as kind of like flattened so we have the the image represented as a one row of of of numbers showing whether the zero is 0yz I think this image is kind of dealing around I think the zero should be black and the one should be should be white yeah I'm not quite sure why is it like this my understanding was always that zero is black like especially like in the computer vision like images like JPEG or and then one and then 256 was white yeah so the so the data we get from the tickle fight least seven hundred seven hundred eighty four numbers just thoughts as one one row and would they'll have fifty thousand of training data and don't see it here the validation but I think that was a bit less than that so then the next step before we so once we have the data before we can do deep install deep than your network we need to normalize the data so normalization takes takes that data from zero to 255 and would convert all of that so the average for each pixel value will be zero around 0 and the standard deviation would be would be 1 so that's the normal distribution normal standard standardized distribution so this is this was not required for in the forests although it's needed for the neural network so also we have to do the same for the validation set and for the validations that will use the standard deviation of the training set because this is the our network learned the weights based on the normalized training set so therefore we have to use the same standard deviation and mean to normalize the data it's not going to be like exactly zero for that reason by its close to zero for the for the mean and close to 1 for standard deviation before it's fine yeah that's right so so the next step to kind of to display that as an image or and then to work with in with this as images in our neural network we need to kind of reshape that flattened image so because we know that 28 by 28 is 784 so we know that we need 28 and 28 in the last two like in the rows by columns as an image or by all by columns by rows depends how you look at it then if you specify the minus one for the reshape it's going to is it for the reshape yeah for the numpy into shape it's going to create as many images as as it's going to find by the calculation mmm so that's quite helpful we don't need to know how many images we have although we know we have 10,000 yeah validation because we're looking at validation here so we know that we have 10,000 of those so we could specify 10,000 here but we're going to let Python decide based on our shape we need and then surely it's it's calculated 10,000 it's quite convenient and then we can we can display that image together with validation label as so this in this case is 3 so with that image now ready we could start training our network and this is just some introduction of how many were literal looks like so you have the input to your neural net one color left so from X X 1 X 2 to XP that's your input nodes input layer and at the end you have your output layer which is I don't know it shows this kind of line here and then everything in between input and output is your is your layers of your network also called hidden layers and you can I have many of those hidden layers in this example we just we're just using the one layer so the and the the network normally consists of the linear function which is your input multiplied by the white so the W 1 1 or I 1 1 J 1 IJ K so on and that's that's going to give you an activation and then we're going to use nonlinear function because the combination of the linear function and non linear function when you have like more layers it's going to give you ability to approximate like any other function so that's quite useful feature of the neural networks very powerful feature actually so to do that we're going to use at least that's what Jeremy used in this example was the pi touch so initially importing the torch and then as an N that's kind of similar standards as pandas PD and so on so usually that's how people import pointers so just a quick question for you I'm out over for anyone in general I I've missed some parts of this so is there is there a really good reason why we're using PI torch mrs. Karis yeah so at least what Jeremy says so they used the test of Rho and Clara's for the the first version of the course and then the second one I mean deep learning talking deep learning so that's what I've used and then they switched to PI torch and the reason they told was that the pythor's provides this thing I mean graphs some features that test of all does not provide so there's some things they can do a lot easier and faster using Python I know that's a high-level overview the new faculty I've heard in one library on top of Pi torch yeah yeah exactly yeah but but but why why I thought was yeah misses any of the other ones out there I know they did care us a long time when I just didn't know why yeah so so yeah that's right so they what Jeremy said that it's it was there's some things they could not do winter so Frank Harris and so they decided to switch to PI torch and they've created the library to make it a lot easier than if it was especially like for neural networks there's this one lesson that in deep learning the version two Jeremy shows how to do stuff with faster and fighters and then he shows how to do this exactly the same side like neural networking in terms of floor and caris and it takes you much more lines of code and the result you get is not as good as what they have so that was one of the reasons they decided to switch they wanted to make it better which tensorflow or not allowing them to do and not that flexible to what I understand and the Charis was all those easy easier than the plain tensorflow still not the level of easy that they wanted to achieve so I think that's why they decided to create a change to pyrogen and faster I like as a library ok that's what I heard yeah anyone's good different beyond I remember what they were talking about debugging that with it I don't it was easier I know type I thought so I have worked with Karis I remember Jeremy saying that it was easier to debug because you don't have to make a graph and once you finish you can run it something like that it's the only thing I remember him comparing both you know nothing the big as I mentioned I think the big bonus for forp I thought was always the the the dynamic graph I think I I don't really know the details but apparently tensorflow is now also including some kind of a dynamic eater in the newer versions because they were obviously missing it but in principle it's a fine craft in tensorflow and tensorflow is catching up with pi to a human standpoint and let the the latest release has that kind of dynamic graph yeah I know what that was called but yeah that's a girl it's a eager your execution yeah and also fighter just trying to catch up with tensorflow and like in production I think it's good when they when they compete in that way all right your execution is what I was looking for yeah yeah so who knows might be the next version of the course will be something different I think that's that was another kind of Jeremy's common that shouldn't really spend that much time I think we should spend the time learning library but which also should be worried that he might change in the future in a new future especially with the progress will be learning machine learning that's going on now yes so in this case after important torch fight or just torch the simple Network that German defiant is using the sequential that's that's a function that's a method from from torch and you can define your network in this case it's just just two layers the linear light layer 12 of the size 28 by 28 and 10 because we need our 10 digits from 0 to 9 10 and the next layer is the nonlinear layer in this case the log softmax and then he puts all of that in on a GPU by adding the odd CUDA yeah so the first layer is 28 by 28 because this is the number of of pixels per image the output is different classes and then the soft marks so the other they're different they're different nonlinear layers of course you can define in this case the soft marks was chosen because we want just one for each image you want just one label and soft marks is going to give us a prediction where one label is going to be the biggest prediction number in some situations you want to have multiple labels per image so like if you like this picture of your room you what you you would say there's a chair there's a table there's a as a picture there so there will be like multiple labels but in this case this is just basic one label per image and the softmax is the good nonlinear layer and good the last layer as well because in this case we only have one layer as such so that that nonlinear line is also the last layer and the softmax is is designed for that so as we have our network our data then the next step is to well it's the kind of like buy it bind that data together using the that's fast a library image classification data and in this case from a rice so we we shall which Python arrays or number 9 pi erase you want to use as our training so the x and y was the training and validation is x valid and y valid so and the image classifier from data is going to to make that data available for the torch model to use that's right that's what slideshows the next couple things we need to define before we can start training is the what sort of loss function we want to use so the loss function is going to tell the model whether we imp with the changing of the way it's whether the we improving with the model or it gets worse so in this case we're going to use the the cross entropy which is called an ally loss as well and the matrix is something we're going to print with which each epoch ran so in this case we want to see accuracy how many digits would predict the correct and also the optimization we need to choose what sort of optimization like for example stochastic gradient descent or in this case the atom optimizer was chosen okay so our so once we have that we could Ram the fit method and then we define we put as arguments we put our network with the finalist net we put our data then with my variable MD we define how many epochs who want to answer how many times we want our the network is going to see each image so if we say one the network is going to see each image just once the critic criticality of thing needs the same as loss and then optimizer our optimization method and the matrix and then once we ran that we're going to get the results so zero is the first epoch and the first number is your training loss so zero point three one and the next one is the validation loss 0.28 and the accuracy 0.91 so the model with just one layer gives us with just one epoch gives us accuracy of 91% which is like I guess it's not the best things it's quite good ready so if we want to use that the predictions we would get the well in this case we just use the the validation data set for that is not using the test set so he gets the predictions and this gives you this gives him the array of 10,000 and per 10 so now to get because so it gives like 10 predictions for each image it gives so then we have to select be the highest number output the predictions so like in this case if we look at prediction of of the of the 1 row I guess we get different numbers for different digits so I have to find out the highest number here so for that we use the arc max and then the armored arguments give us the the location of the highest prediction and based on that we can find out whether that was 0 1 2 3 or something and then we created so with our predictions and the validation because we know the predictions for validations that we can calculate as well like manually our accuracy which which is the same as the one that was created by the ear our network and then again we can plot those images I think the interesting part of that is that these numbers these numbers the maximum of these are the long a lot of probability so that if you wanted to know like how confident you are that your digit in the 7 for example look at the probability it's gonna it's going to have it's going to compute and under the hood it's going to compute the 10 probabilities for each digit and then we take the the maximum the digit that has the maximum probability is the you you yeah right because this is like minus 10 so there'll be not so good probability right yeah make sense yeah and then there's a last thing what Jeremy showed this lesson was how you could define your own logistic because that's basically one neural network with one layer is is the same as logistic regression so it just showed how you could define logistic regression yourself using an alert module class so you're going to create like subclass of that class and this gives you the same results so there's a couple of ways you can do the same thing and I didn't play without you in encode as such but this was pretty much taste that's basically the nature of the fight watch models right so you you you basically can stack them as you like you always derive from a new module and then then if you do that you you already get all the benefits like like probability stuff included and you just have to adhere to some protocol that you have to have a four word function basically defined then you can do whatever you want and it's all compatible just used existing modules anyway right so but if you want to do something really custom you can just build it like that and just plug whatever into forward custom modules that can be chained in the sequential kind of thing and then you can build up whatever kind of model you want and in that forward function you can see where they're computing the probabilities it's the it's the third line where it says X you know e to the x over 1 plus e to the X that's the probability that gives you the 10 different yeah and so just question around this forward where because I mentioned before where where is forward actually called why do you go from I touch them like faster I in on behalf of title is basically when when you do the forward round in the model this one will be executed and therefore you have to have it otherwise it's not compatible like in training you always have a forward and a backward pass right and and in the forward you basically calculate you you you multiply your weight with your training data and go towards the end of your network and once you're there you're basically back propagating to update the weights and that's that should be the forward step yeah and the backdrop is is automatically done because PI dot can automatically differentiate so that's that's done for you so you don't have to because every variable or every tensor has an option to be an autocrat and therefore if it's enabled and then PI thought will keep track of it and can do the back propagation so you only have to customize the forward if you want okay there was a question which art about if we have imbalance said which my trick other than accuracy should be you should be using there even if it's unbalanced yeah if you have select more lines than three for example and a lot more I think was asked in the forum's in I think you either do it in your prick thing that you do over sampling when he kind of pants the classes by standing less frequent by having them move or you provide say I get into the model light like that that certain level or higher no I don't think it's included in faster i if i remember yet yes are you providing wait yeah you would not change the accuracy you would not change the metric he would you'd basically generate some more either images from the class or or sample more maybe from that class yeah so i don't understand where you you know the question about why why does accuracy come into play there it's not about the metrics right it's it's more about this is purely for visual inspection it doesn't affect model training or performance at all only reporting numbers it's not interacting with the performance just showing the model performance that's why you're free to change it to whatever it's just why is to choose one with informative the most yeah so basically that's and from like a quick lecture recap and and where the talked a little bit about those topics but so is there anything else that you guys found interesting or or confusing or yeah I do have a question next because I was just able to squeeze in this thing before this started and one of the things he mentioned was this timing he initialization where you divide the torch random number dimensions by dimension zero can I'm gonna look it up but can someone from a high-level view of what that's about very briefly mention that he started typing that in the browser that he stopped quickly I didn't look into that so yourself I can't explain it either and I know that it's the better initialization but I I forgot why I mean that was we could look at that quickly if anyone remembers or know how to spell that what did he say it's h ra s sk e i ming each e initialization installation of deployer okay so here is a PDF from that maybe not the PDF rolls okay well sure if that's about insulation linearization okay is something here who doesn't talk a lot here about this this is the right side I guess is there a list of topics it's normalization like I would say just now a lot of stuff here I'll tell you what I'll look it up and before next Sunday I'll I'll put a short document and what I've figured out or whatever understood okay great thank you yeah so we're decided couple of us doing couple things at the same time so that's kind of for me the stumbling blocks kind of the issue there it's a lot of material to cover especially he talks a lot of stuff and declaring the image recognition natural language processing collaborative systems like everything and so that's that's a bit of a challenge there is there anything that you've learned or find interesting in this lecture I think it's just that if anyone missed it at the start of the conversation this is a really good help to the deep learning course so you know to this and next next week's a little help at the Autry yeah and it should complement this very well you guys doing anything interesting for homework or any any side projects initially we're talking here about some cargo competitions we've entered like taxi competition so now we talk in image recognition so there's a lot of cargo competitions on on the image classification or our segmentation or this is quite a lot of image because doing anything follow the two courses and I keep seeing all these new capitation to pop up and think oh this might be nice to try but then I just don't have time cuz I'm just trying to follow the class right now yes I mean they only run for like two or three months right and if you jump in and and really need to learn from from ratchets it's almost too late at least for the live ones I kind of skip that for now and just look for interesting old competitions and see if the data that I like asking about the about the standard deviation formula that Germany used and so I wrote up a little a little thing on it and I posted it on the forum for this through I think for most of the seven and for most of you it's probably too much you know because you probably already know it but I think some people really wanted to understand how that formula came about so I just there's a link on the slack on our slack channel would you like to keep like 5 milli presentation of it ok um sure I mean let me let me get the slack channel link and then I'll go to it share my screen yeah thank you okay I'm going to I think I'm going to share my screen okay so I it says I'm sharing my screen but oh um someone has to give up sharing their screen before I can share mine yeah okay let me let me go stop sure okay okay okay are people seeing mist yep okay so I just so there's just there's just a little it's kind of like turning a little formula that Jeremy uses to get a faster version than just the if you computed the standard deviation from the first formula that I showed there then you're going to have to compute its it's not vectorized it's kind of you have to do a looping and get through it so it's a little slower so there's a well-known formula that I'll just skip to the bottom there's a well-known for know that people use for this we're into this formula here where the standard deviation is is the square root of the mean of the squares minus the square of the mean and that's the formula that I just went through the derivation of it and then I showed how well Jeremy implemented code you know cuz T I need to find a function over here and this is basically the implementation of this formula so I just I mean most people are going to want to go through all this but I just thought I would sort of go through and derive it for people and it can hear it if you're not familiar with these angle brackets I explain what they mean they're basically when you put angle brackets around a vector that's it's the it's the expectation called the expectation value and if all the weights are uniform then it's the same as the mean in other words if the distribution of excess is uniform then then the expectation value is the same as the mean so I just go through that notation because Jeremy used that and and without explaining it and so I thought I'd just go through it and explain it's what people can follow it so if interested you can delve into that otherwise it's basically just this formula down the bottom here that'd be like Jeremy do this for staff deviation nice and thank you okay linguish my screen since we're talking about what people are doing like ProjectWise is also interesting like and find the new stuff in each day I guess there's a forest classification I think it was mentioning in the forum somewhere as well like a simple classification thing that I might try to compare the random forests to the tabular one so if anyone else is interested in addition to the taxi one which I still want to write up but I I will afterwards probably have a look at this forest classification from USGS I think the data and that looks kind of needn't and small and the for the for the owning I mean I don't know about you guys but doing the deep learning one as well and I find that I hop around too much it's just so much stuff I want to do and it's just too hard to focus and at mean time to doing that because I thought ok this stuff is like super cool and but there's so much involved there if you don't really stick to the example to the point and it kind of blew up in my face these examples in class I mean it's frustrating not for me to not be able to go out and do casual things and challenges and so on but I think that by concentrating on learning these tools now I feel like if I had to do a cattle competition I have some I think it's gonna be fine I think eventually I'll when when I'm done with these classes all or maybe yeah I'm not doing any cable competition at the moment either just looking for data that kind of might be similar to a to a problem tag of the class and I trusted with it because yeah because it's more exciting than just doing the exact same thing and follow the new progress yeah once once we finish all those courses and of the air that'd be more time unless we a start another very first with learning this is already talking on a on the slack about this so whoever's interested in reinforcement learning Sammy's looking to start study group sometime next year somebody mentioned this class on Udacity which is a free course in white arch can I get a kind of a hands up who's like evaluating Arabs tensorflow and Pilate at the same time are you kind of focusing more on my couch now because for me I don't know I'm a bit drawn I mean all the all the extra drops and stuff they they they are more for really and I thought seems to be totally senior are you kind of just learning the concepts and don't care like which which which framework you learn or is it that you already know the flow this is just me I took the Andrew ings specialization where where he he uses tensor flow and Karis like yeah we don't really don't get much unless you do a lot of extra work it just goes by so fast understanding of just a basic working knowledge of it and so for now you know I'm learning by torture and then eventually I'll try to learn all three and you know find out which one is better for which projects and so on ok yeah I'm only pythor's from now I didn't really spend any time with like learning intensive oh yeah that's what I thought recently right I mean ideally I would wanna know both but seems like like facile gets it to it to a result really quick and it's probably worthwhile to really take into Python trust okay familiar with that kind of thing right instead of Paris is very similar you're you basically build the network layer by layer I think high torch you you have it all it's all under the hood you don't have to actually pay attentions and layer that's just done for you if you want you can look into it but in and carats you actually build the network layer by layer you you build you know did some some smoke over to the - yeah and it's it's nice because you sort of have your hands on it more better understanding of a better working knowledge of what like you can get the same thing by just building under the hood and pie tours just looking at the definition of the model you know like you can do dir when you build a model like learn or whatever and you dir learn and then it it shows you all the layers on unique yes I looked at Karis about a year ago and you know getting as far as creating models and all is really easy I just couldn't find enough resources to help me figure out how to get past you know things like image augmentation and when you trying to transfer learning and so on they just not but at the time that I looked at it which is yeah I think over a year ago I I was struggling to get beyond the simple stuff and and that's when I can I'm abandoned my probably go back to that I know what's what what's been your experience they're kind of similar for me to be honest I did some like follow follow some it for me it's always like okay if you just follow some tutorials or whatever it's all make sense and it's all laid out it no one kind of works but that's why also want to apply my own datasets or whatever because that's when it normally falls apart like for me right if you just follow notebooks in an existing one then it's all nice and dandy but then when you bring your own and it's suddenly a bit different or whatever and some be yeah this knowledge doesn't really help you that much do you really need to understand the stuff progress I don't know but ya know yeah but I also agree on the Christians point about ten so for being using in production a lot I guess if you if you apply for like jobs people would you use tensorflow more so it's it's a kind of a good question about whether you actually need to learn pencil flour or I'll be good with only Python the question I thought about actually trying to like when I have a neat small project really wrapped up just we implemented so on the topic of tensorflow is is any resources anyone can has tried that they can recommend for for that I mean you want to just jump right in I mean they have this Google has these classes and using tensorflow on on Google cloud platform there's like a specialization in that yeah and it's it's pretty much total immersion you are and I think that it's it they just put these classes out and they really haven't worked down I think they'll be good when they I mean if you want to learn the Charis kind of thing and I guess the book of what's-his-name the French guy that's supposed to be really good yeah I was watching that book on Amazon and all decide to drop down to $15 and I should have bought it but I didn't and then it went back up to 29 I read it it's nice but it's about chaos so I was going to ask if you have used tensorflow and chaos is there any reason to use tensorflow I mean we can ask you do more or less everything and tension flow is really like low level no you have to do many things just to build a neural network instead in chaos you can just write a couple of lines and you have it and if you want stimuli like like like faster I read but I guess I understand in faster you get the kind of benefit of having Jeremy's like opinionated selection of stuff so you get like not not five choices but the best choice by his definition for NLP right so you get a nicer cleaner interface but also probably like state-of-the-art algorithms which I didn't try to do anything fancy but once with Karis I was building a faster CNN or something so I was doing something else and I used the functional API which allows you to build a bit whatever you want is not just like a stack at the network show you I think that with Karis you can do more or less everything I never have seen the need to use tensor flow you know what I mean yeah actually I went to a couple of meetups here and there's a guy who's really like experienced in stuff and he was I don't know building his own networks before tens of everything and he's doing everything in carrots now as well like apparently it's likes to move enough for really professions as well I was just wondering like it since you seem to have done quite a bit with carrots and sense of law and what's your thinking on faster I pie touch at the moment like it apart from being allegedly more flexible like that do you like any of them better do you feel like one has a big advantage of the other for the stuff that you do or in my case I haven't just by thoughts yet I mean I have followed the course only the machine learning I try to start with the deep learning like you guys but I goes into it yeah so I want to learn BIOS and to try the things and potatoes but so far I have used strongly I'm happy with you but I really pull you because if for example this thing with debugging in at some point Jeremy mentioned that is easier to understand in fighters because it tells you better while it's failing instead being callous you get some cryptic thing that sometimes it's really hard to you know find out where the network broke thing would suit the tensorflow they got the stencil board right yeah but I think there's something coming up for the tech support is like a visual representation I don't say it now there's flow well tensor well fingers board yeah that's right I think pythons it's tender board X or something yeah basically gives you like a like a graphs and things you can just change some settings and see how that would improve or effects your model I didn't use that but that seems to be quite a nice tool and indeed I did also saw and I think I've posted that on there for gonna slack that someone created something similar for for fight verge yeah I put it in the chat I also did okay so just now it's tense about X yeah but I never used it I'm not sure if it's just hooks or bindings to actually use tender board with it looks like it like visually it looks like to be tender board yeah that could be interesting yeah so basically it gives you a nice yeah nice board so you can change things and see quickly visually how that yeah that seems like a nice tool interesting anyone use that yet no some I'm trying I'm trying some some keggle competitions like there's this quick-draw competition which got a lot of I wanna see if I can find it quickly so there's a the data set is huge so that's 50 million of drawings like that so quick-draw is like a Google site when you can you have like 20 seconds to draw something they tell you to draw and then they take that image and put it in the database and then they made those 50 millions of those drawings available and you have to predict what they are based on the training data and the issue here is that there's a lot of those images so it's quite difficult to kind of compute all that and the other issue with that is that some labels are about to be incorrect so because for example the application which is like quick-draw so when the application tells you to draw a wheel I can I can draw a and I can leave that and then I believe that application will take that square and assign the label of wheel to that and then and then that's a noise in the in the hall yep so the time was up you didn't guess it but to my understanding that that square would now go into database as wheel and then you get in your training data you get like about guys estimate about 10% of incorrect labels so so what the guys trying to do they trying to make a batch size as large as possible so then those noisy images they kind of like average valid so then you could kind of in reduce the impact of that noise so that's what meaning like meaning that if you then if you had a large enough back to me said wheel then hopefully that'd be a whole lot of other wheels to tell you that oh yeah yeah that that's what the kind of at least some discussion on on the conical forums that's what people trying to do to overcome that issue so they trying to use as large just possible batch size like 600 if no more they trying to use all the all the 50 million images which takes forever to train so it's kind of I guess kind of still not the real-life example but still kind of some sort of kind of like a challenge where you have to deal with that big amount of data so either you can take a subsample of that start with even train you with just one percent of the data gives you already like 90% accuracy but like to get more you have to add thing you have to really train like with all of them and with large batch size and then they the the data they give you it's not image actually it's a it's a vectors so when when people draw those things it stores that was as a vectors like from A to B and then again again so they give you vectors but some guys made some some kernels available that takes those vectors and convert that into image so actually you actually your input to your network could be an image again but some people try even like the LS TM networks for that and try to see the patterns in in the in those lines in The Strokes so there are a couple of different methods people try it nothing like doing competition you can see all that stuff going on and you can learn also what from the forums what people what people do then in this competition if in the test data it's possible that you have also something wrongly labeled or or maybe those are check it because you cannot guess know what it is if it if you never saw it before and it's wrong you cannot guess it are you guys yeah on the test oh good question because on the test set with the more we only have the images without labels so because this question Scott from thinking that maybe it will be good to do something to prefilter the damage is know so for example you take one class and you do I don't know how wide you can do some sort of clustering and you remove you know the farthest examples from there you know from the mass of you know the ones that are closer between them or something like that I don't know yeah that's a good point so if there was a like a way to do it they'll be great so to get rid of those wrong because of our Legos but for me I'm so like all those images like on this page B's every single one looks so different I mean yeah that's true it's like I'm so surprised that Mac neural networks can make sense out of it it's like unbelievable but in fact they can but it's crazy all right so we are one o'clock I guess there will be for this so it's around the random forest implement is thief and I'm still catching up on some of the videos is the first day I random forest implementation vastly different from the scikit-learn one and is that what what he's he's talking about and how it's different and what the advantages are he's using the psychic learn run the forest the only the only faster library uses is for like to prepare the data for for the psychic learn run the forest and stuff like around the D Hall the regression model he's just scikit-learn yeah you mentioned in one of the lectures that basically the cyclotron one is just much more optimized so even though he's coding it up now in in in in the course and you get some pretty decent performance even especially if you introduce the certain language stuff then you can get close to as a psychic learn but it's it's nothing different or whatever he's just showing that not magic basically walking program with yourself but in the end psychic learn you trust yourself oh okay excellent thank you any more questions before we are finished for the day yeah did he talk about gradient boosting and lecture eight I don't I don't know okay it's coming up either I think I kind of ran through that the coming lessons like briefly and I don't think there's anything about boosting yeah so so in let's he chose run of forests I believe do too it's easier to interpret may be easier to to to teach I guess easier to compute and it still gives you quite decent protection but he in it in one of the lecture he mentioned that this is the kind of random forest together with other the tree based methods like extra boost or the light GBM these are the methods to kind of to learn for this purpose unlike if you look at a girl now everything is like almost like Agassi boost or the light GBM no one's easier and the forests fearful that for those problems okay excellent so next week we have Lesson nine and that's going to be I forgot what I was going to be but it's going to be something interesting again I guess cool so SGD has a gradient descent applied to the same maybe no no that's this one I'm sorry that's this one yeah I'll find out it's always interesting from German sir sure it's nice that it's again if you want to do any mini presentation five minutes to talk about something just let us know and we'll do that okay okay thanks Michael thank you very much thank you youokay cool let's start so its last night so we have finished with the with the random forests and we're moving on to different loci wrong networks a logistic regression methods are based on the optimizers like stochastic gradient descent h right yeah so the first night we are four more to go so we're almost there I know it's a lot most of people do deep learning and machine learning at the same time so it takes a lot of time but now it should be more in line with deep learning course so that's that's good so we're going to have the basic neural network today and then the this this course cover more more in detail more in-depth talks more about like things like regularization or normalization something that the deep learning curves doesn't talk that much so I think it will complement that as well very nicely you also talk more about natural language processing and also kölner data so still a lot of stuff to do in those couple more lessons but at least for those guys for the deep learning as well it take it's it's more in line yes totally I mean I think if you I think if you do this one and nine chapter nine I think it actually helps for what we covered off yesterday and there were questions I had that were covered off today so I think it's brilliant we represent yeah thanks so how to wrong we I think we know that one so the lesson eight yeah so what's in a white we we finished on lesson 7 with Rama forests and although run the forests are very good there are a couple of things that run the forest that are great with so like if there is a time series component in a data they are not that great and also if if we want to predict them something that is outside of the training data that some day that I would did not see in training basically random forests will struggle to predict what the results should be and with this situation the neural networks or the regression whether logistically a regression can help because it builds like a function and it can extrapolate and it can help predict in on the data that did not see and then we quickly move to recognizing so the example for this lesson was the list list example which is a bunch of handwritten digits from 0 to 9 as a database called amnesty it's a so you can download the data from the from the URL like this so Jeremy showed how to get the data using a little bit of a faster library or some helper functions so the data is stored in a in a pickle Python Python pickle file which then it's quite easy to to to load as a Python objects so we get a file and we unzip it because it gzip the thing well actually I think pickle already know how to handle gzip so we get that we load that data into x and y so that's our training data and also there is already validation set and a test set as well but for this example we don't really need the test examples so we kind of loaded in to underscore it suggest we not go to do a lot with that but the trend X&Y and the validation takes valid and Wi-Fi leads that's already in the pickle file so that's quite convenient right so the image so the image is quite small this is just 28 by 28 black and white images how are they are not stored as as images as such just give you a second guys sorry what so the images are the stored as kind of like flattened so we have the the image represented as a one row of of of numbers showing whether the zero is 0yz I think this image is kind of dealing around I think the zero should be black and the one should be should be white yeah I'm not quite sure why is it like this my understanding was always that zero is black like especially like in the computer vision like images like JPEG or and then one and then 256 was white yeah so the so the data we get from the tickle fight least seven hundred seven hundred eighty four numbers just thoughts as one one row and would they'll have fifty thousand of training data and don't see it here the validation but I think that was a bit less than that so then the next step before we so once we have the data before we can do deep install deep than your network we need to normalize the data so normalization takes takes that data from zero to 255 and would convert all of that so the average for each pixel value will be zero around 0 and the standard deviation would be would be 1 so that's the normal distribution normal standard standardized distribution so this is this was not required for in the forests although it's needed for the neural network so also we have to do the same for the validation set and for the validations that will use the standard deviation of the training set because this is the our network learned the weights based on the normalized training set so therefore we have to use the same standard deviation and mean to normalize the data it's not going to be like exactly zero for that reason by its close to zero for the for the mean and close to 1 for standard deviation before it's fine yeah that's right so so the next step to kind of to display that as an image or and then to work with in with this as images in our neural network we need to kind of reshape that flattened image so because we know that 28 by 28 is 784 so we know that we need 28 and 28 in the last two like in the rows by columns as an image or by all by columns by rows depends how you look at it then if you specify the minus one for the reshape it's going to is it for the reshape yeah for the numpy into shape it's going to create as many images as as it's going to find by the calculation mmm so that's quite helpful we don't need to know how many images we have although we know we have 10,000 yeah validation because we're looking at validation here so we know that we have 10,000 of those so we could specify 10,000 here but we're going to let Python decide based on our shape we need and then surely it's it's calculated 10,000 it's quite convenient and then we can we can display that image together with validation label as so this in this case is 3 so with that image now ready we could start training our network and this is just some introduction of how many were literal looks like so you have the input to your neural net one color left so from X X 1 X 2 to XP that's your input nodes input layer and at the end you have your output layer which is I don't know it shows this kind of line here and then everything in between input and output is your is your layers of your network also called hidden layers and you can I have many of those hidden layers in this example we just we're just using the one layer so the and the the network normally consists of the linear function which is your input multiplied by the white so the W 1 1 or I 1 1 J 1 IJ K so on and that's that's going to give you an activation and then we're going to use nonlinear function because the combination of the linear function and non linear function when you have like more layers it's going to give you ability to approximate like any other function so that's quite useful feature of the neural networks very powerful feature actually so to do that we're going to use at least that's what Jeremy used in this example was the pi touch so initially importing the torch and then as an N that's kind of similar standards as pandas PD and so on so usually that's how people import pointers so just a quick question for you I'm out over for anyone in general I I've missed some parts of this so is there is there a really good reason why we're using PI torch mrs. Karis yeah so at least what Jeremy says so they used the test of Rho and Clara's for the the first version of the course and then the second one I mean deep learning talking deep learning so that's what I've used and then they switched to PI torch and the reason they told was that the pythor's provides this thing I mean graphs some features that test of all does not provide so there's some things they can do a lot easier and faster using Python I know that's a high-level overview the new faculty I've heard in one library on top of Pi torch yeah yeah exactly yeah but but but why why I thought was yeah misses any of the other ones out there I know they did care us a long time when I just didn't know why yeah so so yeah that's right so they what Jeremy said that it's it was there's some things they could not do winter so Frank Harris and so they decided to switch to PI torch and they've created the library to make it a lot easier than if it was especially like for neural networks there's this one lesson that in deep learning the version two Jeremy shows how to do stuff with faster and fighters and then he shows how to do this exactly the same side like neural networking in terms of floor and caris and it takes you much more lines of code and the result you get is not as good as what they have so that was one of the reasons they decided to switch they wanted to make it better which tensorflow or not allowing them to do and not that flexible to what I understand and the Charis was all those easy easier than the plain tensorflow still not the level of easy that they wanted to achieve so I think that's why they decided to create a change to pyrogen and faster I like as a library ok that's what I heard yeah anyone's good different beyond I remember what they were talking about debugging that with it I don't it was easier I know type I thought so I have worked with Karis I remember Jeremy saying that it was easier to debug because you don't have to make a graph and once you finish you can run it something like that it's the only thing I remember him comparing both you know nothing the big as I mentioned I think the big bonus for forp I thought was always the the the dynamic graph I think I I don't really know the details but apparently tensorflow is now also including some kind of a dynamic eater in the newer versions because they were obviously missing it but in principle it's a fine craft in tensorflow and tensorflow is catching up with pi to a human standpoint and let the the latest release has that kind of dynamic graph yeah I know what that was called but yeah that's a girl it's a eager your execution yeah and also fighter just trying to catch up with tensorflow and like in production I think it's good when they when they compete in that way all right your execution is what I was looking for yeah yeah so who knows might be the next version of the course will be something different I think that's that was another kind of Jeremy's common that shouldn't really spend that much time I think we should spend the time learning library but which also should be worried that he might change in the future in a new future especially with the progress will be learning machine learning that's going on now yes so in this case after important torch fight or just torch the simple Network that German defiant is using the sequential that's that's a function that's a method from from torch and you can define your network in this case it's just just two layers the linear light layer 12 of the size 28 by 28 and 10 because we need our 10 digits from 0 to 9 10 and the next layer is the nonlinear layer in this case the log softmax and then he puts all of that in on a GPU by adding the odd CUDA yeah so the first layer is 28 by 28 because this is the number of of pixels per image the output is different classes and then the soft marks so the other they're different they're different nonlinear layers of course you can define in this case the soft marks was chosen because we want just one for each image you want just one label and soft marks is going to give us a prediction where one label is going to be the biggest prediction number in some situations you want to have multiple labels per image so like if you like this picture of your room you what you you would say there's a chair there's a table there's a as a picture there so there will be like multiple labels but in this case this is just basic one label per image and the softmax is the good nonlinear layer and good the last layer as well because in this case we only have one layer as such so that that nonlinear line is also the last layer and the softmax is is designed for that so as we have our network our data then the next step is to well it's the kind of like buy it bind that data together using the that's fast a library image classification data and in this case from a rice so we we shall which Python arrays or number 9 pi erase you want to use as our training so the x and y was the training and validation is x valid and y valid so and the image classifier from data is going to to make that data available for the torch model to use that's right that's what slideshows the next couple things we need to define before we can start training is the what sort of loss function we want to use so the loss function is going to tell the model whether we imp with the changing of the way it's whether the we improving with the model or it gets worse so in this case we're going to use the the cross entropy which is called an ally loss as well and the matrix is something we're going to print with which each epoch ran so in this case we want to see accuracy how many digits would predict the correct and also the optimization we need to choose what sort of optimization like for example stochastic gradient descent or in this case the atom optimizer was chosen okay so our so once we have that we could Ram the fit method and then we define we put as arguments we put our network with the finalist net we put our data then with my variable MD we define how many epochs who want to answer how many times we want our the network is going to see each image so if we say one the network is going to see each image just once the critic criticality of thing needs the same as loss and then optimizer our optimization method and the matrix and then once we ran that we're going to get the results so zero is the first epoch and the first number is your training loss so zero point three one and the next one is the validation loss 0.28 and the accuracy 0.91 so the model with just one layer gives us with just one epoch gives us accuracy of 91% which is like I guess it's not the best things it's quite good ready so if we want to use that the predictions we would get the well in this case we just use the the validation data set for that is not using the test set so he gets the predictions and this gives you this gives him the array of 10,000 and per 10 so now to get because so it gives like 10 predictions for each image it gives so then we have to select be the highest number output the predictions so like in this case if we look at prediction of of the of the 1 row I guess we get different numbers for different digits so I have to find out the highest number here so for that we use the arc max and then the armored arguments give us the the location of the highest prediction and based on that we can find out whether that was 0 1 2 3 or something and then we created so with our predictions and the validation because we know the predictions for validations that we can calculate as well like manually our accuracy which which is the same as the one that was created by the ear our network and then again we can plot those images I think the interesting part of that is that these numbers these numbers the maximum of these are the long a lot of probability so that if you wanted to know like how confident you are that your digit in the 7 for example look at the probability it's gonna it's going to have it's going to compute and under the hood it's going to compute the 10 probabilities for each digit and then we take the the maximum the digit that has the maximum probability is the you you yeah right because this is like minus 10 so there'll be not so good probability right yeah make sense yeah and then there's a last thing what Jeremy showed this lesson was how you could define your own logistic because that's basically one neural network with one layer is is the same as logistic regression so it just showed how you could define logistic regression yourself using an alert module class so you're going to create like subclass of that class and this gives you the same results so there's a couple of ways you can do the same thing and I didn't play without you in encode as such but this was pretty much taste that's basically the nature of the fight watch models right so you you you basically can stack them as you like you always derive from a new module and then then if you do that you you already get all the benefits like like probability stuff included and you just have to adhere to some protocol that you have to have a four word function basically defined then you can do whatever you want and it's all compatible just used existing modules anyway right so but if you want to do something really custom you can just build it like that and just plug whatever into forward custom modules that can be chained in the sequential kind of thing and then you can build up whatever kind of model you want and in that forward function you can see where they're computing the probabilities it's the it's the third line where it says X you know e to the x over 1 plus e to the X that's the probability that gives you the 10 different yeah and so just question around this forward where because I mentioned before where where is forward actually called why do you go from I touch them like faster I in on behalf of title is basically when when you do the forward round in the model this one will be executed and therefore you have to have it otherwise it's not compatible like in training you always have a forward and a backward pass right and and in the forward you basically calculate you you you multiply your weight with your training data and go towards the end of your network and once you're there you're basically back propagating to update the weights and that's that should be the forward step yeah and the backdrop is is automatically done because PI dot can automatically differentiate so that's that's done for you so you don't have to because every variable or every tensor has an option to be an autocrat and therefore if it's enabled and then PI thought will keep track of it and can do the back propagation so you only have to customize the forward if you want okay there was a question which art about if we have imbalance said which my trick other than accuracy should be you should be using there even if it's unbalanced yeah if you have select more lines than three for example and a lot more I think was asked in the forum's in I think you either do it in your prick thing that you do over sampling when he kind of pants the classes by standing less frequent by having them move or you provide say I get into the model light like that that certain level or higher no I don't think it's included in faster i if i remember yet yes are you providing wait yeah you would not change the accuracy you would not change the metric he would you'd basically generate some more either images from the class or or sample more maybe from that class yeah so i don't understand where you you know the question about why why does accuracy come into play there it's not about the metrics right it's it's more about this is purely for visual inspection it doesn't affect model training or performance at all only reporting numbers it's not interacting with the performance just showing the model performance that's why you're free to change it to whatever it's just why is to choose one with informative the most yeah so basically that's and from like a quick lecture recap and and where the talked a little bit about those topics but so is there anything else that you guys found interesting or or confusing or yeah I do have a question next because I was just able to squeeze in this thing before this started and one of the things he mentioned was this timing he initialization where you divide the torch random number dimensions by dimension zero can I'm gonna look it up but can someone from a high-level view of what that's about very briefly mention that he started typing that in the browser that he stopped quickly I didn't look into that so yourself I can't explain it either and I know that it's the better initialization but I I forgot why I mean that was we could look at that quickly if anyone remembers or know how to spell that what did he say it's h ra s sk e i ming each e initialization installation of deployer okay so here is a PDF from that maybe not the PDF rolls okay well sure if that's about insulation linearization okay is something here who doesn't talk a lot here about this this is the right side I guess is there a list of topics it's normalization like I would say just now a lot of stuff here I'll tell you what I'll look it up and before next Sunday I'll I'll put a short document and what I've figured out or whatever understood okay great thank you yeah so we're decided couple of us doing couple things at the same time so that's kind of for me the stumbling blocks kind of the issue there it's a lot of material to cover especially he talks a lot of stuff and declaring the image recognition natural language processing collaborative systems like everything and so that's that's a bit of a challenge there is there anything that you've learned or find interesting in this lecture I think it's just that if anyone missed it at the start of the conversation this is a really good help to the deep learning course so you know to this and next next week's a little help at the Autry yeah and it should complement this very well you guys doing anything interesting for homework or any any side projects initially we're talking here about some cargo competitions we've entered like taxi competition so now we talk in image recognition so there's a lot of cargo competitions on on the image classification or our segmentation or this is quite a lot of image because doing anything follow the two courses and I keep seeing all these new capitation to pop up and think oh this might be nice to try but then I just don't have time cuz I'm just trying to follow the class right now yes I mean they only run for like two or three months right and if you jump in and and really need to learn from from ratchets it's almost too late at least for the live ones I kind of skip that for now and just look for interesting old competitions and see if the data that I like asking about the about the standard deviation formula that Germany used and so I wrote up a little a little thing on it and I posted it on the forum for this through I think for most of the seven and for most of you it's probably too much you know because you probably already know it but I think some people really wanted to understand how that formula came about so I just there's a link on the slack on our slack channel would you like to keep like 5 milli presentation of it ok um sure I mean let me let me get the slack channel link and then I'll go to it share my screen yeah thank you okay I'm going to I think I'm going to share my screen okay so I it says I'm sharing my screen but oh um someone has to give up sharing their screen before I can share mine yeah okay let me let me go stop sure okay okay okay are people seeing mist yep okay so I just so there's just there's just a little it's kind of like turning a little formula that Jeremy uses to get a faster version than just the if you computed the standard deviation from the first formula that I showed there then you're going to have to compute its it's not vectorized it's kind of you have to do a looping and get through it so it's a little slower so there's a well-known formula that I'll just skip to the bottom there's a well-known for know that people use for this we're into this formula here where the standard deviation is is the square root of the mean of the squares minus the square of the mean and that's the formula that I just went through the derivation of it and then I showed how well Jeremy implemented code you know cuz T I need to find a function over here and this is basically the implementation of this formula so I just I mean most people are going to want to go through all this but I just thought I would sort of go through and derive it for people and it can hear it if you're not familiar with these angle brackets I explain what they mean they're basically when you put angle brackets around a vector that's it's the it's the expectation called the expectation value and if all the weights are uniform then it's the same as the mean in other words if the distribution of excess is uniform then then the expectation value is the same as the mean so I just go through that notation because Jeremy used that and and without explaining it and so I thought I'd just go through it and explain it's what people can follow it so if interested you can delve into that otherwise it's basically just this formula down the bottom here that'd be like Jeremy do this for staff deviation nice and thank you okay linguish my screen since we're talking about what people are doing like ProjectWise is also interesting like and find the new stuff in each day I guess there's a forest classification I think it was mentioning in the forum somewhere as well like a simple classification thing that I might try to compare the random forests to the tabular one so if anyone else is interested in addition to the taxi one which I still want to write up but I I will afterwards probably have a look at this forest classification from USGS I think the data and that looks kind of needn't and small and the for the for the owning I mean I don't know about you guys but doing the deep learning one as well and I find that I hop around too much it's just so much stuff I want to do and it's just too hard to focus and at mean time to doing that because I thought ok this stuff is like super cool and but there's so much involved there if you don't really stick to the example to the point and it kind of blew up in my face these examples in class I mean it's frustrating not for me to not be able to go out and do casual things and challenges and so on but I think that by concentrating on learning these tools now I feel like if I had to do a cattle competition I have some I think it's gonna be fine I think eventually I'll when when I'm done with these classes all or maybe yeah I'm not doing any cable competition at the moment either just looking for data that kind of might be similar to a to a problem tag of the class and I trusted with it because yeah because it's more exciting than just doing the exact same thing and follow the new progress yeah once once we finish all those courses and of the air that'd be more time unless we a start another very first with learning this is already talking on a on the slack about this so whoever's interested in reinforcement learning Sammy's looking to start study group sometime next year somebody mentioned this class on Udacity which is a free course in white arch can I get a kind of a hands up who's like evaluating Arabs tensorflow and Pilate at the same time are you kind of focusing more on my couch now because for me I don't know I'm a bit drawn I mean all the all the extra drops and stuff they they they are more for really and I thought seems to be totally senior are you kind of just learning the concepts and don't care like which which which framework you learn or is it that you already know the flow this is just me I took the Andrew ings specialization where where he he uses tensor flow and Karis like yeah we don't really don't get much unless you do a lot of extra work it just goes by so fast understanding of just a basic working knowledge of it and so for now you know I'm learning by torture and then eventually I'll try to learn all three and you know find out which one is better for which projects and so on ok yeah I'm only pythor's from now I didn't really spend any time with like learning intensive oh yeah that's what I thought recently right I mean ideally I would wanna know both but seems like like facile gets it to it to a result really quick and it's probably worthwhile to really take into Python trust okay familiar with that kind of thing right instead of Paris is very similar you're you basically build the network layer by layer I think high torch you you have it all it's all under the hood you don't have to actually pay attentions and layer that's just done for you if you want you can look into it but in and carats you actually build the network layer by layer you you build you know did some some smoke over to the - yeah and it's it's nice because you sort of have your hands on it more better understanding of a better working knowledge of what like you can get the same thing by just building under the hood and pie tours just looking at the definition of the model you know like you can do dir when you build a model like learn or whatever and you dir learn and then it it shows you all the layers on unique yes I looked at Karis about a year ago and you know getting as far as creating models and all is really easy I just couldn't find enough resources to help me figure out how to get past you know things like image augmentation and when you trying to transfer learning and so on they just not but at the time that I looked at it which is yeah I think over a year ago I I was struggling to get beyond the simple stuff and and that's when I can I'm abandoned my probably go back to that I know what's what what's been your experience they're kind of similar for me to be honest I did some like follow follow some it for me it's always like okay if you just follow some tutorials or whatever it's all make sense and it's all laid out it no one kind of works but that's why also want to apply my own datasets or whatever because that's when it normally falls apart like for me right if you just follow notebooks in an existing one then it's all nice and dandy but then when you bring your own and it's suddenly a bit different or whatever and some be yeah this knowledge doesn't really help you that much do you really need to understand the stuff progress I don't know but ya know yeah but I also agree on the Christians point about ten so for being using in production a lot I guess if you if you apply for like jobs people would you use tensorflow more so it's it's a kind of a good question about whether you actually need to learn pencil flour or I'll be good with only Python the question I thought about actually trying to like when I have a neat small project really wrapped up just we implemented so on the topic of tensorflow is is any resources anyone can has tried that they can recommend for for that I mean you want to just jump right in I mean they have this Google has these classes and using tensorflow on on Google cloud platform there's like a specialization in that yeah and it's it's pretty much total immersion you are and I think that it's it they just put these classes out and they really haven't worked down I think they'll be good when they I mean if you want to learn the Charis kind of thing and I guess the book of what's-his-name the French guy that's supposed to be really good yeah I was watching that book on Amazon and all decide to drop down to $15 and I should have bought it but I didn't and then it went back up to 29 I read it it's nice but it's about chaos so I was going to ask if you have used tensorflow and chaos is there any reason to use tensorflow I mean we can ask you do more or less everything and tension flow is really like low level no you have to do many things just to build a neural network instead in chaos you can just write a couple of lines and you have it and if you want stimuli like like like faster I read but I guess I understand in faster you get the kind of benefit of having Jeremy's like opinionated selection of stuff so you get like not not five choices but the best choice by his definition for NLP right so you get a nicer cleaner interface but also probably like state-of-the-art algorithms which I didn't try to do anything fancy but once with Karis I was building a faster CNN or something so I was doing something else and I used the functional API which allows you to build a bit whatever you want is not just like a stack at the network show you I think that with Karis you can do more or less everything I never have seen the need to use tensor flow you know what I mean yeah actually I went to a couple of meetups here and there's a guy who's really like experienced in stuff and he was I don't know building his own networks before tens of everything and he's doing everything in carrots now as well like apparently it's likes to move enough for really professions as well I was just wondering like it since you seem to have done quite a bit with carrots and sense of law and what's your thinking on faster I pie touch at the moment like it apart from being allegedly more flexible like that do you like any of them better do you feel like one has a big advantage of the other for the stuff that you do or in my case I haven't just by thoughts yet I mean I have followed the course only the machine learning I try to start with the deep learning like you guys but I goes into it yeah so I want to learn BIOS and to try the things and potatoes but so far I have used strongly I'm happy with you but I really pull you because if for example this thing with debugging in at some point Jeremy mentioned that is easier to understand in fighters because it tells you better while it's failing instead being callous you get some cryptic thing that sometimes it's really hard to you know find out where the network broke thing would suit the tensorflow they got the stencil board right yeah but I think there's something coming up for the tech support is like a visual representation I don't say it now there's flow well tensor well fingers board yeah that's right I think pythons it's tender board X or something yeah basically gives you like a like a graphs and things you can just change some settings and see how that would improve or effects your model I didn't use that but that seems to be quite a nice tool and indeed I did also saw and I think I've posted that on there for gonna slack that someone created something similar for for fight verge yeah I put it in the chat I also did okay so just now it's tense about X yeah but I never used it I'm not sure if it's just hooks or bindings to actually use tender board with it looks like it like visually it looks like to be tender board yeah that could be interesting yeah so basically it gives you a nice yeah nice board so you can change things and see quickly visually how that yeah that seems like a nice tool interesting anyone use that yet no some I'm trying I'm trying some some keggle competitions like there's this quick-draw competition which got a lot of I wanna see if I can find it quickly so there's a the data set is huge so that's 50 million of drawings like that so quick-draw is like a Google site when you can you have like 20 seconds to draw something they tell you to draw and then they take that image and put it in the database and then they made those 50 millions of those drawings available and you have to predict what they are based on the training data and the issue here is that there's a lot of those images so it's quite difficult to kind of compute all that and the other issue with that is that some labels are about to be incorrect so because for example the application which is like quick-draw so when the application tells you to draw a wheel I can I can draw a and I can leave that and then I believe that application will take that square and assign the label of wheel to that and then and then that's a noise in the in the hall yep so the time was up you didn't guess it but to my understanding that that square would now go into database as wheel and then you get in your training data you get like about guys estimate about 10% of incorrect labels so so what the guys trying to do they trying to make a batch size as large as possible so then those noisy images they kind of like average valid so then you could kind of in reduce the impact of that noise so that's what meaning like meaning that if you then if you had a large enough back to me said wheel then hopefully that'd be a whole lot of other wheels to tell you that oh yeah yeah that that's what the kind of at least some discussion on on the conical forums that's what people trying to do to overcome that issue so they trying to use as large just possible batch size like 600 if no more they trying to use all the all the 50 million images which takes forever to train so it's kind of I guess kind of still not the real-life example but still kind of some sort of kind of like a challenge where you have to deal with that big amount of data so either you can take a subsample of that start with even train you with just one percent of the data gives you already like 90% accuracy but like to get more you have to add thing you have to really train like with all of them and with large batch size and then they the the data they give you it's not image actually it's a it's a vectors so when when people draw those things it stores that was as a vectors like from A to B and then again again so they give you vectors but some guys made some some kernels available that takes those vectors and convert that into image so actually you actually your input to your network could be an image again but some people try even like the LS TM networks for that and try to see the patterns in in the in those lines in The Strokes so there are a couple of different methods people try it nothing like doing competition you can see all that stuff going on and you can learn also what from the forums what people what people do then in this competition if in the test data it's possible that you have also something wrongly labeled or or maybe those are check it because you cannot guess know what it is if it if you never saw it before and it's wrong you cannot guess it are you guys yeah on the test oh good question because on the test set with the more we only have the images without labels so because this question Scott from thinking that maybe it will be good to do something to prefilter the damage is know so for example you take one class and you do I don't know how wide you can do some sort of clustering and you remove you know the farthest examples from there you know from the mass of you know the ones that are closer between them or something like that I don't know yeah that's a good point so if there was a like a way to do it they'll be great so to get rid of those wrong because of our Legos but for me I'm so like all those images like on this page B's every single one looks so different I mean yeah that's true it's like I'm so surprised that Mac neural networks can make sense out of it it's like unbelievable but in fact they can but it's crazy all right so we are one o'clock I guess there will be for this so it's around the random forest implement is thief and I'm still catching up on some of the videos is the first day I random forest implementation vastly different from the scikit-learn one and is that what what he's he's talking about and how it's different and what the advantages are he's using the psychic learn run the forest the only the only faster library uses is for like to prepare the data for for the psychic learn run the forest and stuff like around the D Hall the regression model he's just scikit-learn yeah you mentioned in one of the lectures that basically the cyclotron one is just much more optimized so even though he's coding it up now in in in in the course and you get some pretty decent performance even especially if you introduce the certain language stuff then you can get close to as a psychic learn but it's it's nothing different or whatever he's just showing that not magic basically walking program with yourself but in the end psychic learn you trust yourself oh okay excellent thank you any more questions before we are finished for the day yeah did he talk about gradient boosting and lecture eight I don't I don't know okay it's coming up either I think I kind of ran through that the coming lessons like briefly and I don't think there's anything about boosting yeah so so in let's he chose run of forests I believe do too it's easier to interpret may be easier to to to teach I guess easier to compute and it still gives you quite decent protection but he in it in one of the lecture he mentioned that this is the kind of random forest together with other the tree based methods like extra boost or the light GBM these are the methods to kind of to learn for this purpose unlike if you look at a girl now everything is like almost like Agassi boost or the light GBM no one's easier and the forests fearful that for those problems okay excellent so next week we have Lesson nine and that's going to be I forgot what I was going to be but it's going to be something interesting again I guess cool so SGD has a gradient descent applied to the same maybe no no that's this one I'm sorry that's this one yeah I'll find out it's always interesting from German sir sure it's nice that it's again if you want to do any mini presentation five minutes to talk about something just let us know and we'll do that okay okay thanks Michael thank you very much thank you you\n"

TWiML & AI x Fast.ai Machine Learning Study Group – Session 8 – November 25, 2018

Random Videos