Co-variance of a data Matrix Dimensionality reduction and visualization Lecture 8 @Applied AI Course

**Understanding the Covariance Matrix**

In statistics and machine learning, the covariance matrix is a fundamental concept that plays a crucial role in various applications. The covariance matrix is a square matrix that summarizes the covariance between different variables in a multivariate distribution. In this article, we will delve into the details of the covariance matrix and explore its properties, construction, and applications.

**Transposing the Data Matrix**

The first step in constructing the covariance matrix is to transpose the data matrix X. The data matrix X represents the feature values for each observation, while the transpose operation swaps the rows with the columns. So, if we have a data matrix X of size n x D, its transpose will be a D x n matrix.

**Multiplying with the Transpose of the Data Matrix**

Once we have the transposed data matrix, we can multiply it with itself to get the covariance matrix. The formula for constructing the covariance matrix is s = 1/n * X^T * X, where s is the covariance matrix, and X^T is the transpose of the data matrix.

**Assumptions**

It is essential to note that the assumption of column standardization is crucial in this construction process. If the columns of the data matrix X are standardized, then the formula for constructing the covariance matrix holds true. Standardizing the columns means centering each feature value around its mean and scaling it by its standard deviation.

**Understanding the Elements of the Covariance Matrix**

The covariance matrix s represents the covariance between different variables in the dataset. Each element s_ij of the covariance matrix corresponds to the covariance between the i-th variable and the j-th variable. To understand this, let's consider a formula for calculating each element of the covariance matrix.

**Calculating the Elements of the Covariance Matrix**

The i-th component of the j-th column of the covariance matrix can be calculated using the formula s_ij = F_i^T * F_j, where F_i and F_j are the j-th and i-th columns of the transposed data matrix X, respectively. This formula represents the covariance between the i-th variable and the j-th variable.

**Understanding Matrix Multiplication**

Matrix multiplication is a fundamental concept in linear algebra that involves multiplying two matrices to get another matrix. In this context, we are multiplying the transposed data matrix X with itself to construct the covariance matrix. The ith component of the j-th column of the product can be calculated by taking the dot product of the ith row of the first matrix and the j-th column of the second matrix.

**Applying the Formula**

Using the formula s_ij = F_i^T * F_j, we can see that each element of the covariance matrix is calculated using this formula. The i-th component of the j-th column of the product represents the covariance between the i-th variable and the j-th variable.

**Interpreting the Covariance Matrix**

The covariance matrix s represents the covariance between different variables in the dataset. Each element of the covariance matrix corresponds to the covariance between two specific variables. Understanding the covariance matrix is crucial in various applications, including data analysis, machine learning, and statistical modeling.

**Conclusion**

In conclusion, the covariance matrix is a fundamental concept in statistics and machine learning that summarizes the covariance between different variables in a multivariate distribution. The construction of the covariance matrix involves transposing the data matrix X and multiplying it with itself to get another matrix. Understanding the elements of the covariance matrix is essential for interpreting the relationships between variables in the dataset.

"WEBVTTKind: captionsLanguage: enso let's assume we are given a data matrix X just like the regular feature one feature to feature a feature T right sorry sorry okay point one point two so on so forth point n this is an N cross D data matrix right with X I transpose here I'm just writing it multiple times so that you will not forget it okay so this is my data matrix right for this given my data matrix there is something called a covariance matrix often written as capitalist let me define what covariance matrixes so let me just define what covariance matrixes will in some very very interesting properties of covariance matrix so this covariance matrix is always of size D cross D so given a data matrix S Plus R X of size n cross D with B features the covariance matrix for s for X sorry the covariance matrix X again sorry extremely sorry so this is called the covariance matrix of X ok this is because X and s sound similar I was getting confused so the covariance matrix of X of my data matrix X is written as s and it has decrease d it's a square matrix so this is called a square matrix because the number of rows and number of columns are the same okay so the 8th row and the jth column this element is called s I J okay so the element corresponding so this is a matrix right this is a matrix with D cross D elements right d rows and D columns right so I'll write the so s IJ corresponds to the eighth row and I'd column element in matrix S okay this is this oval represented right so let's take any of any value here so again we will come to it anyway ok so this one I will represent it now let me define what s IJ is as part of our definition so sij okay where I can go from one to D and J can also go from one today because s is a square matrix of size D trusty s IJ is nothing but the covariance of feature I and feature J now let let me introduce you some simple notation okay so let me just erase this so that it becomes simpler okay so whenever I say F I or F J it means this column vector corresponding to the the feature F J so whenever I say F J okay whenever I say F J it is a column vector what I'm referring to is it's a column vector okay it's a column vector corresponding to jet feature okay so take any element for example if I want to explain you about this element this element let's say zoom corresponds to isolate a point okay this is X this corresponds to let's say X I and jet feature okay so this element I'll write it as X I J just like a savior now here X IJ let me read it for you X IJ means okay so X IJ means right it is the jet feature for the height data point okay simple notation here I am just introducing little notation so that writing the math becomes easy so X IJ basically means for the eighth point I'm looking at the jet feature and whenever I say F J it is basically the column vector corresponding to the jet feature okay so since this is a column vector and this is a column vector it's basically like your simple covariance right so we saw the formula for covariance when we learned about correlation coefficients etcetera covariance between any two random variables x and y any two random variables x and y is nothing but 1 by n summation this is this is nothing but the average it is the average value of x i minus mean of x okay why I - mean of why we saw this definition right we also saw how its interpretation geometrically right we saw besides interpretation geometrically when we learned about correlation coefficients Pearson correlation coefficient coefficient Spearman rank correlation coefficient and simple equation for covariance right will be learned all about it based on the shape we also saw its geometric interpretation so covariance again comes back to help us this is a concept that we learned in probability it has connections to linear algebra okay so let let's understand it so what I'm saying here is the IJ element of my of my covariance matrix is nothing but the covariance of Fi and FJ I also know that covariance of fi + FJ is nothing but covariance of fi we also learned this when we learnt the covariance right covariance of or I probably would have written it like this covariance of X comma X is nothing but variance of X of course if instead of Y if I put X here this is nothing but the formula for variance right nothing very fancy there this is this is obvious okay so let us understand that matrix again our matrix oh there is one more property for covariance which is covariance of fi and FJ is same as covariance of FJ and fi it's one in the same because if I replace excess with Weiss Weiss with excess this formula doesn't change right if I put Y I minus V of x XM minus mu of X it doesn't change right so covariance of so these are some of the same simple properties of covariance let's call this property one and property two first properties covariance of X comma X is nothing but variance of X the second property is symmetric which means covariance of FI comma F J is nothing but covariance of FJ comma F I now I having learned this so what does what does the covariance matrix will look like all the diagonal elements so it has de diagonal elements right it has it's a d cross d matrix which means it has d diagonal elements on the diagonal elements you see variances of features okay and since since since this matrix so s so let's assume this is s IJ and this is s ji right since s J I and s IJ are the same so this matrix is also called symmetric matrix so let me write a very simple symmetric matrix what is the symmetric matrix if I have a matrix like this let us assume 2 1 3 okay these are these these are called my diagonal elements I'll write a 3 cross 3 matrix here just for simplicity so as to explain it to you ok so let's assume I have 1 and 2 here and let's say you know 5 here okay so if this is 1 and this is also 1 and this is 2 and let's assume this is 5 sorry write it 1 2 1 2 2 1 2 1 1 5 1 1 5 2 5 3 to 5 years this is called a symmetric matrix because take this element right this is first row second row third row this is first row first column second column third column so take this element what is this element this element is so if this matrix is a this is a 2 1 second row and first column if this is equal to a 1 2 what is a 1 2 first row and second column this element there same right if a I J is equal to AJ I for all I comma J ok then then we call the symmetric matrix in this matrix since SJ I equals to s IJ right here look at like this this is 3 comma 2 third row second column right so a 3 2 is 5 what about a 2 3 a 2 second row third column is also fine so as long as it is satisfied for all as long as this condition is satisfied for all I comma J of course for diagonal elements AIA is so all your diagonal elements will look like what s 1 1 s 2 to s 3 3 s 4 4 so on so forth a deedy right so this is called a symmetric matrix the matrix where a IJ a matrix a is called a symmetric matrix if a IJ equals to AJ I for all I comma J now your covariance matrix is also a symmetric matrix because by definition because since s IJ what is the definition of sij sij is covariance of F I comma F J which is equal to Co variance of F J comma F I from this property to which is equal to SJ I since s IJ is same as s Jie your s is also a symmetric matrix it's also a square matrix why is it a square matrix because the number of rows because it has D rows and columns same number of rows and columns is called is called square matrix same number of rows and columns so this is called square symmetric matrix okay your covariance matrix is a square matrix and also a symmetric matrix okay so and we understood the definition of covariance matrix right we'll see why it is useful bear with me we are just defining terms here we are not doing anything very very fancy here we are just defining a bunch of terms okay now let's let's understand some very very interesting property so let's say you my data set X which has D features right F 1 F 2 so on so forth B features and I have points 1 2 so on so forth em7 cross d right let's assume let let lets suppose that X has been column standard Einstein so what does column standardization mean it means that the mean of the mean of any of my F is equals to 0 and the standard deviation of any of my features equals to 1 that's what it means right which which is not we just learned about let's assume X has been column standardized okay let let let this be then what is the covariance of feature Fi and F J let's write the formula all right okay so what is this so if I have two features Fi and F J what is what is the covariance between them okay so let's write it down so let's say June this is my if I feature and this is my F J feature right okay it has elements right it has x1 I X so what is what is this element this is X first data point and I feature what about this this is x2 second data point right this corresponds to second data point and I feature similarly this one is x1 jet feature this one is x2 jet feature we just learned about this terminology right awhile ago if you look here I just explained X IJ terminology here I'm just reusing that right now okay so what so covariance of F 5 F J is nothing but 1 by n summation over 1 to n ok so let's write it so what what is what are the terms here they are X I okay let me call it f1 f2 just for simplicity okay so let's assume I want to find the covariance between let me just erase this so as not to confuse the so so as to make it simpler covalence of F 1 F 2 is nothing but x i1 minus mu-1 I will define what anyone is just bear with me x i2 minus mu now mu 1 is nothing but the mean of F F 1 sorry this is nothing but mean of F 2 right so let me change the color so that I can explain it better okay so I have to so this I can think of as my random variable x ii think of as my random variable y what is the covariance of X comma Y take all the values of X subtract the mean of X take all the values of one subtract the mean of Y right and take the average value this is basically an averaging right exactly so these are all my values for my random variable my random variable here is my feature right my random variable here is my feature so having said that let's go into it so X I 1 corresponds to so okay let's let's do it suppose if this is my F 1 and this is my F 2 okay I just write it much more cleanly let's assume this is my data set X okay so what will this be this will be X 1 this is my first data point right X 1 1 so what I'm doing here is I'm taking all the values of these features each of these individual values for this feature the let's assume this is this feature is better length and let's assume this feature is petal width registration okay so I'm taking each of the petal weights in my dataset subtracting it with the mean pattern width okay that value I'm multiplying with each of the petal weights supplying with a mean petal width this is the petal length R itself did say these are petal links this is the mean petal length let's assume this is petal width this is the mean pendel width from our iris data set example just so that you connect the dots now one thing you know is since your data has been column standardized okay these means are 0 this is 0 and this is 0 right so what happens to my covariance of F 1 F 2 I can write my covariance of features F 1 and F 2 as 1 by n summation I equals to 1 to n X 1 and X I to x because this has become 0 so this goes away this has become 0 it goes away so I'm just left with this term and this term so what does this mean let's understand what it means let me draw my data matrix much more clearly because this is my data matrix let's assume this is my feature 1 and this is my feature two okay this is my feature 1 and this is my feature 2 okay of course I have lots of features like this up to FB this is my first point my second point my third point fourth point so on so forth end points okay this is my first observation corresponding to my first data point for feature one again for feature two okay my second observation so what this is saying here is multiply this value with this value okay again multiply this value with this value multiply this value with this value so on so forth that's what it is what is xi1 right xi1 is for the eighth point this is X I won this is your X I won this is your X I - it's a multiply X I won with xi2 which means multiply these two and sum up from I equals to 1 to n so what am I literally doing I'm basically multiplying my f1 with f2 I am doing a dot product between f1 and f2 that's what this sum will be equal to so I can write that my covariance between f1 and f2 can I write it as F 1 transpose F 2 multiplied by 1 by n because what is F 1 transpose F 2 okay what is F 1 transpose basically instead of a column vector I'll make this a row vector I'll take this vector here okay I'll convert this into a row vector and I'll multiply with this column vector right whenever I multiply we learned this in basics of linear algebra if I am doing component wise multiplication family if I am doing component wise multiplication and if I'm adding up all of them that is nothing but dot product this is nothing but the dot product between F 1 and F 2 what is not product between F inert the component wise multiplication followed by addition and what is the formula for F 1 dot F 2 it's all the but F 1 transpose F 2 right so you can say that if your features are if F 1 and F 2 have been standardized right then covariance of f1 and - is nothing but f 1 transpose F 2 by n literally that's what it is okay now let me ask you something much more interesting in a second my argument is as follows let me fill it for you let me fill the argument for you I will argue that my s by my covariance matrix my D cross D matrix is nothing but I'll take my matrix X transpose it and multiply with matrix X now what is the size of matrix X my data matrix this is my data matrix this is my data matrix this is my transpose of the transpose of my data matrix this is n cross D so this will be D crossing so when I multiply these two what will I get I will get a D cross D matrix which is what s is okay but let me prove that each element of s will also work out okay so here here here I'm assuming remember I'm assuming that X has been that X has been column standardized I'm assuming this this is a very very very very important assumption that we cannot rule out okay so let's see what is sij what is SH is supposed to be it is covariance of feature I and feature J sorry I need to put a 1 by M here son I forgot that ok so we learned we learned from the formula earlier let's look at the formula what is the definition not the form a sorry what is the definition of of okay here is the definition what is the definition of each element of s IJ it's nothing but covariance of feature and feature J right now I know ok coming back to our topic sorry coming back to our discussion I know from this formula from this formula I know that covariance is nothing but F 1 transpose F 2 by n right so this is nothing but if I transpose F J by n right no so this is your allegis think of this as your LHS I will prove that the ith component of your RHS is exactly this ok let me show that to you ok what does X transpose X mean this is this is interest is important to understand okay if this is X transpose and this is X what is F what is so so so for this whole matrix I want to understand what is the ie jet component of this x this right I want to understand the I jth component when I get the ith component of this product when I multiply the 8th row when I multiply the 8th row with the jth column sorry when I multiply that with the jth column right when I multiply with the 8th row and jth column and when I add so when I multiply multiply this with this and I multiply this with this and when I add it this is simple matrix multiplication right those of you who have learned basic matrices in your high school or in undergrad and quickly understand this this is basic matrix multiplication so your I jet column of this multiplication so on LHS we know that s IJ is nothing but F AI transpose F J by n ok let's look at our RHS ok our RHS I need to prove that this multiplied by this the ijade column of this multiplication is nothing but F AI transpose F J I'll prove that to you so I know that the Jade column in X is nothing but F J ok so this da for the I J for the I Jett value of this of this multiplication is nothing but F chain x okay since I've transpose this matrix what happens my columns my columns become my rows here that's what transpose is it this is X transpose this matrix is X transpose what is transpose mean my columns become my my let me change the color here my columns become my rows here so what is the a through here my I throw is nothing but if I transpose so the ith value for this multiplication is nothing but if I transpose so let me write it more appropriately sorry my height value of this multiplication will be fi transpose F J okay which is what we wanted to prove right so your s IJ is nothing but if I transpose F J by n this is your n HS even on your RHS this is your RHS right your RHS what do you have X transpose X for your X transpose X when I multiplied the IJ value of this product will be FA transpose F J so my sorry so yes so what I have here is on the RHS on the LHS I have fi transpose F J by n on the RHS when I multiply this whole thing by n because the formula here I said s is nothing but 1 by n into X transpose X right so on RHS also I have exactly the same thing so my rh is equals to my latches which basically says that my s is nothing but X transpose X when X is represented the wavy we showed and this is a d cross n matrix this is an N cross d matrix it's always good to write this so as to understand whether you can actually multiply these matrices or not okay so you can get your covariance matrix again if if X has been column standardized otherwise this formula doesn't hold we will use this formula a lot when we are learning will use this extensively when we are learning about when we are going to trying to do dimensional reduction using PCA this is extremely important so your covariance matrix is nothing but X transpose X if your matrix X has been column standardizedso let's assume we are given a data matrix X just like the regular feature one feature to feature a feature T right sorry sorry okay point one point two so on so forth point n this is an N cross D data matrix right with X I transpose here I'm just writing it multiple times so that you will not forget it okay so this is my data matrix right for this given my data matrix there is something called a covariance matrix often written as capitalist let me define what covariance matrixes so let me just define what covariance matrixes will in some very very interesting properties of covariance matrix so this covariance matrix is always of size D cross D so given a data matrix S Plus R X of size n cross D with B features the covariance matrix for s for X sorry the covariance matrix X again sorry extremely sorry so this is called the covariance matrix of X ok this is because X and s sound similar I was getting confused so the covariance matrix of X of my data matrix X is written as s and it has decrease d it's a square matrix so this is called a square matrix because the number of rows and number of columns are the same okay so the 8th row and the jth column this element is called s I J okay so the element corresponding so this is a matrix right this is a matrix with D cross D elements right d rows and D columns right so I'll write the so s IJ corresponds to the eighth row and I'd column element in matrix S okay this is this oval represented right so let's take any of any value here so again we will come to it anyway ok so this one I will represent it now let me define what s IJ is as part of our definition so sij okay where I can go from one to D and J can also go from one today because s is a square matrix of size D trusty s IJ is nothing but the covariance of feature I and feature J now let let me introduce you some simple notation okay so let me just erase this so that it becomes simpler okay so whenever I say F I or F J it means this column vector corresponding to the the feature F J so whenever I say F J okay whenever I say F J it is a column vector what I'm referring to is it's a column vector okay it's a column vector corresponding to jet feature okay so take any element for example if I want to explain you about this element this element let's say zoom corresponds to isolate a point okay this is X this corresponds to let's say X I and jet feature okay so this element I'll write it as X I J just like a savior now here X IJ let me read it for you X IJ means okay so X IJ means right it is the jet feature for the height data point okay simple notation here I am just introducing little notation so that writing the math becomes easy so X IJ basically means for the eighth point I'm looking at the jet feature and whenever I say F J it is basically the column vector corresponding to the jet feature okay so since this is a column vector and this is a column vector it's basically like your simple covariance right so we saw the formula for covariance when we learned about correlation coefficients etcetera covariance between any two random variables x and y any two random variables x and y is nothing but 1 by n summation this is this is nothing but the average it is the average value of x i minus mean of x okay why I - mean of why we saw this definition right we also saw how its interpretation geometrically right we saw besides interpretation geometrically when we learned about correlation coefficients Pearson correlation coefficient coefficient Spearman rank correlation coefficient and simple equation for covariance right will be learned all about it based on the shape we also saw its geometric interpretation so covariance again comes back to help us this is a concept that we learned in probability it has connections to linear algebra okay so let let's understand it so what I'm saying here is the IJ element of my of my covariance matrix is nothing but the covariance of Fi and FJ I also know that covariance of fi + FJ is nothing but covariance of fi we also learned this when we learnt the covariance right covariance of or I probably would have written it like this covariance of X comma X is nothing but variance of X of course if instead of Y if I put X here this is nothing but the formula for variance right nothing very fancy there this is this is obvious okay so let us understand that matrix again our matrix oh there is one more property for covariance which is covariance of fi and FJ is same as covariance of FJ and fi it's one in the same because if I replace excess with Weiss Weiss with excess this formula doesn't change right if I put Y I minus V of x XM minus mu of X it doesn't change right so covariance of so these are some of the same simple properties of covariance let's call this property one and property two first properties covariance of X comma X is nothing but variance of X the second property is symmetric which means covariance of FI comma F J is nothing but covariance of FJ comma F I now I having learned this so what does what does the covariance matrix will look like all the diagonal elements so it has de diagonal elements right it has it's a d cross d matrix which means it has d diagonal elements on the diagonal elements you see variances of features okay and since since since this matrix so s so let's assume this is s IJ and this is s ji right since s J I and s IJ are the same so this matrix is also called symmetric matrix so let me write a very simple symmetric matrix what is the symmetric matrix if I have a matrix like this let us assume 2 1 3 okay these are these these are called my diagonal elements I'll write a 3 cross 3 matrix here just for simplicity so as to explain it to you ok so let's assume I have 1 and 2 here and let's say you know 5 here okay so if this is 1 and this is also 1 and this is 2 and let's assume this is 5 sorry write it 1 2 1 2 2 1 2 1 1 5 1 1 5 2 5 3 to 5 years this is called a symmetric matrix because take this element right this is first row second row third row this is first row first column second column third column so take this element what is this element this element is so if this matrix is a this is a 2 1 second row and first column if this is equal to a 1 2 what is a 1 2 first row and second column this element there same right if a I J is equal to AJ I for all I comma J ok then then we call the symmetric matrix in this matrix since SJ I equals to s IJ right here look at like this this is 3 comma 2 third row second column right so a 3 2 is 5 what about a 2 3 a 2 second row third column is also fine so as long as it is satisfied for all as long as this condition is satisfied for all I comma J of course for diagonal elements AIA is so all your diagonal elements will look like what s 1 1 s 2 to s 3 3 s 4 4 so on so forth a deedy right so this is called a symmetric matrix the matrix where a IJ a matrix a is called a symmetric matrix if a IJ equals to AJ I for all I comma J now your covariance matrix is also a symmetric matrix because by definition because since s IJ what is the definition of sij sij is covariance of F I comma F J which is equal to Co variance of F J comma F I from this property to which is equal to SJ I since s IJ is same as s Jie your s is also a symmetric matrix it's also a square matrix why is it a square matrix because the number of rows because it has D rows and columns same number of rows and columns is called is called square matrix same number of rows and columns so this is called square symmetric matrix okay your covariance matrix is a square matrix and also a symmetric matrix okay so and we understood the definition of covariance matrix right we'll see why it is useful bear with me we are just defining terms here we are not doing anything very very fancy here we are just defining a bunch of terms okay now let's let's understand some very very interesting property so let's say you my data set X which has D features right F 1 F 2 so on so forth B features and I have points 1 2 so on so forth em7 cross d right let's assume let let lets suppose that X has been column standard Einstein so what does column standardization mean it means that the mean of the mean of any of my F is equals to 0 and the standard deviation of any of my features equals to 1 that's what it means right which which is not we just learned about let's assume X has been column standardized okay let let let this be then what is the covariance of feature Fi and F J let's write the formula all right okay so what is this so if I have two features Fi and F J what is what is the covariance between them okay so let's write it down so let's say June this is my if I feature and this is my F J feature right okay it has elements right it has x1 I X so what is what is this element this is X first data point and I feature what about this this is x2 second data point right this corresponds to second data point and I feature similarly this one is x1 jet feature this one is x2 jet feature we just learned about this terminology right awhile ago if you look here I just explained X IJ terminology here I'm just reusing that right now okay so what so covariance of F 5 F J is nothing but 1 by n summation over 1 to n ok so let's write it so what what is what are the terms here they are X I okay let me call it f1 f2 just for simplicity okay so let's assume I want to find the covariance between let me just erase this so as not to confuse the so so as to make it simpler covalence of F 1 F 2 is nothing but x i1 minus mu-1 I will define what anyone is just bear with me x i2 minus mu now mu 1 is nothing but the mean of F F 1 sorry this is nothing but mean of F 2 right so let me change the color so that I can explain it better okay so I have to so this I can think of as my random variable x ii think of as my random variable y what is the covariance of X comma Y take all the values of X subtract the mean of X take all the values of one subtract the mean of Y right and take the average value this is basically an averaging right exactly so these are all my values for my random variable my random variable here is my feature right my random variable here is my feature so having said that let's go into it so X I 1 corresponds to so okay let's let's do it suppose if this is my F 1 and this is my F 2 okay I just write it much more cleanly let's assume this is my data set X okay so what will this be this will be X 1 this is my first data point right X 1 1 so what I'm doing here is I'm taking all the values of these features each of these individual values for this feature the let's assume this is this feature is better length and let's assume this feature is petal width registration okay so I'm taking each of the petal weights in my dataset subtracting it with the mean pattern width okay that value I'm multiplying with each of the petal weights supplying with a mean petal width this is the petal length R itself did say these are petal links this is the mean petal length let's assume this is petal width this is the mean pendel width from our iris data set example just so that you connect the dots now one thing you know is since your data has been column standardized okay these means are 0 this is 0 and this is 0 right so what happens to my covariance of F 1 F 2 I can write my covariance of features F 1 and F 2 as 1 by n summation I equals to 1 to n X 1 and X I to x because this has become 0 so this goes away this has become 0 it goes away so I'm just left with this term and this term so what does this mean let's understand what it means let me draw my data matrix much more clearly because this is my data matrix let's assume this is my feature 1 and this is my feature two okay this is my feature 1 and this is my feature 2 okay of course I have lots of features like this up to FB this is my first point my second point my third point fourth point so on so forth end points okay this is my first observation corresponding to my first data point for feature one again for feature two okay my second observation so what this is saying here is multiply this value with this value okay again multiply this value with this value multiply this value with this value so on so forth that's what it is what is xi1 right xi1 is for the eighth point this is X I won this is your X I won this is your X I - it's a multiply X I won with xi2 which means multiply these two and sum up from I equals to 1 to n so what am I literally doing I'm basically multiplying my f1 with f2 I am doing a dot product between f1 and f2 that's what this sum will be equal to so I can write that my covariance between f1 and f2 can I write it as F 1 transpose F 2 multiplied by 1 by n because what is F 1 transpose F 2 okay what is F 1 transpose basically instead of a column vector I'll make this a row vector I'll take this vector here okay I'll convert this into a row vector and I'll multiply with this column vector right whenever I multiply we learned this in basics of linear algebra if I am doing component wise multiplication family if I am doing component wise multiplication and if I'm adding up all of them that is nothing but dot product this is nothing but the dot product between F 1 and F 2 what is not product between F inert the component wise multiplication followed by addition and what is the formula for F 1 dot F 2 it's all the but F 1 transpose F 2 right so you can say that if your features are if F 1 and F 2 have been standardized right then covariance of f1 and - is nothing but f 1 transpose F 2 by n literally that's what it is okay now let me ask you something much more interesting in a second my argument is as follows let me fill it for you let me fill the argument for you I will argue that my s by my covariance matrix my D cross D matrix is nothing but I'll take my matrix X transpose it and multiply with matrix X now what is the size of matrix X my data matrix this is my data matrix this is my data matrix this is my transpose of the transpose of my data matrix this is n cross D so this will be D crossing so when I multiply these two what will I get I will get a D cross D matrix which is what s is okay but let me prove that each element of s will also work out okay so here here here I'm assuming remember I'm assuming that X has been that X has been column standardized I'm assuming this this is a very very very very important assumption that we cannot rule out okay so let's see what is sij what is SH is supposed to be it is covariance of feature I and feature J sorry I need to put a 1 by M here son I forgot that ok so we learned we learned from the formula earlier let's look at the formula what is the definition not the form a sorry what is the definition of of okay here is the definition what is the definition of each element of s IJ it's nothing but covariance of feature and feature J right now I know ok coming back to our topic sorry coming back to our discussion I know from this formula from this formula I know that covariance is nothing but F 1 transpose F 2 by n right so this is nothing but if I transpose F J by n right no so this is your allegis think of this as your LHS I will prove that the ith component of your RHS is exactly this ok let me show that to you ok what does X transpose X mean this is this is interest is important to understand okay if this is X transpose and this is X what is F what is so so so for this whole matrix I want to understand what is the ie jet component of this x this right I want to understand the I jth component when I get the ith component of this product when I multiply the 8th row when I multiply the 8th row with the jth column sorry when I multiply that with the jth column right when I multiply with the 8th row and jth column and when I add so when I multiply multiply this with this and I multiply this with this and when I add it this is simple matrix multiplication right those of you who have learned basic matrices in your high school or in undergrad and quickly understand this this is basic matrix multiplication so your I jet column of this multiplication so on LHS we know that s IJ is nothing but F AI transpose F J by n ok let's look at our RHS ok our RHS I need to prove that this multiplied by this the ijade column of this multiplication is nothing but F AI transpose F J I'll prove that to you so I know that the Jade column in X is nothing but F J ok so this da for the I J for the I Jett value of this of this multiplication is nothing but F chain x okay since I've transpose this matrix what happens my columns my columns become my rows here that's what transpose is it this is X transpose this matrix is X transpose what is transpose mean my columns become my my let me change the color here my columns become my rows here so what is the a through here my I throw is nothing but if I transpose so the ith value for this multiplication is nothing but if I transpose so let me write it more appropriately sorry my height value of this multiplication will be fi transpose F J okay which is what we wanted to prove right so your s IJ is nothing but if I transpose F J by n this is your n HS even on your RHS this is your RHS right your RHS what do you have X transpose X for your X transpose X when I multiplied the IJ value of this product will be FA transpose F J so my sorry so yes so what I have here is on the RHS on the LHS I have fi transpose F J by n on the RHS when I multiply this whole thing by n because the formula here I said s is nothing but 1 by n into X transpose X right so on RHS also I have exactly the same thing so my rh is equals to my latches which basically says that my s is nothing but X transpose X when X is represented the wavy we showed and this is a d cross n matrix this is an N cross d matrix it's always good to write this so as to understand whether you can actually multiply these matrices or not okay so you can get your covariance matrix again if if X has been column standardized otherwise this formula doesn't hold we will use this formula a lot when we are learning will use this extensively when we are learning about when we are going to trying to do dimensional reduction using PCA this is extremely important so your covariance matrix is nothing but X transpose X if your matrix X has been column standardized\n"