Represent a dataset as a Matrix - Dimensionality reduction Lecture 4@Applied AI Course

Representing Data Points as Rows and Columns

In data analysis, it is essential to choose an appropriate representation method for your data points. One common approach is to represent each data point as a row, where each column represents a feature or variable. This format allows for easy comparison and manipulation of the data.

Transposing the Default Format

-----------------------------

When working with matrices, the default format is often a column vector. However, this can be limiting when dealing with datasets that contain multiple features. To overcome this limitation, it is possible to transpose the matrix, which means swapping rows with columns. This results in a new matrix where each row corresponds to one data point and each column corresponds to one feature.

Representing Data as Row Vectors

--------------------------------

One way to represent X is by transposing it, resulting in a row vector of T-dimensional space. Each row in this representation corresponds to one data point, while each column corresponds to one feature or variable. For example, if we have a dataset with three features (F1, F2, and F3) and three data points (X1, X2, and X3), we can represent it as:

| X1 | X2 | X3 |

| --- | --- | --- |

| f1 | f2 | f3 |

| f1 | f2 | f3 |

| f1 | f2 | f3 |

In this representation, each row represents a data point, and each column represents a feature or variable.

Alternative Representations

---------------------------

There is another way to represent X, where each column vector is stacked together to form a matrix. This results in a B × n matrix, where each column represents a data point, and each row represents a feature or variable. For example:

| X1 | X2 | X3 |

| --- | --- | --- |

| f1 | f2 | f3 |

| f1 | f2 | f3 |

| f1 | f2 | f3 |

However, this representation is less common and may not be as intuitive for some readers.

Choosing a Representation Method

---------------------------------

There is no right or wrong approach to representing data points. The choice of method depends on personal preference, the specific problem being addressed, and the intended use of the data. In general, it is essential to choose a representation method that allows for easy comparison, manipulation, and analysis of the data.

Representation with Y

--------------------

Another important aspect of data representation is the classification or target variable, often denoted as Y. This can be represented as a column vector, where each row corresponds to one data point, and each column represents a feature or variable. In this context, the corresponding y-value represents the class label (setosa, virginica, or versicolor) for that particular data point.

Using Matrices Extensively

-------------------------

In subsequent lectures, we will use matrix representations extensively. We will explore various operations on these matrices and how they can be applied to solve problems in machine learning and data analysis.

"WEBVTTKind: captionsLanguage: enone other very common way of presenting a dataset is using a matrix let's say let's say how to represent it as you might already know a matrix is basically like a table right suppose if my data set D is collection of X I Y I I going from 1 to n let's say X I belongs to our B and let's say Y I belongs to let's say setosa virginica versicolor okay let's say this this representation we saw a while ago now let's put the same data in a matrix form ok there again to 2 ways of representing it as a data matrix I'll pick one I'll also show you the other way of representing so imagine if this is my data matrix and let's assume so what does this mean this means that my X I is a D dimensional vector right which means I have D features what does this mean this means I have D features all right I can have I can write my features as columns of my matrix F 1 F 2 F 3 so on so forth FD right and I have n data points right I have first data point second data point third data point so on and so forth n data points so this matrix is typically written as capital X which is n cross d each row here each row so the I'd throw here I throw here is nothing but your X I transpose right Here I am representing each data point I am representing each data point as a row why do I write a transpose here because if I just wrote X I the default X I is always a column vector is always a column vector so excitin spose becomes a row vector right so if X I is a column vector right Excite transpose which means swapping which means basically converting your rows to columns and columns to rows is a row vector all right now given this I'm representing X a transpose is now a row vector of T dimensional space so each row here corresponds to one data point each column corresponds to one feature so if I have FJ here this is my Jade feature right this is one way of representing and each column and each column represents a feature or a variable this is one way of representing it in there is also exactly similar way of representing where let me show it to you where my X could be represented as this where each row represents my features f1 f2 f3 so on FJ so on so forth ft ok and each of my data points could be represented as column vectors sorry I so on so forth n so this is a B cross n matrix and each of my points my eighth point my eighth point X I is here it's a column vector this is in this case each column represents a data point and each row represents each row represents a feature or a variable a feature are available so in this case your features your f1 could have been petal length your f2 could have been petal width your f 3 could have been sepal length your f 4 could have been petal so except the width right so there are two representations and remember this X this new X that I have written is nothing but transpose of this if I just swap rows with columns columns with rows you get this matrix okay both of them are valid in lot of research papers you typically find this let me let me agree to that because this false digital red X I is a column vector by default so they just stack up all the column vectors to make to make a matrix like this when I studied and during my experience I have used this format more as long as somebody tells you what each column and which each row is it's okay so I'll stick to this representation where each row is a data point and each column is a feature right this looks more like a table so for example I prefer this because it looks more like a table there is no right or wrong approach as long as you specify what you're doing so I like that approach because I can think of it like a table I can think of my data as a table where each row so this could be my sepal length my sepal width right petal length and petal width my four features my f1 my f2 my F 3 sorry my f3 and my F 4 and each of these could be my flowers my first flower what is it settling settlement petal length better with my second flower so on so forth so this is this is the first format that I explained you this format right where each of my data and it looks very similar to typical tabular representation of data right this is how you tabulate data right each row is typically a a flower or a data point and each column is a feature so I prefer to use this rotation where each row is a data point but lot of research papers prefer to use this it's perfectly okay to use any of them but we will stick to this because because I'm just more used to it and it feels more natural especially when you're tabulating data right so there is one more question okay you explain how to represent your X but what about Y of course I can represent Y as a column vector here write as vector Y what is the length of the Y so this is a column vector right which means it has it has n rows and one column and the I throw so this is my first row second row so on my I throw so on so forth n throw because there for each data point I have a wire which says whether the floor is setosa virginica or versicolor so here I'll have know by a corresponding to my X I row this is important corresponding to my X 0 L happening by okay this is how we can represent data where X I can be written as matrix where it where x i's can be all x i's can be concatenated or clubbed together to form a matrix like this where each row is a data point each column is a feature and my Y could be just one column of data where y a corresponding to the corresponding to each X a there is a y which represents what is the class limit right and this is what will stick to will use this extensively you can use this matrix a we'll use this mitt this is matrix x-ray we'll use this matrix matrix extensively in the next few lecturesone other very common way of presenting a dataset is using a matrix let's say let's say how to represent it as you might already know a matrix is basically like a table right suppose if my data set D is collection of X I Y I I going from 1 to n let's say X I belongs to our B and let's say Y I belongs to let's say setosa virginica versicolor okay let's say this this representation we saw a while ago now let's put the same data in a matrix form ok there again to 2 ways of representing it as a data matrix I'll pick one I'll also show you the other way of representing so imagine if this is my data matrix and let's assume so what does this mean this means that my X I is a D dimensional vector right which means I have D features what does this mean this means I have D features all right I can have I can write my features as columns of my matrix F 1 F 2 F 3 so on so forth FD right and I have n data points right I have first data point second data point third data point so on and so forth n data points so this matrix is typically written as capital X which is n cross d each row here each row so the I'd throw here I throw here is nothing but your X I transpose right Here I am representing each data point I am representing each data point as a row why do I write a transpose here because if I just wrote X I the default X I is always a column vector is always a column vector so excitin spose becomes a row vector right so if X I is a column vector right Excite transpose which means swapping which means basically converting your rows to columns and columns to rows is a row vector all right now given this I'm representing X a transpose is now a row vector of T dimensional space so each row here corresponds to one data point each column corresponds to one feature so if I have FJ here this is my Jade feature right this is one way of representing and each column and each column represents a feature or a variable this is one way of representing it in there is also exactly similar way of representing where let me show it to you where my X could be represented as this where each row represents my features f1 f2 f3 so on FJ so on so forth ft ok and each of my data points could be represented as column vectors sorry I so on so forth n so this is a B cross n matrix and each of my points my eighth point my eighth point X I is here it's a column vector this is in this case each column represents a data point and each row represents each row represents a feature or a variable a feature are available so in this case your features your f1 could have been petal length your f2 could have been petal width your f 3 could have been sepal length your f 4 could have been petal so except the width right so there are two representations and remember this X this new X that I have written is nothing but transpose of this if I just swap rows with columns columns with rows you get this matrix okay both of them are valid in lot of research papers you typically find this let me let me agree to that because this false digital red X I is a column vector by default so they just stack up all the column vectors to make to make a matrix like this when I studied and during my experience I have used this format more as long as somebody tells you what each column and which each row is it's okay so I'll stick to this representation where each row is a data point and each column is a feature right this looks more like a table so for example I prefer this because it looks more like a table there is no right or wrong approach as long as you specify what you're doing so I like that approach because I can think of it like a table I can think of my data as a table where each row so this could be my sepal length my sepal width right petal length and petal width my four features my f1 my f2 my F 3 sorry my f3 and my F 4 and each of these could be my flowers my first flower what is it settling settlement petal length better with my second flower so on so forth so this is this is the first format that I explained you this format right where each of my data and it looks very similar to typical tabular representation of data right this is how you tabulate data right each row is typically a a flower or a data point and each column is a feature so I prefer to use this rotation where each row is a data point but lot of research papers prefer to use this it's perfectly okay to use any of them but we will stick to this because because I'm just more used to it and it feels more natural especially when you're tabulating data right so there is one more question okay you explain how to represent your X but what about Y of course I can represent Y as a column vector here write as vector Y what is the length of the Y so this is a column vector right which means it has it has n rows and one column and the I throw so this is my first row second row so on my I throw so on so forth n throw because there for each data point I have a wire which says whether the floor is setosa virginica or versicolor so here I'll have know by a corresponding to my X I row this is important corresponding to my X 0 L happening by okay this is how we can represent data where X I can be written as matrix where it where x i's can be all x i's can be concatenated or clubbed together to form a matrix like this where each row is a data point each column is a feature and my Y could be just one column of data where y a corresponding to the corresponding to each X a there is a y which represents what is the class limit right and this is what will stick to will use this extensively you can use this matrix a we'll use this mitt this is matrix x-ray we'll use this matrix matrix extensively in the next few lectures\n"