R tutorial - Subsetting your Vectors in R

The Importance of Factor Subsetting in R: A Comprehensive Guide

In the previous exercises, you have created and worked with factors of different types, learning how to operate with them on a basic level. However, there is one more important element that makes your experience with factors complete: factor subsetting. The name itself reveals what it's all about - selecting parts of your vector to end up with a new vector that is a subset of the original. In this article, we will explore the different ways you can perform factor subsetting using R.

Using Indexes for Subsetting

One way to select elements from a factor is by using indexes. Suppose you want to select the first element from your factor. You can do so by using square brackets and placing the index number inside them. For example, `remain[1]` will give you the first element from your factor. This method works for numeric indexes only. If you're dealing with character-based factors, you can use character strings instead of numbers.

For instance, if you have a factor called "spades" and you want to select its first element, you would simply type `remain$spades[1]`. Note that the dollar sign is used here because it's a character vector. This method allows for a flexible way of selecting elements from your factor using numeric or character-based indexes.

Another way to refer to an element in a factor with a name is by typing the name followed by square brackets, as shown above. For example, `remain$spades[1]` would give you the first element of "spades" within the factor. This method provides an alternative approach when dealing with factors that have character-based names.

Selecting Multiple Elements from Factors

When working with multiple elements, it's often useful to select them all at once. One way to do this is by using a vector containing the indexes of the elements you want to select. For example, if you want to get the first and third elements from your factor, you can create a vector like `c(1, 3)` and use it in square brackets: `remain[1:3]`. This method allows you to work with multiple elements simultaneously.

Another way to select multiple elements is by using the logical indexing method. This involves creating a vector containing true or false values that correspond to each element in your factor. For instance, if you want to get all the elements except for the first one and the third one (0-indexed), you would create a vector like `c(FALSE, TRUE, FALSE)` and use it in square brackets: `remain[!c(FALSE, TRUE, FALSE)]`. This method provides another way to work with multiple elements by filtering them based on their values.

Selecting Elements from Factors Using Names

In addition to using indexes or the logical indexing method, you can also select elements from factors using names. For example, suppose you want to get all the elements that correspond to "spades" and "clubs". You would create a vector containing these names: `c("spades", "clubs")`, and use it in square brackets along with the dollar sign: `remain$c("spades", "clubs")`. This method allows you to work with character-based factors by selecting elements based on their names.

Another way to select all the elements from a factor except for one is by using the minus operator (-). For instance, if you want to get all the elements of your factor except for the first one (0-indexed), you would simply type `remain[-1]`. This method provides an alternative approach when dealing with numeric indexes only.

Using Logical Vectors for Subsetting

Another way to perform subsetting is by using logical vectors. A logical vector contains true or false values that correspond to each element in your factor. When used in square brackets, it filters out elements based on their corresponding value. For example, if you want to get the second and fourth elements from your factor (0-indexed), you would create a vector like `c(TRUE, FALSE, TRUE, FALSE)` and use it in square brackets: `remain[c(TRUE, FALSE, TRUE, FALSE)]`.

When working with logical vectors for subsetting, keep in mind that they are zero-indexed. This means the first element corresponds to index 1, not 0. For instance, if you want to get all the elements except for the first one (which is actually at index 0), you would create a vector like `c(FALSE, TRUE)`, and use it in square brackets: `remain[c(FALSE, TRUE)]`.

Moreover, logical vectors can be used with characters as well. For example, if you want to get all the elements corresponding to "spades" but not those corresponding to "diamonds", you would create a vector like `c(TRUE, FALSE)` and use it in square brackets: `remain[c(TRUE, FALSE)]`. This method allows for a flexible way of selecting elements from your factor based on their values or names.

Converting Vector Length to Match Factor Length

When working with logical vectors, you may encounter situations where the length of the vector is less than the length of your factor. In this case, you can use the `c()` function in R to create a new logical vector that repeats the existing one until it matches the length of the original vector. For example, if you want a logical vector of the same length as `remain` but with all elements set to FALSE (except for the ones corresponding to "spades"), you would do `c(FALSE, TRUE)[length(remain)]`. This method allows you to handle cases where your logical vector has fewer elements than your factor.

Conclusion

In this article, we explored different ways of performing factor subsetting in R using indexes, names, and logical vectors. Each method provides a unique approach to selecting elements from factors based on their values or names. By mastering these techniques, you'll become more comfortable working with factors in R, allowing you to extract specific data points more efficiently. Whether dealing with numeric or character-based factors, learning how to perform factor subsetting effectively is an essential skill for any data analyst or researcher.

"WEBVTTKind: captionsLanguage: enin the previous exercises you have created and worked with factors of different types you have even computed that length and seen how to operate with them on a basic level a last important elements to make your experience with factors complete is factor sub setting as the name reveals it basically comes down to selecting parts of your vector to end up with a new vector which is a subset of your original vector remember the remain vector that we built in one of the previous videos here it is again as a named vector suppose you now want to select the first element from this vector corresponding to the number of spaces that are left you can use square brackets for this we right remain open brackets 1 closed brackets the number 1 inside the square brackets indicates that you want to get the first element from the remain vector result is again a vector because a single number is actually a vector of length 1 this new vector contains the number 11 the name of the only element spades is conveniently gets if you instead wanted to select the third elements corresponding to the remaining diamonds you could go to remain followed by 3 in square brackets this was subsetting using an index but if you're dealing with names vectors you can also use the names to perform the selection instead of using the index 1 to select the first elements you can use the name space you type remain followed by spades as a go-to string inside square brackets result is exactly the same as using the numeric index 1 can you tell how you can refer to the third element and a vector with the name you've probably figured out that you simply have to type diamonds inside square brackets this time suppose now you want to select the elements in a vector that gives the remaining spades and clubs in a one-liner and start them in a new variable remain black instead of using a single number inside the square brackets you can use a vector to specify which indices you want to select because spades are at index 1 and clubs at the next for you use a vector containing 1 & 4 inside square brackets how the resulting vector is ordered depends on the order of the indices inside the selection vector if you change c14 to c4 1 you will get a vector where the clubs come first of course subsetting multiple elements can also be done by using names at least if you're dealing with a named vector to get the same result as the command above we right remain open brackets then a vector containing the character strings clubs and spades closed brackets there's yet another way to subset vectors that's specifically useful if you want to select all the elements from a vector except one suppose you want to create a vector that contains all the information that's in the remain vector except for the space count you can write remain open brackets minus 1 closed brackets this command removes the first index from to remain vector of course you can also remove multiple elements like this for example the minus operator does not work with names though this command for example draws an error these subsetting techniques will enable you to select only the elements of interest from your vector and continue your analysis with these before I unleash you to the last set of exercise of this chapter I want to talk about one last way to perform vector subsetting using a logical vector to do this you typically use a logical vector that has the same length as a vector you try to subset the elements for which the corresponding value in a selecting vector is true will be kept in a subset the vector elements that correspond to false will not be kept let's try to select the 2nd and 4th element from the remain vector using a logical vector to this sense we construct a vector containing false true false and true and put it inside the square brackets of course you could also have created a new vector first and then use it to perform the selection like this now you might expect that ARCT rows an error if you try to use a logical vector that is shorter than the vector on which you want to perform the subsetting drying this out shows a different reality suppose you use a vector containing only two logicals instead of four no error whatsoever that's because our perform something called recycling are smart enough to see that the vector of logicals you passed it is shorter than remain vector so it repeats the contents of the vector until it has the same length as remain this means that behind the scenes this line of code is executed giving result that we've observed before even if you use a vector of length three to do the selection the factor is recycled to end up with a vector of length four thus appending the first element again so this statements gets converted to this statement behind the scenes that's why their outputs are the same up to you now to solve the challenges and use your newfound knowledge to beat a casino in the next chapter I'll be talking about matrices see you therein the previous exercises you have created and worked with factors of different types you have even computed that length and seen how to operate with them on a basic level a last important elements to make your experience with factors complete is factor sub setting as the name reveals it basically comes down to selecting parts of your vector to end up with a new vector which is a subset of your original vector remember the remain vector that we built in one of the previous videos here it is again as a named vector suppose you now want to select the first element from this vector corresponding to the number of spaces that are left you can use square brackets for this we right remain open brackets 1 closed brackets the number 1 inside the square brackets indicates that you want to get the first element from the remain vector result is again a vector because a single number is actually a vector of length 1 this new vector contains the number 11 the name of the only element spades is conveniently gets if you instead wanted to select the third elements corresponding to the remaining diamonds you could go to remain followed by 3 in square brackets this was subsetting using an index but if you're dealing with names vectors you can also use the names to perform the selection instead of using the index 1 to select the first elements you can use the name space you type remain followed by spades as a go-to string inside square brackets result is exactly the same as using the numeric index 1 can you tell how you can refer to the third element and a vector with the name you've probably figured out that you simply have to type diamonds inside square brackets this time suppose now you want to select the elements in a vector that gives the remaining spades and clubs in a one-liner and start them in a new variable remain black instead of using a single number inside the square brackets you can use a vector to specify which indices you want to select because spades are at index 1 and clubs at the next for you use a vector containing 1 & 4 inside square brackets how the resulting vector is ordered depends on the order of the indices inside the selection vector if you change c14 to c4 1 you will get a vector where the clubs come first of course subsetting multiple elements can also be done by using names at least if you're dealing with a named vector to get the same result as the command above we right remain open brackets then a vector containing the character strings clubs and spades closed brackets there's yet another way to subset vectors that's specifically useful if you want to select all the elements from a vector except one suppose you want to create a vector that contains all the information that's in the remain vector except for the space count you can write remain open brackets minus 1 closed brackets this command removes the first index from to remain vector of course you can also remove multiple elements like this for example the minus operator does not work with names though this command for example draws an error these subsetting techniques will enable you to select only the elements of interest from your vector and continue your analysis with these before I unleash you to the last set of exercise of this chapter I want to talk about one last way to perform vector subsetting using a logical vector to do this you typically use a logical vector that has the same length as a vector you try to subset the elements for which the corresponding value in a selecting vector is true will be kept in a subset the vector elements that correspond to false will not be kept let's try to select the 2nd and 4th element from the remain vector using a logical vector to this sense we construct a vector containing false true false and true and put it inside the square brackets of course you could also have created a new vector first and then use it to perform the selection like this now you might expect that ARCT rows an error if you try to use a logical vector that is shorter than the vector on which you want to perform the subsetting drying this out shows a different reality suppose you use a vector containing only two logicals instead of four no error whatsoever that's because our perform something called recycling are smart enough to see that the vector of logicals you passed it is shorter than remain vector so it repeats the contents of the vector until it has the same length as remain this means that behind the scenes this line of code is executed giving result that we've observed before even if you use a vector of length three to do the selection the factor is recycled to end up with a vector of length four thus appending the first element again so this statements gets converted to this statement behind the scenes that's why their outputs are the same up to you now to solve the challenges and use your newfound knowledge to beat a casino in the next chapter I'll be talking about matrices see you there\n"