R Tutorial - Working with 'Out-of-Core' Objects using the Bigmemory Project

The Use of Big Memory Package in R: A Comprehensive Guide

In this article, we will delve into the world of big matrices and explore the use of the big memory package in R. This package is designed to store, manipulate, and process dense matrices that may be larger than a computer's RAM. We will cover how to create, retrieve subsets, and summarize big matrices, as well as discuss the benefits and strategies for using this package.

The Use of Big Memory Package

The big memory package is a tool designed to handle large datasets that are too big to fit in memory. This package uses a technique called "out of core computing," which involves moving data from disk to RAM only when it is necessary for processing. This approach has several benefits, including improved performance and reduced risk of running out of RAM.

When working with big matrices, it's essential to consider the size of your dataset in relation to the amount of RAM available on your machine. If your dataset is at least 20% of the size of RAM and is represented as a dense matrix (i.e., most values are not zero), you should consider using a big matrix object. Big matrices keep data on disk and only move it to RAM when needed, which helps prevent bogging down your machine when running out of RAM.

The package automatically detects when needed data resides on disk and moves it for you, eliminating the need for explicit function calls to transfer data between disk and RAM. Another advantage of using a big matrix object is that it only needs to be imported once, similar to reading a data frame. When you read in a big matrix object, it creates a backing file that holds the data in binary format along with a descriptor file that tells R how to load it in a subsequent session. You can then point R at these two files and they will be instantly available without having to go through the import process again.

Creating a Big Matrix Object

To create a big matrix object, you first need to load the big memory package using the library function. Then, you can create a big matrix object by specifying six parameters:

1. The number of rows

2. The number of columns

3. The type of elements the big matrix will hold (e.g., numeric, integer, etc.)

4. The initial value for all elements of the matrix

5. The name of the backing file

6. The name of the descriptor file

The backing file holds the binary representation of the matrix on disk, while the descriptor file holds other information about the big matrix, such as the number of rows, columns, type, and column and row names.

Once you have created a big matrix object, you can verify its elements by typing `X`. This will display information about the matrix, including its type and handle to its underlying C++ data structure. Big matrices behave like regular R matrices, allowing you to change values in the matrix using standard R syntax, such as assigning values to specific rows and columns.

Example Use Case

Let's create a big matrix object and demonstrate how it can be used in practice.

```R

# Load the big memory package

library(bigmem)

# Create a big matrix object with 10 rows, 5 columns of numeric type,

# initialized to zero, backed by files "bigmatrix.bin" and "bigmatrix.dsn"

bigmat <- bigmatrix(0, 10, 5, type = "numeric", init = 0,

file = "bigmatrix.bin", descfile = "bigmatrix.dsn")

# Assign a value to the first row and column

bigmat[1, 1] <- 3

# Verify the change by looking at all elements of the matrix

X

```

In this example, we create a big matrix object with 10 rows and 5 columns of numeric type, initialized to zero. We then assign a value to the first row and column using standard R syntax. Finally, we verify that the change has taken place by looking at all elements of the matrix using the `X` command.

Conclusion

The big memory package is a powerful tool for handling large datasets in R. By understanding how to create, retrieve subsets, and summarize big matrices, you can unlock the full potential of this package and improve your performance when working with large datasets. Remember to consider the size of your dataset in relation to the amount of RAM available on your machine, and use the package's out of core computing technique to move data from disk to RAM only when necessary. With practice and experience, you'll be able to harness the power of big matrices and take your R skills to the next level.

"WEBVTTKind: captionsLanguage: enin this section we're going to cover the use of the big memory package this is a package I wrote to store manipulate and process dense matrices called big matrices that may be larger than a computer's ramp we're going to cover how to create retrieve subsets and summarize big matrices as mentioned before our objects are kept in RAM this is much faster than using the disk but there is less ram than disk when you run out of RAM your machine may start moving things to disk to make space your programs may keep running but they will become slow in most cases you are better off moving data to ram only when it is necessary for processing this is sometimes called out of core computing and it's the strategy we're going to use to process data for datasets that are at least 20% of the size of RAM and are also represented as dense matrices matrices where most of the values are not 0 you should consider using a big matrix which is implemented in the big memory package by default a big matrix keeps data on the disk and only moves it to RAM when it is needed as a result it won't bog down your machine when you run out of RAM the movement of data from the disk to RAM is implicit meaning that users don't have to make function calls to move the data the package detects when needed data resides on disk and moves it for them another advantage of using a big matrix object is that since it is stored on disk it only needs to be imported once you read in a big matrix object similar to reading a data frame however doing this creates a backing file that holds the data in binary format along with a descriptor file that tells our how to load it in a subsequent session you simply point R at these two files and they are instantly available without having to go through the import process again here's a first example of creating a big matrix object first we load the big memory package using the library function then we create a big matrix object the six parameters specify the number of rows number of columns the type of elements the big matrix will hold the initial value for all elements of the matrix the name of the backing file and the name of the descriptor file the backing file holds the binary representation of the matrix on the disk the descriptor file holds other information about the big matrix like the number of rows number of columns type and column and row names if they're all ready to print the elements of the big matrix object you need to explicitly state that you want to see the elements if you simply type X then you'll see other information including its type and the handle it holds to its underlying C++ data structure big matrix behaves like a regular r matrix to change the value in the first row and the first column 2 3 assign 3 to X 1 1 we can verify that this change has taken place by once again looking at all of its elements time to put this into practicein this section we're going to cover the use of the big memory package this is a package I wrote to store manipulate and process dense matrices called big matrices that may be larger than a computer's ramp we're going to cover how to create retrieve subsets and summarize big matrices as mentioned before our objects are kept in RAM this is much faster than using the disk but there is less ram than disk when you run out of RAM your machine may start moving things to disk to make space your programs may keep running but they will become slow in most cases you are better off moving data to ram only when it is necessary for processing this is sometimes called out of core computing and it's the strategy we're going to use to process data for datasets that are at least 20% of the size of RAM and are also represented as dense matrices matrices where most of the values are not 0 you should consider using a big matrix which is implemented in the big memory package by default a big matrix keeps data on the disk and only moves it to RAM when it is needed as a result it won't bog down your machine when you run out of RAM the movement of data from the disk to RAM is implicit meaning that users don't have to make function calls to move the data the package detects when needed data resides on disk and moves it for them another advantage of using a big matrix object is that since it is stored on disk it only needs to be imported once you read in a big matrix object similar to reading a data frame however doing this creates a backing file that holds the data in binary format along with a descriptor file that tells our how to load it in a subsequent session you simply point R at these two files and they are instantly available without having to go through the import process again here's a first example of creating a big matrix object first we load the big memory package using the library function then we create a big matrix object the six parameters specify the number of rows number of columns the type of elements the big matrix will hold the initial value for all elements of the matrix the name of the backing file and the name of the descriptor file the backing file holds the binary representation of the matrix on the disk the descriptor file holds other information about the big matrix like the number of rows number of columns type and column and row names if they're all ready to print the elements of the big matrix object you need to explicitly state that you want to see the elements if you simply type X then you'll see other information including its type and the handle it holds to its underlying C++ data structure big matrix behaves like a regular r matrix to change the value in the first row and the first column 2 3 assign 3 to X 1 1 we can verify that this change has taken place by once again looking at all of its elements time to put this into practice\n"