R Tutorial - Exploring Your Data Set

Exploring Real-World Network Analysis: A Case Study using Amazon Co-Purchases Data

In this case study, we will be putting into practice all the concepts learned in an introduction to network analysis course. We will use a real-world dataset that has several daily snapshots of items purchased together from Amazon over the course of 2003. These are known as co-purchases.

The first step when working with a new dataset is to explore the raw data. While we go on to create an eye graph object out of this raw data, it's useful to understand the data associated with each vertex. The columns that make up the graph itself are the "to" and "from" columns, each with a vertex ID. There is also associated metadata such as the name of the product, type of the product, sales rank, etc. We will use this later in the lesson.

We will use Dplyr to help create a graph directly from our data frame. The first step in our pipeline is to filter down to a single date. It's important that we look at just a single day so we don't conflate co-purchases on different days. Our next step is to select just the "from" and "to" columns, and finally we will say it's a directed graph. Checking the size of the graph shows there are around 10,000 vertices. The graph is obviously quite large, so we'll just look at a small sub-graph first.

We use the function induced subgraph to make a new graph that is just the first 500 vertices. Next, we delete any vertices with a degree of zero. Finally, we make a plot as you can see this might look a bit different than your Platonic ideal of a graph because just a few things tend to be purchased in a single Amazon order, that's why we see all these little clusters of connected vertices. An eye graph provides a way to count all these small sub-patterns which are called dyads for two connected vertices and triads for three connected vertices.

When we run a diet census on our graph using diet underscore census, we will see three outputs from my graph account of null asymmetric and mutual dyads. These correspond to the following sub-graph types: "no connection", symmetric, and mutual. The underlying pattern has implications for graph-level metrics like reciprocity. Triads get even more complicated because in a directed graph, there are 16 possible triad types to understand all of these, a common three-digit code is used.

The first number is a count of the pairs of vertices connected by a bi-directional symmetric edge, the second number is the count of the pairs of vertices connected by an asymmetric edge, and the third number is a count of pairs of unconnected vertices. Letter codes C, D, u, and T are used to denote whether a triad is cyclic, like ten single edges go down from the top like seven or 12, or up from the bottom like eight, or transitive like nine. A concept that says any two vertices in a triad are connected to each other then there must exist a connection between the third.

It should be clear that patterns one, two, and three are essentially the dyad patterns. When we run an eye graph triad census using the function triad underscore census, will get a count of each of these sixteen possibilities. Now it's time to count the dyads and triads in the co-purchase graph and then see how those numbers relate to the graph-level metrics of transitivity and reciprocity.

"WEBVTTKind: captionsLanguage: enhi and welcome to this case study in network analysis in this course we'll be putting into practice all the concepts you learned in your intro to network analysis using real-world datasets in this first lesson we'll be using a data set that has several daily snapshots of items purchased together from Amazon over the course of 2003 these are known as Co purchases the first step when you're working with a new dataset is to explore the raw data while will go on to create an eye graph object out of this raw data it's useful to understand the data associated with each vertex the columns that make up the graph itself are the to and from columns each with a vertex ID then there's associated metadata such as the name of the product type of the product sales rank etc we'll use this later in the lesson we'll use D plier to help create a graph directly from our data frame the first step in our pipeline is to filter down to a single date it's important that we look at just a single day so we don't conflate Co purchases on different days our next step is to select just the from and two columns and lastly we'll say it's a directed graph checking the size we can see there around 10,000 vertices the graph is obviously quite large so we'll just look at a small sub graph first we use the function induced sub graph to make a new graph that is just the first 500 vertices next we'll delete any vertices with a degree of zero finally we'll make a plot as you can see this might look a bit different than your Platonic ideal of a graph because just a few things tend to be purchased in a single Amazon order that's why we see all these little clusters of connected vertices I graph provides a way to count all these small sub patterns which are called dyads for two connected vertices and triads three connected vertices when we run a diet census on our graph using diet underscore census we'll see three outputs from my graph account of null asymmetric and mutual dyads these correspond to the following sub graph types know is when there is no connection a symmetric is when there is a single directed edge and mutual is where there are two directed edges back and forth this underlying pattern has implications for graph level metrics like reciprocity triads get even more complicated because in a directed graph there are 16 possible triad types to understand all of these a common three-digit code is used the first number is a count of the pairs of vertices connected by a bi-directional symmetric edge the second number is the count of the pairs of vertices connected by an asymmetric edge and the third number is a count of pairs of unconnected vertices letter codes C D u and T are used to denote whether a triad is cyclic like ten single edges go down from the top like seven or 12 or up from the bottom like eight or transitive like nine a concept that says any two vertices in a triad are connected to each other then there must exist a connection between the third it should be clear that patterns one two and three are essentially the dyad patterns when we run an eye graph triad census using the function triad underscore census will get a count of each of these sixteen possibilities now it's time to count the dyads and triads in the co purchase graph and then see how those numbers relate to the graph level metrics of transitivity and reciprocityhi and welcome to this case study in network analysis in this course we'll be putting into practice all the concepts you learned in your intro to network analysis using real-world datasets in this first lesson we'll be using a data set that has several daily snapshots of items purchased together from Amazon over the course of 2003 these are known as Co purchases the first step when you're working with a new dataset is to explore the raw data while will go on to create an eye graph object out of this raw data it's useful to understand the data associated with each vertex the columns that make up the graph itself are the to and from columns each with a vertex ID then there's associated metadata such as the name of the product type of the product sales rank etc we'll use this later in the lesson we'll use D plier to help create a graph directly from our data frame the first step in our pipeline is to filter down to a single date it's important that we look at just a single day so we don't conflate Co purchases on different days our next step is to select just the from and two columns and lastly we'll say it's a directed graph checking the size we can see there around 10,000 vertices the graph is obviously quite large so we'll just look at a small sub graph first we use the function induced sub graph to make a new graph that is just the first 500 vertices next we'll delete any vertices with a degree of zero finally we'll make a plot as you can see this might look a bit different than your Platonic ideal of a graph because just a few things tend to be purchased in a single Amazon order that's why we see all these little clusters of connected vertices I graph provides a way to count all these small sub patterns which are called dyads for two connected vertices and triads three connected vertices when we run a diet census on our graph using diet underscore census we'll see three outputs from my graph account of null asymmetric and mutual dyads these correspond to the following sub graph types know is when there is no connection a symmetric is when there is a single directed edge and mutual is where there are two directed edges back and forth this underlying pattern has implications for graph level metrics like reciprocity triads get even more complicated because in a directed graph there are 16 possible triad types to understand all of these a common three-digit code is used the first number is a count of the pairs of vertices connected by a bi-directional symmetric edge the second number is the count of the pairs of vertices connected by an asymmetric edge and the third number is a count of pairs of unconnected vertices letter codes C D u and T are used to denote whether a triad is cyclic like ten single edges go down from the top like seven or 12 or up from the bottom like eight or transitive like nine a concept that says any two vertices in a triad are connected to each other then there must exist a connection between the third it should be clear that patterns one two and three are essentially the dyad patterns when we run an eye graph triad census using the function triad underscore census will get a count of each of these sixteen possibilities now it's time to count the dyads and triads in the co purchase graph and then see how those numbers relate to the graph level metrics of transitivity and reciprocity\n"