R Tutorial - Why do we need special methods for time-to-event data

The Need for Special Methods in Survival Analysis

In this lesson, we will discuss why special methods are necessary for survival analysis. The first important point is that duration times are always positive, and therefore, we need to work with distributions that can handle positive outcomes. The linear model, for example, assumes a normal distribution, which is not very appropriate for positive outcomes. A common distribution used to model duration times is the viable distribution, and the corresponding model is called the viable model.

Historically, in survival analysis, the survival function has been a measure of interest. We will learn about the survival function in the next lesson. There are other measures that are more interesting in survival analysis than in other areas, such as the hazard function. The last point why we need special methods for survival analysis is probably the most important: censoring.

Censoring occurs when we only know that an event did not happen until a certain time point, but we have no knowledge about what happened after that. For example, let's think about the cab example again. Each day you call a cab and want to analyze how long it takes them to arrive at your house. The cab on day 1 arrives at your house after 5 minutes. Cap 2 doesn't arrive until time point 6, and you get annoyed and decide to walk instead. That leads to you never observing when the cab actually arrives on day 3. The cab does not arrive in the first two minutes but then you fall asleep and never observe what happened. This type of censoring is called right censoring, and it's the most common type of censoring in survival analysis.

There are two other types of censoring: left censoring and interval censoring. Left censoring occurs when we only know that an event occurred before a certain time point, but we have no knowledge about what happened after that. Interval censoring occurs when we know that an event occurred within a certain time interval, but we don't know the exact time of occurrence. We will not cover these types of censoring in this course.

When working with right sensor time to event data, we need to specify the type of censoring appropriately. In our example, we have times 5, 6, 2, and 4, and for the event indicator, it's one if the event happened and zero otherwise. This means that the two censored individuals 2 and 3 have a value of zero. With the R package survival, we can specify that the variables time and event belong together by creating a serve object with the serve function.

The serve object is also known as a survival object in one of the upcoming exercises. We will take a look at it more deeply and see what it looks like in the GBS g2 dataset. Speaking of packages, we haven't told you about the air packages yet, aside from the packages that store the datasets. We will focus on two packages during this course: most importantly, we will use the survival package. It provides all functionality for basic survival analysis and is a very widely used R package.

The survival package allows the user not only to do survival analysis but also to visualize the results, additional to the plotting features in the survival package. We will be using the serve minor package for more advanced visualizations. We will focus on interpreting visualizations in this course, since we will skip the mathematically more advanced interpretation of the model effect estimates.

The Need for Special Methods in Survival Analysis

In order to properly analyze and understand the data, it is necessary to use specialized methods that take into account the unique characteristics of survival analysis. One of these methods involves working with distributions that can handle positive outcomes, which is a key characteristic of duration times. The linear model, which is commonly used in many fields of study, assumes that the data follows a normal distribution. However, this assumption may not hold true for survival analysis, where duration times are always positive.

Another important consideration in survival analysis is censoring. Censoring occurs when we only know that an event did not happen until a certain time point, but we have no knowledge about what happened after that. This type of censoring can occur when the event is unknown or unobserved at the end of the study period. For example, let's consider a study where participants are followed over time to see if they experience a certain outcome, such as heart attack or death. In this case, we may not know whether an individual has experienced the outcome by the end of the study period.

Censoring can be further divided into two subtypes: right censoring and left censoring. Right censoring occurs when we only know that an event occurred after a certain time point, but we have no knowledge about what happened before that point. Left censoring occurs when we only know that an event occurred before a certain time point, but we have no knowledge about what happened after that point. In both cases, the goal is to estimate the probability of experiencing the event within a given time frame.

When working with right sensor data, it's essential to specify the type of censoring appropriately. This involves identifying whether the individual has experienced the event or not and using that information to determine the appropriate survival distribution. The survival package in R provides a convenient way to create a serve object that can handle this process.

The Use of Survival Analysis

Survival analysis is a statistical technique used to analyze data where the outcome of interest is measured at multiple time points, rather than at a single point in time. This type of analysis is commonly used in fields such as medicine, engineering, and social sciences. The goal of survival analysis is to estimate the probability of experiencing an event or outcome within a given time frame.

Survival analysis involves using specialized methods that take into account the unique characteristics of duration times and censoring. One of these methods is the use of survival distributions, which are mathematical models that describe the probability of experiencing an event over time. The most commonly used survival distribution is the exponential distribution, but other distributions such as the Weibull and lognormal distributions may also be used.

In addition to estimating the probability of experiencing an event, survival analysis can also be used to estimate the median survival time and the hazard rate. The median survival time is the point in time at which half of the individuals have experienced the event, while the hazard rate represents the instantaneous risk of experiencing the event over a given time period.

The use of survival analysis has many practical applications in fields such as medicine, engineering, and social sciences. For example, it can be used to estimate the probability of heart attack or death among patients with certain medical conditions. It can also be used to predict the failure rate of equipment or machines.

"WEBVTTKind: captionsLanguage: enin this lesson we will discuss why we need special methods for survival analysis why can't they just compute a linear model the first important point of why we need particular methods for survival analysis is the fact that duration times are always positive so we need to work with distributions that can handle positive outcomes the linear model for example assumes a normal distribution which is not very appropriate for positive outcomes a common distribution to model duration times is the viable distribution and the corresponding model is called the viable model which we will discuss later in this course historically in survival analysis the survival function has been a measure of interest we will learn about the survival function in the next lesson there are some other measures that are more interest in survival analysis than in other areas like for example the hazard function the last point why we need special methods for survival analysis is probably the most important censoring in the example shown here we know that for individual 1 the event happened at time point 5 of individual 2 we only know that the event did not happen until the time point 6 but we have no knowledge about what happened after that let's think about the cab example again each day you call a cab and want to analyze how long it takes them to arrive at your house the cab on day 1 arrives at your house after 5 minutes cap 2 doesn't arrive until time point 6 and you get annoyed and decide to walk instead that leads to you never observing when the cab actually arrives on day 3 the cab does not arrive in the first two minutes but then you fall asleep and never observe what happened the caps on days four and five arrive after four minutes type of censoring shown here is called right censoring and the most common type of censoring and survival analysis they exist two other types of censoring left and interval censoring which we will not cover in this course when working with right sensor time to event data we need to specify this appropriately NR in our example we have times five six two four and for the event indicator is one if the event happened and zero otherwise this means the two censored individuals two and three have a value of zero with the R package survival we can specify that the variables time and event belong together we do this using a serve object created with the serve function we will also call this a survival object in one of the upcoming exercises we will take a look at the serve object more deeply and see what it looks like in the GBS g2 dataset but speaking of our packages I haven't told you about the air packages we will be using in this course yet aside from the packages which store the datasets we will focus on two packages during this course most importantly we will use the survival package it provides all functionality for basic survival analysis and is a very why widely used R package the survival package allows the user not only to do survival analysis but also to visualize the results additional to the plotting features in the survival package we will be using the serve minor package for more advanced visualizations we will focus on interpreting visualizations in this course since we will skip the mathematically more advanced interpretation of the model effect estimates now let'sin this lesson we will discuss why we need special methods for survival analysis why can't they just compute a linear model the first important point of why we need particular methods for survival analysis is the fact that duration times are always positive so we need to work with distributions that can handle positive outcomes the linear model for example assumes a normal distribution which is not very appropriate for positive outcomes a common distribution to model duration times is the viable distribution and the corresponding model is called the viable model which we will discuss later in this course historically in survival analysis the survival function has been a measure of interest we will learn about the survival function in the next lesson there are some other measures that are more interest in survival analysis than in other areas like for example the hazard function the last point why we need special methods for survival analysis is probably the most important censoring in the example shown here we know that for individual 1 the event happened at time point 5 of individual 2 we only know that the event did not happen until the time point 6 but we have no knowledge about what happened after that let's think about the cab example again each day you call a cab and want to analyze how long it takes them to arrive at your house the cab on day 1 arrives at your house after 5 minutes cap 2 doesn't arrive until time point 6 and you get annoyed and decide to walk instead that leads to you never observing when the cab actually arrives on day 3 the cab does not arrive in the first two minutes but then you fall asleep and never observe what happened the caps on days four and five arrive after four minutes type of censoring shown here is called right censoring and the most common type of censoring and survival analysis they exist two other types of censoring left and interval censoring which we will not cover in this course when working with right sensor time to event data we need to specify this appropriately NR in our example we have times five six two four and for the event indicator is one if the event happened and zero otherwise this means the two censored individuals two and three have a value of zero with the R package survival we can specify that the variables time and event belong together we do this using a serve object created with the serve function we will also call this a survival object in one of the upcoming exercises we will take a look at the serve object more deeply and see what it looks like in the GBS g2 dataset but speaking of our packages I haven't told you about the air packages we will be using in this course yet aside from the packages which store the datasets we will focus on two packages during this course most importantly we will use the survival package it provides all functionality for basic survival analysis and is a very why widely used R package the survival package allows the user not only to do survival analysis but also to visualize the results additional to the plotting features in the survival package we will be using the serve minor package for more advanced visualizations we will focus on interpreting visualizations in this course since we will skip the mathematically more advanced interpretation of the model effect estimates now let's\n"