The Need for Special Methods in Survival Analysis
In this lesson, we will discuss why special methods are necessary for survival analysis. The first important point is that duration times are always positive, and therefore, we need to work with distributions that can handle positive outcomes. The linear model, for example, assumes a normal distribution, which is not very appropriate for positive outcomes. A common distribution used to model duration times is the viable distribution, and the corresponding model is called the viable model.
Historically, in survival analysis, the survival function has been a measure of interest. We will learn about the survival function in the next lesson. There are other measures that are more interesting in survival analysis than in other areas, such as the hazard function. The last point why we need special methods for survival analysis is probably the most important: censoring.
Censoring occurs when we only know that an event did not happen until a certain time point, but we have no knowledge about what happened after that. For example, let's think about the cab example again. Each day you call a cab and want to analyze how long it takes them to arrive at your house. The cab on day 1 arrives at your house after 5 minutes. Cap 2 doesn't arrive until time point 6, and you get annoyed and decide to walk instead. That leads to you never observing when the cab actually arrives on day 3. The cab does not arrive in the first two minutes but then you fall asleep and never observe what happened. This type of censoring is called right censoring, and it's the most common type of censoring in survival analysis.
There are two other types of censoring: left censoring and interval censoring. Left censoring occurs when we only know that an event occurred before a certain time point, but we have no knowledge about what happened after that. Interval censoring occurs when we know that an event occurred within a certain time interval, but we don't know the exact time of occurrence. We will not cover these types of censoring in this course.
When working with right sensor time to event data, we need to specify the type of censoring appropriately. In our example, we have times 5, 6, 2, and 4, and for the event indicator, it's one if the event happened and zero otherwise. This means that the two censored individuals 2 and 3 have a value of zero. With the R package survival, we can specify that the variables time and event belong together by creating a serve object with the serve function.
The serve object is also known as a survival object in one of the upcoming exercises. We will take a look at it more deeply and see what it looks like in the GBS g2 dataset. Speaking of packages, we haven't told you about the air packages yet, aside from the packages that store the datasets. We will focus on two packages during this course: most importantly, we will use the survival package. It provides all functionality for basic survival analysis and is a very widely used R package.
The survival package allows the user not only to do survival analysis but also to visualize the results, additional to the plotting features in the survival package. We will be using the serve minor package for more advanced visualizations. We will focus on interpreting visualizations in this course, since we will skip the mathematically more advanced interpretation of the model effect estimates.
The Need for Special Methods in Survival Analysis
In order to properly analyze and understand the data, it is necessary to use specialized methods that take into account the unique characteristics of survival analysis. One of these methods involves working with distributions that can handle positive outcomes, which is a key characteristic of duration times. The linear model, which is commonly used in many fields of study, assumes that the data follows a normal distribution. However, this assumption may not hold true for survival analysis, where duration times are always positive.
Another important consideration in survival analysis is censoring. Censoring occurs when we only know that an event did not happen until a certain time point, but we have no knowledge about what happened after that. This type of censoring can occur when the event is unknown or unobserved at the end of the study period. For example, let's consider a study where participants are followed over time to see if they experience a certain outcome, such as heart attack or death. In this case, we may not know whether an individual has experienced the outcome by the end of the study period.
Censoring can be further divided into two subtypes: right censoring and left censoring. Right censoring occurs when we only know that an event occurred after a certain time point, but we have no knowledge about what happened before that point. Left censoring occurs when we only know that an event occurred before a certain time point, but we have no knowledge about what happened after that point. In both cases, the goal is to estimate the probability of experiencing the event within a given time frame.
When working with right sensor data, it's essential to specify the type of censoring appropriately. This involves identifying whether the individual has experienced the event or not and using that information to determine the appropriate survival distribution. The survival package in R provides a convenient way to create a serve object that can handle this process.
The Use of Survival Analysis
Survival analysis is a statistical technique used to analyze data where the outcome of interest is measured at multiple time points, rather than at a single point in time. This type of analysis is commonly used in fields such as medicine, engineering, and social sciences. The goal of survival analysis is to estimate the probability of experiencing an event or outcome within a given time frame.
Survival analysis involves using specialized methods that take into account the unique characteristics of duration times and censoring. One of these methods is the use of survival distributions, which are mathematical models that describe the probability of experiencing an event over time. The most commonly used survival distribution is the exponential distribution, but other distributions such as the Weibull and lognormal distributions may also be used.
In addition to estimating the probability of experiencing an event, survival analysis can also be used to estimate the median survival time and the hazard rate. The median survival time is the point in time at which half of the individuals have experienced the event, while the hazard rate represents the instantaneous risk of experiencing the event over a given time period.
The use of survival analysis has many practical applications in fields such as medicine, engineering, and social sciences. For example, it can be used to estimate the probability of heart attack or death among patients with certain medical conditions. It can also be used to predict the failure rate of equipment or machines.