

STATISTICS 

Year : 2011  Volume
: 2
 Issue : 4  Page : 145148 

Survival analysis in clinical trials: Basics and must know areas
Ritesh Singh^{1}, Keshab Mukhopadhyay^{2}
^{1} Department of Community Medicine, College of Medicine and JNM Hospital, Kalyani, West Bengal, India ^{2} Department of Pharmacology, College of Medicine and JNM Hospital, Kalyani, West Bengal, India
Date of Web Publication  31Oct2011 
Correspondence Address: Ritesh Singh Department of Community Medicine, College of Medicine and JNM Hospital, Kalyani, West Bengal  741 235 India
Source of Support: None, Conflict of Interest: None  Check 
DOI: 10.4103/22293485.86872
Abstract   
Many clinical trials involve following patients for a long time. The primary event of interest in those studies is death, relapse, adverse drug reaction or development of a new disease. The followup time for the study may range from few weeks to many years. A different set of statistical procedures are employed to analyze the data, which involves time to event an analysis. It is a very useful tool in clinical research and provides invaluable information about an intervention. This article introduces the researcher to the different tools of survival analysis. Keywords: Cox proportional hazard model, hazard ratio, survival analysis
How to cite this article: Singh R, Mukhopadhyay K. Survival analysis in clinical trials: Basics and must know areas. Perspect Clin Res 2011;2:1458 
Introduction   
Clinical trials are conducted to assess the efficacy of new treatment regimens. The major events that the trial subjects suffer are death, development of an adverse reaction, relapse from remission, and development of a new disease entity. ^{[1]} Medical articles dealing with survival analysis often use Cox's proportional hazards regression model. These statistical models takes into consideration time until an event of interest occurs and compare the cumulative probability of events over time for two or more cohorts, while adjusting other influential covariates. This article outlines the must know areas of survival analysis and introduces the reader to oftenused terms in the survival analysis.
History of Survival Analysis   
Survival analysis is a collection of statistical procedures for data analysis, for which the outcome variable of interest is time until an event occurs. It is the study of time between entry into observation and a subsequent event. The term 'Survival analysis' came into being from initial studies, where the event of interest was death. Now the scope of the survival analysis has become wide. Today scientists are using it for time until onset of disease, time until stock market crash, time until equipment failure, time until earthquake, and so on. ^{[2]} Common events studied are death, disease, relapse, and recovery. Few examples of studies where tools of survival analysis are used are: leukemia patients and time in remission, time to develop a heart disease for normal individuals, elderly population and time until death, and heart transplants and time until death. ^{[3]}
No one is sure of the birth of this statistical procedure. Probably it originated centuries ago, but only after World War II a new era of survival analysis has emerged, being stimulated by an interest in the reliability of military equipment. At the end of the war these newly developed statistical methods, emerging from strict mortality data research to failure time research, quickly spread through the private industry as customers became more demanding of safer, more reliable products.
Life Table Analysis   
In longitudinal studies it is often of interest to estimate a 'survival' curve for the population. What proportion of the population survive beyond a specified time interval without a particular event happening? ^{[4]} The most straightforward way to describe the survival in a sample is to compute the Life Table. The life table technique is one of the oldest methods for analyzing survival data. The distribution of survival times is divided into a certain number of intervals. For each interval we can then compute the number and proportion of cases or objects that entered the respective interval 'alive,' the number and proportion of cases that failed in the respective interval (number of terminal events, or number of cases that died), and the number of cases that were lost or censored (will be described later) in the respective interval. Based on those numbers and proportions, several additional statistics can be computed, such as, the number of cases at risk, proportion failing, proportion surviving, survival function, hazard rate, and median survival time. This procedure is used for larger samples where the time intervals are large enough to be broken down into smaller units. ^{[5]} By using the life table analysis we can find out the probability of whether a woman who retained an IUD for the first six months will still have it by the end of the twentieth month. Similarly we can find out if Mrs. A, who has retained her IUD until now (the beginning of the eleventh month) and Mrs. B, who has also retained her IUD until now (the beginning of the thirteenth month) will both lose their IUDs within the next six months.
Some Common Terms Used in Survival Analysis   
Censoring
Most survival analyses consider a key analytical problem called censoring. It occurs when we have some information about individual survival time, but we do not know the survival time exactly. Three reasons of censoring are: When a person does not experience the event before the study ends, when a person is lost to followup during the study period, and when a person withdraws from the study because of death (if death is not the event of the interest) or some other reason like adverse drug reaction. Censoring is of two types, right and left. ^{[6]} We generally encounter rightcensored data. Leftcensored data can occur when a person's survival time becomes incomplete on the left side of the followup period for the person. As an example, we may follow up a patient for any infectious disorder from the time of his or her being tested positive for the infection. We may never know the exact time of exposure to the infectious agent.
Survival function
Survival function, S (t) gives the probability that a person survives longer than some specified time t . It gives the probability that the random variable T exceeds the specified time t . The survival function is fundamental to a survival analysis. The survivor function is often expressed as a KaplanMeier curve. The name is a misnomer as in the actual data case scenario we get the step functions rather than smooth curves. Vertical drop in a KaplanMeir curve indicates an event.
Hazard function
The hazard function h (t) gives the instantaneous potential per unit time for the event to occur, given the individual has survived up to time t. It is the probability of failure in an infinitesimally small time period between y and y + ∆ y given that the subject has survived up till time y. In this sense, the hazard is a measure of risk: The greater the hazard between times y1 and y2, the greater the risk of failure in this time interval. The hazard function has its own importance, as it provides an insight into the conditional failure rates; it may be used to identify a specific model form and it is the vehicle by which mathematical modeling of survival data is carried out.
Hazard ratio
Hazard ratio (HR) is akin to relative risk. It has been used to describe the outcome of therapeutic trials where the question is, to what extent can treatment shorten the duration of an illness. ^{[7]} The hazard ratio is an estimate of the ratio of the hazard rate in the treated versus the control group. For example if there are two groups, group 1 and group 2, HR = 4.5 for treatment means that the risk (of relapse) for group 2 is 4.5 times that of group 1. If HR = 1 then Group 1 h (t) = Group 2 h (t).
Cox Proportional Hazards Model   
Clinical trials commonly record the length of time from study entry to a disease endpoint for a treatment and a control group. These data are commonly depicted with a KaplanMeier curve, from which the median (time at which, in 50% of cases, an event of interest has occurred) and the mean (average time for the event) can be derived. There are several methods available to analyze timetoevent curves, such as Cox proportional hazards, logrank, and Wilcoxon two sample tests. The Cox proportional hazards model has been the most widely used because of its applicability to a wide variety of clinical studies. ^{[8]} The Cox model was introduced by Cox, in 1972, for analysis of survival data with and without censoring, for identifying differences in survival due to treatment and prognostic factors (covariates or predictors or independent variables) in clinical trials. The Cox model is a regression method for survival data. It provides an estimate of the hazard ratio and its confidence interval. Cox regression is considered a 'semiparametric' procedure because the baseline hazard function, h _{0} (t), does not have to be specified. There are two assumptions about the Cox proportional hazard model: The hazard ratios of two people are independent of time, and are valid only for timeindependent covariates. This means that the hazard functions for any two individuals at any point in time are proportional. In other words, if an individual has a risk of death at some initial point in time that is twice as high as that of another individual, then at all later times the risk of death remains twice as high.
In a survival study, one should ensure that patients are not removed from the study just before they die. Survival studies often recruit patients over a long period of time and so it is also important to verify that other factors remain constant over the period, such as, the way patients are recruited into a study, and the diagnosis of the disease. The Cox model is popular as it is robust, the estimated hazards are always nonnegative and the hazard ratio can be calculated.
Logistic regression is applied when the investigators examine the relationship between risk factors and various disease events. The ability to consider the time element of event occurrences by proportional hazards models has meant that logistic regression has played a less important role in the analysis of survival data. ^{[9]} The Cox model is preferred over the logistic model, which ignores survival time and censoring information. ^{[10]} Given a Cox model and the coefficients, we can subsequently estimate the baseline hazard function and the survival curves.
Log Rank Test   
The log rank test (also known as the Mantel logrank test, the Cox Mantel logrank test, or the MantelHaenszel test) is a form of Chisquare test. ^{[11]} It calculates a test statistic for testing a null hypothesis that the survival curves are the same for all groups, in other words, to test a null hypothesis where there is no difference between the populations in the probability of an event at any time point. For each time point the observed number of deaths in each group and the number expected if there has been no difference, are calculated. The number of expected is calculated as the proportion of subjects who are at risk at a given time point multiplied by the total number of events at that point. The log rank test is based on the same assumptions as the hazard ratio that the survival probabilities are the same for subjects early and late in the study, and the events happen at the time specified. The test is more likely to detect a difference between groups when the risk of an event is consistently greater for one group than another. It is unlikely to detect a difference when survival curves cross. Hence it is useful to plot survival curves when analyzing survival data. Under the null hypothesis, the logrank statistic is approximately chisquare with one degree of freedom. Thus, a Pvalue for the logrank test is determined from tables of the chisquare distribution.
There are other tests for survival data. One of the important one is the 'Peto test'. It is an alternative to the logrank test. In contrast to the logrank test, the Pito test uses a weighted average of the observed minus expected score. It places more emphasis on the information at the beginning of the survival curve where the number at risk is large.
Discussion   
Survival analysis is a very good tool when a researcher takes into account the time till an event occurs and the censored data. There are some common mistakes performed by researchers when applying tools of survival analysis for their research. ^{[12]} The first being, only data related to an event of interest occurring is reported. The time of the event is not mentioned. How long patients were observed with no events occurring is not considered. It is evident that events would be observed more frequently in patients with longer followup times than in patients with a short followup. Evaluation of raw event frequencies without mention of time will produce biased results. Similarly, when we get biased results, no distinction is made as to whether a patient suffered an event or was censored. The third error is not including the censored data in the analysis. If we take a specific proportion of events from both the groups, without taking into account the censoring, a different method of statistics should be employed, and not the survival analysis technique.
Conclusion   
There are three primary goals of survival analysis, to estimate and interpret survival and / or hazard functions from the survival data; to compare survival and / or hazard functions, and to assess the relationship of explanatory variables to survival time. Survival analysis provides a great tool for analyzing the time to an event type of data, which is very common in any clinical trial. Researchers are not using it frequently because they are not confident in the theory of its application and its interpretation. There are books available that provide the basic knowledge on survival analysis. They should not make common mistakes while applying these tools to their data.
References   
1.  Lee HP. On clinical trials and survival analysis. Singapore Med J 1982;23:1647. [PUBMED] 
2.  Smith T, Smith B. Survival analysis and the application of Cox′s proportional hazards modeling using SAS. Statistics, Data Analysis, and Data Mining. 
3.  Survival analysis. A selflearning text. Kleinbaum DG, editor. USA: Springer; 2005. 
4.  Booth JG, Hirschl TA. Life Table analysis using weighted survey data. June 2005. Available from URL: http://bscb.cornell.edu/~booth/papers/lifetable.pdf. [Last accessed on 2011 Sep 06]. 
5.  Ives M, Funk R, Dennis M. Survival Analysis/Life Tables. Available from URL: http://www.chestnut.org/li/downloads/training_memos/survival_analysis.pdf. [Last accessed on 2011 Sep 06]. 
6.  Prinja S, Gupta N, Varma R. Censoring in clinical trials: Review of survival analysis techniques. Indian J Community Med 2010;35:21721. [PUBMED] 
7.  Spruance SL, Reid JE, Grace M, Samore M. Hazard ratio in clinical trials. Antimicrob Agents Chemother 2004;48:278792. [PUBMED] [FULLTEXT] 
8.  Cox DR, Oakes D. Analysis of survival data. London, England: Chapman and Hall; 2001 
9.  Abbott RD. Logistic regression in survival analysis. Am J Epidemiol 1985;121:46571. [PUBMED] 
10.  Seminar in Statistics: Survival Analysis Presentation 3: The Cox proportional hazard model and its characteristics. In: Fabsic P, Evgeny V, Zemmer K, editors. Zurich; 2011. 
11.  An Introduction to Survival Analysis. Mark Stevenson. EpiCentre, IVABS, Massey University. June 4, 2009. 
12.  Zwiener I, Blettner M, Hommel G. Survival analysis. Dtsch Arztebl Int 2011;108:1639. [PUBMED] [FULLTEXT] 
This article has been cited by  1 
Exploring the key genes and signaling transduction pathways related to the survival time of glioblastoma multiforme patients by a novel survival analysis model 

 Yuan Xia,Chuanwei Yang,Nan Hu,Zhenzhou Yang,Xiaoyu He,Tingting Li,Le Zhang   BMC Genomics. 2017; 18(S1)   [Pubmed]  [DOI]   2 
Le concept de l’analyse de survie : vérifier l’applicabilité 

 P. Mouracade   Progrès en Urologie. 2017; 27(6): 331   [Pubmed]  [DOI]   3 
DCEMRI prediction of survival time for patients with glioblastoma multiforme: using an adaptive neurofuzzybased model and nested model selection technique 

 Azimeh N.V. Dehkordi,Alireza KamaliAsl,Ning Wen,Tom Mikkelsen,Indrin J. Chetty,Hassan BagherEbadian   NMR in Biomedicine. 2017; : e3739   [Pubmed]  [DOI]   4 
Receipt of thyroid hormone deficiency treatment and risk of herpes zoster 

 ShaoChung V. Hsia,Lie Hong Chen,HungFu Tseng   International Journal of Infectious Diseases. 2017; 59: 90   [Pubmed]  [DOI]   5 
Estimating Dynamic Signals From Trial Data With Censored Values 

 Ali Yousefi,Darin D. Dougherty,Emad N. Eskandar,Alik S. Widge,Uri T. Eden   Computational Psychiatry. 2017; 1: 58   [Pubmed]  [DOI]   6 
Feature selection through validation and uncensoring of endovascular repair survival data for predicting the risk of reintervention 

 Omneya Attallah,Alan Karthikesalingam,Peter J. E. Holt,Matthew M. Thompson,Rob Sayers,Matthew J. Bown,Eddie C. Choke,Xianghong Ma   BMC Medical Informatics and Decision Making. 2017; 17(1)   [Pubmed]  [DOI]   7 
Anticoagulants effect on preadult growth of Aedes aegypti using artificial membrane feeding with stochastic approach 

 A. Ahdika,N. Lusiyana,M.H.S. Kurniawan   Model Assisted Statistics and Applications. 2016; 11(4): 339   [Pubmed]  [DOI]   8 
Effect of temperature and relative humidity on the development times and survival of Synopsyllus fonquerniei and Xenopsylla cheopis, the flea vectors of plague in Madagascar 

 Katharina S. Kreppel,Sandra Telfer,Minoarisoa Rajerison,Andy Morse,Matthew Baylis   Parasites & Vectors. 2016; 9(1)   [Pubmed]  [DOI]   9 
Survival functions for defining a clinical management Lost To FollowUp (LTFU) cutoff in Antiretroviral Therapy (ART) program in Zomba, Malawi 

 Beth Rachlis,Donald C. Cole,Monique van Lettow,Michael Escobar   BMC Medical Informatics and Decision Making. 2016; 16(1)   [Pubmed]  [DOI]   10 
Quantifying Domestic Used Electronics Flows using a Combination of Material Flow Methodologies: A US Case Study 

 T. Reed Miller,Huabo Duan,Jeremy Gregory,Ramzy Kahhat,Randolph Kirchain   Environmental Science & Technology. 2016;   [Pubmed]  [DOI]   11 
Data Mining of Gene Arrays for Biomarkers of Survival in Ovarian Cancer 

 Clare Coveney,David Boocock,Robert Rees,Suha Deen,Graham Ball   Microarrays. 2015; 4(3): 324   [Pubmed]  [DOI]   12 
Machine Learning Approaches for Predicting Radiation Therapy Outcomes: A Clinicianæs Perspective 

 John Kang,Russell Schwartz,John Flickinger,Sushil Beriwal   International Journal of Radiation Oncology*Biology*Physics. 2015; 93(5): 1127   [Pubmed]  [DOI]   13 
Duration of disease influences survival to discharge of Thoroughbred mares with surgically treated large colon volvulus 

 E. S. Hackett,R. M. Embertson,S. A. Hopper,J. B. Woodie,A. J. Ruggles   Equine Veterinary Journal. 2015; : n/a   [Pubmed]  [DOI]   14 
Bayesian neural network approach for determining the risk of reintervention after endovascular aortic aneurysm repair 

 Omneya Attallah,Xianghong Ma   Proceedings of the Institution of Mechanical Engineers, Part H: Journal of Engineering in Medicine. 2014; 228(9): 857   [Pubmed]  [DOI]   15 
Topic modeling for cluster analysis of large biological and medical datasets 

 Weizhong Zhao,Wen Zou,James J Chen   BMC Bioinformatics. 2014; 15(Suppl 11): S11   [Pubmed]  [DOI]   16 
Research on the Indices for Demonstrating Cell Conditions 

 IkHyun Kim,SungBum Pan   Journal of Sensor Science and Technology. 2012; 21(5): 324   [Pubmed]  [DOI]   17 
Immediate Rehabilitation of Completely Edentulous Jaws With Fixed Prostheses Supported by Implants Placed Into Fresh Extraction Sockets and in Healed Sites 

 Ugo Covani,Bruno Orlando,Aniello D'Ambrosio,Vincenzo Bucci Sabattini,Antonio Barone   Implant Dentistry. 2012; 21(4): 272   [Pubmed]  [DOI]  



