Survival analysis or duration modelling is a widely applied statistical method for estimating the expected time until a specific event of interest occurs. In recent decades, this method has garnered significant attention within the field of social sciences, particularly in economic-based disciplines, in addition to its widespread application in clinical trials and engineering research. The use of survival analysis techniques provides the estimation of failure (or success) probabilities over a given period of time, thereby enabling a comprehensive representation of time-to-event data. Furthermore, it facilitates the examination of the interplay between various factors, resulting in identifying the pivotal determinants of events. This knowledge can then inform decision-making, attain forecasting objectives and promote effective policies and strategies.
Inference survival analysis techniques are grounded in the theoretical foundations reflected in the following contributions. Kaplan and Meier (1958)[1] developed a non-parametric survival function estimator as a modern version of the classical life tables used in actuary science (Halley, 1693). The second work introduced a semi-parametric approach to model survival as a function of covariates, namely the Cox proportional hazards model (Cox, 1972)[2] The third is the development of the theory of counting processes, particularly the introduction of the non-parametric Nelson-Aalen estimator of the cumulative hazard function in survival analysis (Nelson,1972)[3] and (Aalen, 1978).[4] Additional information on methodologies utilized in survival analysis can be sourced from the works of Andersen (Anderson et al., 1993)[5], Kalbfleisch & Prentice (2002)[6] and Lawless (2011).[7]
Methodology
The proper selection of methodology for survival analysis is of utmost importance as it depends on the assumptions regarding the appropriate distribution of parameters. The choices for modelling the survival function are non-parametric (such as the Kaplan-Meier and Aalen-Johansen models), semi-parametric (Cox proportional hazard regression model), and parametric (Exponential, Log-Normal, Log-Logistic or Weibull models).[8] Moreover, in survival analysis, often, the event is unobserved for all individuals or units in the sample, resulting in censoring of the data. Several models can handle censored data and provide better representations of time-to-event data than traditional methods. A common technique for addressing this challenge is the Kaplan-Meier estimator. In the case of bankruptcy, some firms may still be operating at the time of the study, while others may no longer exist. Here, the time of bankruptcy is unknown for firms still in business.
A central concept in survival analysis is the hazard function (also known as the hazard rate or failure rate) which measures the immediate risk of an event occurring at a specific time, given that the event has not yet occurred. It is defined as the ratio of the probability of failure in a small-time interval around a given time to the likelihood of survival up to that period. The hazard function can take on different forms depending on the underlying distribution of the event time. In parametric survival analysis, the hazard functions may be fully specified depending on either a set of covariates or time. For example, suppose the event time follows an exponential distribution. In that case, the hazard function is constant over time, while if the event time follows a Weibull distribution, the hazard function can be increasing, decreasing or constant.[9] In non-parametric survival analysis, no assumptions are made regarding the functional form of the hazard function. In contrast, semi-parametric survival analysis involves the imposition of constraints and assumptions on the functional forms of the hazard function, resulting in a shape that varies upon the specific form of the distribution utilized.
Hazard Function
The cumulative hazard function, also referred to as the integrated hazard function, is a crucial concept in Survival Analysis. It estimates the cumulative risk of failure over time and is defined as the integral of the hazard function with respect to time.
It is important to note that the cumulative hazard function is closely related to the survival function, which is the complement of the cumulative hazard function. The survival function represents the probability that an individual or unit will survive beyond a given time and is estimated by taking the exponential of the negative cumulative hazard function.
Economic Objectives
A wide range of economic objectives can be achieved through the application of survival analysis. At the outset, the probability of success or failure of individuals and entities, even products or services, can be explored, as well as how the risk of failure or likelihood of success evolves over time. It may offer a clear visual depiction of the probability function of the survival model, i.e. a representation of the cumulative distribution function to the hazard rate. As an illustration, the prediction of economic events, such as the determination of the time frame for 50% of manufacturing firms to reach milestones such as breaking even, beginning exports, or exiting foreign markets. Alternatively, survival analysis could suggest that within 3 years, only 20% of consumers in a given market would exhibit a significant increase in brand loyalty.
Moreover, survival analysis provides a means to explore relationships between one or more covariates.[11] As an illustration, the market entry probability of a food retailer can be influenced by several factors, including its relative size, the intensity of competition, and constraints in the supply chain. An examination of the relationship between a firm’s investment in innovation and its propensity to engage in exports may also be conducted. It may also yield valuable insights into the affinity between certain economic events or government policies and regulations such as financial incentives, trade agreements, marketing promotions, etc. and various entities. Consequently, this could be used to predict future events or to prioritize policy interventions even if the underlying causes are unknown.
Survival Analysis Data
Survival analysis data can be employed at a range of levels of aggregation, extending from the micro level, such as individuals, firms, and products, to the macro level, including sectors and countries. At the micro level, survival analysis can be utilized to investigate the duration until the occurrence of events, such as the time to detect dissatisfied clients, the expansion of firms into foreign markets, or the acceptance of a new product in the market. At the macro level, survival analysis can be employed to examine the time required for countries to reach a certain level of economic development or to calculate the probabilities of competitiveness for various sectors in response to policy interventions, such as the implementation of tax incentives.
A case study for survival analysis can be demonstrated through examining unemployment spell durations to explore the duration for which individuals remain unemployed before reintegrating into the workforce. Data is frequently collected on the initiation and termination dates of unemployment spells, along with relevant covariates such as age, educational attainment, and prior work experience. Using survival analysis, researchers can estimate the hazard function, which quantifies the immediate risk of securing employment at any given point during the unemployment spell, and the survival function, which illustrates the probability of continued unemployment for a specified time frame. Additionally, the factors that are correlated with extended unemployment spells, including economic circumstances, demographic characteristics, and labour market policies, may be identified. This type of research can prove useful to policy-makers in developing and implementing strategies to reduce unemployment and enhance labour market efficiency.[12]
Conclusion
References
[1] Kaplan, E. L., & Meier, P. (1958). Nonparametric estimation from incomplete observations. Journal of the American statistical association, 53(282), 457-481.
[2] Cox, D. (1972), ‘Regression models and life-tables’, Journal of the Royal Statistical Society. Series B (Methodological), 187–220.
[3] Nelson, W. (1972). Theory and applications of hazard plotting for censored failure data. Technometrics, 14(4), 945-966.
[4] Aalen, O. (1978). Nonparametric inference for a family of counting processes. The Annals of Statistics, 701-726.
[5] Andersen, P. K., Borgan, Ø., Gill, R. D., & Keiding, N. (1993). Statistical models based on counting processes. Springer Science & Business Media
[6] Kalbfleisch, J. D., & Prentice, R. L. (2002). The Statistical Analysis of Failure Time Data (Vol. 360). John Wiley & Sons..
[7] Lawless, J. F. (2011). Statistical models and methods for lifetime data. John Wiley & Sons.
[8] Emmert-Streib, F., & Dehmer, M. (2019). Introduction to survival analysis in practice. Machine Learning and Knowledge Extraction, 1(3), 1013-1038.
[9] Jiang, R., & Murthy, D. N. P. (2011). A study of Weibull shape parameter: Properties and significance. Reliability Engineering & System Safety, 96(12), 1619-1626.
[10] Leung, K. M., Elashoff, R. M., & Afifi, A. A. (1997). Censoring issues in survival analysis. Annual review of public health, 18(1), 83-104.
[11] Hosmer, D. W., & Lemeshow, S. (1999). Applied survival analysis: Time-to-event (Vol. 317). Wiley-Interscience.
[12] See more in Meyer, B. D. (1990). Unemployment Insurance and Unemployment Spells. Econometrica, 58(4), 757-782.
Gallery