Survival Analysis - EconPedia

Survival analysis or duration modelling is a widely applied statistical method for estimating the expected time until a specific event of interest occurs. In recent decades, this method has garnered significant attention within the field of social sciences, particularly in economic-based disciplines, in addition to its widespread application in clinical trials and engineering research. The use of survival analysis techniques provides the estimation of failure (or success) probabilities over a given period of time, thereby enabling a comprehensive representation of time-to-event data. Furthermore, it facilitates the examination of the interplay between various factors, resulting in identifying the pivotal determinants of events. This knowledge can then inform decision-making, attain forecasting objectives and promote effective policies and strategies.

Inference survival analysis techniques are grounded in the theoretical foundations reflected in the following contributions. Kaplan and Meier (1958)[1] developed a non-parametric survival function estimator as a modern version of the classical life tables used in actuary science (Halley, 1693). The second work introduced a semi-parametric approach to model survival as a function of covariates, namely the Cox proportional hazards model (Cox, 1972)[2] The third is the development of the theory of counting processes, particularly the introduction of the non-parametric Nelson-Aalen estimator of the cumulative hazard function in survival analysis (Nelson,1972)[3] and (Aalen, 1978).[4] Additional information on methodologies utilized in survival analysis can be sourced from the works of Andersen (Anderson et al., 1993)[5], Kalbfleisch & Prentice (2002)[6] and Lawless (2011).[7]

Methodology

The proper selection of methodology for survival analysis is of utmost importance as it depends on the assumptions regarding the appropriate distribution of parameters. The choices for modelling the survival function are non-parametric (such as the Kaplan-Meier and Aalen-Johansen models), semi-parametric (Cox proportional hazard regression model), and parametric (Exponential, Log-Normal, Log-Logistic or Weibull models).[8] Moreover, in survival analysis, often, the event is unobserved for all individuals or units in the sample, resulting in censoring of the data. Several models can handle censored data and provide better representations of time-to-event data than traditional methods. A common technique for addressing this challenge is the Kaplan-Meier estimator. In the case of bankruptcy, some firms may still be operating at the time of the study, while others may no longer exist. Here, the time of bankruptcy is unknown for firms still in business.

A central concept in survival analysis is the hazard function (also known as the hazard rate or failure rate) which measures the immediate risk of an event occurring at a specific time, given that the event has not yet occurred. It is defined as the ratio of the probability of failure in a small-time interval around a given time to the likelihood of survival up to that period. The hazard function can take on different forms depending on the underlying distribution of the event time. In parametric survival analysis, the hazard functions may be fully specified depending on either a set of covariates or time. For example, suppose the event time follows an exponential distribution. In that case, the hazard function is constant over time, while if the event time follows a Weibull distribution, the hazard function can be increasing, decreasing or constant.[9] In non-parametric survival analysis, no assumptions are made regarding the functional form of the hazard function. In contrast, semi-parametric survival analysis involves the imposition of constraints and assumptions on the functional forms of the hazard function, resulting in a shape that varies upon the specific form of the distribution utilized.

Hazard Function

The cumulative hazard function, also referred to as the integrated hazard function, is a crucial concept in Survival Analysis. It estimates the cumulative risk of failure over time and is defined as the integral of the hazard function with respect to time.

It is important to note that the cumulative hazard function is closely related to the survival function, which is the complement of the cumulative hazard function. The survival function represents the probability that an individual or unit will survive beyond a given time and is estimated by taking the exponential of the negative cumulative hazard function.

Economic Objectives

It is imperative to acknowledge that while survival analysis provides many benefits, it also has certain limitations to acknowledge. These shortcomings of survival analysis include its univariate nature, with a single response variable (failure time) despite multiple explanatory variables. The binary response is useful only if individual survival is classified as very short or long. A second limitation is the censoring bias[10], as some observations are not spotted until or after the event has occurred, making it difficult to predict future outcomes accurately. Third, some models assume that the underlying hazard rate remains constant over time or has a particular functional shape, such as the Weibull or exponential distribution. In some circumstances, these presumptions might not be true, resulting in incorrect forecasts. Fourth, survival analysis is typically performed on one variable at a time, which may only partially capture the complex interaction between multiple variables that influence the event of interest. Furthermore, some covariates may change over time, making it challenging to incorporate them into a survival analysis. This can lead to biased estimates if they are not adequately accounted for.

A wide range of economic objectives can be achieved through the application of survival analysis. At the outset, the probability of success or failure of individuals and entities, even products or services, can be explored, as well as how the risk of failure or likelihood of success evolves over time. It may offer a clear visual depiction of the probability function of the survival model, i.e. a representation of the cumulative distribution function to the hazard rate. As an illustration, the prediction of economic events, such as the determination of the time frame for 50% of manufacturing firms to reach milestones such as breaking even, beginning exports, or exiting foreign markets. Alternatively, survival analysis could suggest that within 3 years, only 20% of consumers in a given market would exhibit a significant increase in brand loyalty.

[/vc_column_inner]Next, the analysis of survival functions over time for various individuals, groups, or industries, differentiated by their unique characteristics, can yield valuable insights. It is particularly noteworthy as it enables the comparison of the likelihood of a specific event occurring across various populations. Examples include Individuals who have received vocational training are 75% more likely to re-enter the workforce after 18 months than those who have not. The attainment of comparative advantage in the public procurement market by small and medium-sized enterprises (SMEs) takes longer than it does for larger firms. Additionally, the examination of the performance of businesses with gender-balanced executive boards may result in the enhancement of gender equality.

Moreover, survival analysis provides a means to explore relationships between one or more covariates.[11] As an illustration, the market entry probability of a food retailer can be influenced by several factors, including its relative size, the intensity of competition, and constraints in the supply chain. An examination of the relationship between a firm’s investment in innovation and its propensity to engage in exports may also be conducted. It may also yield valuable insights into the affinity between certain economic events or government policies and regulations such as financial incentives, trade agreements, marketing promotions, etc. and various entities. Consequently, this could be used to predict future events or to prioritize policy interventions even if the underlying causes are unknown.

Survival Analysis Data

Survival analysis data can be employed at a range of levels of aggregation, extending from the micro level, such as individuals, firms, and products, to the macro level, including sectors and countries. At the micro level, survival analysis can be utilized to investigate the duration until the occurrence of events, such as the time to detect dissatisfied clients, the expansion of firms into foreign markets, or the acceptance of a new product in the market. At the macro level, survival analysis can be employed to examine the time required for countries to reach a certain level of economic development or to calculate the probabilities of competitiveness for various sectors in response to policy interventions, such as the implementation of tax incentives.

A case study for survival analysis can be demonstrated through examining unemployment spell durations to explore the duration for which individuals remain unemployed before reintegrating into the workforce. Data is frequently collected on the initiation and termination dates of unemployment spells, along with relevant covariates such as age, educational attainment, and prior work experience. Using survival analysis, researchers can estimate the hazard function, which quantifies the immediate risk of securing employment at any given point during the unemployment spell, and the survival function, which illustrates the probability of continued unemployment for a specified time frame. Additionally, the factors that are correlated with extended unemployment spells, including economic circumstances, demographic characteristics, and labour market policies, may be identified. This type of research can prove useful to policy-makers in developing and implementing strategies to reduce unemployment and enhance labour market efficiency.[12]

Conclusion

In conclusion, Survival analysis enables the examination of the outcomes of individuals or entities over an extended period of time. This approach provides insights into the evolution of individuals’ behavior, market dynamics, decision-making processes of governments and businesses and more. As a result, the usage of survival analysis techniques on time-to-event data has the potential to inform and augment economic research by providing a more comprehensive understanding of complex systems and behaviors. The appropriate selection of models should be made judiciously, taking into account the assumptions regarding the distribution of parameters. Survival analysis can serve as a potent tool in the economist’s arsenal offering accurate depictions of time-to-event data, the calculation of survival probabilities over time, the examination of the interplay between various variables, and the provision of forecasting information to decision-makers regarding the likely outcomes of various policies.

References

[1] Kaplan, E. L., & Meier, P. (1958). Nonparametric estimation from incomplete observations. Journal of the American statistical association, 53(282), 457-481.

[2] Cox, D. (1972), ‘Regression models and life-tables’, Journal of the Royal Statistical Society. Series B (Methodological), 187–220.

[3] Nelson, W. (1972). Theory and applications of hazard plotting for censored failure data. Technometrics, 14(4), 945-966.

[4] Aalen, O. (1978). Nonparametric inference for a family of counting processes. The Annals of Statistics, 701-726.

[5] Andersen, P. K., Borgan, Ø., Gill, R. D., & Keiding, N. (1993). Statistical models based on counting processes. Springer Science & Business Media

[6] Kalbfleisch, J. D., & Prentice, R. L. (2002). The Statistical Analysis of Failure Time Data (Vol. 360). John Wiley & Sons..

[7] Lawless, J. F. (2011). Statistical models and methods for lifetime data. John Wiley & Sons.

[8] Emmert-Streib, F., & Dehmer, M. (2019). Introduction to survival analysis in practice. Machine Learning and Knowledge Extraction, 1(3), 1013-1038.

[9] Jiang, R., & Murthy, D. N. P. (2011). A study of Weibull shape parameter: Properties and significance. Reliability Engineering & System Safety, 96(12), 1619-1626.

[10] Leung, K. M., Elashoff, R. M., & Afifi, A. A. (1997). Censoring issues in survival analysis. Annual review of public health, 18(1), 83-104.

[11] Hosmer, D. W., & Lemeshow, S. (1999). Applied survival analysis: Time-to-event (Vol. 317). Wiley-Interscience.

[12] See more in Meyer, B. D. (1990). Unemployment Insurance and Unemployment Spells. Econometrica, 58(4), 757-782.