Survival Analysis of Infiltrating Ductal Breast Cancer Patients with Cure Rate Regression Model- A Retrospective Cohort Study from a Tertiary Care Hospital in Central Kerala
Correspondence Address :
Rejani Parassery Parameswaran,
Associate Professor (Statistics), Department of Community Medicine Government Medical College, Thrissur, Kerala, India.
E-mail: rejstat@gmail.com
Introduction: Breast cancer is a leading cause of morbidity among women all around the world. Monitoring the survival pattern and identifying the prognostic factors on survival are always great concerns in cancer researches. Cure rate regression model is a useful statistical tool to predict the cure rate of cancer diseases and to determine the factors associated with survival of patients.
Aim: To estimate the cured proportion and to make out the factors associated with survival time of infiltrating ductal breast carcinoma patients in a tertiary care hospital of central Kerala.
Materials and Methods: The retrospective cohort study was conducted in the Department of Radiotherapy of a major tertiary care hospital of central part of Kerala, India. A total of 313 female patients diagnosed with infiltrating ductal breast cancer during January 2012 to December 2015 were considered for the study. The impact of the covariates, age at diagnosis of disease, grade of cancer, stage of disease, tumour stage, status of regional lymph node, distant metastasis, and triple-negative status on survival of patients were studied. The parametric mixture cure rate regression model was used for estimation and inferential procedures. The cure rate with respect to each study variable and their role on long term survival of patients were investigated.
Results: Mean age of patients was 51.95±10.91 years. The minimum cure rate found out (29.8%) among the patients presented with distant metastasis. The bivariate analysis showed the factors, stage of cancer (T-stage), status of regional lymph node and distant metastasis status influence significantly on incidence of death due to breast cancer in long run period and grade of cancer determines the survival of patients at shorter duration of time. The presence of regional lymph node and distant metastasis status were found out to be two important indicators that determine cure rate of infiltrating ductal breast cancer patients in multivariate analysis. The hazard of patients living with higher grade cancer was seen 1.48 times more than that of others (Hazard Ratio=1.48, p-value <0.001).
Conclusion: The cure rate of patients estimated between 29.8% and 69.6% with respect to various factors under study. The status of regional lymph node and distant metastasis status were found to be associated with cure rate of patients. The study results showed that the grade of cancer is one significant factor that determines survival of patients. The study recommends cure rate regression model as a useful tool to analyse breast cancer survival data in the presence of cured proportion.
Distant metastasis status, Hazard ratio, Stage of cancer, Status of lymph node, Tumour stage, Weibull distribution
Cancer is a major health problem all around the globe. According to worldwide statistics, 19.3 million people are affected by various types of cancers (1). Breast cancer is seen to be the most prevalent cancer among all cancers. About 2.26 million new cases of female breast cancer are reported around the world and the current statistics surpassed the incidence of lung cancer (2). Incidence of breast cancer is reported to be increasing in India. The projected number of patients with cancer in India is 1,392,179 for the year 2020. According to the data from population based cancer registries, breast cancer is the most common type of cancers among cancers affecting females in Kerala (3).
Mortality is the worst outcome of cancer disease. The incidence of mortality varies with cancer cases. Counting the expected number of persons survived after cancer diagnosis is interesting and at the same time relevant for clinicians and researchers in this field. The survival analysis is the statistical technique used to quantify the survival experience of study subjects and median survival time is one key indicator used to show their survivorship (4).
Due to early detection and recent advancement in treatment patterns, the number of breast cancer survivors has increased substantially and it can be observed that a significant proportion of study subjects never experience the event of interest such as death or recurrence of disease even after a long follow-up period. Such proportion of subjects is said to ‘cured’ or ‘immune’ or ‘insusceptible’ in survival analysis. The disappearance of signs and symptoms after longer medications keeps the patients away from hospital visits. Hence, a fraction of censored individuals among study subjects contribute to the cured proportion in a survival data set (5).
The regression technique in survival analysis helps to predict the factors associated with survival time of patients under study. Even though parametric models yield more precise estimates, Cox proportional hazards regression model is the mainstay in cancer research works. The popularity of this model is that, it does not make any assumption about particular distribution on survival time. Unfortunately, it does not provide any information regarding cured proportion and fails to separate the factors that influence the short term and long term survival of patients (6).
In standard survival analysis, it is assumed that all study subjects are subjected to the event of interest at the end of the study. This assumption is violated if cured proportion exists in a survival data set. These models are usually inadequate for the analysis of such data since it does not account for the possibility of cure. Cure rate models (6) are useful for the analysis of survival data with cured proportion. The cure rate models separate long term and short term survival of patients under study. Even though mixture and non mixture cure rate models are available in literature, mixture cure models attracts more by the researchers. The survival function of standard mixture cure rate model is of the form.
S(t)=(1-p)+pS0 (t) (1)
in such a way that S(t) tends to (1-p) as time t tends to infinity, where (1-p) is the proportion of cured subjects and S0 (t) is the proper survival function of lifetime t. The probability density function of t with respect to (1) is f (t)=pf0 (t) where f0 (t) is the baseline probability distribution of time t.
The factors associated with survival of breast cancer patients among residents of Kerala state, a southern region of India are discussed by many researchers (7),(8). However, to date, no study has been reported with parametric cure rate regression models to estimate the cured proportion and to identify the factors associated with survival time of breast cancer patients in central part of Kerala. Infiltrating ductal carcinoma is the most common sub-type of cancer among various types of breast cancers (9).
The present study aimed to estimate the cured proportion and to identify the factors associated with survival of infiltrating ductal breast carcinoma patients in central Kerala through parametric mixture cure rate regression model based on Weibull distribution.
A retrospective cohort study was conducted in the Department of Radiotherapy of a major tertiary care hospital of central part of Kerala, India. A total of 313 female patients diagnosed with infiltrating ductal breast cancer during January 2012 to December 2015 were considered for the study. All data were accessed with the approval of Institutional Ethical committee (As per Order No: B6-8772/2016/MCTCR(18) dated 23/06/2018).
Inclusion criteria: All female patients who were diagnosed (primary) with breast cancer from 1st January 2012 to 31st December 2015 were included in the study.
Exclusion criteria: The patients whose information related to the covariates under study are not/partially available were excluded from the study.
Sample size calculation: The sample size was calculated at 5% level of significance and for 80% power using the formula
n= Number of events / Probability of event
where the number of events=4(Zα+Zβ)2/[log(HR)]2 based on a study (10) with pre-assumed minimal survival probability=0.6 and Hazard Ratio=2. The minimum sample size was obtained as 163 and all cases (n=313) during the study period that meet inclusion criteria were taken for the study by considering the heavy censoring behavior of breast cancer data.
Procedure
Each patient’s details on age at disease diagnosis, clinical staging, grade of cancer, tumour stage, status of regional lymph node metastases, status of distant metastasis and triple-negative condition were collected from hospital records with a self-designed proforma.
In the present study, death of patients due to breast cancer is the event of interest. The survival time is defined as the time between the date of diagnosis of the disease and death or end date of the study period or last date of patient’s follow-up, whichever happens first for each study subject. The patient who lost to follow-up and who are alive until the end of study are considered as censored. Each patient’s survival experience was monitored till 31st March 2021. It ensures minimum follow-up time of five years for all patients.
The impact of all selected variables on cured proportion and survival time of patients were evaluated through regression analysis with parametric cure rate model. The authors use logistic distribution model assumption to estimate the cured proportion and Weibull distribution (11) for designing short term survival of study subjects. The outline of the model characteristics and methods used for estimation are as follows:
Cure Rate Regression Model
Let T be a non negative random variable representing time to occurrence of the event.
Define the indicator variable Y=1 if, the individual eventually experience the event of interest and Y=0, otherwise.
For Y=1, the time T has the probability density function f(t|Y=1) and survival function S(t|Y=1). Let p=Pr(Y=1), the probability of incidence. Let Z be a (p+1)×1 vector of covariates. Assume that b=(b0, b1, ..., bp) and β=(β0, β1, ..., βp) are vectors of regression coefficients corresponding to the factors associated with incidence of death and long term survivors. The survival function of Weibull cure rate regression model based on standard model (1) is defined as:
S (t|Y=1,Z)=1-p+S0(t|Y=1,Z) (2)
where,
The probability function p (incidence) models long term effect of covariates on the cure status of study subjects and the conditional survival function (latency) focuses on the short term effect of covariates that concerns about uncured study subjects.
Computational Method
Denote the observations for the ith individual be (ti , δi, Zi), i=1, ....n where ti is observed (survival) time or the censoring time, δi is the indicator function given by δi =1, if ti is uncensored and δi =0, otherwise. Assume that t1,..., tm are the survival times and tm+1,..., tm+n are censored times and censoring is statistically independent of Y. Obviously, the random variable Y=1 for the first m individuals and is unknown for the remaining n-m individuals. Then the likelihood function for cure rate model corresponding to the observations (ti, δi, Zi), i=1,2,...,n is
L=L1×L2 (3) where,
STATISTICAL ANALYSIS
Regression analysis was done by maximising the likelihood function formulated using above given equations via Expectation-maximisation (EM) algorithm technique to avoid the loss of missing information due to heavy censoring in the entire data set. The factors influencing short term and long term survival of patients identified together with cured proportion. The data analysis was performed using the optimisation function N.ArgMax in Wolfram Mathematica software version 10.0. Likelihood-Ratio test was performed to find out the influence of covariates on survival of patients and p-value <0.05 was considered as statistically significant.
Out of the total 313 patients studied, death occurred in 71 (22.68%) patients, and 242 (77.32%) cases were censored. The mean age of patients was 51.95±10.91 years. More than half of the total patients were in advanced stages (stage 3 and stage 4). Grade 3 and 4 cancer cases were reported as a higher grade (Table/Fig 1). Each patient’s estrogen and progesterone receptor status and Human epidermal growth factor receptor 2 (HER2) positivity were documented and the data were classified into two groups to know the role of Triple-negative status on survival of patients under study rather than each receptor’s individual effect. A detailed description of the data is given in (Table/Fig 1).
A Kaplan-Meier plot along with 95% confidence interval is drawn to bring out the basic survival pattern of patients under study and displayed in (Table/Fig 2). The curve is not tapered to zero and there is large plateau after 2600 days. Hence, it is evident that long term survivors (cured proportion) are present in the data. It also shows the adequacy of cure rate regression model for the analysis of data.
The results of bivariate analysis are depicted in (Table/Fig 3). The minimum cure rate found out (29.8%) among the patients presented with distant metastasis compared to all other groups. The factors stage of cancer, tumour stage (T-stage), status of lymph node and distant metastasis status influence significantly on incidence of death due to breast cancer at longer duration of time but, grade of cancer determines the survival of patients under 3study. All significant (p-value <0.05) factors except stage of cancer were taken for multivariate analysis of data. The variable stage was removed from the final analysis since T-stage, lymph node status and distant metastasis status together reflect the effect of cancer staging. The multivariate analysis outcome proved that the presence of regional lymph node and distant metastasis status are two important indicators that determine cure rate of infiltrating ductal breast cancer patients. The cured proportion among patients who presented with distant metastasis and regional lymph node is 35% (CI=0.30, 0.40). The grade of cancer again showed highly significant relation with the survival of patients. The hazard of patients living with higher grade cancer is 1.48 times more than that of others (Hazard Ratio=1.48, p<0.001) and their median survival time is two years and nine months. The patients who belong to low grade cancer survive about four years and two months. (Table/Fig 4) describes the results of final model.
The survival time and related factors are pathways to formulate new treatment methods and patient care proposals in health field. It is observed that age at diagnosis is not a predicting factor of survival and the result is consistent with a study from Karnataka (12). It is essential for clinicians to have awareness about the clinical staging of disease to fix their treatment plan since there is an inverse relation between stage of cancer and patient’s survival. In India, early and advanced stage breast cancers are seen in equal proportions (13). But in the present study, the number of advanced stage cancers are 38.66% i.e., less than 50% but significant inverse relation found out with the incidence of death due to breast cancer (HR=2.63, p-value <0.001).
Lymph node status is an important prognostic factor of breast cancer survival (14). The proposed model predicts lymph node status as a significant factor influencing the survival time of breast cancer patients (p-value <0.002). This observation could be further explored with more robust studies in future with inclusion of number of regional lymph nodes involved.
The present study revealed that grade of cancer is a crucial element to determine the survival of patients. The similar results are reported by Rezaianzadeh A et al., in one of their studies from Southern Iran (15). The risk of patients with higher grade cancer reported to be more than 1.48 times than that of patients with lower grade in our study (HR=1.48).
Triple-negative breast cancers account for 15% to 20% of breast cancer diagnosis. According to a recent study conducted in Kerala, 16.3% of breast cancer is triple-negative (16). But, more number of cases found out in the present study (23.96%). Even though, there is an increase in the number of triple-negative breast cancers, it does not show any relation with survival time of patients (p-value=0.832). As per current study, the cure rate of breast cancer patients with distant metastasis in the presence of regional lymph node is 29.8%. Investigations from different regions worldwide have been reported variant rates (17),(18). The metastasis site is an important cause of these variations in rates (19).
In survival studies conducted in the field of medical science, the investigators are more interested to find out the factors associated with the survival of patients from various events of interest such as death, disease recurrence, relapse, etc. Even though, parametric and nonparametric models are available for the regression analysis of survival data, the cox proportional hazards model is the most common regression technique to handle survival data. But the assumption of proportionality of hazard functions may not be valid for all lifetime data. For example, in a study comparing the treatment effectiveness of autologous and allogeneic bone marrow transplants for acute myelogenous leukaemia patients, it can be observed that the patients in the autologous group show rapid progress in survival at the initial stage and the trend gradually changes. In such cases, the researchers are prevented from the use of the Cox model and the parametric model is the single choice for the analysis of data. Moreover, if any parametric distribution is found out to be the best fit for a given lifetime data set, the corresponding model will provide more efficient regression estimates than that obtained from a non parametric or a semi-parametric model (20). The deviation from proportionality assumption of hazard rates, the presence of immune, and occurrence of covariates together in survival data add to the burden of analysis and at this stage, parametric cure models perform well and give better results to the investigators.
Nair N et al., had conducted a wide survey on breast cancer survival outcome analysis (21). They had adopted the Cox Proportional Hazards model to explain the role of covariates on the survival of patients and the results are seen to be much similar to the current study. As compared to their work, the main advantage and differentiating feature found out in the present study is that the proposed model could separate the effect of covariates on the incidence of the event of interest (death) and survival of patients. There is no such information available in their reports. Hence, the present study recommends using cure rate regression models in future investigations on the survival experience of breast cancer patients with the possibility of cure.
The cure rate regression models are useful not only for analysing breast cancer but also for various types of diseases with the capability of cure. Mirzaee M et al., used the Cox mixture cure model to assess the role of independent variables for the prediction of allograft survival in the short-term and long-term after kidney transplantation (22). Akhlaghi AA et al., explained long-term and short-term survival of patients undergoing Continuous Ambulatory Peritoneal Dialysis (CAPD) with various parametric cure models (23). The model provides information on survival to measure progress and effectiveness of various treatments against breast cancer and it helps to implement preventive strategies for cancer control and increase the possibility of cure. The breast cancer survival studies with special attention to the significant covariates identified in the current study can yield more forceful results and it will be more helpful for clinicians to plan their treatment strategies and patient follow-up. The resulting prolonged survival benefits attained by the patients save themselves and their living community as a whole.
The article may be helpful for clinicians and researchers to select the appropriate model for the analysis of medical and health related data. In the present study, the post treatment effects and the role of competing risks are not considered for the prediction of the survival time of patients. The studies in this direction are going on and will be reported in a future article.
Limitation(s)
A study on prospective cohort design can provide more accurate results even though its feasibility is less due to long-term follow-ups of study subjects being necessary for the collection of data. The proposed model is useful if, the lifetime data follows the Weibull distribution. Sometimes, the adequate form of the lifetime distribution may not be known and in such cases, semi-parametric regression models are more useful for the analysis of survival data. Also, the model described in this work is applicable only, if the cured proportion is present in a lifetime dataset.
Cure rate regression models can separate short and long term survival of patients and determines the factors influencing on survival of study subjects. The cure rate for various study factors estimated between 29.8% and 69.6% and the minimum value reported for distant metastasis. The presence of regional lymph node and distant metastasis lead the mortality among breast cancer patients in long run period, the cancer grade determines their survival in short time period.
The authors are thankful to the referee and editor for the constructive comments and suggestions on earlier version of this manuscript that appreciably improved the article. The authors are thankful to all research assistants for needful services provided throughout the study. The first author expresses sincere thanks to State Board of Medical Research for funding this research.
DOI: 10.7860/JCDR/2022/51739.16194
Date of Submission: Aug 08, 2021
Date of Peer Review: Nov 15, 2021
Date of Acceptance: Dec 23, 2021
Date of Publishing: Apr 01, 2022
AUTHOR DECLARATION:
• Financial or Other Competing Interests: None
• Was Ethics Committee Approval obtained for this study? Yes
• Was informed consent obtained from the subjects involved in the study? NA
• For any images presented appropriate consent has been obtained from the subjects. NA
PLAGIARISM CHECKING METHODS:
• Plagiarism X-checker: Aug 11, 2021
• Manual Googling: Dec 22, 2021
• iThenticate Software: Jan 18, 2022 (5%)
ETYMOLOGY: Author Origin
- Emerging Sources Citation Index (Web of Science, thomsonreuters)
- Index Copernicus ICV 2017: 134.54
- Academic Search Complete Database
- Directory of Open Access Journals (DOAJ)
- Embase
- EBSCOhost
- Google Scholar
- HINARI Access to Research in Health Programme
- Indian Science Abstracts (ISA)
- Journal seek Database
- Popline (reproductive health literature)
- www.omnimedicalsearch.com