Joint modeling of longitudinal change in tumor cell level and time to death of breast cancer patients: In case of Ayder comprehensive specialized Hospital Tigray, Ethiopia

Abstract Abstract: Breast cancer is the major public health problem throughout the world and it results in serious physical damages and death. This work proposes the use of joint model to study breast cancer in patients of Ayder Hospital. The primary motivation is to contribute to the understanding of the tumor cell progression of breast cancer, within Ayder Hospital, using a joint model that takes into account a possible existence of a serial correlation structure within a same subject observations from September 2015 till December 2018. The general aim of this study was to investigate the risk of longitudinal change in tumor cell level on time to death due to breast cancer among breast cancer patients. Hospital-based retrospective cohort study was conducted among breast cancer patients. A joint model of longitudinal and time to death model was used to determine the risk of longitudinal change in tumor cell level on time to death due to breast cancer patients. These were used by using JM package in R version. Results from joint models, showed that the longitudinal Tumor cell progression was signicantly associated with the survival probability of these patients(estimated association parameter(ɑ) in the joint model is 0.84 with corresponding (95% CI: 2.28,2.37). A comparison between parameter estimates obtained in this joint model and independent survival and longitudinal analysis lead us to conclude that independent analysis brings up bias parameter estimates. There is a strong association between the progression change in log(TCL) and risk of mortality due to breast cancer.

The main resarch topic of our group is Breast cancer, a major public health problem in the developing countries. We investigate risk factors and pathophysiological mechanisms, which cause or are associated with Breast cancer. Additionally, we work on improving the present Breast cancer diagnostics. The basis for all these research work is Joint modeling of survival and longitudinal analysis of Breast cancer. Thus, we perform studies intending to investigate joint modelong of survival_ longtitudinal analyiss.

PUBLIC INTEREST STATEMENT
This study was about Breast cancer in Ayder referral hospital _ Ethiopia. Breast cancer destroys the lives of many young individuals in developing countries. As it is known that the main predictive factors for Breast cancer patients are more health variables for this disease, data about its pathophysiological variables are important for interpretation in disease. The study presents data of the normal cohort of the Ayder, Registry. In many cases, due to a lack of awareness for women from health workers practices, these diseases claim lots of lives of the community who should not have otherwise died. With regular research and dissemination, among other interventions, it is very possible to contain these diseases. More researches are still encouraged in this area and also in another hospital.

Background
Cancer is a group of diseases that causes by the uncontrolled growth and spread of abnormal cells anywhere in the body, if the spread is not controlled, it can result in death. Most types of cancer cells eventually from a lump or mass called a tumor, and are named after the part of the body where the tumor originates (Diabate et al., 2018).
Breast cancer is a type of malignant tumor which it starts in the cells of the breast tissue that is made up of glands for milk production, called lobules, and the ducts that connect the lobules to the nipple. the majority of breast cancer cases are classified as either invasive or noninvasive. The invasive type of breast cancer is distributed in the whole body; but not the noninvasive (Abay et al., 2018). Cancer constitutes enormous burdens in more and less economically developed countries. Studies also suggested that over the years, the burden has shifted to less developed countries, which currently account for about 57% of cases and 65% of cancer deaths worldwide (Torre et al., 2015).
Breast cancer is the most commonly diagnosed cancer and the second leading cause of cancer death among African women. In Africa from 92,600 cases, 50,000 deaths in 2008 . In sub-Saharan Africa, breast cancer is one of the non-communicable diseases, and the most commonly diagnosed cancer in women (Brinton et al., 2014). It is estimated that around 10,000 Ethiopian women and men have breast cancer with thousands of more cases unreported as women living in rural areas often seek treatment from traditional healers before seeking help from the government health system (Lemlem et al., 2013).
The average age of diagnosis of breast cancers among African women tends to be young, with estimates that a majority of cancers develop among women 50 years or younger; a considerably younger age than seen in Caucasian populations (Sighoko et al., 2013). Genetic or environmental factors or interplay of the two may be additional factors involved (Brinton et al., 2014). Age has significant effect on women breast cancer patients and it indicated that 1.35 times (95% CI;1.12, 2.22) more likely died in ages 65 years and older Educational status and marital status are prognostic factors for mortality of women with breast cancer (Lan et al., 2013). A study conducted from Braga's hospital diagnosed with a malignant tumor by using linear-mixed model shows that tumor stage III and IV have an increasing of the tumor cell values as compared to tumor stage 0, I, II (Borges et al., 2015). Covariates such as time, age, tumor stage and tumor size have a positive significant effect on the linear mean progression of tumor cell level. On the other hand, Tumor stage (III or IV versus 0 or I or II) have a significant effect on patients' survival. Patients with Tumor stage III or IV were 3.52 times more likely to die than patients with Tumor stage 0 or I or II (Borges et al., 2015). Tumor size, lymph node metastasis status, and tumor extension had significant effect on breast cancer survival (Desantis et al., 2013). Stage and pathology type had significant effect on breast cancer survival (Kantelhardt et al., 2014).
High proportion of breast cancer morbidity and mortality was observed in age category of 15 years of age and above in both men and women in Tigray, Northern Ethiopia. Overall breast cancer mortality was 2.3% during the study period (Ajemu et al., 2019). Translating Raman spectroscopy technology by using the RESpect probe as a potential point-of-care screening instrument has the potential to change the paradigm of screening for cancer as an initial step to determine when a definitive tissue biopsy would be necessary (Agsalda-Garcia et al., 2019). The profile of a cancer patient includes summarization and visualization of the results of WES and RNAseq analysis (specific variants and significantly expressed genes, respectively) and the clinical profile, integration/comparison of these results and a prediction regarding the disease trajectory (Kosvyra et al., 2019).
Joint models reduce bias in the estimates of the overall treatment effect, that is, the treatment effect on survival and the longitudinal marker (J. G. Ibrahim et al., 2010). An excellent general review article on joint modeling of longitudinal and survival data is given in Tsiatis and Davidian (Tsiatis & Davidian, 2004). JG Ibrahim et al. (2001) also give an overview of joint modeling methods in their book. Joint models for longitudinal and survival data in which the survival component of the model is a cure rate model are also useful in cancer research. Law et al. (2002), Brown (2003 consider such models for cancer clinical trials. Even though the number of people living with breast cancer is increasing from year to year; it has emerged as one of the rapidly increasing non-communicable disease and a major public health challenge in developing countries like Ethiopia (Kantelhardt et al., 2014) with a consequence of Chronicity, disability, and death. Therefore, we formulate a method that can estimate the effect of baseline covariates on Tumor cell level and time to death jointly for breast cancer patients and its complication using suggested risk factors in literature, mainly by including Socio-demographic variables (Age, Gender, Educational status, Marital status, Residence, and Alcohol drinking) and Clinical factors (Treatment used, Pathology type, stage, Baseline Tumor cell count, stage of metastatic, Baseline tumor size, Time/month).To find the estimate, we want to use a joint model of longitudinal and time to death statistical models to determine the risk of longitudinal change in tumor cell level on time to death due to breast cancer among breast cancer patients by using JM package in R version.
The general aim of this study was to investigate the risk of longitudinal change in tumor cell level on time to death due to breast cancer among breast cancer patients from September 2015 till December 2018 at Ayder Comprehensive Specialized Hospital. Specifically 1) To estimate the effect of baseline covariates on tumor cell level and time to death jointly 2) To observe the association between tumor cell level and time to death due to breast cancer among breast cancer patients 3) To demonstrate the advantage of joint model analysis techniques over separate survival and longitudinal analysis to the data.

Significance of the study
Any concerned legal body may use the output of the research to allocate proper human and material resources for breast cancer health care institutions. On the other hand, health care institution could provide fast and effective services for those women with high-risk factors of breast cancer. This research increases the education and awareness for individual, society, the government and any legal body about the risk factor of women with breast cancer and their life expectancy.

Study area and period
The study was conducted at Ayder comprehensive specialized hospital from September 2015 till December 2018. Tigray region, Northern part of Ethiopia.
Tigray region has 18 public hospitals and 170 health centers with a total population of 4,316,988.

Data source
Source of data for this study was the patient's chart from the data record office at Ayder comprehensive specialized hospital. The patients 'chart was used to extract the necessary information from different BC recording formats.

Inclusion criteria
The study includes all BC patients and started Anti-cancer treatment since September 2015 and have at least two follow up until December 2018in ACSH.

Exclusion criteria
This study excludes those patients who had incomplete variable registration cards, a person who died by other accidents, and those patients who started anti-cancer treatment from other healthcare institutions.

Study design
A facility-based retrospective cohort study design was conduct in ACSH. The study was focused on investigating the determinant factors that affect time to death among breast cancer patients with joint modeling of longitudinal change in tumor cell level at ACSH.

Study population
The study population consists of all BC patients who were under follow up and started the anticancer treatment and have at least two follow up until December 2018. From the total-registered BC patients at the hospital, only 186 of them were included in the study.

Method of data collection
The data were collected from patient charts based on those variables to be considered in this study. Both the longitudinal and survival data were extracted from the patient's chart which contains demographic and clinical variables of BC patients under treatment follow-up.

Operational definition
Censoring: When we say Observations are called censored if the information about their survival time is incomplete.
Survival time: the period that a patient Stays in life after starting the anti-cancer treatment.
Time: the period which patients visiting to the hospital during the follow up every 3 months.
Minimum follow-up time: the period that a patient who had two longitudinal measurements after started the anti-cancer treatment.

Outcome variables: • Survival case
The main outcome variable for survival data analysis is time to death due to breast canceror censored (in a month).

• Longitudinal case
The main outcome variable for longitudinal data analysis is tumor cell level measured in ng/mL every 3 month.

• Joint case
In this case, the two outcome variables are longitudinal change in tumor cell level and time to death due to breast cancer or censored (in month) among breast cancer patients. Figure 1

Independent variables
Several predictors were considering in this study for both survival and longitudinal cases.

Data processing and analysis
The analysis consists of exploratory data analysis and three different models namely; a linear mixed model for the longitudinal data, the Cox proportional-hazards model for the time to-death data, and a joint modeling of them altogether. The data were analyzed by using STATA12 to presents the categorical descriptive statistics, R version 3.6.1 to analyze the Cox, LMM models including the graphics. A joint model of longitudinal and time to death model was used to determine the risk of longitudinal change in tumor cell level on time to death due to breast cancer among breast cancer patients by using JM package in R version 3.6.1.

Data exploration
Exploratory data analysis (EDA) was conducted to investigate various associations, structures, and patterns exhibited in the data set. This consists of obtaining the summary statistics, individual profile plots, mean structure, correlation structure, and variance structure plots were obtained to gain some insights into the data. Since our data are unbalanced and had not equal observation time we had used smoothing techniques that highlight the typical response as a function of an explanatory variable without reliance on specific parametric models.

Statistical models
We have used Surival sub models for length of time from the start date of Anti-cancer treatment until the date of death or censored (measured in months), longitudinal sub-models for tumor cell level and Joint Surival_ longitudinal for true-unobserved Longitudinal covariate.

Data quality management
The supervisor & principal investigator had performed immediate supervision on a daily basis, & every activity of questionnaire was checked for completeness & consistency.

Descriptive statistics
The descriptive statistics for categorical variables in described in Tables 1 and Tables 2 as. A total of 186 breast cancer patients were included in this study. Out of these, 22 (11.83%) were dead. And also,119 (64%) patients their residence in rural. Among 186 breast cancer patients, 47 (97.92%), 42(97.67%), 53(82.81%), and 22(70.97%) were censored for them in stages stage-1, stage-2, stage-3, and stage-4, respectively. When we consider based on death for educational status of the patients, 17(18.09%) of patients are illiterate, 3(6.38%) of them are primary educated,1(14.29%) of them are secondary educated and 1(2.63%) are tertiary educated. Similarly, when we consider the covariate type of treatments, we observe that 121(65.1%) of the patient uses surgery treatment and 33(17.7%) chemotherapy whereas the remaining used both surgery and chemotherapy treatment. When we take the pathology type of BC patients, 94(50.5%) of breast cancer patients are ductal carcinoma in-situ 71 (38.2%) lobular carcinoma in-situ &8(4.3%) were ductal invasive carcinoma. But the rest of them were lobular invasive carcinoma.
From the following Table 3 we can conclude that the median age, B tumor cell, B tumor size & B_HGL of BC patients were given as 47,246,12 &12.8, respectively. and also, the median survival time of BC patient was 34.50 months

Kaplan-Meier estimates and log-rank tests
We compared the survival time of patients by using Kaplan Meier and log-rank test. When we compared by using educational status, the patient whose educational status was illiterate category had shortest survival probability than primary, secondary, and tertiary categories.
The survival probability of the BC patients who came from urban is longer than the patients who came from rural areas.
Similarly, BC patients who used the treatment both surgery & chemotherapy has smallest survival probability than the patients, who used the treatment surgery and chemotherapy.  The BC patients whose pathology type was LCIS have longest survival probability than the patients who were DCIS, DIC, and LIC. The patients whose stage was stage-4 has smallest survival probability than the patients, whose stage was stage-1, stage-2, and stage-3.
Based on Table 4 results of long-rank test, Educational Status, Treatment, Pathology type, and Residencewere statistically significant differences in the risk of mortality among BC patients at 25% level of significance. However, there is no significant difference in the risk of mortality

Variable selection and Cox PH assumption
To determine the variables to be included in the multivariable Cox survival model we used variables whose p-value <0.25 for Tables 4 and Tables 5 on the above. Educational status, Age, B tumor size, B tumor cell, pathology type, treatment, and residence were the candidate variables in the multivariate cox regression model. However, marital status, stage, metastatic& B-HGL were not included in the multivariate cox regression model.
The proportional-hazards assumption asserts that the hazard ratios are constant overtime. That means the risk of failure must be the same no matter how long subjects have been followed. In order to test this assumption, GLOBAL test was used. The result of the GLOBAL test is displayed on Table 6. From the output, it is clear to see that the p-value of GLOBAL is insignificant. This indicates that the PH assumption is not violated. Therefore, we can assume the proportionality assumption is met.

Cox proportional hazards model
After checking the assumption of proportional hazard, the survival data were analyzed based on Cox proportional hazard model. The results are presented below. From the result of multivariable analysis presented in Table 7 show that the estimated AHR, 95% CI, and p-value of the covariates in the final cox regression model for age, educational status, residence, Baseline Tumor size & Pathology type (LIC) were found to be statistically significant effect at 5% level of significance on risk of mortality due to breast cancer. The remaining variables which were found to be nonsignificant at 5% of significance level in the multivariable analysis of the final cox regression model.

Separate analysis of longitudinal data
The Q-Q plots was used to check the normality of the longitudinal measures of tumor cell in ng/ml. Figure 2 represents that the Q-Q plot for tumor cell measures of original and logarithm transformed data. The normal quintile-quintile plot indicates that the log tumor cell level is normally distributed since the line concentrates around the normal line and is an approximately straight line through the origin. Thus, the analysis of this study was used the logarithm transformed of tumor cell-level data.

Exploratory Data Analysis (EDA)
Exploratory data analysis was conducted to investigate various associations, structures and patterns exhibited in the data set. In addition, the individual profile plots, mean structure plots and variance structure plots were obtained to gain some insights of the data (Verbeke and Molenberghs, 2009).  Figure 3 visualizes the pattern change of the overall individual plots log tumor cell measurements of patients over time. It demonstrates the variability (within and between patients) in log tumor cell measurements of breast cancer patients. Since the measurements were not equally spaced across the different subjects. It shows that there is large variability between log tumor cell of patients that their measure is too extreme in some extent and deviated through time. This figure 4 indicates that there is variability in tumor cells with in & between BC patients in our LMM.

Exploring the mean structure
We explore the mean plot of log tumor cell over time to describe the longitudinal change in log tumor cell level of breast cancer patients. figure revealed that there seems slight increase in average of log tumor cell and whereas

Exploring the variance structure
Similarly, Figure 5 assures that there was increase in the mean of log tumor cell-expecting constant increase at follow-up time. The line plot of log tumor cell over time is used to observe the variability of the data. Indicated that there is log tumor cell variability at the first visit than at visit-2. The log tumor cell variability increases slightly from visit-3 to visit-4 and decreases very fast from visit-4to 5, whereas slightly increases to visit-6 and slightly decreases to visit-7. Then, the pattern shows that it increases fastly except decreases for the last follow up time figure 6.

Linear mixed model
After exploring, the data examine whether the assumption of heterogeneous within-subject variance for the log tumor cell is supported and also identify the random effects (random intercepts, random slope, and random intercept random slope) to be included in the model.
Here from the following three model examine the great reduction in the AIC of RI RS for the model incorporating subject-specific variances is an evident that subject-specific log tumor cell variances must be considered in the analysis. Also, the random effect of table tells us there is subject-specific variation. Hence, it supports the assumption of heterogeneous variance for the repeated log tumor cell measurements.

Univariate analysis of longitudinal data
From the result of the univariate analysis of linear-mixed models, Time, Age, residence, educational level, baseline Tumor cell level, pathology type, stage, metastatic, and treatment are important factors that can affect Log (TCL) of BC patients and should be included in the multivariable analysis . However, B-HGL, baseline Tumor size, and marital status of the disease were not statistically significant for Log (TCL) at 25% level of significance. Therefore, these variables are not included in the multivariable analysis of LMM.

Multi-variate analysis of longitudinal data
After made a transformation of the response variable tumor cell level, the most important variables were selected using purpose full variable selection method. The variables which were significant in the univariate analysis at 25% level of significance is all included in the multivariable analysis for the response variable of log (TCL). Therefore, from the result of multivariable analysis (Table 11) age, educational status, stage-3, stage-4, baseline Tumor cell level, time*age, time*stage-3 and time*stage-4 were statistically significant factors that affect change in log (TCL) of BC patients. But, the remaining covariates were no statistically significant at 5% level of significance Table 9Table 10.

Data analysis for joint longitudinal _ Survival analysis
The result of the joint model of longitudinal change of log Tumor cell level and time to death (Table 12) shows that age, educational status, baseline Tumor cell level, stage-3, stage-4, Time*age, Time*stage-3 and Time*stage-4 were significant factors of log (Tumor cell level) and age, educational status, pathology (DIC), pathology (LIC)and Residence were significant factors that affect the mortality rate of BC patients. The association of longitudinal change in log Tumor cell level and the mortality rate was significant, and this shows that there is a strong positive relationship (Association (HR) = 2.321 with 95% CI (2.26,3.091), p-value = <0.0001) between the two sub models.It indicated that as the number of tumor cells increases the mortality rate also increases.

Interpretation of joint model
From the result of separate and joint models, the joint models had narrow confidence interval as compared with the separate models. This indicates that the joint model is more precise than the separate models. Also, the statistical significance of the association parameter is evidence that the joint model is better than the separate models (Borges et al., 2015).The estimates of the parameters of the separate and joint models are not identical. The estimates of the association parameters in the joint models are significantly different from zero, providing strong evidence of association between the two sub-models for the joint model. The estimate of the association parameter due to trend of log Tumor cell level is positive (HR = 2.32) with 95% CI (2.28,2.37), p-value = <0.0001), which indicates that the log Tumor cell level is positively associated with the hazard of death of BC patients on follow up anti-cancer treatment. This implies that an increase of the log Tumor cell level in BC patients on follow-up treatment significantly increases the risk of death of those BC patients.
Hence, the joint model of longitudinal and time to death model is better.
Baseline age had a significant positive association with Tumor cell level (Borges et al., 2015). Similarly, the average of the log TCL increases as the age patient increases. From the result of longitudinal sub-model of log(TCL), as the baseline age of patients increases by 1 year the average log (TCL) also increases by 1.012 (p-value < 0.031) by adjusting other variables. The progression changes in log tumor cell level were decreased by 5%,4% and 7% for Female BC patients whose educational status was primary, secondary, and tertiary, respectively, as compared to whose educational status was illiterate by controlling other independent variables.
Clinical stage has a significant effect on the progression change of log TCL of breast cancer patients such that the more advanced stage is a more aggressive of log tumor cell level. The progression changes in log tumor level for stage III and stage IV patients had increased by 1.54 and 2.1 as compared to stage I patients and this difference is statistically significant since the 95% confidence interval did not include zero by controlling the other variables constant. This indicates that the latter stage at diagnosis is associated with the higher mean change of log tumor cell level. Such finding was similar to the study conducted from south Africa by (Dickens et al., 2014).And also another studies were conducted by (Borges et al., 2015) and (Borges et al., 2015) show that the late stage at diagnosis contributes to increase in the value of log TCL.
Baseline tumor cell level has been found to have a significant impact on the progression change of log tumor cell level. The progression change of log tumor cell level has increased by 1.11 when baseline tumor cell level is increased by one unit by controlling other independent variables. The interaction effect stage by time had a significant progression change on patients' log tumor cell level. That is patients with stage III by visit time and stage IV by time interaction decreased the progression change of log tumor cell by 33% and 43%, respectively, as compared to patients with stage I by time interaction. Age by time interaction have a significant progression change on log TCL such that increasing the age by a visit time interaction had a decreasing progression change about 0.6%by controlling the other covariates.
Baseline age had a significant positive association with the hazard of death (Rezaianzadeh et al., 2009). Our finding also indicates that mortality rate of breast cancer patient increases 1.09 times when age increased by 1 year by controlling other independent variables. The other significantly important variable for the survival rate of the patient is educational status. The hazard rate of patient whose educational status primary, secondary&tertiaryeducation was decreased by 93.6, 91.7, and 97.2%, respectively, when we compared with those patients, whose educational status was illiterate. Similarly, this result agreed with the research from Vietnam on Survival probability and prognostic factors for breast cancer by (Lan et al., 2013).
In this study, we found that pathology type has positive association with the risk of mortality, the Invasive pathology type at diagnosis is associated with higher risk of mortality than those insitu. The hazard rate of patients who had pathology DIC and pathology LIC were 4.01and 7.4 times more than those patients who had pathology DCIS, respectively. Residence is an important predictor for the survival time of breast cancer patients. The mortality rate of breast cancer patient whose residence was from rural side had shorter probability of surviving as compared to those patients from urban side. The hazard rate of patients whose residence was from urban is reduced by 63.9% than those patients from rural side. which shows that in Tigray regional state breast cancer patients were highly affected than their urban counterparts; this survival difference is because they reside further from treatment centers, they presented at later stages and were less compliant with treatment. This result is similar to the study conducted from South Africa by (Dickens et al., 2014).
Joint models for longitudinal and survival data are particularly relevant to breat cancer and observational studies in which longitudinal biomarkers (eg, circulating tumor cells, immune response to a vaccine, and quality-of-life measurements) may be highly associated with time to event . This result is similar to the study conducted at the University of North Carolina by J. G. Ibrahim et al. (2010) and study conducted from portugal by Borges et al. (2015).This research also shows that there is strong positive association between the progression change in log(TCL) and risk of mortality due to breast cancer. In particular, a unit increase in the value of log (TCL) corresponding to exp (0.84) = 2.32fold increase (95% CI: (2.28, 2.37)) in the risk for death at the same time point.

Conclusion
The result indicates that there is strong positive association between log(TCL) and survival time of BC patients. According to our finding, Age, educational status, pathology(DIC), pathology(LIC) and Urban were important baseline covariates that have a significant effect on time to death due to breast cancer among breast cancer patients. On the other hand, Age, educational status, base line Tumor cell level, stage-3, stage-4, time*age, Time*stage-3 and Time*stage-4 were important baseline covariates which have a significant association with progression change of TCL.
From our finding, we concluded that the relationship between log (tumor cell level) and the hazard for death was positively significant. Thus, a patient with higher tumor trend is less likely to survive. The results of both the separate and joint analyses are consistent. However, the use of a joint analysis compared to independent models adjusted for the correlation between the responses, which indicates that more adequate and efficient inferences can be made using joint model estimates. This means that joint modeling can benefit for the analyses of correlated data and an assumption of association between the two processes in a joint model of breast cancer data is necessary.