Comparison of classical tumour growth models for patient derived and cell-line derived xenografts using the nonlinear mixed-effects framework

In this study we compare seven mathematical models of tumour growth using nonlinear mixed-effects which allows for a simultaneous fitting of multiple data and an estimation of both mean behaviour and variability. This is performed for two large datasets, a patient-derived xenograft (PDX) dataset consisting of 220 PDXs spanning six different tumour types and a cell-line derived xenograft (CDX) dataset consisting of 25 cell lines spanning eight tumour types. Comparison of the models is performed by means of visual predictive checks (VPCs) as well as the Akaike Information Criterion (AIC). Additionally, we fit the models to 500 bootstrap samples drawn from the datasets to expand the comparison of the models under dataset perturbations and understand the growth kinetics that are best fitted by each model. Through qualitative and quantitative metrics the best models are identified the effectiveness and practicality of simpler models is highlighted


Introduction
In the process of preclinical evaluation of drug effects on tumours, an essential first step is the accurate modelling of tumour kinetics and the appropriate choice of tumour growth model to do so. Inferring which mathematical model is the best to describe tumour growth kinetics is performed through comparing the fit of the model to the data which is very often done in the context of fitting each model to single growth curves individually. Previous studies have focused on comparing tumour models in that manner over the past decades [3,4,16,24], with few focusing on a plethora of mathematical models including empirical, functional and structural tumour models. The results from such studies are comparable with a few exceptions owning to different fitting optimization methods and variance assumptions [4,17]. More specifically, in [17] with the exception of the logistic and von Bertalanffy model which were unable to describe the data, the other 12 models captured the dynamics sufficiently with the Gompertz model being one of the best performing empirical models and superior to the exponential particularly in the case of spheroid dynamics. Similarly in [4] the Gompertz model is at the top two performing models in both the breast cancer data along with the exponential-linear model, which is the Simeoni model used in our study, as well as the lung data together with the power law model. The paper by Benzekry et al. [4] is very interesting due to the fact that the comparison of the model is performed using both individual fits and mixed-effects with data from 20 mice for lung cancer and 34 mice for breast cancer (2 cell lines) and several goodness-of-fit metrics were used.
Fitting of models to individual curves might limit our understanding of which model is best at capturing a given dataset as a whole and at simulating xenograft growth in general. That has led to the use of nonlinear mixed-effects, which allow for a simultaneous fit of the model to the totality of the data, being the most popular approach for pharmacokinetics/pharmacodynamics (PK/PD) analysis in the pharmaceutical industry. Nonlinear mixed-effects (NLME) models are statistical models that allow for both fixed-effects that stay constant at every measurement and random-effects that change depending on the sample/level (individuals) [20]. As a result they can account for different types and levels of variability in addition to residual errors and hence give rise to multilevel/hierarchical structures. Due to the ability of NLME models to capture correlations between data, such as multiple measurements from the same study or from the same individual, they are very suitable for modelling multiple tumour growth data since they can fit the whole dataset without the need to fit each curve individually or assume independence [6] and hence lead to more generalized results. Rarely are NLME models used for large preclinical datasets with the work by Parra-Guillen et al. [22] being the first large dataset systematic modelling of xenografts using that framework and the paper by Benzekry et al. [4] being the only model comparison paper using mixed-effects. To the best of our knowledge our paper is the first that performs a model comparison in such extensive datasets for both patient-derived xenografts (PDXs) and cell-line derived xenografts (CDXs) and attempting to generalize by comparing the performance of tumour growth models under perturbations of the original dataset.
This paper is focused on xenograft data and more particularly patient-derived and cellderived ones. These come from either direct engraftment of tumour tissue from the patient to immuno-deficient mice (PDX) or by in vitro cultivation of the tumour cell-lines that are then engrafted to immuno-deficient mice (CDX). Xenografts have advantages over the classic cell-line studies as they are considered to be more heterogeneous and their microenvironment more similar to that of the patient tumour [27]. That has led to xenograft studies becoming very popular for cancer research as they are deemed a more appropriate model for comparison to clinical studies. Among these, PDXs are considered an even better approximation due to the fact that CDX have already been exposed to an artificial culture environment perhaps leading to loss of their characteristics [27]. On the other hand, the direct tissue engraftment of PDX preserves both the original micro-environment and cell-cell interactions [15].

Data
In this project we explore the performance of seven mathematical models, extensively used in the context of tumour growth, by fitting them on PDX data from the Novartis PDX encyclopedia found in the supplementary material of Gao et al. [12] and internal AstraZeneca CDX data.
Data used for the comparison of the mathematical models came from pre-existing studies, no animal experiments were performed for the current study. The data come from two different sources the first being the Novartis PDX encyclopedia which is the biggest publicly available database of PDX data. As the primary aim of this study is the fitting of the models on pure growth we considered only the untreated (control) cases which comprise of 220 volume change data from 220 PDXs without replicates (no. of 1) with six different tumour types, namely breast carcinoma (BRCA), non-small cell lung carcinoma (NSCLC), gastric cancer (GC), colorectal cancer (CRC), cutaneous melanoma (CM) and pancreatic ductal carcinoma (PDAC) as shown in Figure 1. There were 2442 datapoints in total. The authors Gao et al. [12] showed the effect of the genetic drift due to passages was minimal with the PDXs demonstrating phenotypic stability and PDX from the same patient having almost perfect expression correlation. To make sure the drift was minimized the study in Gao et al. was conducted with PDXs between 4 and 10 passages.
The second source was internal CDX studies. All animals studies in the United Kingdom were conducted in accordance with the UK Home Office legislation, the Animal Scientific Procedures Act 1986, and with AstraZeneca Global Bioethics Policy. For PD studies, tumour cells were subcutaneously inoculated into athymic nude mice and measured in regular intervals (initially several times a week) using calipers. These data originate from independent studies. Mice were sacrificed when the total tumour weight reach the specified welfare limit (2.5 cm 3 ). There were 25 cell lines in total (with multiple replicates in each) that spans several different tumours. Data as well as tumour distribution per cell-line can be seen in Figure 2.
The range of measurements varied widely for individual mice with the longest measurement being at 165 days for the Novartis dataset and 45 for the internal. Due to the different growth rates, we observe something called a dropout where fast-growing cancers are terminated early which biases the fitting towards calculating a slower average growth rate [13]. Hence, there is the need to introduce a cut-off at the measurements, one that is long enough so that we have enough data points but short enough as to minimize the effect of the dropout and keep as many mice as possible. Figure 3 shows a comparison of the different cut-off times for the Novartis data.
We want to keep the framework for the comparison as similar as possible between studies and hence we introduce the same cut-off to the CDX dataset as well. A similar analysis for that dataset is not required as the cut-off does not decrease the number of samples.
We can see that mice are being taken off the study at every possible cut-off but perhaps the most dramatic change is the difference in growth curves from the cut-off of 20 to 30 days with 186 animals being left after 20 but 120 after 30. This can also be seen when the step is 20 days. That indicates that a cut-off of around 20 days is both a reasonable time-frame for the curves to have evolves as well as a point where still the majority of models remain with a good representation of both fast and slow growing tumours. Although an exploration of the optimal number around the 20th day mark was not performed we selected 21 days to represent exactly three weeks of measurements decreasing the dataset to 1359 datapoints.
We did not find strong evidence to suggest that different tumour types demonstrate different growth kinetics (see Figure 4) and hence we aggregated the data from all tumour types together for the fitting and comparison of the models.

Tumour growth models
Various mathematical models have been formulated to capture tumour growth each with different assumptions and simplifications. We focus on five models which have been widely used for this purpose.

Exponential model
The exponential model is the simplest of models used for that purpose. It assumes that a tumour is growing proportionally to its volume at a constant rate given by the growth parameter k, leading to an unbounded growth. Although that is the case in early stages of tumour growth [5] evidence suggests a slowing down of the initial rate [21] and after a certain size for various reasons, one being the inability of oxygen reaching the inner part of the tumour resulting in a hypoxic centre that does not contribute to the overall growth. That translates in the growth law changing to something different than an exponential for larger tumours.

Gompertz model
The Gompertz model was formulated in 1825 by Benjamin Gompertz and has since found many applications in biology where the curves take a S-shape form. In tumour dynamics in particular the equation accounts for the progressive decrease of the growth rate as the tumour gets larger. For small volumes the first term dominates and the growth law is almost exponential. Whereas for higher values it slows down significantly leading to a steady-state for V. Two possible explanations have been given the for the second term in the Gompertz model one being a possible aging/differentiation of cells which increases the doubling time of cells and the other being the diffusion limited dynamics of oxygen on the inside of the tumour assuming a critical concentration below which hypoxia occurs giving rise to a hypoxic radius [7]. The disadvantage of this method is the fact that a steady-state has not been observed experimentally.

Simeoni model
Which, are often combined in one equation: The exponential-linear model which for simplicity we will call the Simeoni model [23] is built around a similar idea to the Gompertz model where tumour growth starts as exponential and switches to linear with the difference that the volume does not reach a steady-state but instead keeps growing at steady rate after the switch. The critical volume at which this happens can be expressed in terms of the two growth parameters in order to have continuity (V th = λ 1 /λ 0 ). The parameter ψ was kept fixed at the value 20 as in the original paper in order to not introduce a third parameter into the model. This corresponds to the switching parameter and a high enough value, as tested in [23], offers a smooth and rapid switching comparable to the two-equation system.

Conger model
This model first appeared in the paper by Conger and Ziskin [8] following the observation that tumour spheroids went from a geometric to a linear growth with a constant proliferating shell radius and it was later formulated to account for drug effects [10]. It is based on the hypoxic core idea for limiting growth rate, where the growth rate is multiplied by the proliferating fraction, i.e. the outer layer of the cell tumour that contributes to the proliferation. The parameter r d is the difference of the total radius of the tumour and of the hypoxic core radius and is considered to be constant and specific to each cancer type. As the tumour increases the proliferating fraction becomes smaller and for very high values approximating only the surface of the tumour. The growth rate although decreasing never becomes zero and hence the model admits no steady-state. r is the radius of the whole tumour given by (π 3/4V) 1/3 .

Mayneord model
In 1932 W. V. Mayneord [18] formulated a model to describe Jensen's rat sarcoma rederived and adjusted to account for PD effects [14]. The basic assumption of this model is that the rate of growth of the radius of the tumour is constant which gives rise to the equation above for the volume. It can be shown, using asymptotic analysis for large tumour volumes, that this model represents a special case of the Conger model and the growth is mainly originating from the surface of the tumour, hence the 2/3 exponent.

Logistic model
The Logistic model was formulated by Pierre Francois Verhulst [25]. Similar to the Gompertz model it assumes a slowing down of the growth rate that is linear with respect to the increasing volume of the tumour. The model assumes a carrying capacity for the tumour volume that when reached makes the growth rate go to zero.

von Bertanlanffy
The von Bertanlanffy model [26] was derived based on the idea that growth is a balance between metabolic and catabolic processes. The model assumes that the former often follows an allometric scaling whereas the latter is proportional to the total volume. While often the model appears with γ = 2/3 we wanted to include the more general form for the comparison.

Fitting and comparison
A nonlinear mixed-effects model was fitted to the data, which allows for a more complete capturing of the data by taking into account all individual data, inter-individual variability, dependence of measurements and ignoring missing values. In NLME the parameters of the model are made up of two components that are estimated in this setting. Fixed-effects which represent the average dynamical behaviour and average value of the parameter (population value) and random-effects which represent the variability of the parameters between individuals in the population. Mixed-effects can be summarized with the following equations: Where, y ij is the response j for individual i, p i is parameter i given by the sum of fixedeffects θ i and random-effects η i , A i , B i are design matrices for fixed and random-effects and i are residuals errors. Both the residual errors and the random-effects are assumed to be normally distributed with mean zero and variance σ 2 and ω 2 , respectively. Hence, in contrast to typical model fitting NLME calculates a distribution for each parameter centred around θ with variance ω 2 . This was done in the programming environment R using the NLMIXR package v1.1.0.9 [11] with proportional error and no covariance between different parameters and the FOCEi method in order to account for the interaction between the known and unknown variability [2]. Log-likelihood for the FOCEi method can be found in [2] Equations (10)- (11)) and depends on both residual errors as well as inter-individual variability. The initial tumour volume was fitted as a parameter. Moreover, the covariance matrix is fully diagonal assuming that there were no correlations between the parameters. This was done for the sake of simplicity and any strong correlation would manifest itself in the diagnostic plots if present as an inadequacy of the model to fit the data.
Before fitting the models using the nonlinear mixed-effects framework we needed to provide initial estimates for the initial conditions, parameters as well as their between subject variability. To do that we performed individual curve fittings with an additive least squares error (lsqcurvefit) in the programming language MATLAB. That gave us estimates of the mean parameter values as well as their variance. Additionally it showed that the different tumours have similar growth characteristics for these PDXs as seen in Figure 4. A similar conclusion was draw for the CDXs especially since half of the data (with replicates treated separately) are from CRC.
The performance of the NLME fitting was assessed using both goodness-of-fit criteria namely VPCs in order to assess the ability of the model to fit the actual data and the degree of over or under-prediction as well as the quantitative metric of Akaike Information Criterion number to compare the performance of the models between one another [1]. AIC number is derived from information theory and the smaller it is the better is the performance of the model. It penalizes overfitting by introducing a positive parameter number term and rewards the quality of the fitting. The equation is given by: Where k is the number of parameters for a given model and L is the likelihood.
Since we wanted to generalize our results beyond our datasets it was not enough to derive the AIC number for the fitting of the models to our dataset. There is no statistic or distribution associated with the AIC that allows a p-value to be generated in order to assign significance to the differences in AIC scores. To that end we performed the fitting of the models to 500 bootstrap samples of the original datasets with resampling [9]. The reason for that is that each bootstrap sample is a perturbation of the original dataset hence the comparison of the model in all 500 perturbation is a form of sensitivity analysis on the model comparison, i.e. a way of seeing how changes to the initial dataset affect the results with the hope that a model will outperform the other in all or most samples. Moreover, the 500 simulations provide 500 AICs for each model which give rise to AIC distributions. Additionally, using visual interpretations of the individual objective functions for each model for each growth curve we aimed to explain the characteristics that favour one model compared to another.
Finally, for the case of the Novartis PDX dataset, where we have an almost even distribution of the data among 6 tumour types, we fitted the models to each type individually to access whether the optimal model is tumour type dependent and how the models perform on a study-like dataset size. For that purpose we used the AIC directly.

Results
Tables 1 and 2 show the fitted values of the model parameters for the Gao et al. and internal dataset, respectively. The initial estimate for the starting tumour volume was 0.24 for all models in the PDX dataset and 0.4 for all models in the CDX. The parenthesis next to the estimated value contains the relative standard error as a percentage and the one next to the BSV contains the coefficient of variation (CV). The initial tumour volume, V(0), was fitted as a parameter and attained very similar estimate and variability in all models which is to be expected.
From the fitting of the PDX dataset (AIC & error) we observe that the Gompertz and von Bertanlanffy models seem to provide the best fit followed by the Logistic and then the Simeoni, Mayneord and Conger that have similar AIC numbers and finally the exponential which as expected is the worst performing. This can also be seen from the proportional error in the model with the Gompertz and von Bertnalnaffy having approximately 11.1%  [12] dataset including the between subject variability and precision of estimation. Additionally we provide the goodness-of-fit criteria including the error and AIC number. error. Interestingly the Gompertz model also demonstrates the greatest variability in its parameters between different PDXs as seen from the large BSV (random effect). In contrast the Conger model required the smallest variability in order to describe the data among all models. In the case of the Conger model we also see the larger estimation error for the growth parameter but that is attributed to the high correlation between the K and r d making it challenging to tell one apart from the other precisely.
In the fitting of the CDX dataset the Logistic model has the lowest AIC with the Gompertz and von Bertanlanffy models having slightly larger AIC values but with lower errors. In contrast to the PDX dataset the Simeoni model demonstrates a differential performance compared to the Conger and Mayneord which have both a higher AIC and a higher error. Despite the large number of individual curves of the CDX dataset we observe a decrease in the variability needed to describe the data from all models in addition to a significant increase in the growth rate. These differences are evident in Figures 1(a) and 2(a) of the growth curves. Figure 5 and 6 are a comparison of the VPCs between the different models where the mean growth curves as well as the 95% prediction intervals are shown for each dataset, the complete VPCs with the 97.5 and 2.5 percentiles as well as the 95% confidence intervals are included as well in Figure 7 and 8. All of these were produced from 500 simulated datasets. Supplementary Figures S1-S6 demonstrate other goodness-of-fit plots and namely the observations vs IPRED, the residuals plot as well as the residuals versus observation. IPRED corresponds to the 'individual prediction' which is the simulation of the model using the fixed effects as well as the individual random effects, in contrast to PRED which corresponds to the 'population prediction', i.e. the simulation of the average model using only the fixed effects. These plots serve as further indication of the ability of a model to capture the data, satisfy the normality of residuals and make sure there is no underlying trend in the residuals with varying values of the volume. These are in agreement with the conclusion made so far regarding the ability of the modes to fit the data. The residuals are normal-like in all cases, no miss-specification pattern is observed in the weighted residual vs time or vs PRED plots. In the observation vs IPRED plots we notice that most models follows the data quite well with some over and under predictions for higher tumour sizes for certain models.
From the VPCs we are looking for prediction intervals that follow the data closely. Von Bertanlanffy and the Logistic model seem to offer the best fit (VPC) in both datasets with the Simeoni model showing a similar performance to them for the CDX dataset only. The exponential seems to over-predict the data variability and the other three models are almost indistinguishable. From the Observation vs IPRED plots (S1,S4) we can see the individual predictions from the Gompertz and von Bertanlannfy model capture the observations the best compared to other models which show no distinct difference between them other than in the higher volume regions where we can observe that the Gompertz and von Bertanlanffy are still true to the observations. The Simeoni and Logistic models predict lower volumes than the ones observed and the other three models predict higher volumes than the observations with the results of the Exponential being the worst. Finally the WRES vs Observation (S3,S6) plots are almost indistinguishable, mainly showing no trend which would indicate a miss-specification of the model, with the exception of negative residual trend observed for large volumes in the case of the CDX dataset for the Simeoni and Exponential model, and so are the residual plots. So while there were some differences in the different VPCs no clear picture can be painted regarding the best model and most seem to perform very adequately for both datasets. That raised the need for the use of a quantitative metric.
Moving from the VPCs to the AIC numbers from 500 bootstrap samples the comparison becomes clearer.
From Figure 9(a) and Table 3 we can see that the Gompertz and von Bertanlanffy models are the best performing models (AIC distribution) in the PDX dataset, outperforming all others. They are followed by the Logistic, Mayneord, Conger and Exponential in order of performance. In the case of the CDX dataset Figure 9(b) the Logistic model has the lowest AIC values distribution followed closely by the Gompertz and then the von Bertanlanffy. In contrast to the PDX dataset here the Simeoni model outperforms both Conger and Mayneord. This highlights the importance of comparing the models on multiple samples. If the Most of the distributions seem normally distributed but we can observe a few outliers. That might be due to the fact that there were some odd growth curves in both datasets, i.e. tumour shrinkage, which growth models cannot account for. Moving on to now try and understand why the models perform the way they do we have used the objective function value for the individual sample(unique ID) to see if there is a pattern that explains which models fits which dynamics better. Figure 10 illuminates some of the reasons why certain models perform better. For example it is clear in the Novartis dataset that there were many slow growing tumours that are captured well by the Gompertz. The Exponential, Mayneord and Logistic seem to be good at fast growing tumours. The Simeoni is doing well across the whole board and finally the von Bertanlanffy captures the extremities well, i.e. the very fast and very slow growing tumours.
In the Internal dataset we used a log scale to make the results clearer. Here we observe the exponential model is fitting curves that are almost straight line as expected whereas the rest of the models do well for curves that differ from the typical straight line log-curve. There is also a significantly larger amount of curves fitted better by the Simeoni model. Other than that no additional information can be drawn from these plots.   Another interesting observation is the good performance of the Gompertz (as seen from Figure 10) despite not seeing the classical sigmoidal curves that it is so good at capturing. Possible explanations can be the fact that there is a significant number of slow-growing tumour that the Gompertz model is able to capture due to the flexibility and variability of the retardation term in Equation (2) and also the fact that the slowing down of the tumour is done continuously compared to the Simeoni model that features a switch or the Mayneord model that has the same form throughout.
As a final exploratory step into the performance of the models and wanted to check whether that is tumour and dataset-size dependent we fitted the models to each tumour type. That was performed only for the Novartis dataset as only there do we have an almost even split of curves per tumour type (internal dataset is CRC dominated). Table 5 summarizes the results of the fitting on individual tumour types. The initial estimates are kept the same as the before. The shrinkage for each parameter of the model was added to aid in identifying potential issues. Shrinkage, defined as 1-SD(between individual variation)/ , approaches zero when data are informative and 1 otherwise with high shrinkage indicating an issue in estimating the inter-individual variation of the parameter. What is in agreement with the previous results is that the Gompertz model is once again consistently one of the top performers (AIC & error). It outperforms all the other models in 3 out of 6 tumour types and it is the second best in CRC and PDAC. Surprisingly it perform quite poorly in NSCLC. The von Bertanlanffy and Logistic models generally perform well but have variable performance and often high shrinkage indicating a difficulty in evaluating some parameters in the presence of fewer data. Another expected result is  the exponential model being consistently one of the worst models similar to what we saw earlier.
Despite some similarities with the results from before, there were a few note-worthy observations. The first is the Simeoni's model performance. While there were 2 tumours where it performs well (compared to other models) in the rest it seems to be at the last positions according to the AIC number. While the difference in performance is not clear from the dynamics observed in Figure 11 the very high shrinkage of the linear growth term might indicate a difficulty in evaluating that parameter. That is in agreement to [22] where the authors reported a difficulty in evaluating the BSV of the linear growth term as well as in addition to having the largest error compared to the other model parameters. That was not the case when all tumours were combined in a single large dataset. This in turn might indicate a potential issue with using this model in smaller datasets, normal studysize datasets were there were not enough data from the linear phase of the growth. This difficulty in evaluating parameters (high shrinkage) is also evident in the Logistic and von Bertanlanffy models which might suffer from similar estimation issues in smaller datasets.
The second observation is the performance of the Mayneord model which is the top performers in two cases without ever dropping below position 5. That is a surprising result taking into account that this is the only single parameter model that performs well and even outperforms more complicated models. That is somewhat in agreement to the our results from the fitting of the full dataset (AIC bootstrap) where the Mayneord's model performance was superior to that of the Conger and Simeoni.
Additionally, we can see some very large error associated with the parameters of the Conger model and that stems from the strong correlation between them. That can be seen in Figure 12 from the NLME fitting of the Conger model to the PDX dataset. We can see a very strong trend between the two parameters. Additionally, in the tumour type specific fittings a strong correlation between the fitted values of K c and r d was observed even reaching a value of 1 making them indistinguishable. That is again an issue that needs to considered when dealing with the Conger model is smaller datasets.
Finally, it should be stressed that although our analysis was performed in the original data scale we fitted the models in the log-transformed data for both datasets which admitted the same results with respect to model comparison and similar errors and parameter estimates. For that reason we deemed that it would not strongly affect the subsequent analysis. Similarly the Baysian Information Criterion (BIC) was evaluated as an alternative measure instead of the AIC and yielded the same results and conclusions. Hence, we deemed it not necessary to include both AIC and BIC in the analysis.

Conclusions
In this paper we compared seven mathematical models for tumour growth using nonlinear mixed-effects on two large datasets. To our knowledge this is the first model comparison on this setting and with the use of different xenograft models. Both data and analysis highlight novelty compared to previous studies, namely the two large datasets of different xenografts in addition to the use of the bootstrap that generalizes the comparison for the seven models under dataset perturbations. Two datasets were used in order to assess the models in different situations as PDXs and CDXs might have different growth patterns. The models selected for this comparison are simple empirical models and one structural (Conger) with most of them admitting an initial exponential growth and a subsequent decrease in growth rate. The comparison of the models was performed both quantitatively and visually. Visual comparison and analysis was essential to confirm that all models are capable of capture the base dynamics of the data and the quantitative comparison using AIC on bootstrap samples was necessary to formalize and generalize the results.
The Gompertz model was consistently one of the best performing by all means of comparison that include the fitting of the models to the original datasets, the AIC bootstrap, the VPCs as well as the comparison in the fitting of each tumour type. This is in agreement to several studies of model fitting in both in vitro and in vivo data [4,17,19]. From the individual curves objective function value we were able to determine that the niche of the Gompertz model at the slower growing tumours. Despite the fact that the CDX data admit faster growing growth curves the Gompertz model remained one of the top performing models possibly due to the fact of the second term in the equation that allows for a continuous slowing down of the growth rate and hence a more flexible model. The analysis shows that this model is not only appropriate for sigmoidal dynamics but it also has the potential to capture varying tumour growths. The von Bertanlanffy and Logistic model also proved to be two of the best performing models with their performance being a bit more variable compared to the Gompertz and in the case of fewer data (fitting of individual tumour types) with difficulties in the estimation of some model parameters. The worst performing model is almost always the exponential model, with the exception of the fitting of some tumour types, which despite the fact that it captures the mean dynamics effectively, overpredicts the overall growth due to the lack of slowing down during at later time points. That was expected as it is the simpler model on which most of the other models are based. The other three models have variable performance depending on the dataset with the Mayneord model showing a comparable performance in the VPCs with more complicated models and even better performance with respect to the AIC distributions to two parameters models such as the Simeoni and Conger in the PDX dataset. Moreover, the Conger model parameter estimates have the smallest variability in the PDX dataset and one of the smallest in the CDX in contrast to the Gompertz that has the largest.
The distinction between the CDX and PDX data is understandable as the CDX growth curves are faster growing and more homogeneous as can be seen by the parameter estimates as well as the respective figures. That favours models such as the Simeoni whose transition from exponential to linear volume change is more suitable to faster dynamics compared to the Mayneord a model whose growth is sub-exponential from the beginning and linear in radial change. The small proliferating shell thickness found in the Conger model means that the majority of the tumours are fitted as if they have a necrotic core from the beginning and hence with a sub-exponential growth as well. The flexibility of the Simeoni model switching becomes evident in Figure 10(a) where it is clear that the model is able to capture the widest range of growth curves better than the other models, from almost stationary tumours to very fast growing ones.
It is worth mentioning that a comparison of the models using only the datasets is not comprehensive as the errors are comparable and the VPC are in most cases indistinguishable between models such as the Gompertz, Conger and Mayneord. The bootstrap analysis of the AIC is what provides evidence on the order of performance of the models. With both these results in mind it might be important to consider other factors when selecting a model for tumour growth. For example despite the differences found through AIC the Mayneord model, from the VPCs, seems to be performing well and similarly to the other models despite the fact that it has only one parameter compared to the two of Simeoni, Conger, Gompertz, Logistic and von Bertanlanffy which might become important when extending the model to treated tumours which will result in more parameters added. This is highlighted even further when comparing the models by fitting them to each tumour type separately. That allows for a comparison in a more realistic scale similar to what you would have in an tumour study.
With regards to the fitting on individual tumour types no conclusions can be drawn as to which model is better at capturing a specific type without additional datasets to explore and no such conclusion was clear from the dynamics of the curves themselves. The AIC numbers in many cases were comparable and the positioning of the models is not there to state categorically which model is statistically better but mainly as an additional point to the aforementioned argument that considerations other than strict statistical performance might be important when deciding which model to select for further analysis.
Moving on from the comparison of these models, potential future work could include comparing even more models including other empirical ones as well as complex mechanistic models to have a fuller picture of the performance of tumour models on PDX and CDX data. An other potential future direction of this work is a similar analysis but by fitting the model on a log scale with a additive error. That would allow to explore any difference in the analysis and results obtained by using different fitting approaches. Additionally in the log scale the growth curves will be compressed on a tighter range which might lead to model niches different from the ones observed in Figure 9. Whether that would lead to a better insight of the performance of these models is uncertain but different assumptions on the data fitting process have shown different preferences in models in some occasions [17].