Modelling height in adolescence: a comparison of methods for estimating the age at peak height velocity

Abstract Background: Controlling for maturational status and timing is crucial in lifecourse epidemiology. One popular non-invasive measure of maturity is the age at peak height velocity (PHV). There are several ways to estimate age at PHV, but it is unclear which of these to use in practice. Aim: To find the optimal approach for estimating age at PHV. Subjects and methods: Methods included the Preece & Baines non-linear growth model, multi-level models with fractional polynomials, SuperImposition by Translation And Rotation (SITAR) and functional data analysis. These were compared through a simulation study and using data from a large cohort of adolescent boys from the Christ’s Hospital School. Results: The SITAR model gave close to unbiased estimates of age at PHV, but convergence issues arose when measurement error was large. Preece & Baines achieved close to unbiased estimates, but shares similarity with the data generation model for our simulation study and was also computationally inefficient, taking 24 hours to fit the data from Christ’s Hospital School. Functional data analysis consistently converged, but had higher mean bias than SITAR. Almost all methods demonstrated strong correlations (r > 0.9) between true and estimated age at PHV. Conclusions: Both SITAR or the PBGM are useful models for adolescent growth and provide unbiased estimates of age at peak height velocity. Care should be taken as substantial bias and variance can occur with large measurement error.

The model has been reformulated using a multi-level model notation, where parameters have random effects u with mean 0 and variances Ω u . Model parameters have the following population average values and interpretations: h 1 =180 (final height), h θ =165 (height at PHV), γ =1.3 (a dimensionless constant), S 0 =1.5 and S 1 =0.1 are rate constants, and θ=13.5 (age at PHV). Random variables u 0j , u 1j, .... , u 5j correspond to the deviations from the population average parameters described previously, and e ij represents the occasion level (level 1) residual for the height for individual j at occasion i.
Population average variances and covariance of Ω u are listed in below. n.b. it was not clear how γ relates to other parameters of interest therefore was simulated independently from other parameters. Key features of the simulated data including the true age at PHV; height at age 10; and final height are shown in Table S1. Statistics fall within expected ranges, indicating that our simulation procedure is realistic. In all methods (except PACE) 1000 simulations were attempted for each method. The computational time required to implement the PACE method when measurement occasions were unbalanced and infrequent (I=10 and I=5) was extensive without the use of distributed computing, so the number of simulation runs was reduced to 500 in both scenarios.

Preece & Baines growth model
The PBGM(32) is an individually estimated non-linear growth model. Parameters were suggested to have the following interpretations: h 1 = final height, h θ height at PHV, S 0 and S 1 are rate constants, and θ is the age at PHV. We have recently described analytical solutions for the age at PHV .(39)

Multi-level models using fractional polynomials
The model with the best fitting set of fractional polynomials is found using an iterative procedure, where all possible combinations of powers and degrees in the fixed and random effects are fitted.
Patrick Royston's user written Stata program fracpoly_powers is used to generate a list of powers for models of any degree. Models are fitted using the runmlwin(43) interface with increasing complexity, adding 1 random effect at a time for models up to 3 degrees. i.e. a 3 degree model may be formulated with no random effects, a random effect on the first power, random effects on first and second power, and random effects on all three powers. Models are built successively increasing the complexity with the addition of each random effect. // Decide model fitting sequence local rand_parms `" "nore" "`x'_1" "`x'_1 `x'_2" "`x'_1 `x'_2 `x'_3" "' // Use no initial values for a fixed effects only model local inits = "" // Loop over fitting sequence, building model complexity foreach random in `rand_parms' { if "`random'"=="nore" local random = "" if "`random'"!="" local inits "initsprevious" runmlwin ht cons `fracvars' , /// level1(time:cons) level2(id: cons `random') /// igls `inits' nopause } The computational burden is greatly increased in comparison to single level models, and model convergence is not always guaranteed. There are a number of possible methods for choosing the model with the best fit, with the log-likelihood being the method suggested in the original paper. In this simulation study we selected the best model using the AIC statistic, as the AIC statistic reflects model fit, and penalizes model complexity (i.e. number of parameters).
The age at PHV was then derived from the equation of the best fitting polynomial. In order to do this it was necessary to build on the earlier work of Long et al.(48) and to determine first and second derivatives of any degree fractional polynomial, by differentiating both the fixed and random effects.
The general form of a multi-level model with polynomials is described below, where FD represents the degree of the fixed effects model, RD represent the degree of the random effects model, and P indicates the power. Parameters of the fixed effects are indicated by β d and empirical Bayes random effects u dj . With no repeated or zero powers the model may be expressed as equation [4].

[4]
However, when powers are repeated the model must be modified and the second occurrence of the power is expressed as Age p (Log(Age)) r-1 . The number of occurrences of the same powers is indicated by r i.e. when there is 1 occurrence r=1 and when its repeated for the first time r=2, and when there are no zero powers the linear model is given by: When including powers equal to zero and only repeated powers of zero, the model is rewritten as: However, the best fitting model may include a mixture of non-zero and zero powers and is rewritten as equation [7] to reflect this.
The expanded form of the multi-level model allows for consistent differentiation of the linear expression as summations of fixed and random effects, the first derivative (velocity) of any degree fractional polynomials is written as equation And second derivative (acceleration) as equation [9] respectively.
Age at PHV is then found by identifying the age at which acceleration is equal to zero using an iterative grid search procedure. The confidence interval for age at PHV is found by setting acceleration to the limits of the height acceleration confidence interval using an iterative grid search.

Superimposition by translation and rotation
SuperImposition Translation And Rotation (SITAR)(31) is a shape invariant non-linear multi-level model with natural cubic splines suggested by Cole et al. The model estimates the population average growth curve and departures from it as random effects. The SITAR model uses three parameters which are suggested to have direct interpretations: α j adjusts for difference in mean height; β j adjusts for timing of the pubertal growth spurt (which is thought to correlate with age at PHV); γ j adjusts for the duration of the growth spurt, where h represents the natural cubic spline constraint. . PHV is identified using numerical differentiation of the individually predicted growth curves, the age at PHV is the age which corresponds to the maximum velocity. The SITAR model was fitted in R using an adaptation of the nlme library written by Tim Cole.

Principal Analysis by Conditional Expectation (PACE)
In standard functional principal components analysis, the observations are seen as discrete representations of a collection of underlying smooth functions . The data have a mean function up to some threshold of variance such that functional principal components are chosen.
Using the Kahrunen-Loève expansion, each individual curve can be represented as where the are uncorrelated random variables with mean zero and variance .