Human height: a model common complex trait

Abstract Context Like other complex phenotypes, human height reflects a combination of environmental and genetic factors, but is notable for being exceptionally easy to measure. Height has therefore been commonly used to make observations later generalised to other phenotypes though the appropriateness of such generalisations is not always considered. Objectives We aimed to assess height’s suitability as a model for other complex phenotypes and review recent advances in height genetics with regard to their implications for complex phenotypes more broadly. Methods We conducted a comprehensive literature search in PubMed and Google Scholar for articles relevant to the genetics of height and its comparatibility to other phenotypes. Results Height is broadly similar to other phenotypes apart from its high heritability and ease of measurment. Recent genome-wide association studies (GWAS) have identified over 12,000 independent signals associated with height and saturated height’s common single nucleotide polymorphism based heritability of height within a subset of the genome in individuals similar to European reference populations. Conclusions Given the similarity of height to other complex traits, the saturation of GWAS’s ability to discover additional height-associated variants signals potential limitations to the omnigenic model of complex-phenotype inheritance, indicating the likely future power of polygenic scores and risk scores, and highlights the increasing need for large-scale variant-to-gene mapping efforts.


Introduction
ever since antiquity, scholars have studied human height and the factors that influence it (tanner 1981).the subject has received extensive attention for several main reasons.Firstly, there have been efforts to predict and/or influence adult height.For example, many of the first growth studies were started in the eighteenth century in order to ensure a suitable supply of tall soldiers (tanner 1981). in modern times, there is keen interest in early identification and subsequent hormone intervention for those trending towards acutely short stature (Dauber et al. 2020;lu t et al. 2021).A second motivation is the serious medical conditions associated with height, most notably cancer and cardiovascular disease.the positive relationship with cancer incidence has been well established and is believed to be a function of the greater number of cell divisions in taller individuals (Nunney 2018;choi et al. 2019). in contrast, the inverse relationship with cardiovascular disease is less well understood, and multiple hypotheses have been proposed in this context (Paajanen et al. 2010;Nüesch et al. 2016;Pes et al. 2018;Moon and Hwang 2019;Marouli et al. 2019;Yano 2022).As such, a better understanding of the factors influencing height, in particular genetic contributors, should elucidate the biology of these associated diseases and provide new avenues for therapeutic intervention.
A final key reason for the abundance of research into height is that the trait is relatively easy and inexpensive to observe in comparison to others.it can therefore serve as a model to test new methods and gain insights relevant to other phenotypes with lower incidence rates and higher investigational costs.taking this approach, our review will focus on summarising what is known about the genetic contributions to height, with particular attention to the ways in which our current understanding of height may reflect and inform the biology of complex-phenotypes overall.
Establishing height as a model common complex phenotype the use of human height as a model common trait has a long history in genetics dating back to the nineteenth century.the first major work in this vein was Galton's famous demonstration that average parental height predicts that of offspring (Galton 1886).A few decades later, in another seminal work, R.A. Fisher proposed a model that used height as its illustrative phenotype to show how polygenic inheritance could explain the observed variation in a continuous trait (Fisher 1919).However, when both these papers were written neither author knew the validity of extrapolating results from height to other traits.Before we consider the same possibility of generalising results for height, we will first consider how height's characteristics compare to those of other common complex phenotypes.
Apart from the ease of measurement, height is also a very useful model common complex trait due to its above-average heritability relative to other complex phenotypes (Polderman et al. 2015;watanabe et al. 2019).twin and family-based analyses estimate that between 30% and 90% of human height variation is determined by genetic factors, with most estimates towards the upper end of that range (Preece 1996;Silventoinen et al. 2000;Silventoinen et al. 2001;Macgregor et al. 2006;Perola et al. 2007).this proportion is lower at birth and rises during childhood reaching a peak post-puberty (Burk et al. 2006;Mook-Kanamori et al. 2012;van Soelen et al. 2013;Jelenkovic et al. 2016;Silventoinen et al. 2019).
Since it has been well documented that environmental factors, such as childhood net nutrition and socioeconomic status (tanner 1981;Silventoinen 2003;Deaton 2007;Perkins et al. 2016), have a strong effect on adult height, high heritability likely means that the effect sizes and/or the quantity of genetic variants that impact the phenotype must be higher for height than for the vast majority of other traits.Both possibilities would make it relatively easy to identify genetic factors influencing the trait.
evaluating the likelihood of each of these possibilities is a question of polygenicity, that is the degree to which height's genetic modulators are scattered throughout the genome in many variants of low effect or concentrated in a smaller number of high-effect loci.Several approaches have been used to quantify the polygenicity of height relative to other complex phenotypes, and the results have been somewhat mixed.For a given sample size the number of loci and variants associated with height is comparable to those of other complex traits (chakravarti and turner 2016).Similarly, an analysis of 588 complex phenotypes estimated that the proportion of independent and causal single nucleotide polymorphisms (SNPs) for height is near the 40th percentile (watanabe et al. 2019). in contrast, a more targeted analysis of 5 of the 588 complex traits found that height had nearly double the next highest proportion of putative causal SNPs even though the other four traits had been scored as more polygenic in the prior analysis (Johnson et al. 2021).From the available evidence, it is difficult to precisely quantify the polygenicity of height, but it is generally presumed that it is broadly similar to most other complex traits.
A final element to consider when evaluating the suitability of height as a model for other complex phenotypes is the number of tissues and cell types involved in its aetiology.Studies have consistently shown that height-associated genetic variants are enriched in expressed genes, functional genomic elements and pathways relevant to musculoskeletal and connective tissues (wood et al. 2014;Finucane et al. 2015;Finucane et al. 2018;luo Y et al. 2021).Recent analyses have even found enrichments relevant to cardiovascular-relevant tissues and the uterus (Finucane et al. 2015;Finucane et al. 2018;luo Y et al. 2021).Although the tissues relevant to a given phenotype are distinct, the studies that compared height to other complex traits and diseases found it had a comparable number of relevant cell types and tissues (Finucane et al. 2015;Finucane et al. 2018;luo Y et al. 2021). in summary, height appears similar to other complex phenotypes for the key metrics outlined above, apart from its relative ease of study and high heritability -both of which give it an advantage as a model common complex trait.

Identifying genetic factors for height
Before the advent of genome wide appraisals, candidate gene studies and family-based linkage analyses had been widely employed to study complex phenotypes, though with much lower success.in the case of height, most of the "discovered" regions failed to replicate (Perola et al. 2007;weedon and Frayling 2008) and the handful of true genetic factors identified were principally driven by mutations in genes yielding rare, monogenic forms of extreme stature (Godfrey and Hollister 1988;vissing et al. 1989;Shiang et al. 1994;Rao et al. 1997;Maheshwari et al. 1998;Hasegawa et al. 2000;tartaglia et al. 2001;Durand and Rappold 2013).the key development of genome wide association studies (GwAS) in the mid-2000s expanded this insight into the genetics of complex traits and height drastically.initial GwAS uncovered tens of common genetic polymorphic variants contributing to height (weedon et al. 2007;Sanna et al. 2008;Gudbjartsson et al. 2008;lettre et al. 2008;weedon et al. 2008;Soranzo et al. 2009) and subsequently led to a flood of many thousands more loci being reported.
the first GwAS for height was published in 2007 and involved approximately 5,000 subjects of european ancestry (weedon et al. 2007).the main observation in that initial study was a signal at HMGA2, which encodes the mobility group-A2 oncogene.this result was confirmed in a follow-up effort in approximately 10,000 additional subjects (Yang t-l et al. 2010).the next major GwAS of height revealed strong evidence for a second signal, namely at the GDF5-UQCC locus (Sanna et al. 2008).these first reports were followed by a set of meta-analyses, where datasets from various investigative groups were combined to increase statistical power to gain a collective larger sample size.together these efforts uncovered in excess of 40 more loci for height (Gudbjartsson et al. 2008;lettre et al. 2008;weedon et al. 2008;Soranzo et al. 2009).But differences were observed in the outcomes of these different meta-analyses, likely partly due to differences in statistical power (weedon and Frayling 2008), but perhaps also due to underlying co-morbidities that were not fully accounted for in the analyses of the participants in the study (Yaghootkar et al. 2017).
over the next decade, GwAS sample sizes continued to increase in both size and diversity leading to an ever-growing number of loci associated with height.to illustrate, a 2010 study in 130,000 european ancestry individuals more than quadrupled the number of associated loci to over 180 (lango Allen et al. 2010).Four years later, the next major GwAS of over 250,000 european-ancestry individuals identified 423 loci (wood et al. 2014), a number that would be dramatically surpassed just another four years later by a GwAS in ~450,000 British individuals that identified over 700 loci (Yengo et al. 2018).though not all the loci identified in each of these GwAS would replicate in later studies, each study showed an increasing proportion of phenotypic variation captured by the significantly associated variants as well as increasing numbers of secondary signals at associated loci.Apart from simply increasing european ancestry sample sizes, several GwAS over this period sought to identify associated loci in samples drawn in part or exclusively from non-european individuals (N' Diaye et al. 2011;He et al. 2015;Akiyama et al. 2019;Graff et al. 2021). in addition to replicating many of the loci identified in the european-ancestry GwAS, these analyses also identified several ancestry-specific loci and variants and leveraged differing patterns of linkage disequilibrium across ancestries to fine-map the location of putative causal variants.
in comparison to other complex phenotypes, height has not always been at the leading edge of the GwAS field as initial focus was on disease outcomes directly.indeed, it was not the first trait to be studied in a GwAS (ozaki et al. 2002;Klein et al. 2005;ikegawa 2012;loos 2020), nor was it the first to pass major milestones such as having a sample size over 1 million individuals (loos 2020).Height also suffers from many of the same problems that affect other phenotypes, most obviously an under-representation of non-european ancestry individuals that limits statistical power and the subsequent portability of results across populations (Mills and Rahal 2020;loos 2020). in contrast to this history of height as a typical but non-leading GwAS trait, the latest GwAS of height has presented ground-breaking results that could have important implications for all common complex phenotypes.

The first saturated GWAS
this latest GwAS of height (Yengo et al. 2022) is the first of what is referred to as a "saturation" GwAS.Such a GwAS is so highly powered that further increases in the sample size without increases in participant diversity or variant inclusion are unlikely to reveal any additional genetic insights.in this landmark study conducted in 5.4 million individuals, the largest sample ever assembled for a GwAS, Yengo et al. identified an unprecedented 12,111 independent signals associated with height clustered into 7,209 non-overlapping loci that are enriched near genes with known Mendelian effects on skeletal growth.together these loci comprise 21% of the genome, a result consistent with previous estimates (Shi et al. 2016).the results are notable not only because of their potential use in the discovery of novel height-modifying genes but also because they showed for the first time near exhaustion of GwAS's ability to discover further associations, particularly in european ancestry populations.Such an outcome had been previously hypothesised to be possible but never actually demonstrated (visscher et al. 2017;Kim et al. 2017;wray et al. 2018).
the authors primarily showed saturation of their GwAS by examining the proportion of estimated single nucleotide polymorphism (SNP) based heritability captured by the associated loci.SNP-based heritability is distinct from the total heritability estimated in twin studies and represents the portion of total heritability attributable to SNPs under an additive GwAS model (Yang J et al. 2017). in their analysis, Yengo et al. found that in individuals of european ancestry, associated regions accounted for approximately 100% of the estimated common SNP-based heritability, and across the other four tested ancestries the proportion was over 90%, even in the more genetically diverse African ancestry group.like most other published GwAS, this study was heavily dominated by individuals of european ancestry (75.8% of the sample), so the higher saturation of european ancestry results was expected.the results appeared robust, as the enrichment of heritability within these loci did not transfer to another tested trait, namely body mass index (BMi), and random SNPs matched on allele frequency and linkage disequilibrium score did not show similar enrichment.Based on these results, the authors concluded that height-associated loci are largely shared across ancestries and that the generated list of loci is exhaustive i.e. saturated, for european ancestry.
while estimated SNP-based heritability enrichment strongly supported the comprehensiveness of the catalogued height loci for individuals of european ancestry, it was unknown whether the entire sample had been necessary to achieve such completeness.to better understand when saturation occurs, Yengo et al. reanalysed prior height GwAS (lango Allen et al. 2010;He et al. 2015;Graff et al. 2021) and down-sampled their own study to produce five samples of individuals of european ancestry ranging from ~130,000 to ~4 million people.using these samples, and their trans-ancestry meta-analysis, the authors found different points of saturation across various metrics of interest.For example, they found that the number of prioritised genes by summary-data-based Mendelian randomisation (Zhu et al. 2016) of Gtex (Aguet et al. 2020) and expression quantitative trait loci, only plateaued when all 4 million individuals of european ancestry were analysed.Moreover, the number of loci discovered and the proportion of the genome covered by GwAS loci could not be saturated without including non-european ancestry individuals.
while the latest study by Yengo et al. is the first GwAS to show saturation of european ancestry in associated loci, it should be mentioned that a complementary effort at building a prediction model of height using an l1-penalised regression (lASSo) captured a comparable proportion of the total heritability as Yengo et al. (lello et al. 2018).the proportion captured by the lASSo model was slightly lower, but it is not known whether this difference is significant.However, even putting this difference aside, Yengo et al.'s contributions are still novel and significant in four main ways.Firstly, Yengo et al. were able to demonstrate near saturation in non-european ancestries whereas the lasso model only considered european individuals.Secondly, Yengo et al. analysed other metrics of saturation beyond heritability.thirdly, Yengo et al.only required 21% of the genome to capture the total estimated SNP-based heritability whereas the activated variants in the lASSo model spanned the entire genome roughly uniformly.And lastly, and most importantly, the effect size estimates from a lASSo model depend upon the specific variants retained in the model whereas GwAS does not have that issue.this distinction makes the results of GwAS useful for downstream analyses such as fine mapping for which lASSo results would be unsuitable.
An issue always worth considering when discussing any GwAS results is the effect of assortative mating.A shortcoming of Yengo et al.'s work is that they did not do so.there is an extensive literature documenting human assortative mating in general (Burgess and wallin 1943;Price and vandenberg 1980;Mascie-taylor 1989;Mcleod 1995;Allison et al. 1996;Du Fort et al. 1998;Maes et al. 1998;Hippisley-cox et al. 2002;Stimpson and Peek 2005;Jurj et al. 2006;Meyler et al. 2007;Di castelnuovo et al. 2009;Alford et al. 2011;Ask et al. 2012;Peyrot et al. 2016;luo S 2017;Jeong and cho 2018;Horwitz et al. 2023) and specifically, on the basis of height (Stulp et al. 2017;torvik et al. 2022).Assortative mating can inflate SNP-based heritability estimates.For example, an analysis of ~335,000 individuals of British ancestry found that assortative mating inflated the heritability of height by 14-23% (Border et al. 2022).However, that same study also demonstrated that sufficient sample sizes should cause the heritability estimates derived from the restricted maximum likelihood approach employed by Yengo et al. (Yang J et al. 2011(Yang J et al. , 2012) ) to converge to the true SNP-based heritability.it is plausible that the large sample size of this most recent GwAS is sufficient to ensure such convergence.
though Yengo et al. did not set out to study phenotypes apart from height, given the demonstrated similarity of height to other complex traits, the potential implications of this study for the broader field of complex phenotypes are worth reflection.the remainder of this review considers these possible impacts especially as regards the "omnigenic model, " polygenic risk scores, and causal gene identification.

Relevance to the omnigenic model
Perhaps the most important implication of the Yengo et al. saturation results are for the validity of the widely cited omnigenic model (Boyle et al. 2017).this model was originally proposed to explain the challenge of "missing heritability" (Manolio et al. 2009), that is the proportion of total heritability not captured by significant GwAS loci and variants uncovered at a specific point in time.under the model, this heritability was attributed to a large set of variants dispersed widely across the genome, each with individually small effects.the model was predicated upon the hypothesis that all genes expressed in phenotype-relevant cells impact in some capacity the function of genes central to the phenotype and therefore the phenotype itself.A more extreme version of the model also emerged which hypothesised that with sufficient power, GwAS would effectively implicate every gene and genomic region (Flint and ideker 2019).
the results presented by Yengo et al. seemingly contradict the more extreme version of the omnigenic model.the authors showed that only a subset, albeit a large subset, of genomic regions appear to be implicated in the trait in individuals of european ancestry.though expanding non-european ancestry representation or incorporating new variants from databases of whole genome sequences would likely increase the proportion of the genome associated with the phenotype in future height GwAS, it seems highly improbable that the whole remaining ~79% of the genome will also be implicated.
Rare and structural variant associations could also implicate additional regions of the genome, but it seems similarly doubtful that the proportion would grow considerably.Regarding rare variants, Yengo et al. found suggestive evidence that variants with minor allele frequency between 0.1% and 1% concentrate in the same 21% of the genome as the common variants.this result is consistent with those obtained by recent rare variant analyses for height.~70% of the 83 low-frequency coding variants identified in a 2017 study (Marouli et al. 2017) lie within loci identified by Yengo et al. and a later analysis of 492 traits showed strong colocalisation of rare and common variants (Backman et al. 2021).comprehensive results for structural variation are more limited than for rare variants, but we similarly note that 80% of the height-related copy number variants in a recent cataloguing effort (Hujoel et al. 2022) overlapped the loci identified by Yengo et al.
As for the original conception of the model, it could be the case that the associated 21% of the genome captures the complete set of genes expressed in height-relevant cell types, but this too seems unlikely.As described, several tissues and a large number of cell types are believed to play a role in height aetiology.it seems improbable that 21% of the genome captures all the genes they express but more research would be needed to confirm this hypothesis.Should the patterns observed with height be repeated in the saturation of other complex phenotypes, a highly polygenic model seems a more valid explanation of complex trait inheritance than a truly omnigenic one.

Implications for polygenic scores
Aside from exploring how well their GwAS saturated the discovery of height genetics, Yengo et al. also investigated the ability of their results to accurately predict height across ancestries using polygenic scores.Polygenic scores and risk scores predict individuals' values and risks of complex traits and diseases respectively.Generally, these terms refer to statistical models that make genetic-based predictions using GwAS-estimated variant effect sizes (Sugrue and Desikan 2019; wang et al. 2022), though under some definitions they may denote any genetics-based statistical or machine-learning model designed to predict phenotypes (wand et al. 2021).Prior to the latest height GwAS, polygenic scores and risk scores had shown relatively modest success at predicting traits and disease risk for individuals drawn from the population used to create the scoring metrics (Schrodi et al. 2014;Hu et al. 2017).though generally used on unrelated individuals, at least in the case of height polygenic scores, they can differentiate between siblings (lello et al. 2020).it has been similarly shown that in order for these scores to obtain their maximum accuracy, they should include both coding and non-coding associated variants (Yong et al. 2020).
Measuring the accuracy between height and their polygenic score based on the 12,111 genome-wide significant signals, Yengo et al. obtained an accuracy of around 40% in individuals of european ancestry.As could also be expected based on its equal heritability saturation, the previously mentioned lASSo model also obtained an accuracy around 40% (lello et al. 2018).these accuracies were not significantly different from that obtained by taking the average of parental heights and were comparable to that obtained by using all the SNPs input into the GwAS (42% accuracy).that the gap was so close between the accuracies of the polygenic score based on all input SNPs and the polygenic score based on only the 12,111 significant SNPs was expected given the saturation of common-SNP heritability observed in individuals of european-ancestry.Building on these results, Yengo et al. also demonstrated that a weighted combination of parental average height and the 12,111 SNP polygenic score achieved an accuracy above 54%.this result showed how polygenic scores can be used jointly with family history to appreciably improve predictions and gives an insight into what gains could be expected when polygenic score and risk score accuracy for other phenotypes achieve eventual saturation.
lamentably, Yengo et al. 's polygenic score showed reduced performance in the non-european ancestries studied.Given the large skew of Yengo et al.'s sample towards european ancestry and the well-known poor portability of polygenic scores across ancestries (Duncan et al. 2019;lewis and vassos 2020;Adeyemo et al. 2021) this result was not surprising.in fact, the sensitivity to ancestry differences is so great for polygenic scores that the level of deterioration can be seen to correlate with any admixture (Bitarello and Mathieson 2020) and can be observed even in subtly stratified groups within a single continental ancestry (Berg et al. 2019;isshiki et al. 2021).
Adding individuals of non-european ancestry to capture the missing 5-10% of common SNP-based heritability in these populations should improve the accuracy of polygenic scores in non-european populations by allowing for discovery of disease associated variants that are rare in europeans but common in other groups (wojcik et al. 2019;Bentley et al. 2020).increasing diversity is a necessary step for ensuring that polygenic prediction techniques work well for all people and should be a goal for future research in height as in other complex traits.in addition to benefiting polygenic scores and variant discovery, building cohorts of greater diversity should also enable future studies to more precisely fine-map putative causal variants by leveraging differing patterns of linkage disequilibrium across ancestries; cohorts of African ancestry with their smaller haplotype blocks are especially useful in this regard (Hutchinson et al. 2020;lu Z et al. 2022;Yuan et al. 2023).there are several large-scale ongoing efforts including All of us (the "All of us" Research Program 2019), the Million veterans Program (Gaziano et al. 2016), andH3Africa (owolabi et al. 2019) that aim to recruit participants from these underrepresented ancestry-groups.these efforts clearly should continue and even expand.

The ongoing challenge of gene discovery
Despite the fact that many of the loci identified by Yengo et al. reside near known genes relevant to cartilage and bone biology, such as HHIP, BMPs, GDF5 and ACAN, it is now clear that a systematic approach is required to more accurately characterise the causal effector genes at each of the 12,111 height loci.indeed, the genetics community is now grappling broadly with the issue of whether the nearest gene to a GwAS-implicated signal is actually always the main culprit "effector" gene at a particular locus (Broekema et al. 2020;li and Ritchie 2021).Some key lessons can be learned from the FTO obesity locus.
obesity is principally defined by the relationship between height and weight, as defined by BMi.GwAS for BMi and obesity has consistently shown the signal at the FTO locus to be the strongest association with the trait (Frayling et al. 2007;Rosen and ingelfinger 2015;Yengo et al. 2018;Huang et al. 2022).this signal lies within an intron of the FTO gene (Frayling et al. 2007) and has been widely replicated across multiple populations (Hassanein et al. 2010;okada et al. 2012;wen et al. 2012).Despite the FTO gene product receiving wide attention by the research community, two key papers presented compelling evidence that IRX3 and IRX5 were actually the main effector genes at this particular genomic location (Smemo et al. 2014;claussnitzer et al. 2015).As such, the findings suggested that the strongly associated obesity variant was embedded in one gene but was driving the expression of other, neighbouring genes.these seminal findings have now moved the interpretation of GwAS findings forward, where it is no longer presumed that the nearest gene(s) to a given GwAS signal is the causal effector gene(s), and indeed, this should not come as a surprise given that gene expression can be driven by long range proximal regulatory elements, such as enhancers and silencers, and can exert their effect as far as hundreds of kilobases in distance.
combined with the results from this latest GwAS of height, these findings indicate the magnitude of the challenge that now faces the genetics community.if the associated proportion of the genome is approximately consistent across complex traits, then each complex phenotype has on average thousands of genes that modulate pathogenesis.identifying the causal gene(s) at each locus and validating the observations is, and will continue to be, a challenging task requiring multiple approaches, particularly algorithmic and high-throughput experimental techniques. in summary, height is broadly similar to other complex phenotypes apart from its ease of measurement and high heritability.these factors have made it a widely employed model trait for studying the topic of complex phenotype inheritance.However, throughout the GwAS era, height has not always been at the leading edge of variant and gene discovery, that is until its most recent GwAS by Yengo et al. in having at last closed the gap of missing common SNP-based heritability for a common trait, Yengo et al. may have signalled the beginning of the end of the GwAS era.their work demonstrates the limits to endlessly increasing GwAS sample sizes and highlights the need for greater diversity in study populations.Moreover, their results directly contradict the most extreme form of the omnigenic model and imply that highly polygenic inheritance is likely a more appropriate model for complex traits.the analysed polygenic score results also suggest that when sample sizes across complex phenotype GwAS efforts increase to the point of heritability saturation across all ancestries, polygenic risk scores will become powerful tools for the prediction of disease risk.However, the implications of this study for the identification of individual effector disease genes are less optimistic.Should the GwAS era be drawing to a close, the era of gene identification that follows will surely be one of both great challenges and opportunities.

Disclosure statement
No potential conflict of interest was reported by the author(s).