Nutrition Matters: Numeracy, Child Nutrition and Schooling Efficiency in Sub-Saharan Africa in the Long Run

Abstract School enrolment has increased at an unprecedented scale in Sub-Saharan Africa but learning and the associated education efficiency have not. Given that resources are limited, the efficient use of inputs is of utmost importance for sustainable development. Hence, we investigate whether improvements in children’s nutrition can improve learning and hence efficiency. To assess this relationship, we employ average female height as our proxy for nutrition during childhood. For learning, we estimate numeracy and efficiency using a linearized version of the Whipple Index. Our data is at the subnational level focusing on the birth decades from 1950 to 1999. To deal with the endogeneity of nutrition, we use an instrumental variable approach. Our instrument is negative rainfall shocks during childhood which can adversely affect nutrition. We find that better nutrition increases education efficiency. Therefore, investments in nutrition can advance self-sustaining long-term growth based on human capital in Sub-Saharan Africa.


Introduction
Providing access to high-quality education to each child is of utmost importance to achieve sustainable development in the long run.While in the last decades, more children than ever have been enrolled in primary school in substantial parts of Sub-Saharan Africa, the region has not experienced the expected progress in learning outcomes (Angrist, Djankov, Goldberg, & Patrinos, 2021).The World Bank has even argued that Africa faces a severe 'schooling crisis' (Bashir, Lockheed, Ninan, & Tan, 2018).For sustainable growth the learning outcomes are essential, not the time spent in school (Hanushek & Woessmann, 2008).Therefore, policymakers and scholars have been searching for tools to increase education efficiency.
We contribute to this discussion by taking a long-term perspective.First, we evaluate how numerical skills and schooling efficiency have evolved at the sub-national level between 1950 to 1999.Second, we investigate the underlying mechanisms focusing on nutritional quality because Correspondence Address: J€ org Baten, Department of Economics, T€ ubingen University, Tuebingen, Germany.Email: joerg.baten@uni-tuebingen.de Supplementary Materials are available for this article which can be accessed via the online version of this journal available at https://doi.org/10.1080/00220388.2024.2322974.previous research found that lower nutrition substantially reduces the ability to learn (Bryan et al., 2004;Currie & Vogl, 2013;Paxson & Schady, 2007).Following this evidence both international organizations and governments have invested in school-feeding programs, which provide children with meals at school, to not only improve nutrition in general but also learning.Hence, we estimate the impact of children's nutrition and health on schooling efficiency, measured as the ratio between acquired numeracy (output) and years of schooling (input).We employ average female height as our main explanatory variable.
Stature is an indicator of the quality of nutrition during childhood (Baten & Blum, 2014).While height is highly dependent on genetic factors at the individual level, averaging across individuals within a given group can specify whether they suffered from malnutrition during childhood.However, we must confine our averages to female height because of constraints in the available data.Nevertheless, we can use this to estimate the effect of childhood nutrition on schooling efficiency.
We use an instrumental variable approach (IV) using negative rainfall shocks as the IV for a causal interpretation.We chose this instrument due to its local variations and its importance in livelihoods as more than half of the population in Sub-Saharan Africa is employed in agriculture (World Bank, 2022) and little land is irrigated (Barrios, Bertinelli, & Strobl, 2010).Moreover, there are several mechanisms that can link rainfall shocks with adverse child health.First, insufficient rainfall can reduce food supply, agricultural income as well as breastmilk production (Banerjee, Duflo, Postel-Vinay, & Watts, 2010;Burlando, 2014;Hidalgo et al., 2010).Second, anticipation of a poor harvest induces maternal stress, harming foetal development (e.g.Aizer, Stroud, & Buka, 2016;Currie & Rossin-Slater, 2013;Lee, 2014).Lastly, insufficient rainfall is also linked to conflict, posing additional risks to child well-being (Hsiang, Burke, & Miguel, 2013).We carefully assess the instrumental variable (exclusion restriction and other issues) below.
For our outcome variable 'schooling efficiency', we estimate numeracy, because numerical skills are among the most relevant outputs of the schooling process (alongside with literacy).We follow the age-heaping methodology as less numerate people are more likely to misreport their age (A'Hearn, Baten, & Crayen, 2009). 1 This is visible in censuses and other surveys, as an unusually large number of people mention, for example, 'I am 40 years old' when in reality they are 39 or 41.Societies in regions and periods in which many people are innumerate tend to be societies in which many persons do not know their exact age, or they cannot calculate it from existing documents.To quantify this phenomenon, we employ a linearized version of the Whipple Index, the ABCC Index.It compares the frequency of age statements with the terminal digits zero or five to a uniform distribution.We estimate this index for each birth decade from the 1950s to the 1990s per admin I subnational area in Sub-Saharan Africa.
This indicator has been characterized as a reliable proxy for numerical skills.For example, several contributions show the negative correlation of heaped ages with numerical tests (Baten, Benati, & Ferber, 2022;Baten & Nalle, 2022;Ferber & Baten, 2023).However, a certain degree of measurement error remains, as with any proxy indicator.The key advantages of this method are as follows: First, the data requirements are comparably low as only the respondent's reported age is necessary (compared to a demanding math test); and second, the birth cohort method allows estimating of numerical abilities over long-time horizons and different geographical areas while remaining comparable. 2  For the numeracy estimation, we combine reported age data from censuses, UNICEF's Multiple Indicator Cluster Surveys and the Afrobarometer to achieve a large coverage of Sub-Saharan Africa.Thus, this study fills an important data gap about adult numeracy and provides the first comprehensive overview of basic numerical abilities at the subnational level in Sub-Saharan Africa for the birth decades from 1950 to 1990.
As a preview to our findings, we observe that children's nutrition proxied by height is a strong predictor of the efficiency of the education system.The finding is robust across several specifications.This might also explain the stagnation of schooling efficiency, as height has stagnated over this period and declined in several countries of Sub-Saharan Africa (Baten & Blum, 2014).From the 1960s to the 1980s, African heights fell modestly from the range of 169.7-169.9cm to the one of 169.1-169.0cm, while all other world regions increased in height. 3 The contributions of this paper to the literature are twofold.As a first value-added, it is the first paper to systematically estimate numeracy for Sub-Saharan Africa for the second half of the 20th century.We obtain important insights into the spatial differences by considering subnational evidence and tracing the development of numeracy in the region.Some evidence on the numeracy of school children exists, but none on adults: Data from international comparable tests have become an integral part of evaluating the quality of schooling.For Sub-Saharan Africa, the Southern and Eastern Africa Consortium for Monitoring Educational Quality (SACMEQ) and the Programme d'Analyse des Syst emes Educatifs de la Confemen (PASEC) have started to provide test scores for many countries.However, data from these international comparable tests have two important drawbacks: They cover only the in-school population and given that many children in Sub-Saharan Africa do not attend school at all or regularly, the test scores will overestimate the overall skills of a given population (Lilenstein, 2020).Moreover, since both organizations only started operation in the late 1990s, assessing the longrun development of schooling outcomes is impossible.Furthermore, only very few studies estimate numerical abilities in Sub-Saharan Africa. 4The most closely related study is by Cappelli and Baten (2021) who estimate numerical abilities with the age-heaping method for 43 Sub-Saharan African countries between 1730 and 1970 using a panel data model.However, their study does not cover most of the post-colonial period.
As a second value-added item, our study contributes to the literature about the efficiency of the educational systems.The relevance of nutrition for numerical learning has been hypothesized and studied for individual case studies, such as Kenya (Hulett et al., 2014), Ecuador (Paxson & Schady, 2007), and England around 1810 (Baten, Crayen, & Voth, 2014).We are the first to study the relevance of nutrition for the efficiency of the educational sector for a whole world region, Sub-Saharan Africa.In sum, our work uses the educational efficiency of numeracy acquisition for the first time.Using this method broadens data availability considerably.We gain significant insights on the determinants of educational effectiveness, especially nutrition.If the causes of low educational efficiency are fully understood, political decisions can be taken to improve educational efficiency.
The remainder of this paper is organized as follows: Section 2 presents the background; Section 3 discusses the data sources, the methods to estimate numeracy, and our strategy to evaluate the efficiency of the education systems; Section 4 details the results; Section 5 includes the robustness checks; and Section 6 concludes the discussion.

Background
To design effective policies that raise the overall level of education, understanding the current state and potential determinants of educational achievement is of utmost importance.Many studies and policy reports focused on literacy rather than numeracy until recently (for example, the Millennium Development Goals only considered literacy in their targets).However, Hanushek and Woessmann (2012) used a measure of math and science skills based on test scores from numerous international studies between 1964 and 2003 to estimate the effect of numeracy on economic growth.They revealed that numeracy is an even more important factor than literacy, which drove economic growth over the past decades, although both are necessary.
Even in agriculture, skills for comparing numerical proportions are essential.For example, a farmer might need to carry 50 stacks of wood.The question arises if the farmer should walk home to get a wheelbarrow or walk more often.Another example would be a microentrepreneur who could hire an employee and needs to compare costs and benefits.Both examples illustrate the far-reaching effects of basic numeracy skills on labor productivity and living standards.Thus, an overall increase in numeracy can substantially increase incomes in Sub-Numeracy, Nutrition and Schooling Efficiency 1023 Saharan Africa.Crayen and Baten (2010) show the impact of numeracy on GDP growth on a global scale, especially in many African countries.While literacy naturally remains a foundational skill, obtaining numeracy evidence is also relevant to enable populations to lift themselves out of poverty.
The study's second main question pertains to how to achieve higher educational output.Thus, over the past decades, many studies investigated which inputs into the schooling system improve its efficiency.This literature has focused both on increasing student enrolment and attendance as well as actual educational output.The theoretical framework for retrospective and (quasi-)experimental studies in this field is a schooling production function (Glewwe & Kremer, 2006).Figure 1 represents a stylized version of such an education production function.The achievement of a student is determined by numerous factors, such as the time spent in school (S), the quality of schools and teachers (Q), the student's innate ability (C), and the socioeconomic status of the household (H).
Much earlier work has focused on retrospectively estimating the parameters of such an education production function (Glewwe & Jacoby, 1994;Glewwe, Grosh, Jacoby, & Lockheed, 1995;Tan, Lane, & Coustere 1997).However, as most components of this function are highly endogenous, this causes severe concerns about the biasedness of the results (Glewwe & Kremer, 2006).Therefore, later research employed experimental or quasi-experimental designs to estimate the influence specific inputs have on attainment and hence, overall schooling efficiency.Several reviews (Evans & Mendez Acosta, 2021;Kremer, Brannen, & Glennerster, 2013;McEwan, 2015) have discussed the findings of this literature.However, these studies lack a comprehensive long-run perspective on the schooling efficiency for numeracy, a gap addressed by our study.

Data
Our main data sources are four different types of surveys-censuses provided by the Integrated Public Use Microdata Series (IPUMS), Multiple Indicator Cluster Surveys (MICS), the Demographic and Health Surveys (DHS), and Afrobarometer (AB)-which are all representative at the national level.As a caveat, we should note that not all our datasets are designed to become representative at a regional level.Hence, we show in the Supplementary Material whether the regional estimates of education of the different datasets reconfirm each other.Our code and data are available in the Supplementary Material figshare files.
Figure 2 displays a map indicating for which country which data sources are available with (a) showing the countries for which age data is available and (b) the countries for which height data is available.Overall, the different sources allow us to cover almost the entire region.A more detailed overview of the surveys can be found in the Appendix.1024 S. Ferber and J. Baten Moreover, we use data from various sources as geographic and historical control variables in our analysis.A complete list of the controls, their definition, and sources are provided in Table A.2.

Method
While data on educational inputs such as enrolment or years of schooling are available for Sub-Saharan Africa for the period between 1950 and 1999, these do not measure educational output (i.e. the skills a student learns in school, as argued above) (Pritchett, 2013;Angrist et al., 2021).Therefore, to estimate numeracy, we employ age heaping.
3.2.1.Numeracy.Age heaping is commonly used in the economic history literature to proxy for numerical abilities as statistics on numeracy are not available for most periods.Age heaping Numeracy, Nutrition and Schooling Efficiency 1025 is the tendency of individuals to round their age to preferred terminal digits such as zero or five.For example, an individual might state 40 as his or her age when in fact it is 38 (as mentioned before).While this phenomenon is a serious problem for demographers as these false age statements prove difficult in estimations such as population forecasts, it allows us to estimate the numerical abilities of a given population.We employ the Whipple Index, the ratio of the observed frequency of ages that end in zero or five to a uniform distribution where ages ending in zero or five should only constitute onefifth of the entire population.To compare the development of numerical abilities over time and across space, we estimate this index for each birth decade from the 1950s to the 1990s per admin I subnational area in Sub-Saharan Africa.Admin I areas are the largest subnational divisions of a country.In a few cases, the availability of geographic indicators forced us to group some regions.Moreover, we employ the newest administrative borders in each country.In most cases, these boundaries have changed over time; however, this allows us to divide the sample into consistent subareas comparable over time and to contemporary statistics.
where i denotes the subnational area for birth decade t.The index ranges from 0 to 500, where a value of 500 indicates all individuals reporting an age that is a multiple of five.A value of 100 shows no heaping, and a value of 0 means that no individual in the respective population stated an age ending in zero or five.This implies a five-point increase in the Whipple Index, which equals one percentage point increase in the share of heaped ages.To make the index well understandable, we utilize the ABCC Index (which is a simple linear transformation of the Whipple Index) 5 .It displays the approximate share of individuals who correctly report their age.
We restrict the age range between 23 and 62 for multiple reasons.First, older people tend to exaggerate their age, which would bias the estimates.Second, individuals younger than 23 are excluded as age heaping is much less observed among younger people. 6Crayen and Baten (2010) observed that individuals between 23 and 32 are comparably more likely to heap on even numbers than older individuals.Hence, we apply their adjustment method for the age group 23-32 (subtracting 25%). 7 It is also important to consider that in a regular age distribution, there will be fewer individuals at age 54 than 50 because some people die.To avoid a potential mortality bias, we estimate numeracy separately for age groups beginning with terminal digit 3 (i.e. from 23 to 32, 33 to 42, 42 to 52, and 53 to 62).We assign each age group the birth decade that most individuals in that age group belong to.If there are several surveys for a birth decade at the subnational level available from one type of source (e.g.there is data for the 1960 birth cohort in each of the three Mali censuses), we calculate the average weighted by the respective sample size.
Previous literature (A'Hearn, Delfino, & Nuvolari, 2022; F€ oldv ari, Van Leeuwen, & Van Leeuwen-Li, 2012) has discussed several potential biases when employing the ABCC Index as a proxy for numerical abilities.These potential biases are a respondent bias (men answering on behalf of women), a marriage bias (wives adapting their age to their husbands' age), an ageing bias (increased heaping as individuals age) and an enumerator bias (counterchecking of age statements by enumerators).Ferber and Baten (2023) validate the African data and check for these potential biases.We provide a summary of their results in the appendix.They provide evidence that the ABCC Index is well suited to capture numerical abilities in Sub-Saharan Africa and does not fall prey to substantial biases.Hence, the conclusion is that the ABCC Index is an appropriate method for our endeavor.
3.2.2.Schooling efficiency for numeracy.We calculate our schooling efficiency measure by estimating the ratio of the ABCC index of numeracy (¼the educational output) to the average years of schooling (¼the educational input) per region and birth decade: This ratio between the output in numeracy skills and the year-of-schooling input helps to assess the efficiency of the educational system.We need a strategy of standardizing both variables because numeracy does not equal zero for individuals who have not been to school.Basic numeracy is partly acquired in the family and other social environments of young children (Benavides-Varela et al., 2016;Niklas, Cohrssen, & Tayler, 2016).Thus, we first standardize both components of our efficiency measure to consider this.We loosely follow the methodology of the Human Development Index that expresses an indicator between its minimum and maximum, setting the former to zero and the latter to one.
Another change we need to make is to not employ the observed maximum years of schooling.Basic numerical skills measured by the ABCC index should be achieved latest at the age of finishing primary schooling.The median length of primary schooling in Sub-Saharan Africa is six years; thus, we set the maximum number of years of schooling to this.If we did not make this adjustment-regions with very high levels of numeracy and a high number of average years of schooling would appear overly inefficient compared to regions that also have high levels of numeracy but fewer years of schooling.Yet, the former regions probably achieved basic numeracy years before they ended schooling and moved on to more advanced mathematics.However, we would not be able to capture this in our data without our standardization strategy.
For clarity, we discuss a numerical example: Assume for Eq. ( 4) we only have the following ABCC it values in our dataset: f5, 33, 42, 69, 88g: So ABCC min ¼ 5 and ABCC max ¼ 88: If we plug in our values in the standardization formula shown above, we get: Please note that our efficiency measure considers the ratio between inputs and outputs.It does not focus on the underinvestment in school-year inputs.For example, a poor region with only one year of schooling might reach a high efficiency even with numeracy output below the average.In such a situation, the parents might support the numerical learning in the family, and other factors, such as child nutrition, might be of sufficient quality.This is not necessarily Numeracy, Nutrition and Schooling Efficiency 1027 a region of high development status (because of a lack of inputs), but it will inform our analysis of why the large increase in inputs in SSA overall did not lead to a corresponding increase in skill output.In other words, we would like to emphasize that we need to understand the issue of why the significant investment increase in schooling years in Africa did not result in more numeracy output.A potential issue may be the quality of nutrition, which we discuss in the following section.
3.2.3.Nutrition.We use the average height of a given population as an indicator of nutrition during childhood.Numerous studies have demonstrated that high-quality nutrition is key for the development of cognitive abilities (Bryan et al., 2004) and adult height (Baten & Blum, 2014).A child that does not grow enough during childhood due to a lack of adequate nutrition will not catch up later in life to reach its potential height (Leroy, Ruel, Habicht, & Frongillo, 2015).Therefore, we use female height per admin I area as the main explanatory variable in our model. 9Our baseline specification is a pooled OLS model in which we cluster by admin I region.
where ScEf f it is the estimated efficiency of schools per region per birth decade, and Height it is height per region and birth decade.Z it is a vector of control variables, and X it are birth decade and larger regions fixed effects ('larger' regions are the regions of East, West, Central, and South Africa).Besides the geographic control variables, we include age at first marriage, the share of Muslims, and a measure of religious fractionalization as control variables (available at the admin I level). 10We opted not to use a fixed effects panel data model because schooling efficiency is highly persistent.Thus, there is not enough variation over time to find any effects in a fixed effects model.
A potential problem of the regression could be an omitted variable bias.Thus, we turn to a quasi-experimental strategy and use an instrumental variable (IV) approach.The instrument we propose for Height is the cumulative monthly negative percentage deviation (Ahmed & Ray, 2018) in the district of residence in the year stated as birth year plus the two years before and after. 11Another potential issue could be survivor bias of height from cohort data.A survivor bias can occur if some individuals have already died, in this case those of lower stature, and are hence not included in the data biasing the average height upwards.In the supplement, we discuss this possibility and find that it is not substantial.In addition, the IV strategy also circumvents potential measurement error issues.
There are two main arguments for the choice of IV.First, there is an extensive literature on critical period programming (Knudsen, 2004) that finds that nutritional conditions during the in-utero period and early infancy are crucial determinants of not only child health but also health during adulthood and adult height (see also Barker, 1990;Behrman & Rosenzweig, 2004;Black, Devereux, & Salvanes, 2007;Alderman, Hoddinott, & Kinsey, 2006;Oreopoulos, Stabile, Walld, & Roos, 2008).Second, according to the World Bank (2022), more than 60 percent of the total employment in Sub-Saharan Africa was in agriculture in 2000, with even higher shares in earlier periods.Little land is irrigated in Sub-Saharan Africa (Barrios et al., 2010) such that sufficient rainfall is a key input for a good harvest (Schlenker & Lobell, 2010).A comprehensive overview of the link between climatic conditions and the economy is provided by Dell, Jones, and Olken (2012).Further information on the reasoning and potential mechanisms is provided in the supplementary materials.
We use Version 4 of the Climate Research Unit gridded Time Series by the University of East Anglia (CRU data) for 1948-2001 (Harris, Osborn, Jones, & Lister, 2020).The dataset provides monthly rainfall data at a 0.5 Â 0.5 resolution for the entire world, except Antarctica.The data is collected from an extensive network of weather station observations and is interpolated using angular-distance weighting.We calculate the mean rainfall per admin I region for each month between January 1948 and December 2001 and the long-term average.Next, we define the rain shock variable as the cumulative monthly negative percentage deviation for the year of birth plus the two years before and after (Ahmed & Ray, 2018).We calculate the value for each individual in our dataset before taking the average for each district and birth decade.
We add the two-year buffer for several reasons.First, we calculate the year of birth for each individual by subtracting the stated age from the year of the interview.Since some individuals have rounded their age, the derived year of birth will not be accurate for all observations.If non-numerate individuals heap to the closest number with the terminal digit zero or five, the margin of error is approximately two years.Second, we do not know the month of birth.It might be January or December, adding the year before and after ensures that we also cover the in-utero period and early infancy for those who stated a correct age.Naturally, there are some drawbacks given the wide margin.However, at worst, if the true year of birth is in the first year of our margin, we cover more of the infancy period, whereas if it is in the last year, we cover the pregnancy and conditions for the mother pre-pregnancy which can be relevant as well for in-utero health exposure.Thus, we believe that overall, we can capture a critical period of a child that has a lasting impact on adult outcomes.
Thus, our first stage regression is where Rainshock is the negative rainfall shock of region i during birth decade t.
An omnipresent potential concern with IVs is the validity of the exclusion restriction.However, since rainfall is arguably exogenous and is not serially correlated over time (Paxson, 1992), rainfall at birth is unrelated to rainfall during later ages.Moreover, we only consider within-district variation and do not compare rainfall across regions.Therefore, as long as the negative shock caused by insufficient rainfall does not last over several years (other than its long-lasting impact on health), the exclusion restriction is fulfilled.An effect of a negative rainfall shock that could potentially last over several years is that it affects wealth.For example, a household living in subsistence agriculture might have to sell their assets accumulated over years of hard work to survive.If this shock lasts until the child starts to acquire numerical skills, then this could violate the exclusion restriction.However, for this to be the case, the rainfall shock must be very severe.Yet, our data shows more than 70 percent of observations never experienced a negative rainfall shock that is two standard deviations below the long-term average of a district.And in more than 95 percent of observations, the average number of months during pregnancy and infancy is less than one month with a severe negative rainfall shock on average.Thus, while no observational study can ever be sure that the exclusion restriction does not pose a problem, we can be reasonably sure that this problem is not of substantive dimensions here.
3.2.4.Descriptives.Summary descriptive statistics for our main variables are provided in Table A .3.All estimates are at the subnational region level and per birth decade.On average, the ABCC index is about 83.However, numeracy varies between less than 30 and 100 percent.Our measure of education efficiency (in logs) is on average about 0.23.Moreover, years of schooling has a mean of 5 years in the sample.
The relationship between the ABCC Index and average years of schooling in our sample is assessed in Figure A.1.We observe a positive relationship indicating that more education does increase numerical skills.However, the error bars indicate that there is quite some variation in each categorythe efficiency of the education system differs substantially between regions.

Numeracy
The spatial distribution of numeracy has remained highly similar and exhibited a strong path dependency.Figure 3 demonstrates the numeracy level for each region per birth decade.Southern Africa has achieved high levels of numeracy for the 1950 birth decade.Thus, our measure is not that useful anymore for studying this area since it cannot detect differences at higher levels of numerical abilities.Other areas with high levels of numeracy are Madagascar and the DR Congo.However, the DR Congo and Madagascar experienced a decrease in numerical abilities over time.In contrast, we observe the lowest level of numeracy in Western Africa, the Sahel area, and Ethiopia.Northern Nigeria has especially low values in all birth decades.Overall, the spatial distribution does not change much over time, and areas that only

Schooling efficiency
While showing a slight downward trend, given the increase in years of schooling over our observational period, the spatial distribution of schooling efficiency has also remained remarkably stable.Figure 4 illustrates our estimates for schooling efficiency for each region per birth decade.The countries with the highest efficiency levels according to our measure are Burkina Faso and Chad, while at the bottom end, we find Nigeria in each period and some areas of Ethiopia and Guinea.We observe strong and persistent spatial differences over time.A contributing Numeracy, Nutrition and Schooling Efficiency 1031 factor to this observation could be increased class sizes and lower average ability in schools as children who were only enrolled in later decades tend to be the weaker students.
Next, we turn to the regression analysis results that investigate nutritional status as a driver of schooling efficiency.Table 1 synthesizes the baseline results.Column (1) shows the raw correlation between height and our measure of education efficiency.This is significant at the one percent level.Subsequently, we add birth decade and African regions fixed effects in column ( 2), age at marriage, the share of Muslims and a measure of religious fractionalization in column (3), and our set of geographic controls in column (4).The height coefficient is of substantial size if we include the geographic controls in column (4) and remains statistically significant at the one percent level.Our results suggest that increasing average height by one centimeter increases schooling efficiency by about 6 percent.
As mentioned before, we need to consider potential endogeneity, arising, for example, from measurement error or omitted variable bias.Therefore, we now present the results from our instrumental variable approach in Table 2. Odd-numbered columns show the first-stage results and even-numbered columns the second-stage results.The first stage results are in all specifications highly statistically significant and exhibit F-Statistics above 10 (Stock & Yogo, 2005).Thus, we do not have to face a weak instrument problem.The instrumental variable results confirm our previous results that height as a proxy for nutrition during early childhood is significantly related to education efficiency.The higher the average height within a population, the more efficient the education system.
Overall, we find that height as a proxy for children's nutrition is significantly related to education efficiency in Sub-Saharan Africa.Given that we find this based on an IV approach, we may claim causality.There can be two underlying mechanisms.On the one hand, better nourished and healthier children are more likely to attend school, and on the other hand, healthier and well-nourished children in school are better able to learn.We cannot distinguish between Notes: These regressions show our IV model which estimates the relationship between adult height (a proxy for health during childhood), and schooling efficiency.Height is instrumented by rainfall during pregnancy and early childhood.The underlying dataset is a panel of African admin I regions for birth decades 1950-1999.The first set of regressions is without further controls, the second adds birth decade and regional fixed effects, the third sociodemographic controls and the fourth geographical controls.The sociodemographic controls include age at marriage, share of Muslims, and religious fractionalization.The geographical controls include the length of colonization, colonial railways, ancient trade routes, diamond mines, explorer routes, missions, nutrient availability, soil workability, malaria ecology index, petroleum sites, ruggedness, tsetse fly suitability, the ratio of the suitability for nomadic pastoralism to sedentary agriculture, distance to capital, distance to coast, population density and area.Standard errors are in parentheses and clustered at the admin I level.Asterisks denote significance levels at ÃÃÃ p < 0.01; ÃÃ p < 0.05.
Numeracy, Nutrition and Schooling Efficiency 1033 these two mechanisms yet provide evidence that improving children's nutrition can be an important contributor to increasing education efficiency in Sub-Saharan Africa.

Robustness checks
Our main specification provides estimates for which we use the whole range of ABCC values.
As there is a larger number of top-coded observations in our data, we also provide estimates using an ABCC of 95, 90, or 85 as cut-off values for a robustness check.In Supplementary Table S.2, we show that our results are not driven by a sample composition effect and hold if we use the different cut-off values.The estimates remain significant and similar in size.Moreover, we provide alternative estimates if we alter our outcome variable slightly.As discussed, we limit years of schooling to six years as this is the median length of primary school education.Supplementary Table S.3 summarizes our results, which are virtually unchanged if we use four, five, seven, or eight years of schooling.
We also assessed the robustness of the instrument by excluding the most arid and rain-intensive part of the distribution (Table 3).Furthermore, in Table 3 we provide estimates for predominantly rural and urban areas separately.Interestingly, we do find a significant difference in our estimates for rural and for urban areas indicating that the link between nutrition and schooling efficiency is stronger in rural areas.Since rural regions are on average affected more severely by poverty this could hint that undernutrition is a more severe problem in rural areas.Again, the results of Table 2 can be confirmed in these alternative specifications.Additionally, Supplementary Tables S.4 and S.5 provide estimates, excluding extreme IV values and using alternative definitions of our IV.We observed that the instrument works in all robustness tests.Lastly, we show in Supplementary Table S.6 that our findings are robust to spatial autocorrelation.

Concluding discussion
Ensuring every child has access to quality education is crucial for achieving long-term sustainable development.Despite increased primary school enrolment in significant portions of Sub-Saharan Africa over the past decade, the region has not witnessed the anticipated improvement in learning outcomes (Angrist et al., 2021).Consequently, policymakers and scholars are actively seeking tools to address this challenge.
Our paper is the first to provide long-term evidence on developing numerical skills at the sub-national level in substantial parts of Sub-Saharan Africa for the birth decades 1950 to 1990.Throughout the observational period, we notice high levels of numeracy in Southern Africa and parts of Central Africa compared to the remainder of the region.Moreover, we observe very little progress in line with contemporary findings of low educational achievement despite increasing enrolment and years of schooling (Angrist et al., 2021).
Therefore, we contribute to this discussion by adopting a long-term perspective.Initially, we assess the evolution of schooling efficiency over the observed period and observe minimal change.Persistent regional disparities further underscore the importance to comprehend underlying mechanisms.Secondly, we examine the impact of children's nutrition and health on schooling outcomes, using average adult height as the primary explanatory variable.Our findings affirm the significance of children's nutrition as a key predictor of schooling efficiency.Additionally, we employ an instrumental variable (IV) approach, using the average months of exposure to below-average district rainfall during pregnancy and infancy as our instrument.The IV results substantiate our findings, and our results remain robust across various specifications.
While the quantity of education input has increased over the last decades in SSA (Baten & Maravall, 2021), this does not necessarily imply that educational efficiency increased.While the Numeracy, Nutrition and Schooling Efficiency 1035 input in years of schooling increased from 4.4 to 6.6 (1960s-1990s), the output has increased at a modest rate: For example, numeracy increased only slightly from 82.1 in both the 1950s and 1960s to 84.9 in the 1980s, and 86.2 in the 1990s.Hence, efficiencywhich is the ratio between the twostays at a similar level or shows slightly declining tendencies.Specifically, in some countries, efficiency decreased, while in others, it increased or remained constant.Similarly, le Nestour, Moscoviz, and Sandefur (2022) observed a decrease in literacy production efficiency for several African countries, while other countries experienced literacy production efficiency increases.Our results suggest that nutritional quality mattered for these outcomes.How can we improve the nutritional situation of school children?One of the most popular interventions worldwide to incentivize children to come to school and increase their chances of learning something at school is school-feeding programs (Aurino, Gelli, Adamba, Osei-Akoto, & Alderman, 2020).The rationale behind this type of intervention is that on the one hand, children receive food conditional on attending school, and, on the other hand, well-fed children are able to concentrate and participate better in school.Moreover, given the high level of malnourishment in Sub-Saharan Africa, providing children with extra food can be considered a goal.Most evidence about school feeding interventions suggests positive effects on enrolment and educational outcomes.The recent literature moves toward the view that school feeding is also cost-effective (Aurino et al., 2020;different nuance: Parker et al., 2015).We find that betternourished children acquire better numerical skills, which is important for sustainable growth.This can change the cost-benefit analysis of school feeding programs substantially.Our analysis suggests that school-feeding programs intended for the youngest school children around age 6 or 7 can be beneficial via the nutrition effect, as at this age, children might compensate for earlier malnutrition issues partly.Even better would be preschool protein supplement programs (Hulett et al., 2014), although we admit that this might be prevented by financial constraints.Protein would be particularly relevant, as Case and Paxson (2008) identified these as predictors of later-life abilities (and protein intake correlates with height; see also Baten et al., 2014, Baten & Blum, 2014).If these programs would be targeted specifically at the most problematic regions that we identify in the maps of Figure 4 (and on the poorest families within these regions), they can contribute to solving the 'schooling crisis'.
Notes 33-42, and so on, as the age heaping-education relationship was much closer for age groups.This does not imply that rounding on age 20 was negligible, but rounding on 18 and 22 was also very substantial, and the shifts to these ages have not been modelled yet in a way to obtain a reliable indicator.Also taking all round ages such as 18 or 22 as "heaped" does not result in a reliable proxy, that can be compared with other age groups.7. Crayen and Baten (2010) have studied a large, global sample for the birth decades of the 1870s to 1940s using a country-decade panel of 1549 observations to identify to which degree individuals of age group 23-32 rounded less on multiples of five, compared to later age groups who were born in the same birth decade, but were interviewed in later censuses.For example, those born in the 1880s, age 23-32 in 1910, were compared to the same persons born in the 1880s, but interviewed in the 1920s, when they were 33-42.Crayen and Baten (2010) estimated an adjustment of the ABCC by −25% for the age groups 23-32.This resulted in a quite similar numeracy level for the same birth cohorts, independent of their age during the census.Clearly, the correct adjustment might be −24% or −26% in some cases, but we cannot identify the subtle differences.The average reduction of 25 percent moves the estimates for this age group closer to the true value.8.We acknowledge that years of schooling as our educational input may not fully capture the educational input in a nonformal environment (i.e., the household).However, the years a child ideally spends in a school are the same in which learning in a nonformal environment needs to take place.Thus, we believe that using years of schooling is a good approximation for the time a child spends learning, whether in a school or a nonformal environment.Moreover, the average years of schooling in a region naturally declines the more people never attend school, thus, reflecting the overall availability of schooling and attendance at schools, which are crucial inputs at the early stages of educational expansion.9.The DHS has good coverage of height data for women, but only very few surveys have data on male heights.Therefore, we opted to only use the female height data to ensure comparability between samples.Moreover, given that women and girls are the more marginalized group compared to their male counterparts, using female height data as a proxy for children's undernutrition might even be the more reliable indicator for undernutrition.If food becomes scarce, it is often first redistributed within the household from females to males (Doss, 2013).Hence, female height might be more 'sensitive' to periods of malnutrition during childhood.10.We include age at first marriage as a proxy for female empowerment (Baten & de Pleijt, 2022).We include the share of Muslims as a control variable as the average educational attainment of Muslims is on average lower than that of Christians.Similarly, we include religious fractionalization to control for the homogeneity of the religious community and religious competition has been linked to higher educational outcomes (Gallego & Woodberry, 2010).To calculate religious fractionalization, we follow Alesina, Devleeschauwer, Easterly, Kurlat, and Wacziarg (2003).11.We acknowledge that individuals may have moved over the course of their lives such that the conditions in the district of residence might not resemble conditions in the actual birth district.However, there is only limited information about the district of birth for a small subset of respondents.Thus, we can only calculate the rainfall shock for the current district of residence.
regional and the household level.A positive and highly significant correlation emerges, demonstrating that with low numeracy in parents also exhibit poorer math performance in their children.The correlation coefficient is as large as 0.67 (p ¼ 0.00).Moreover, for the least numerate countries (Chad, Sierra Leone and Togo) an analysis at the household level shows that caretakers' age statements with the terminal digits 0 or 5 are significantly correlated with lower math test performance of children.These findings confirm the suitability of age heaping-based numeracy estimation in SSA.
We now turn to the potential biases that have been discussed in the literature before.

Respondent bias
A potential concern in surveys is that a husband or male relative might answer questions on women's behalf.However, MICS surveys, designed to capture women's and children's insights, provide confidence in women's self-responses, while IPUMS or AB do not specifically target either gender.To verify, Ferber and Baten (2023) compare ABCC Index based on IPUMS and AB data against MICS data, re-estimating the index separately by gender.A regression equation incorporates data source and gender dummies.The findings show no significant respondent bias in IPUMS, and only a minor bias in AB data.

Marriage bias
Another concern is that married women may adjust their reported ages to match their spouses', potentially leading to an unintentional boost in the estimated numeracy levels of married women due to men's on average higher numeracy.The authors re-estimate ABCC Index by gender and marital status, comparing single, in union, separated/divorced, and widowed individuals.Contrary to concerns, the results do not support a marriage bias.

Ageing bias
Thirdly, there is some concern that people heap more as they age such that they would appear to be less numerate in their fifties than in their thirties.If this were accurate, older birth cohorts would inherently seem less numerate (and not because they went to school in a period of lower education).To address this concern, the authors estimate the ABCC Index for different age groups within birth decades.Comparing numeracy estimates at various life stages, the results show no significant downward trend.

Enumerator bias
Enumerators' potential age-checking bias is considered by comparing ABCC Index between overlapping IPUMS, AB and MICS data.A model with a data source dummy is estimated to detect bias.Results reveal a small bias in MICS data but not in AB data.Consequently, the MICS-derived ABCC Index is corrected by the estimated bias.

Alternative heaping patterns
Concerns about Whipple Index's sensitivity to heaping patterns beyond zero and five are explored.Manual inspection of the data reveals some heaping on the digits two and eight in some countries.The authors estimate an alternative index that accounts for this heaping pattern and compare it to our original one.They only find minor differences between the ABCC Indices and the original ABCC Index also correlates more strongly with education indicators such as years of schooling and literacy.

Descriptives
Table A.3 provides summary descriptive statistics for our main variables.All estimates are at the subnational region level and per birth On average, the ABCC index is about 83.However, numeracy varies between less than 30 and 100 percent.Our measure of education efficiency (in logs) is on average about 0.23.Moreover, years of schooling has a mean of 5 years in the sample.1044 S. Ferber and J. Baten The relationship between the ABCC Index and average years of schooling in our sample is assessed in Figure A.1.observe a positive relationship.The error bars indicate that there is quite some variation in each categorythe efficiency of the education system differs substantially between regions.
Numeracy across the continent and in some sample countries increased only modestly (Figure A.2.).Niger even shows a declining trend.This seems to contradict Figure A.1, as the increase in school years should have resulted in more numeracy.However, the relationship in Figure A.1 depends mostly on cross-sectional variation, while the effect on numeracy over time was quite limited (and similarly the effect on literacy, see Ferber et al., 2023).
Numeracy, Nutrition and Schooling Efficiency 1045

Figure 3 .
Figure 3. ABCC Index per admin I level for each birth decade between 1950 and 1990.Data from IPUMS, MICS and Afrobarometer.Authors' own representation.

Figure 4 .
Figure 4. Schooling efficiency per admin I level for each birth decade between 1950 and 1990.Data from IPUMS, MICS, Afrobarometer and DHS.Authors' own representation.

Figure A. 1 .
Figure A.1.Relationship of average years of schooling and numeracy (ABCC Index).Data from IPUMS, MICS and Afrobarometer.Authors' own representation.

Figure A. 2 .
Figure A.2. Development of numeracy over time for selected countries -Benin, Niger, Kenya, and Zimbabwe.The average refers to all Sub-Saharan African countries included in the data set.Source: IPUMS, MICS and Afrobarometer.Authors' own representation.
the minimum value is standardized to zero.For ABCC it ¼ 88 : ABCC stand:

Table 1 .
Baseline regressioncorrelation of schooling efficiency and height These regressions show our baseline model which estimates the relationship between adult height (a proxy for health during childhood), and schooling efficiency.The underlying dataset is a pooled panel of African admin I regions for birth decades 1950 to 1999.Due to the modest change over time in our outcome variable, we opted to use our data in a pooled panel format rather than using the panel structure to estimate fixed effects, for example.The fixed effects model would have too little variation over time.Column 1 shows the raw correlation, column 2 adds birth decade and regional fixed effects, column 3 sociodemographic controls and column 4 geographical controls.The sociodemographic controls include age at marriage, share of Muslims, and religious fractionalization.The geographical controls include the length of colonization, colonial railways, ancient trade routes, diamond mines, explorer routes, missions, nutrient availability, soil workability, malaria ecology index, petroleum sites, ruggedness, tsetse fly suitability, the ratio of the suitability for nomadic pastoralism to sedentary agriculture, distance to capital, distance to coast, population density and area.Standard errors in parentheses are clustered at the admin I level.Asterisks denote significance levels at ÃÃÃ p < 0.01.

Table 2 .
Instrumental variable regressionthe impact of nutrition on schooling efficiency

Table 3 .
Robustness checks -exclusion of the low or high extreme rain values and by rural/urban areas These regressions show robustness checks for the IV model of Table2(see notes to Table2).The first set of regressions excludes the 10 % rainiest and driest regions and the second set the 20 % rainiest and driest.All regressions include birth decade and regional fixed effects, sociodemographic controls and geographical controls.The sociodemographic controls include age at marriage, share of Muslims and religious fractionalization.The geographical controls include the length of colonization, colonial railways, ancient trade routes, diamond mines, explorer routes, missions, nutrient availability, soil workability, malaria ecology index, petroleum sites, ruggedness, tsetse fly suitability, the ratio of the suitability for nomadic pastoralism to sedentary agriculture, distance to capital, distance to coast, population density and area.Standard errors are in parentheses and clustered at the admin I level.Asterisks denote significance levels at ÃÃÃ