The intertemporal connection between preschool delay of gratification and later academic performance in primary schools: evidence from China

ABSTRACT A child’s ability to resist temptation is an important non-cognitive skill and associated with lifetime benefits. Using a longitudinal dataset, this study links Chinese preschoolers’ delay of gratification to their later scholastic performance during primary education. An empirical investigation is conducted to explore the potential relation between them. The results show that this personality trait, revealed at age 4–5 years by using the marshmallow experiment, has a long-lasting and positive contribution, even after accounting for students’ cognitive performance and other non-cognitive skills measured at age 10–11 years. Our findings in a developing country context are supported by evidence from developed countries.


Introduction
Human decision making often involves intertemporal trade-offs between costs and benefits occurring at different time points.Self-control is an important skill "altering one's own responses, especially to bring them into line with standards such as ideals, values, morals, and social expectations, and to support the attainment of long-term goals" (Baumeister, Vohs, & Tice, 2007, p. 351).Various terms have been used to describe this personality trait such as self-regulation, planning ability and use of self-instruction (Agbaria & Bdier, 2020).More specifically, a preschooler's ability of restraining her/his impulses for the sake of future (greater) reward is closely related to "delay of gratification", a concept introduced by Mischel and his colleagues (e.g., Mischel, Ebbesen, & Zeiss, 1972;Mischel, Shoda, & Rodriguez, 1989;Shoda, Mischel, & Peake, 1990) using the famous marshmallow experiment. 1  In economic psychology, its discussion is about temporal discounting or time inconsistent preferences, and this ability plays an important role in overcoming present bias (see Frederick, Loewenstein, & O'Donoghue, 2002;Koch, Nafziger, & Nielsen, 2015 for overviews).
CONTACT Zheng Li lzesse@hotmail.com;zheng_li@xjtu.edu.cnSchool of Economics and Finance, Xi'an Jiaotong University, Xi'an, Shaanxi 710049, PR China 1 Children aged at around 4 years old were left alone with one marshmallow after the experimenter informed that they could have two marshmallows if they waited to eat the one marshmallow until she/he returned.
Delay of gratification in children may lead to lifetime benefits (Duckworth, Tsukayama, & May, 2010;Moffitt et al., 2011;Oreopoulos, 2007).To understand its influence on educational outcomes is of particular importance (Koch et al., 2015), given the predictive power of academic achievement for future human capital formation and economic development (Mendez, 2015).The existing evidence on the connection between preschoolers' delay of gratification and their later scholastic performance has focused on developed countries (see Smithers et al., 2018 for an overview).In the literature, there is a dearth of empirical evidence on whether the similar link holds in a developing country setting, and it is this that has motivated our paper.
Using the education production function (Hanushek, 1979;Todd & Wolpin, 2003) as a theoretical framework, this study explores this intertemporal link established on a Chinese longitudinal dataset.The long-term role of preschool delay of gratification in shaping schooling performance is quantified, while accounting for individuals' cognitive skills, non-cognitive skills, studying behaviours, and parental inputs.A key finding is that Chinese children, aged at 4-5 years old, who delayed their gratification in the marshmallow experiment are more likely to be the top students in class during primary education, all else equal, which is in line with international evidence.The organisation of the remaining sections is as follows.First, we introduce the data used in this study.This is followed by the modelling framework.After the findings and discussion, we then draw some conclusions and suggestions.

Data
This study uses the China Family Panel Studies (CFPS) of the Institute of Social Science Survey (ISSS) at Peking University.The CFPS, a nationally representative survey of Chinese society, covers a variety of themes on Chinese society, economy, families, and individuals (Institute of Social Science Survey [ISSS], 2015).Its baseline survey was conducted in 2010, and, to date, there are three follow-up surveys (2012, 2014 and 2016) which are publicly available.The CFPS has three levels of survey.At the community level, it collects information on population structure, policies and social services.The family-level data include sociodemographic status, family interaction and relationships.At the individual level, different questionnaires are used for collecting individual attributes/behaviours/attitudes for adults and children/adolescent respectively.For detailed information related to its survey design, sample procedure, data collection, please refer to Xie and Hu (2014).For recent applications of CFPS in the field of economics, please refer to Hsieh and Qin (2018), Porto et al. (2019) and Zhang et al. (2020), among others.
Based on two CFPS waves (2010 and 2016),2 the link between survey-based information on preschool delay of gratification and students' scholastic performance is established.Mischel's delay of gratification paradigm is useful for understanding between-individual heterogeneity in self-control abilities of preschoolers, however it may not be suitable for studying older age groups given that purposeful self-distraction would grow with age which makes delaying less difficult (Otto, 2013).Therefore, this study limits to those who (1) attended the marshmallow test at the age of 4-5 years old in CFPS 2010 3 and (2) also reported the rank in class in the 2016 survey. 4Under the eligibility criteria, the study sample consists of 493 observations; and 51.9% of them delayed their gratification in the 2010 survey, while the distribution for the reported rank across the sample is 6.3% for bottom 25% in class, 8.3% for 51-75%, 27.4% for 26-50%, 27.8% for 11-25% and 30.2% for top 10%.Cognitive skills and non-cognitive skills are important to schooling performance and human capital accumulation (Cunha, Heckman, & Schennach, 2010;Heckman & Rubinstein, 2001;Kaufman, Reynolds, Liu, Kaufman, & McGrew, 2012).The former (abilities in reading, writing, reasoning and mathematics) are typically measured by using standardised tests; while the latter may be captured by observing behaviour or self-reported items (Kajonius & Carlander, 2017).The cognitive/non-cognitive skills assessed in this study are given in Table 1.

Modelling framework
In this study, the dependent variable is the student's overall rank in class in terms of five ordinal categories (bottom 25%, 51-75%, 25-50%, 11-25% and top 10%).Therefore, the ordered choice model is employed, with various thresholds being used to define the ranges of the categories on the underlying latent scale.The ordered choice model is capable of accommodating ordered outcomes with unequal distinctions among these preference scales.It has a wide range of applications with regard to discrete variables (ordinal categories) such as bank rating, investor belief and performance standard (see e.g., Bellotti, Matousek, & Stewart, 2011;Hoffman & Post, 2014;Li & Hensher, 2020).The standard ordered choice model is based on the following specification.
The latent preference variable, y*, is continuous, and its observed counterpart is y i in discrete form shown in Equation ( 2): where β is the set of parameters of the explanatory variables x i ; μ j are the threshold parameters, estimated in conjunction with β based on maximum likelihood; ε i are the disturbance or error term where a logistic distribution defines the ordered logit model.For detailed information on the ordered choice model, see Greene and Hensher (2010).
To conduct a meaningful statistical analysis, the theoretical justification for selecting candidate explanatory variables in the model is rather important.The education production function (Hanushek, 1979;Todd & Wolpin, 2003) presents a useful framework that relates schooling outcomes with (1) individual skills/abilities, (2) family inputs, (3) school characteristics and (4) peer effects.Given that the rank in class (from bottom group to top group) is the dependent variable, school characteristics (public school vs. private school and class size) are used as controls in this paper, along with the student's schooling year (n th grade), age at test, age at school start and gender.Koch et al. (2015) detached peer effects from the function, and explained that they may also be decided by the family/ school.Moreover, the NSFC survey has not collected information on peer effects.Therefore, this current study dropped this component.
The key hypothesis of this paper is that the delay of gratification ability of Chinese children may play a long-run and positive role in their formal schooling performance.Other non-cognitive skills considered in the empirical model are reported by the sampled students.These self-reported variables are associated with a 1-5 Likert-scale (from "strongly disagree" to "strongly agree"), and for each non-cognitive skill, two dummy variables are refined: "strongly agree" (=1) and "agree" (=1), while taking the value '0ʹ for other three levels ("neutral", "disagree" and "strongly disagree").Both dummy variables are included in the model simultaneously to test potential nonlinearity.The age (months) that the child was able to say complete sentences and to count from '1ʹ to "10ʹ and the results of standardised word recall and number series test are used to represent our sampled individuals" cognitive skills at two different stages of development.
Family inputs typically consist of money and time investments on the child's education by the parents, and the available variables in this study include (a) monetary inputs: the child' kindergarten fee in the year of 2010 and primary education fee in the year of 2016, and (b) time-related inputs: whether read to the child every day (1,0) and the number of heart-to-heart conversations per month, as well as (c) indirect inputs: indirect inputs: mother and father's education.Several studying behaviours are also included in the main specification: non-weekend studying time (hours), weekend studying time (hours) and whether the student had cut formal class before (1,0).Table 2 summarises the descriptive statistics for the expandatory and control variables included in the empirical analysis presented in the following section.

Empirical results and discussion
A normalisation is needed so that a constant can be identified, in which the threshold parameter for between Level 0 and Level 1 equal to zero (Mu(0)) and estimate the parameters between Level 1 and 2 (Mu(1)), Level 2 and 3 (Mu(2)), and Level 3 and Level 4 (Mu(3)), which are the threshold values for the corresponding ranks in class, that is, value<0: Bottom 25%; 0< value<Mu(1): 51-75%; Mu(1)<value<Mu(2): 25-50%; Mu (2) <value<Mu(2); value>Mu(3): 11-25%; and value>Mu(3): Top 10%.The specification with the full list of explanatory and control variables introduced in Section 3 is estimated, which is the empirical model in this study.As a robustness check, an alternative specification is also estimated, and the comparison of two models (Model 1: Full specification vs. Model 2: Without other non-cognitive/cognitive skills) suggests that the core parameter estimate of this study, namely preschool delay of gratification, is robust to alternative specifications (see Table 3).The empirical model suggests that preschool delay of gratification may play a long-run role in later schooling performance.Other significant skills include two types of cognitive functioning: word memory and mathematics in terms of dummy variables,5 as well as conscientiousness (the tendency to be hardworking, responsible and organised, Smithers et al., 2018), attention (the awareness of focusing on certain aspects of the environment, Smithers et al., 2018) and low self-esteem (the global appraisal of the self in terms of a feeling of worthlessness, Axelsson & Ejlertsson, 2002).Relative to directly using the point estimates in Table 3, a more informative way is to use the partial effects.For a continuous variable, a partial or marginal effect represents the influence on the choice probability of a particular outcome of one-unit change in an explanatory variable.For a dummy variable, the partial effects are the derivatives of the choice probabilities given a change in the level of the dummy variable from '0ʹ to '1ʹ.Table 4 presents the identified important partial effects on the response probabilities.We found a strong correlation between preschool delay of gratification and scholastic performance during primary education.When facing a choice between (1) immediately obtaining one gift or (2) obtaining two gifts after the interview, children who chose the second option would have a higher chance of being the top-10% students in class (+9.41%).Moreover, its raised likelihood of being among the best performed group is much higher than for being among the second best group (1.28%).This self-control ability is associated with the reduced likelihood of becoming normal students (−6.42% for being ranked among 26-50% and −2.99% for being ranked among 51-75%, as well as the chance of being the worst performed students (−2.14%).The corresponding partial effects estimated from Model 2 which controls for other non-cognitive and cognitive skills are: −0.0221(0.009),−0.0267(0.011),−0.0520(0.021),0.0154(0.008)and 0.0853 (0.035) respectively for the response probabilities from "bottom 25%" to "top 10%", similar to what are reported in Table 4, which would, to some extent, support the robustness of our empirical findings on the intertemporal link of interest.
In addition to individuals' non-cognitive skills, cognitive abilities and studying behaviours, we found that parental education is positively related to their offerings' schooling outcomes.This connection is referred to as intergenerational schooling associations (Holmlund, Lindahl, & Plug, 2011;Pronzato, 2012;Fleury & Gilles, 2018 for an overview).According to Cunha and Heckman's (2007) life-cycle model of learning, parental schooling may add children's initial ability endowment, which would, in turn, has a positive contribution to their learning productivity.In China, Magnani and Zhu (2015) and Dong, Luo, Zhang, Liu, and Bai (2019) found some empirical evidence on the intergenerational transmission of schooling.These findings highlight the long-run benefits of education improvement, which is associated with compound gains in economic and social returns.

Conclusions
Using two waves of the CFPS study, this paper investigates the intertemporal connection between Chinese children' self-control abilities and their schooling outcomes within the theorical framework of the education production function.The model outputs in this study demonstrate the long-run associations between Chinese preschoolers' gratification delay at age 4-5 years and their academic outcomes at age 10-11 years.In the USA, Shoda et al. (1990) found that the ability to delay gratification at the age of four years old is positively correlated with the Standardized Aptitude Test (SAT) results; Watts, Duncan, and Quan (2018) also found that this ability predicted a gain in schooling achievement at age 15.According to the cross-country findings of Pearce et al. (2016), Children in Australia (6-7 years) and in the UK (5 years) with low self-regulation are more likely to deliver worse academic outcomes at 7-9 years old.Another Australian study (Sawyer et al., 2015) found that the ability of self-regulation in early childhood positively correlates with formal schooling performance.These evidence from developed countries would strength the credibility of our conclusion, that is, preschool self-control may be a predictor of academic achievement for Chinese primary school students.However, children's delay of gratification outcomes could be influenced by the trustworthiness of the experimenter (Michaelson & Munakata, 2016), environmental reliability (Kidd, Palmeri, & Aslin, 2013) and children' emotions (Shimoni, Asbe, Eyal, & Berger, 2016).The marshmallow experiment might be associated with measurement error (Cunha et al., 2010) for example the ability to exercise self-control.Such information is not available and hence cannot be separated from observed outcomes, which is a major limitation to this current study.
Besides its contribution to educational success, other returns to this non-cognitive skill are substantial such as health, social functioning and wealth, according to some longer-term follow-up studies (Golsteyn, Grönqvist, & Lindahl, 2014;Moffitt et al., 2011).This important ability is malleable and can be improved (Drobetz, Maercker, Spiess, Wagner, & Forstmeier, 2012;Piquero, Jennings, & Farrington, 2010).However, there is no evidence supporting that one type of intervention is always superior to another (Smithers et al., 2018).When designing interventions, it is imperative to investigate the potential effects of genetic factors and environmental factors on its malleability (Koch et al., 2015).

Table 1 .
Cognitive skills and non-cognitive skills assessed in this study.
3 CFPS 2010 conducted the delay of gratification experiment as follows: The interviewer verbally informed the child that: "I can only give you one candy if you want it now.However, I can give you two if you wait until we complete the interview.Do you want it now or will you wait until we complete the interview?"Thechildthen made a choice between two options.4Theabsolute measure of performance is not available in the 2016 survey.The rank outcome is obtained across different schooling years, according to each sampled student's academic performance prior to the 2016 survey.

Table 2 .
Descriptive statistics of the variables included in the main specification.

Table 3 .
The ordered logit modelling results.

Table 4 .
Important partial effects for the empirical model (Model 1).