Investigating survivorship bias: the case of the 1918 flu pandemic

ABSTRACT Estimates of the effect of foetal health shocks may suffer from survivorship bias. The foetal origins literature seemingly agrees that survivorship bias is innocuous in the sense that it induces a bias towards zero. Arguably, however, selective mortality can imply a bias away from zero. In the case of the 1918 flu pandemic, a suppressed immune system may have been protective against the most severe consequences of infection. We use historical birth records from the maternity hospital of Bern, Switzerland, to evaluate this possibility. Our results suggest that a careful consideration of survivorship bias is imperative for the evaluation of the 1918 flu pandemic and other foetal health shocks.


I. Introduction
A health shock in utero may kill some foetuses, a situation often referred to as 'culling'. The very notion of culling implies a negatively selected 'cull' or, conversely, positively selected survivors. It further implies that ' [. . .] estimates of the effects of fetal health shocks are generally conservative when the shock also increases mortality' (Almond and Currie 2011, 165-166). We discuss this assumption in the context of the 1918 flu pandemic. Almond (2006) uses the 1918 flu pandemic as a natural experiment. His findings -strong negative effects on socioeconomic status, education, and labour market outcomes -proved highly influential. To date, the effects of foetal exposure to the 1918 flu pandemic and other influenza strains were studied comprehensively (recent contributions include e.g. Brown and Thomas 2018;Schwandt 2018). A sizable body of research studies various other foetal health shocks (Almond, Currie, and Duque 2017). Survivorship bias concerns many studies in this literature (Nobles and Hamoudi 2019).
The 1918 flu pandemic featured unusually high mortality rates among young and healthy individuals. Childhood exposure to related flu strains help explain this pattern (Worobey, Han, and Rambaut 2014) and 'vigorous immune responses directed against the virus in healthy young persons could have caused severe disease in 1918' (Morens et al. 2010, e13). The role of the immune system in influenza infection is highly complex and incompletely understood (Iwasaki and Peiris 2013), but it is plausible that women with a suppressed immune system were partly protected against the most severe consequences of the virus. If this holds true, the estimates of Almond (2006) and similar studies of the 1918 flu pandemic might be inflated.
To shed light on survivorship during the 1918 flu pandemic, we create a new data set from historical birth records in the maternity hospital of Bern, Switzerland. These data provide us with detailed information on all women delivering in the maternity hospital in Bern between 1913 and 1922. We find a 4.8 percentage points higher stillbirth probability at peak exposure in trimester 1 and slightly smaller effects in trimesters 2 and 3. We further find that the 1918 flu pandemic increases stillbirths among married mothers, but not among single mothers. Contemporary reports affirm that marital status predicts mothers' socio-economic status and children's later life outcomes (Kraft 1908). Thus, our finding provides suggestive evidence for negatively selected survivorship. We conclude that careful consideration of the nature of survivorship is paramount for the evaluation of foetal health shocks.

II. Survivorship during the 1918 flu pandemic
The 1918 flu pandemic was the most devastating of all known pandemics. The influenza A virus infected approximately one-third of the world's population in three waves in 1918 and 1919, with total deaths being estimated at 50 million or more .
The pandemic likely increased maternal mortality, miscarriages, stillbirths, and abortions. Harris (1919) documents a mortality rate of 27% among infected pregnant women and pregnancy termination among 39% of infected pregnant patients. With pregnancy as an important risk factor for influenza mortality, the scientific community discussed whether abortions should be recommended (Titus and Jamison 1919). Recent evidence suggests that the 1918 flu pandemic decreased birth rates and increased the risks of miscarriage and stillbirth (Nishiura 2009;Bloom-Feshbach et al. 2011;Dahal et al. 2018).
In contrast to other flu pandemics, the mortality pattern of the 1918 pandemic was highly unusual. Influenza mortality is usually concentrated among the very young and the very old, but many young and otherwise healthy individuals were affected in 1918 . Worobey, Han, and Rambaut (2014) argues that certain cohorts were protected by childhood exposure to genetically similar influenza viruses and excessive immune responses may explain the high mortality among young and healthy individuals . Cytokines play an important role in the human body's innate immune response to influenza A infection. While these proteins are crucial for the human body's protection against influenza A, they can cause serious harm if the immune reaction is unregulated (Fukuyama and Kawaoka 2011). So-called cytokine storms might have had deleterious effects on the young and healthy who were so strongly affected during the 1918 flu pandemic. Mounting evidence shows that the 1918 virus triggered a vigorous and pathogenic immune response (e. g. Liu, Zhou, and Yang 2016).
It is plausible that those with the most vigorous immune system suffered the most. A suppressed immune system during a pregnancy characterized by stress, malnutrition, or other environmental factors might have been lifesaving in 1918. However, It must be noted that pathogenicity of a virus infection is determined in complex ways by both virus and host factors (Fukuyama and Kawaoka 2011). Moreover, if a suppressed immune system is protective against cytokine storms, it might still be detrimental in other ways, e.g. through increased susceptibility to bacterial infection.

III. Stillbirths at the maternity hospital in Bern
We use transcribed data from the original birth records of the maternity hospital in Bern. These records include the following individual information for all deliveries that took place at the hospital: admission date, birth date, infant sex, singleton/ multiple birth, mother's age, parity, marital status, date of last menstruation (used to calculate influenza exposure), and stillbirth/live-birth. We use data from 7,711 deliveries admitted in the years 1913 to 1922.
Exposure to the influenza pandemic is measured by the weekly reported numbers of new influenza cases in the city of Bern. The existing legal obligation for medics to report infectious diseases to the city authorities was extended to influenza no later than 16 July 1918, in the second week of the pandemic (Simonin, von Erlach, and Burren 1918). Exposure variables for each trimester are calculated as the sum of all infections during the respective trimester. 1 The exposure variables are normalized, such that the highest possible exposure in each trimester is 1. Table 1 shows summary statistics for our estimation sample. 48:2% of our sample give birth to a girl, 3:1% are multiple deliveries, and the average mother in our sample gives birth to her third child at 27:7 years. The maternity hospital of Bern provided medical care especially for women in need, served as a training hospital for the university and as a midwifery school (Guggisberg 1931, 6-8;Dübi and Berger 1976, 12-19). 73:4% of the mothers in our data set are married and the stillbirth rate in our sample is 6:4%. For comparison, 91:4% of mothers giving birth in the city of Bern in the same period were married and 3:2% experienced stillbirth (Eidg. statistisches Bureau, 1924). Table 1 also shows exposure variable for each trimester, with a minimum of 0 and a maximum of 1.
We estimate logistic regression models of the following form and present marginal effects.
stillbirth ¼ α þ βe trim þ γ 1 girl þ γ 2 multiples þ δ age þ δ parity þ δ month þ δ neighborhood þ ε Our outcome of interest is stillbirth, a dummy variable equal to 1 for stillbirths and 0 for live births. Each of our models includes one of the three trimester exposure variables, e trim . The variable girl indicates the infant's sex and multiples indicates multiple pregnancies. δ age , δ parity , δ neighborhood , and δ month denote fixed effects for age categories, parity categories, neighbourhood, and month of the year. The first three columns in Table 2 present estimates for the full sample, where each model includes one of the three trimester exposure variables. We find substantial effects on the probability of a stillbirth. Mothers at peak exposure in trimester 1 carry a 4.8 percentage points higher risk of stillbirth, as compared with unaffected mothers.
The effects are slightly smaller for exposure during the second trimester (4.2 percentage points) and the third trimester (3.3 percentage points).
We aim to assess whether the stillbirths induced by the 1918 flu pandemic are indeed concentrated among those with poor potential outcomes. Of course, potential outcomes are not observable -we will never know what would have happened to those who did not survive. We use marital status as a proxy variable for potential outcomes. Single mothers are more likely to be economically disadvantaged and to have babies with low birth weight, which in turn predicts long-term outcomes like income and education (Aizer and Currie 2014). Social reform circles in the early 20th century report that single mothers and their children suffered from discrimination, that single motherhood was strongly correlated with poor child health, and that children of single mothers received worse occupational education (Kraft 1908).
Columns 4 to 6 and 7 to 9 in Table 1 show the estimation results for married and single mothers, respectively. The effects on married mothers are larger than the effects on the overall sample in all three trimesters and statistically significant at the 95% level or higher. Conversely, the effects for single mothers are smaller (especially in the third trimester) and statistically insignificant at conventional levels.
Although the reported effects are imprecisely measured, 2 they are quantitatively important. In the census data used by Almond (2006), roughly 50% of the population finish high school. Our estimates suggest that the 1918 flu pandemic led to stillbirths in roughly 4% of the married sample. If these 4% would all have obtained a high school degree had they survived, the surviving sample features a high school completion rate of 47.92% (46 out of 96). This corresponds to a negative effect of 2.08%, which is roughly the size of the effects in Almond (2006). This back-of-the-envelope calculation clearly depicts an extreme case, but it illustrates that the effect sizes we find can lead to quantitatively important bias.

IV. Conclusions
Survivorship bias plays a relatively minor role in the foetal origins literature, as many studies assume positively selected survivors. We argue that this  seemingly natural assumption does not necessarily hold for the 1918 flu pandemic. In particular, the medical literature suggests that women may have seen some degree of protection against the virus if their immune system was suppressed. The term 'culling' disregards such complexities, as it implies positively selected survivors. In this sense, the more neutral term 'survivorship bias' may be preferable. We collect data from the maternity hospital in Bern, Switzerland, and find that the large effects of the 1918 flu pandemic on stillbirths are driven by married mothers in our sample. We interpret this result as suggestive evidence for negatively selected survivorship. Future work may use richer data to evaluate whether our results hold in full-population samples; whether our results generalize to measures of potential outcomes other than marital status; and whether survivorship bias is a serious concern for the evaluation of other foetal health shocks.