A survival analysis of the last great European plagues: The case of Nonantola (Northern Italy) in 1630

This paper develops the first survival analysis of a large-scale mortality crisis caused by plague. For the time-to-event analyses we used the Cox proportional hazards regression model. Our case study is the town of Nonantola during the 1630 plague, which was probably the worst to affect Italy since the Black Death. Individual risk of death did not depend on sex, grew with age (peaking at ages 40–60 and then declining), was not affected by socio-economic status, and was positively associated with household size. We discuss these findings in light of the historical–demographic and palaeo-demographic literature on medieval and early modern plagues. Our results are compatible with the debated idea that ancient plague was able to spread directly from human to human. Our methods could be replicated in other studies of European plagues to nuance and integrate the findings of recent palaeo-biological and palaeo-demographic research on plague.


Introduction
Demographers and historians have long been aware of the fundamental role played by plague in shaping medieval and early modern European population dynamics. Having disappeared from the continent during the early Middle Ages, plague returned in 1347. It was the beginning of the terrible Black Death pandemic, which killed between 30 and 60 per cent of Europe's population (Del Panta 1980;Benedictow 2004). Such a catastrophic event had an enormous demographic impact, as well as important social, cultural, and economic consequences (Alfani and Murphy 2017). What is more, plague had come back to stay, as recently confirmed by palaeo-biological studies (Bos et al. 2016;Seifert et al. 2016) and, in the following three centuries or so, outbreaks of the disease were a fairly common occurrence (Biraben 1975;Livi Bacci 2000. By itself, plague was able to deeply alter European demography, establishing a 'high-mortality' regime (Clark 2007). Only from the late seventeenth century did plague begin to retreat from Europe and its final disappearance is usually counted among the factors explaining the observed decline in mortality at the beginning of the demographic transition (Livi Bacci 2000).
Although the importance of plague for pre-transitional demography can hardly be overstated, in many regards it continues to be a mysterious disease. Nevertheless, there is a scarcity of new studies, especially those of the demographic kind, providing new data. From a macro-demographic perspective, one such study recently focused on the last great plagues of the seventeenth century, suggesting fundamental differences across Europe, both in the epidemiology and the overall demographic impact of the disease (Alfani 2013a). The research situation is better at the micro level, where recent works have covered such diverse aspects as the household structure of plague mortality and its patterns of transmission Whittles and Didelot 2016), the within-city diffusion of plague and the seasonal pattern of mortality (Galanaud et al. 2015;Cummins et al. 2016), the age and sex structure of excess mortality from plague (Manfredini et al. 2002), and the post-crisis 'economic recovery' that went hand in hand with the 'demographic recovery' (Alfani 2010a). However, research on plague has been steady in other fields, like general and global history, history of medicine, and palaeo-biology/palaeo-demography (Cohn 2002(Cohn , 2010aHenderson 2003;Green 2015).
Regarding palaeo-biology, recent research has confirmed the presence of DNA traces of Yersinia pestis (the pathogen responsible for contemporary plague) in the dental pulp of skeletons coming from a range of European plague cemeteries (Haensch et al. 2010;Bos et al. 2011Bos et al. , 2016. However, the way in which the infection was transmitted remains somewhat mysterious: the rapidity with which the Black Death and many subsequent plague waves spread across Europe does not fit well with the traditional description of a complex process involving two vectors (rats and fleas), a fact which continues to puzzle palaeo-biologists and historical demographers alike (Bolton 2013;Raoult et al. 2013). Moreover, again on the grounds of skeletal sources, palaeo-demographers have proposed a number of tentative findings about relevant matters, such as the differential susceptibility to plague according to sex, age, and frailty (DeWitte and Wood 2008; DeWitte 2010; DeWitte and Hughes-Morey 2012) and the overall impact of plague prevalence on survival, living standards, and stature (Barbiera and Dalla Zuanna 2009;DeWitte 2014).
Skeletons, however, might not be the best source for answers to questions about the demographic and epidemiological characteristics of ancient plagues. More traditional sources of historical demography can be explored in novel ways to provide evidence relevant to the aforementioned debates, while avoiding many of the limitations common to studies based on archaeological data (Séguy and Buchet 2013). This paper makes use of the largest existing micro-demographic database for any European plague, which we have built from archival sources. The database covers the rural town of Nonantola during the 1630 plague epidemic, the last affecting the area and possibly the most severe after the Black Death, killing about 2 million people in Northern Italy alone (30-35 per cent of its overall population; Alfani 2013a).
We perform a statistical analysis of the survival time of all residents of Nonantola, describing the effect of some baseline covariates, including age, sex, and socio-economic status, while taking into account the possible household-induced association among survival times. We provide original findings about the factors influencing the risk of death, as well as some insights into the possible ways in which the contagion was transmitted.

Data and methods
Based on tax records, 3,451 people lived in Nonantola in 1629. Located in the Po Plain, not far from the city of Modena, Nonantola belonged to the Estense State. The vast majority of its population was involved in agricultural activities, a fact that is also reflected in the household structure and the settlement pattern. Apart from a main settlement where the communication routes converged, its territory comprised three hamlets physically removed from the main centre: Redù, Bagazzano, and Rubiara. Nonantola had two additional distinctive characteristics. First, it hosted a Benedictine abbey, which throughout the Middle Ages had played a key role in improving the quality of the land. Second, the community had access to sizeable commons (woods and meadows), parts of which have survived until today: a decidedly uncommon feature in the densely populated and intensively used North Italian plain. These peculiarities explain why Nonantola has been the object of a significant amount of research, especially about its commons (Debbia 1990;Alfani 2011a). From the historicaldemographic perspective, there have been some microanalyses of the local impact of the terrible 1590s famine (Alfani 2010b(Alfani , 2011b(Alfani , 2013b and earlier studies of the effects of the 1630 plague Cohn and Alfani 2007). These studies, however, adopted entirely different analytical methods and were based on a database not nearly as complete as the one used here.
Our data come from two main sources: parish books of baptisms and burials, and a salt tax register (Boccatico del sale). The Boccatico (dating from 1629, the year before the plague) is a fiscal source and was used to determine the salt tax due from each household, according to the number of 'heads' (individuals). The register provides us with the complete household structure of Nonantola's population, as divided into six main neighbourhoods: Castello and Borgo, which together constituted the main settlement (Castello was a smaller neighbourhood, encircled by medieval defensive walls; Borgo was a larger built-up area immediately beyond the northeastern walls); Panara, which was a sparse group of large rural houses located on the strip of land bordering the Panaro river; and finally, the three aforementioned hamlets of Redù, Bagazzano, and Rubiara (see Figure 1). The Boccatico also informs us about the ages of the very young and the old (as children under five and adults aged 60 and over were exempt from the salt tax, their ages were recorded) and about the title of the head of household.
The books of burials provide us with a list of people who died in Nonantola during the plague. Nonantola's burial records are exceptional in many respects. First, they are particularly well-preserved and start very early for a rural community-in 1574. Second, the priest noted the first plague victim (buried on 9 June 1630) with the annotation qui comincia il contaggio ('here starts the contagion'): a fairly uncommon kind of clarification, which he was not required to give. Third, from 1587 the age at death was also recorded: a truly exceptional feature. However, for unknown reasons, the registration of burials of children aged four or younger stopped almost entirely during 1600-31. Consequently, children under five are excluded from this study. During the plague, the number of people buried increased so much that we can safely treat them all as victims of the epidemic. Indeed, over the roughly six months of the epidemic, the number of recorded burials was about 25 times the average total for an entire 'normal' year, as seen in Figure 2. The figure also confirms that 9 June 1630 is a reliable date for the beginning of the epidemic, as in the four preceding weeks only three burials were celebrated.
Perhaps some under-registration of burials occurred, but we can presume that this problem is not serious, considering: (1) the overall high quality of the records; (2) the fact that the recording of burials never stopped during the plague; and (3) that the records seem to follow the expected pattern of rise and decline of the contagion closely. A potentially more serious issue is the possibility that part of the population fled the plague and therefore died elsewhere. Although we have no way to fully account for this problem, we have reason to believe that out-migration from Nonantola during the epidemic was at most very limited. First, we know from historical evidence and specific studies that the community was immediately quarantined. Nonantola was prepared for such an event, following a precautionary ducal decree aimed at preventing the spread of the plague within the State. Consequently, when the epidemic began in town, all the roads Figure 1 The territory of Nonantola, Italy, during the early modern period: main settlement and hamlets Source: Our elaboration from an eighteenth-century map preserved at the National Archive of Modena (Mappatorio estense, Serie generale, n. 50).
leading out of Nonantola were already blocked by armed guards and patrols controlled its boundaries, so leaving must have been extremely difficult. A plague ward was immediately established to take care of the infected (Sperandini 1997). Additionally, there is evidence that population flights at the very beginning of the epidemic in the Estense State, when some sort of movement was still possible, were in a different direction: from the city (Modena) to the countryside (Nonantola included).
We were able to link 503 out of the 760 burials (66.2 per cent) that took place between 9 June and 31 December 1630 to entries in the Boccatico. Some of the burials that we could not link relate to families of which there was no trace in earlier sources (75 burials at least, related to 16 distinct families), presumably refugees from Modena who were blocked in town by the quarantine (an appeal to the duke to remove the quarantine, dated 12 February 1631, explicitly mentioned the presence of refugees who wanted to return to their homes; Sperandini 1997). As we include in the analysis only the burials of Nonantola residents listed in the Boccatico, the presence of these extra families is of no consequence. Also, of no consequence for the analysis is the presence of the Benedictine abbey, as monks were not recorded in either the Boccatico or the parish books.
The parish books of burials provide us with information on the ages of the plague victims, while the Boccatico provides information on the ages of the very young and the old. To discover the exact ages of the individuals in the 5-59 age group who were spared by the plague, we looked forwards in time, collecting information about those who died between 1 January 1631 and 31 December 1700. We also looked backwards in time, for the dates of baptism of the people recorded in the Boccatico. We collected information on all the baptisms celebrated between 1 January 1615 and 1 January 1629, to which we added information about baptisms celebrated in the late sixteenth century from an earlier study dedicated to Nonantola's social networks (Alfani and Munno 2012). In total, we collected complete information on an additional 4,258 burials and 3,681 baptisms, to reconstruct the ages of the Nonantola residents who were spared by the plague. Note that the absence of information on the ages of plague survivors places serious constraints on all earlier works on the 1630 plague.
Our data set originally included all 3,451 individuals (from 625 households) who, in 1629, resided in one of the six neighbourhoods as recorded in the Boccatico. From this data set we removed 60 individuals who died between being recorded in 1629 and the beginning of the epidemic (up to 8 June 1630), and 341 individuals under the age of five (as such burials were not usually recorded). The final data set thus contains all of the 3,050 recorded individuals across 623 households who were: (1) exposed to the risk of dying of plague starting from 9 June and observed for the potential event until 31 December 1630; and (2) of an age to be recorded in the parish books of burials (also see section A1 of the supplementary material). All (and only) the survival times of the individuals still alive at the end of 1630 were right censored. For such individuals the observed censoring time was set equal to 206 days (i.e., the exact number of days from 9 June to 31 December inclusive).
For each individual, the following information was available: neighbourhood, household of residence, household size (i.e., number of people residing together), date of burial (if the event occurred in the time frame considered), and social status. The latter, categorized as high or low, was attributed based on the title denoting status (e.g., messere or 'gentleman') with which some household heads were recorded. All household members were given the same status as the head. Moreover, we had almost complete information on age and sex. Information on age was missing for 148 individuals and the sex of two individuals was unknown (for one of them, age was also unknown). Consequently, an additional 149 individuals were dropped from the regression analyses, but for completeness they are included in the descriptive analyses.
For the time-to-event analyses, we use the Cox proportional hazards (PH) regression model (Cox 1972). The hazard function h(t) is a function that describes the individual-specific instantaneous risk of experiencing the event of interest (here, burial) right after time t, given that one has not experienced the event up to time t. Such a function takes nonnegative values and it provides a complete description of the distribution of the time-to-event random variable (T ). Time is measured with respect to the specified origin, which in our case is 9 June 1630. From now on we will refer to the time-to-event (burial) distribution as the 'survival' distribution. Also, we will sometimes indicate the events of interest as 'deaths' even though they are really 'burials'. Deaths and burials can safely be assumed to coincide without much error (Del Panta and Rettaroli 1994). The Cox PH model is a semi-parametric model, where the non-parametric part is a baseline hazard function h 0 (t) with a set of k baseline covariates (X 1 , X 2 , … , X k ) that modify it multiplicatively. Parameters are estimated by maximization of the socalled partial likelihood function (Cox 1972(Cox , 1975. The predictors that we consider here are all baseline covariates. We performed graphical tests for the hypothesis of proportionality of the hazards functions across values of the covariates (results not shown) and, overall, we are comfortable with the support that they provide to our analyses.
We fit a series of models with increasing complexity. Specifically, we apply both the basic model just described and stratified versions of the same, which seems appropriate given the rather different patterns of burial distributions across the neighbourhoods. In addition, it seems important to analyse the survival times while taking into account their possible dependence within the same household. We apply a robust method for consistent estimation of the standard errors of the regression parameters, which makes no specific hypotheses about the dependence structure of the survival times (Lin and Wei 1989). We also explored a gamma frailty model, in which the association among failure times is modelled through a frailty term, that is, a random-effect term, shared by all members of a household (see, e.g., Vaupel et al. 1979).

Descriptive statistics and overall event rates
Out of a population of 3,050 individuals, 503 (16.5 per cent) died during the 1630 plague. The epidemic spread rapidly, peaking at Week 8 (between 28 July and 3 August) when 48 people died. Table 1 shows that plague struck with particular intensity in the main settlement (Borgo), where most of the trades and crafts of the town were clustered. In Castello, located similarly in the main settlement but possibly protected to a degree by the defensive walls encircling it, the percentage dying was nearly half that observed in Borgo.
The cumulative distribution function (CDF) of the death (burial) times over the follow-up period shows the progression of the infection by neighbourhood. Borgo experienced a sharp increase in the number of deaths between the 50th day (28 July) and the 75th day (22 August), and a slower but steady increase thereafter ( Figure 3). In Bagazzano the infection spread with a similar pattern, but here the initial outbreak occurred towards the end of August. In Castello and Panara the epidemic spread at a more constant rate; however, by the end of the period the proportion who had died was similar to that of Bagazzano. Redù and Rubiara experienced late outbreaks, with a slower spread of the infection, resulting in lower overall mortality.
The proportions of males and females who died were similar: 252 females out of 1,546 (16.3 per cent) and 251 males out of 1,502 (16.7 per cent). Regarding age, at the onset of the epidemic the average age for the individuals included in our data set was 27.4 years, whereas the median age was 23 years (children aged 0-4 are excluded).
A survival analysis of the last great European plagues 105 The average age did not vary significantly across neighbourhoods (p-value = 0.659). The greatest absolute number of deaths was reported among those aged 10-14, but the proportion of people of that age group who died was not particularly high ( Figure 4). The highest mortality can be observed among individuals aged around 45-59 years. Notice that the population age structure implicit in Figure  4(a) closely corresponds to Nonantola's population history pre-1630. The fact that there were fewer children aged 5-9 than 10-14 reflects the small cohorts born in the famine-stricken early 1620s (839 baptisms were celebrated in 1616-20 vs. just 762 in 1621-25). Similarly, the sharp drop in the size of the age groups 40-44 and higher reflects the fact that all those aged 41 or above in 1630 had had to survive the terrible famine of 1589-91 (moreover, the cohorts born during the famine, aged 39-41 in 1630,   Table 1. were exceptionally small in number). Instead, the relatively large size of the age groups 60-64 (and older) results mostly from the fact that the Boccatico gives us the ages of all those aged 60+ who were exempt from the salt tax. Consequently, linkage attrition is zero. Some age inflation may also have occurred, possibly fuelled by the fiscal exemptions given to those aged 60+. Finally, we cannot exclude that the individuals listed in the Boccatico and whose age is unknown (n = 149) came predominantly from the age groups 40-44 to 55-59; however, this would not change our main conclusions (see later, and section A1 of the supplementary material). A total of 207 individuals (6.8 per cent) were of high social status (Table 1). The residual 2,843 (93.2 per cent) fell in the low social status group, as expected in a pre-industrial rural community. The neighbourhood with the greatest prevalence of high-status individuals was Castello (46.5 per cent), followed by Borgo (16.7 per cent).
In Table 2 we provide additional household-level descriptive information. Households tended to be smaller in the main settlement (Borgo and Castello) than elsewhere: mean household size was 6.8 in Panara, 5.8 in Bagazzano, 5.3 in Redù, 4.8 in Rubiara, 4.3 in Borgo, and 3.0 in Castello (differences are statistically significant). Table 2 also reports on the observed proportions of households affected by plague. Among the 623 households, 224 (36.0 per cent) experienced at least one death.
Social status is a household-level variable, being defined here from the title of the head of the household. As seen in Table 2, the highest prevalences of high-status households were found in Borgo and Castello (13.2 and 38.7 per cent, respectively). Moreover, high status is associated with smaller household size. The average high-status household size was 5.3 compared with 5.5 for low-status households, and none of the high-status households consisted of more than eleven members, while 8 per cent of low-status households did (with up to 27 members). Household infection rates show that 40 per cent of high-status households experienced at least one event, a slightly higher proportion than for low-status households (35.6 per cent).

Results from the survival models
In the initial models, the conditional hazard function for the jth individual is equal to: where β is the vector of the coefficients associated with the vector X j , which contains the values of the k covariates for individual j, and h 0 (t) is the unspecified baseline hazard function. We included the covariates sex, household size, age, and social status in the models as modifiers of the hazard function. We allowed age to enter the model with both a linear and a quadratic term. To ease interpretation of the estimated effects, household size was included either as a numerical variable (models (1), (2), and (4)) or as a categorical variable after classification in five categories (1-3, 4-5, 6-7, 8-12, and >12 members). Indicators for the different neighbourhoods were also included (models (2) and (3)), with Panara being the baseline (reference) category.

Figure 4
Absolute frequencies (a) and proportions (b) of deaths and survivors by age group, Nonantola, 1630 Source: As for Table 1. Table 3 shows the results obtained from the estimation procedure applied to three such models (with 95 per cent confidence intervals for the individual hazard ratios), obtained by clustering the individuals by household (the original parameters are reported in section A4 of the supplementary material). We use robust estimation, since failure (death) of one individual might be linked to an event experienced by another member of the same household, meaning that death times within a group (cluster) are likely to be dependent. This fact was also confirmed by fitting a multiplicative  gamma shared frailty model that included sex, household size, age, and social status, as well as neighbourhood modifiers (results not shown). In that model, the standard deviation of the frailty term was found to be significantly greater than zero (1.45; p-value < 0.001), thus confirming the presence of dependence among survival times within the household structure. The sex effect is not found to be statistically significant in any of the models in Table 3, so the epidemic apparently struck men and women equally. This is true also if we focus on people in the age group 17-35, which, for women, is characterized by intense childbearing, as discussed later. Instead, the size of the household matters when we adjust for a neighbourhood effect (models (2) and (3)), but its effect seems to be relevant only for households with more than seven members (baseline = 1-3 members). For example, living in a household with more than twelve members increases the hazard by 68 per cent when compared with the baseline group (model (3)). Living in Borgo significantly increases the risk, by more than 2.4 times when compared with Panara (the baseline neighbourhood) in both models (2) and (3), while residing in Redù and Rubiara seems to offer some protection from the risk of death. High social status fails to have a significant effect on the hazard in any of the models.
The hazard ratios of both age and age squared are significant at the 0.01 level in all three models. Figure 5 shows the estimated hazard functions from model (2) for different ages. The highest hazards are observed for baseline ages between 40 and 65, with a peak in all the (proportional) hazard functions at about 60 days from the beginning of the epidemic.
Note that clustering the observations by household does not change the point estimates of the model parameter; it only affects the estimated variance-covariance matrix of the parameter estimator. Thus, this option can only produce a change in the statistical significance of the estimated coefficients. We have also performed these same analyses (not shown) with non-robust estimation of the variance-covariance matrix, and both social status and household size were (incorrectly) detected as being statistically significant at the 0.05 level when using model (1). This is interesting because it indicates that the robust standard errors better reflect the sampling uncertainty, while suggesting and taking into account the dependence of survival times within households.
In models (2) and (3) the hazard functions for the different neighbourhoods are assumed to be proportional to one another for any given value of the covariates. However, we know from Figure 3 that different neighbourhoods were struck by the plague with different timings and intensities. Consequently, we next extend the models to their stratified versions by neighbourhood (models (4), (5), and (6)). This Figure 5 Model-based estimated hazard functions for burial/death, for specific ages (other covariates set to zero), from model (2), Nonantola, 1630 Note: Day 0 corresponds to the start of the plague epidemic (9 June 1630). Source: Statistical estimates based on the same historical sources indicated for Table 1. allows each neighbourhood to have a different, yet still unspecified, baseline hazard function, while borrowing strength from the whole data set for the estimation of the parameters of the regression part of the survival model.
The conditional hazard function for a subject in neighbourhood N is: with the β vector parameter common across neighbourhoods and no neighbourhood indicators in X. We still implement the robust estimation of the parameter's variance-covariance matrix by clustering the individuals by household. Table 4 shows the results obtained from fitting two different versions of this model, in which household size is included either as a numerical variable (model (4)) or through four indicator functions for the five categories (model (5)). In model (6) we treat age as categorical rather than continuous. Apart from some relatively small changes in the values of the covariate effects, the qualitative results do not vary much from the unstratified models (1) to (3). We recommend interpreting the more robust stratified models because of the clearly different neighbourhood-specific baseline hazard functions, thus allowing for a more flexible estimation of the survival distribution within neighbourhoods ( Figure 6).
In model (6) we also include four interaction terms between age groups and sex. We performed a likelihood ratio test for the four interaction terms all being equal to zero. The p-value was 0.0463, suggesting that the overall effect of the four interaction terms is borderline significant. However, the effect is due essentially to the 40-59 age group. More importantly, the likelihood ratio test can only be performed without using robust standard errors (as it is not valid otherwise) and, as a consequence, its results should be interpreted conservatively when data are clustered. We also explored the possibility of a global differential age effect on mortality by sex by adding two interaction terms to models (4) and (5), with age and age squared (not shown). The tests did not reject the null hypothesis that both interaction terms have coefficients equal to zero (p = 0.49 and 0.48 for the two models, respectively). Adjusted neighbourhood-specific hazard functions, which more closely follow the mortality patterns of the different neighbourhoods, can now be estimated. Figure 6(a) shows the estimated baseline hazard functions for the different neighbourhoods when setting all covariates to zero. Figure 6(b) shows the corresponding estimated neighbourhoodspecific cumulative baseline hazard functions Λ 0, N (t), that is, the baseline hazard functions Λ 0,N (t) integrated from zero until the argument t. The differential intensity of the death process across the different neighbourhoods is clear.
The stratified models (4) and (5) allow for a common age effect to be quantified on the complete data set, while taking into account different baseline trends for mortality across different neighbourhoods. Such a pooling of the age effects across neighbourhoods is justified by the results that we obtain by fitting individual neighbourhood-specific models involving sex, age, and household size. Figure 7 shows the age modifier exp(β Age,1 Age + β Age,2 Age 2 ) in such models, which now also allow for the effects of household size and sex to vary across neighbourhoods (social status was excluded from neighbourhood-specific models due to the scant nature of that variable).
These models confirm rather consistent effects of age in Bagazzano, Borgo, and Redù. As a consequence, pooling the information across neighbourhoods in model (4) seems convincing and produces the overall age effect represented in Figure 8. The trend and the values of age group effects in model (6) also support the smooth shape shown in Figure 8.
As a final consideration, note that the shape of the overall age effect is not the result of known faults in our database. In particular: . It is not the result of a greater prevalence of younger individuals among the unlinked deaths, as the age structure of the unlinked deaths follows that of the linked deaths quite closely (see Figure A1 in the supplementary material). . Neither is it the result of under-representation of adults aged 40-59 in the complete database (which might lead to artificially higher mortality rates). To test this, we calculated age-specific mortality rates for the complete database (not reported) and compared them with those resulting from an 'extreme' hypothesis according to which all 149 individuals listed in the Boccatico whose age was unknown were attributed proportionally to the age groups 40-44 to 55-59. Even under this extreme hypothesis, mortality rates peaked around age 50-54 and then declined (consistent with the shape in Figure 8).

Discussion
Our findings are relevant for many ongoing debates about medieval and early modern plague. A first aspect to consider is the effects of sex and age on plague mortality. In a seminal paper based on the London 'Bills of Mortality' for the 1603 and 1625 Figure 6 Predicted neighbourhood-specific baseline hazard functions (a) and cumulative baseline hazard functions (b) for burial/death, from model (4), Nonantola, 1630 Notes: Day 0 corresponds to the start of the plague epidemic (9 June 1630). From these curves, using the relationship S(t|N, X ) = exp(−Λ(t|N, X )) = exp(−Λ 0,N (t)exp(X T β)), which expresses for each neighbourhood (N) the survival function S(t|N, X ) in terms of the cumulative baseline hazard function and the linear term X T β, the survival probabilities can be obtained for any value of the covariates. See section A5 of the supplementary material for an example. Source: Statistical estimates based on the same historical sources indicated for Table 1. plagues, Hollingsworth and Hollingsworth (1971, p. 144) concluded that it was 'very likely that the men were much more affected by plague than the women'. In contrast, a study of the six plagues occurring in Milan from 1452 to 1523 showed a marked prevalence of women among the victims, although this might have resulted from the concentration of immigrant women and poor widows in the overcrowded neighbourhoods that tended to be affected with particular severity by epidemics of any kind . However, the clearly prevalent view in the literature is that plagues affected both sexes in about the same way. This was the case for the famous plagues of Colyton, Eyam, and Penrith in sixteenth-and seventeenth-century England (Bradley 1977;Schofield 1977;Scott and Duncan 2001;Whittles and Didelot 2016), as well as for Mantua (Italy) in 1575-76 (Del Panta 1980). Additionally, skeletal sources have been used to assess whether sex affected an individual's risk of death during the fourteenth-century Black Death, but no significant effects have been found (DeWitte 2009). Overall, our findings for Nonantola are consistent with such a conclusion, as in all models the sex variable was not found to affect mortality significantly. Other studies of the 1630 plague in Italy have Figure 7 Age effects from fitting model (4) separately on each neighbourhood, Nonantola, 1630 Notes: The coefficients for the linear and the quadratic terms in age are both significant for Bagazzano, Borgo, and Redù. The linear term only is significant for Panara, and for Castello and Rubiara neither the linear nor the quadratic term is significant. Note that the original parameter estimates for β Age,1 and β Age,2 can be obtained by taking the natural logarithm of the corresponding hazard ratios (the parameters are also reported in section A5 of the supplementary material). Source: Statistical estimates based on the same historical sources indicated for Table 1.
Regarding the effect of age on mortality, our study adds significant nuance to the conventional wisdom, as it includes a reconstruction of the complete age structure of the population exposed to the risk of dying of the plague-not just the age structure of the deceased, as is usually the case in the literature. This led to somewhat surprising results. In fact, an earlier study on Nonantola that analysed excess mortality by age group concluded that the plague affected the group aged 11-20 with the greatest intensity, followed at a distance by those aged 21-30  and section A2 of the supplementary material). Such findings are in agreement with the aforementioned studies of the 1630 plague that also analysed excess mortality (Abrate 1972;Manfredini et al. 2002), as well as with a study of the 1575-76 plague in Mantua (Belfanti 1981). Our study demonstrates that: (1) the aforementioned literature detected correctly the fact that the young (<10 years of age) were less affected by the plague; but that (2) the risk of death increased with age, peaking much later than earlier believed, that is, between ages 40 and 60 (Table 4 and Figure 8). As our data are more complete and precise than those used by other studies, it seems reasonable to conclude that we have detected a recurrent distortion in earlier analyses, determined by the kind of information they used.
The same problem could have occurred in studies relating to other areas of Europe, which also tend to suggest that plague struck the young preferentially from the fifteenth century onwards. Schofield (1977) tried to overcome the limitations implicit in analyses of excess mortality by comparing the actual number of plague victims to model populations obtained from stable population tables. He compared London (St Botolph), Eyam, and Colyton and found that the risk of dying of plague seemed to decline with age-contrary to what had been suggested by earlier works on excess mortality from plague, like Hollingsworth and Hollingsworth (1971). Schofield (1977), however, admitted that '[the] simple method we have employed … makes a number of assumptions that may or may not be true in reality. Our confidence in the validity of our conclusions would therefore be increased if we could show that a direct calculation of age-specific mortality during the epidemic produces essentially the same results' (p. 115). Our study provides exactly this kind of direct calculation of age-specific mortality, but unfortunately our results are in clear contrast with Schofield's (although not for the very old ages). We know of only one other study where age-specific mortality rates were computed from real data: the work of Séguy et al. (2006) on Martigues in Provence (France), during the so-called 'Great Plague of Marseille' of 1720-21. These authors' results resemble our own quite closely as, overall, plague mortality rates increased with age. However, they did not report a decline in the risk of dying of plague at the older ages; this may be partly due to differences in the age classifications used (they clustered together all deaths occurring at ages 70+).
Additional evidence that the very old were less susceptible to dying of plague comes from plague treatises of the sixteenth and seventeenth centuries. Doctors suggested that old and 'dry' bodies were less susceptible to the disease-indeed, in sixteenthcentury Venice, older people were elected as officers of the plague wards due to their presumed resistance to the disease . Another hypothesis is that the reduced impact of plague mortality on the older population resulted from the selection process that had seen an exceptionally severe famine affecting the area 35-40 years earlier (Alfani 2011b(Alfani , 2013b. This may have left only the stronger individuals, who constituted the older population in 1630. Such stronger individuals may also have been more resistant to plague. The age pattern of mortality may also have been influenced by age-specific behaviours. It seems plausible that middle-aged adults, especially women, were more likely to visit other households and help with nursing care, whereas the older and maybe also the younger adults stayed inside more, looking after infants and children, thus experiencing a reduced risk of contracting the disease outside their household. We found some empirical support for this, as the interaction term between sex (with female as the reference category) and the age group 40-59 in Table 4 shows a statistically significant greater risk of dying for women. As a final point, note that early modern plagues differed from the Black Death, which reportedly affected all age groups in a similar way (Biraben 1975;Scott and Duncan 2001;Alfani 2009).
A different perspective on the relationship between age, sex, and plague mortality is provided by studies of frailty based on skeletal sources, which have concluded that plague (in particular, the Black Death) was probably selective with respect to pre-existing health conditions, more easily killing people who had been in poor health since before the epidemic. These 'frail' people were identified by a variety of skeletal lesions (DeWitte and Wood 2008) or by shorter-than-average stature (DeWitte and Hughes-Morey 2012). As the probability of being in poor health increases with age, this might help to explain the rising part of our age modifier curve (Figure 8). Note that this may be different when one looks at older subjects who are still alive at a given age, due to selection (see earlier). Frailty has also been studied with respect to sex, as in modern populations women are favoured in terms of morbidity and survival. The results of such studies are, however, unclear-plague excess mortality of frail men was found to be greater than that of frail women, but this could have resulted from: (1) a biologically determined greater resistance of women; or (2) a greater ability of plague to kill healthy women than healthy men (DeWitte 2010).
We could not test for frailty in a way similar to the aforementioned literature, due to the lack of information about pre-epidemic health status. However, women aged 17-35 were more likely than younger or older women to be pregnant at the time of the epidemic (mean age at first marriage was 17 for women in the years immediately preceding the plague), and earlier works have suggested that this might explain some of the reported excess mortality for young women compared with men . Particularly high mortality among pregnant women has also been reported for some English plagues (Scott and Duncan 2001). We tested for a possible higher risk of dying of plague for young women aged 17-35 but the results were not significant. However, it is possible that the models did not have sufficient power to identify such interactions.
We also analysed the risk of death according to socio-economic status. The historical literature on Italian and European late medieval and early modern plagues seems to have clearly established that, in contrast to the Black Death and other early plague waves, the disease acquired a social character from the fifteenth century, striking the poor preferentially (Slack 1985;Alfani 2009Alfani , 2013bCohn 2010b). This has been confirmed by recent studies of specific plague outbreaks (Galanaud et al. 2015;Whittles and Didelot 2016). Doctors of the early modern period were of the same view (Cohn 2010a). According to Pietro Parisi, the Sicilian author of a famous plague tract in 1593, 'it is absolutely true that for poverty and privation, … the sticky illness, that is the contagion, which easily passes from one person to another, damages commoners more than the nobility' (cited from Alfani 2013b, p. 104). However, the Italian seventeenth-century epidemics are the notable exception to this, since the infection then spread so efficiently that the social and economic elite were also severely affected (Alfani 2013a). In Venice, 17 per cent of the members of the Great Council were killed by plague in 1630, while in Genoa, 40 per cent of the members of the Great and Low Councils died in 1656-57 (Pullan 1992). In this paper, we tested for a possible impact of high vs. low social status, but this factor proved non-significant in all models.
This provides confirmation of the exceptional character of the 1630 plague, which was so severe and widespread that it acquired the characteristics of a 'universal contagion', similar to the Black Death. Our findings also provide additional support for recent literature that suggests that the 1630 epidemic, unlike other seventeenth-century European plagues, had a deep and negative impact on the areas it affected and particularly on Italy-one possible reason being the particularly severe loss of human capital resulting from non-selectivity by socio-economic status (Alfani 2013a;Alfani and Percoco 2018).
However, socio-economic status could have affected the individual risk of death through the size of the household, as: (1) high-status individuals belonged to smaller households than peasants (as was usual in pre-industrial Europe); (2) no highstatus household contained more than ten members; and (3) exceptionally large households (>12 members) experienced a 68 per cent increase in the hazard compared with small households (<4 members) (Table 3, model (3)). Another way of looking at this is to consider how the specific environment of residence may be connected to the individual risk of death. For London in 1560-1665, Cummins et al. (2016) reported a lower susceptibility to plague among the richest parishes and suggested that this was due to higher living standards. Additionally, plague outbreaks usually started in the poorest parts of the city, outside the walls. The same was found by Cohn and Alfani (2007) for Milan in 1523. For Dijon in 1400 and 1428, Galanaud et al. (2015) found that plague had more effect on the richest parts of the city, where most commercial and craft activities were concentrated (as they argue, 'In a city where garbage collection was left to the inhabitants, the proximity of food stocks, crop market, second-hand clothing dealers … and/or an open area used for slaughter … created conditions suitable for rats to pullulate and favored disease transmission by fleas' (p. 20)).
In Nonantola too, we found that the plague started in the richest part of the town, Borgo, where all communication routes converged. In this specific setting, the physical separation of the peasant hamlets from the main settlement provided some relative protection to at least part of the poor population: as our stratified analysis suggests, areas affected later experienced a lower final plague intensity ( Figure  6). This was presumably the consequence of a combination of factors: first, the arrival of winter, as we know empirically that medieval and early modern plagues tended to wane in the coldest months, possibly due to lower flea activity, reduced human interaction, or both (Cohn 2002;Alfani and Cohn 2007;Whittles and Didelot 2016); and second, the sparser settlement pattern of the hamlets compared with Borgo, and maybe even the actions taken to try to contain the spread of the epidemic from the town centre to the periphery. All in all, the distinctive layout of the town may have limited the possible advantage provided by better living standards to high-status individuals only-an advantage that in the 1630 plague was, however, probably very limited to begin with, as discussed earlier.
Our findings about the higher risk of death experienced by people living in larger households are also relevant to the long-lasting debate concerning the nature of the disease and its method of transmission. As is well known, the Yersinia pestis bacillus responsible for contemporary plague does not easily spread from human to human but spreads mostly through vectors (rats and rat fleas). This seems too complex a mechanism to account for the speed and effectiveness with which medieval and early modern plague spread, as has been suggested by historians (Cohn 2002(Cohn , 2008Bolton 2013;Alfani 2013b) and as is openly recognized also by the palaeo-biologists who identified DNA markers of Yersinia pestis in medieval plague cemeteries (Raoult et al. 2013). To solve the riddle, simpler ways of transmission have often been put forward, either through a different vector (a variety of human fleas instead of rat fleas ;Biraben 1975;Del Panta 1980;Audoin-Rouzeau 2007) or borne by air, although the latter hypothesis is dubious as a general explanation, given that pneumonic plague has demonstrated an ability to spread effectively only under very specific environmental conditions (Cohn 2002(Cohn , 2008Raoult et al. 2013). Notice that both airborne transmission and transmission through human parasites can be considered 'human-to-human', as already argued by Biraben (1975) and recently by Whittles and Didelot (2016). While it is beyond the aims of this paper to provide an answer to such a complex and debated question as the means of plague transmission, our findings about household size and hazard of death nevertheless seem to be compatible with the idea that plague may indeed have been able to spread directly from human to human.
In the case of a disease transmitted from human to human, the individual risk of contracting the disease would be directly related to household size, as the risk would be the cumulative result of: (1) the risk of all household members catching the disease outside the home; and (2) the risk of catching it from an infected household member. Indeed, under the somewhat simplistic assumption that each household member has the same probability of catching the disease outside the home, each additional household member increases the risk of the other household members becoming infected. Our findings clearly suggest that individuals living in large households (>7 members) experienced a risk of dying of plague that increased steeply with household size, compatible with the hypothesis of human-to-human transmission. This does not exclude the possibility that, during the same epidemic, the 'traditional' transmission route of rats to rat fleas to humans also occurred. A recent analysis of the Eyam plague of 1665-66 using stochastic models estimated that 73 per cent of infections may have been caused by human-to-human transmission, while just 27 per cent involved rodents (Whittles and Didelot 2016). Finally, an earlier study of Nonantola focused on the time interval between deaths within each household and found that most of them happened very close to each other: in 14 per cent of cases on the same day and in 40 per cent of cases within three days (see illustrative examples in section A3 of the supplementary material). Even greater time clustering of within-household deaths was reported for plague outbreaks in Milan from 1452 to 1523 . This might signal transmission of pneumonic plague, whose incubation period is significantly shorter than that of bubonic plague (1-3 days instead of 2-6). However, the environmental conditions in which the 1630 plague occurred do not fit those considered typical for pneumonic plague. Consequently, we need to be cautious in making assumptions about which kind of human-tohuman transmission might have occurred.

Conclusion
By using a new individual-level data set of a unique kind, this paper has developed the first detailed survival analysis of a large-scale mortality crisis caused by plague. We found that the risk of dying of plague was similar for the two sexes. The age effect followed a flat inverted U curve, with the risk of A survival analysis of the last great European plagues 115 dying peaking at ages 40-59. Socio-economic status did not affect survival significantly, although it could have played a role through household size. Members of large households (usually peasant and low-status) experienced a much higher risk of dying of plague. Stratification of the analysis by neighbourhood allowed us to highlight differences in the baseline hazard functions and to explore, to some degree, the spatial diffusion of the plague. This showed that the population residing in Borgo, where the epidemic started, was at greater risk than neighbourhoods infected later. Finally, some of our findings, particularly those related to the impact of the size of the household of residence on the individual risk of death, constitute evidence in support of the idea that medieval and early modern plague was able to spread from human to human.
The historical-demographic techniques that we used to build the data set and the bio-demographic methods that we developed could be replicated in other studies of European plagues. This would allow scholars to produce comparable and detailed information that could be vital for finding the hitherto elusive answers to crucial questions about the disease; in particular, how plague was able to cause the greatest mortality crises ever reported in human history.