Modelling determinants, impact, and space–time risk of age-specific mortality in rural South Africa: integrating methods to enhance policy relevance

Background There is a lack of reliable data in developing countries to inform policy and optimise resource allocation. Health and socio-demographic surveillance sites (HDSS) have the potential to address this gap. Mortality levels and trends have previously been documented in rural South Africa. However, complex space–time clustering of mortality, determinants, and their impact has not been fully examined. Objectives To integrate advanced methods enhance the understanding of the dynamics of mortality in space–time, to identify mortality risk factors and population attributable impact, to relate disparities in risk factor distributions to spatial mortality risk, and thus, to improve policy planning and resource allocation. Methods Agincourt HDSS supplied data for the period 1992–2008. Advanced spatial techniques were used to identify significant age-specific mortality ‘hotspots’ in space–time. Multivariable Bayesian models were used to assess the effects of the most significant covariates on mortality. Disparities in risk factor profiles in identified hotspots were assessed. Results Increasing HIV-related mortality and a subsequent decrease possibly attributable to antiretroviral therapy introduction are evident in this rural population. Distinct space–time clustering and variation (even in a small geographic area) of mortality were observed. Several known and novel risk factors were identified, and population impact was quantified. Significant differences in the risk factor profiles of the identified ‘hotspots’ included ethnicity; maternal, partner, and household deaths; household head demographics; migrancy; education; and poverty. Conclusions A complex interaction of highly attributable multilevel factors continues to demonstrate differential space–time influences on mortality risk (especially for HIV). High-risk households and villages displayed differential risk factor profiles. This integrated approach could prove valuable to decision makers. Tailored interventions for specific child and adult high-risk mortality areas are needed, such as preventing vertical transmission, ensuring maternal survival, and improving water and sanitation infrastructure. This framework can be applied in other settings within the region.

or reliable in developing countries, including those in sub-Saharan Africa (3). In many instances, health and sociodemographic surveillance systems (HDSS), though not representative at the national level, are often the only means to assess and more clearly understand population levels, trends, and determinants on a prospective basis (4,5).
Recent advances in data availability and analytic methods have created new opportunities to improve the analysis and modelling of diseases on a local, national, or regional basis (6,7). Spatial analysis and, for example, Bayesian geostatistical modelling are powerful and statistically robust tools for identifying high-mortality areas in a heterogeneous and imperfectly known environment and associated determinants (6,8). An increasing body of literature on spatial analysis of health outcomes in developing countries has been motivated by the availability of geo-referenced data and by the recent advances in methods and software that can implement such complex models (7,9). The identification of geographical clusters of high-risk mortality is an important policy issue that has received limited attention, especially the ability to identify individuals, households, and villages at elevated risk. This study contributes to other literature that investigates mortality and its risk factors that are important from a public health perspective (10). The study also provides guidance regarding the distribution of health services and other spatially-targeted interventions for disease control, mortality reduction, and resource allocation in rural South Africa and has application to broader sub-Saharan Africa.
Addressing health inequities in populations is a major challenge (11), and research that documents and quantifies inequities is needed to inform policies to close health gaps in the developing world. Evidence on reducing inequities within countries is growing. Successful approaches include those that improve geographic access to health interventions in poor communities, subsidize health care and health inputs for the poor, and empower poorer communities (12).
This study aims to describe and develop a framework that captures the spaceÁtime dynamics and determinants of age-specific mortality in rural South Africa.

Study area and population
The Agincourt HDSS is located in a sub-district in northeast South Africa (Fig. 1). There was a baseline census in 1992 that collected data on all individuals and households in the population (13). This has been followed by annual updates of births, deaths, and in-and out-migrations. It is a poor rural sub-district that includes former Mozambican refugees, temporary migrant workers, and a more stable permanent population (13). The site at present covers an area of about 400 km 2 and contains 25 villages, 13,500 households, and 84,000 individuals. There is a full geographic information system (GIS), containing locations of all households within the site, which is updated annually. A household is defined as a group of people who reside and eat together, plus the linked temporary migrants who would eat with them on return. Verbal autopsies (VAs), a method of determining individuals' causes of death in populations without a complete vital registration system, were introduced in 1993. A full VA is conducted on every death recorded during the annual census update and is administered to the closest caregiver of the deceased by a trained fieldworker (14). Three medical practitioners assess VAs to determine likely cause of death. Causes of death (main, immediate, and/or contributing) are coded to be consistent with the International Classification of Diseases (ICD-10). The main cause of death was used in these analyses.
The study population comprised infants (B1 year), children (1Á4 years), young adults (15Á49 years), and older adults (50Á64 years) in the original 21 villages for the period 1992Á2008. Children aged 5Á14 were not included in the more detailed analyses given their very low mortality rate (B1 death per 1,000 person years) and corresponding absolute number of deaths. Data from four new villages added to the site since 2007 were also not included in the analysis as they contributed minimal data to the study period.

Outcome and explanatory variables
The dichotomous age-specific mortality outcomes were defined as follows: . Infant Á mortality within the first 365.25 days of life . Children Á mortality between 1 and 4 years of age . Young to middle-age adults Á mortality between 15 and 49 years of age . Older adults Á mortality between 50 and 64 years of age Person time was defined as time (in years) contributed by an individual during the study period until right censoring (0) or death (1). The time to right censoring was set to either the date of permanent out-migration during the study period or as 31 December 2008 if the individual was present and alive. Demographics (gender, nationality), time period, season, maternal factors (former refugee status, age at pregnancy, death of mother during their offsprings infancy or childhood, education) and fertility factors (parity, birth intervals, sibling death), household factors (size mortality experience, household head demographics, socio-economic status based on household assets, food security), health seeking (distance to nearest health facility, antenatal clinic attendance), migration patterns, and household elevation (climatic proxy) were included as explanatory variables. Household socioeconomic status (SES) was based on living conditions, assets and services including building materials of main dwelling, water and energy supply, ownership of modern appliances and livestock, and means of transport. These assets were used to construct an SES index using a multivariate statistical technique for categorical data, namely multiple correspondence analysis (MCA) (15).

Risk factor analysis
A preliminary bivariate risk factor analysis was conducted to assess the relationship between mortality and each covariate. Covariates significant at the 10% level were then incorporated into the multivariable model. Given the inherent spatial and temporal correlation of longitudinal HDSS data, problems arise when using standard statistical methods as they assume independence of outcome measures (e.g. mortality). Objects in close proximity are often more alike, and common exposures (measured or unmeasured) may influence adult mortality similarly in households of the same geographical area, introducing spatial correlation in mortality outcomes. Including the spatial effect of proximity is important for an efficient estimation of parameters and prediction (16). Ignoring this correlation introduces bias in the risk factor analysis as the standard error of the covariates is underestimated, thereby overestimating the significance of the risk factors. Geostatistical models relax the assumption of independence and assume that spatial correlation is a function of distance between locations. They are highly parameterised models, and their full estimation has only become possible in the last decade by formulating them within a Bayesian framework (17) and estimating the parameters via Markov chain Monte Carlo (MCMC) simulation. With the development of MCMC methods and software such as WinBUGS (18), Bayesian approaches are being applied to the analysis of many social and health problems in addition to disease mapping and modelling or kriging (19). Thus, Bayesian geostatistical multivariable models are needed to analyse longitudinal data in order to address these problems.
Different analytical dataset structures were used for the various age groups. This dictated the corresponding modelling approach to examine the multivariable association between the significant covariates and age-specific mortality. For infants, a negative binomial model (selected due to over dispersion) was used with an offset of time in days contributed in the first year given their higher risk earlier on. For children (1Á4 years), a monthly discrete time logistic or event history approach was used to track any changes of selected covariates in the given intervals. A monthly time interval was used as it was a better approximation of the risk than using a yearly interval. For the adult models, a continuous time to event or survival approach (Weibull parametric model) was adopted that split episodes of time for any relevant changes in selected covariates, for example, change of location or household. For a detailed comparison of the strengths and weaknesses of each analytical approach, please see Appendix 1. Details of the infant and child statistical models and selected results have been published previously (20,21). However, the infant results in this paper include important additional variables (mother death due to HIV or non-HIV, breastfeeding) not used previously. A spatial random effect at the village level was included to take account of spatial correlation and was modelled using a multivariate Gaussian distribution with a covariance matrix expressed as a parametric function of the distance between pairs of village centroids points (17). Furthermore, an unstructured household-level random effect was included to take into account repeated household observations where time episodes were split to incorporate any time varying issues such as change of household physical location. MCMC simulation (22) was employed to estimate the model parameters. Detailed formulation of the models as well as the WinBUGS codes to implement each can be found in Appendices 2 and 3, respectively.

Model assessment
Model comparison in STATA was based on the Akaike information criterion (AIC). The deviance information criterion (DIC) was used to assess the various Bayesian multivariable models (23). Both the AIC/DIC are a measure of the relative goodness of fit of a statistical model. Generally, the smaller the AIC/DIC, the better the model fit.

Spatial analysis
The Kulldorff spatial scan statistic (24) was used to identify significant spatial clustering of mortality. Simulation-based Bayesian Poisson kriging (25) was also used to produce smoothed maps of all-cause mortality risk within the whole HDSS area. All-cause and cause-specific baseline models used included a constant and spatial random effect only. All identifying features (such as village centroids, boundaries) have been removed from the maps to ensure confidentiality and avoid stigmatisation of potentially high-risk villages. The HIV and tuberculosis mortality risk map is also not shown for the abovementioned reason. Model spatial estimates were exponentiated to give relative risk (RR). Risk maps were developed using a heat scale of the location specific RR prediction. Darker areas reflect increasingly higher RR, while increasingly lighter areas indicate lowering RR. A simple map showing potentially high-risk areas as a function of straight-line distance to nearest health facility was constructed using a circular buffer zone around health facilities based on significant cut-off found in the risk factor analysis.

Software
Data were extracted from the Agincourt database using Microsoft SQL Server. The analysis was carried out in STATA version 10.0 SE (26) and WinBUGS (18). Risk maps were constructed in MapInfo Professional version 9.5.

Results
Demographic and mortality profile The demographic and mortality profile of the study samples are provided in Table 1. Overall 9,035 deaths occurred during 1992Á2008, based on 1,110,166 personyear time contributed, at an overall crude mortality rate of 8.1 per 1,000 person-years. The highest mortality rates occurred among infants followed by the older adult (50Á 64 years) age group (29 and 19 per 1,000 person-years, respectively). The mortality rate among children and younger adults (15Á49 years) was similar at 5.7 and 6.9 per 1,000 person-years, respectively. Among infants 216 deaths occurred during the perinatal 1 period and 251 in the neonatal 2 period, that is, the majority occurred in the perinatal or early neonatal phase. The overall perinatal and neonatal mortality rates were 7.6 and 8.8 per 1,000 person-years, respectively. The lowest mortality rates were observed in the 5Á14 year age group. Among adults (15Á 64 years), mortality rates showed a steady increase by 5year grouping with a non-linear excess in the 30Á34 and 35Á39 age groups due to HIV/AIDS. Leading causes of death by age group, 1992 Á 2008 The leading cause-of-death in all age groups (Table 2) was HIV/TB. Among children, the second most prominent cause-of-death was diarrhoea or malnutrition. Among younger adults (15Á49 years), external cases of death, namely assault and transport accidents, featured as the second and third top causes-of-death, with lifestyle-related diseases following. In the older adult age group (50Á64 years) following HIV/TB, chronic noncommunicable diseases featured prominently.
Temporal trends in age-specific mortality A significant increase in all age group mortality rates was observed over the study period (Fig. 2), especially due to the impact of the HIV epidemic from the late 1990s to mid-2000s. All-cause mortality began to plateau around 2004 following rollout of the antiretroviral therapy (ART) programme in 2003 and reduction of HIV-related mortality. The temporal trend terms included in each multivariable model confirmed the significant increase in mortality across all age-groups over the study period both linearly and by period (Table 3).
Major risk factors for age-specific mortality Infants: maternal death in the infant's first year (especially due to HIV/TB), higher number of cumulative household deaths, no breastfeeding, and previous birth interval less than 1 year emerged as highly significant risk factors for all-cause infant mortality (Table 3). Mother being a migrant remained significantly protective. Male gender, increasing parity, and death of previous child were no longer significant risk factors following multivariable adjustment. No significant association was observed between infant mortality and household SES, increasing distance to nearest health facility and climate (using elevation as a proxy which corresponds to the rainfall gradient in the sub-district).
Children: maternal death between the child's first and fifth birthdays, particularly due to HIV/TB, was the most prominent risk factor from the multivariable analysis (Table 3), followed by father death due HIV/TB, four or more children aged less than 5 years living within the household, Mozambican origin of the mother, and winter season. Increasing age of the child remained highly protective. No significant association was observed between mortality risk and increased distance to nearest health facility. In contrast to infants, however, a significant and increasing trend of protective association 1 Perinatal period: last period of gestation up to first 7 days of life. was observed with increasing household SES based on the bivariate analysis.
Adults: The most prominent risks for 15Á49 year mortality following multivariable adjustment were male gender, being a migrant, increasing number of other household deaths, household head death, and distance to nearest health facility ( 6 km) ( Table 3). Increasing wealth of household, household head being male, and older than 40 years were significant and prominent protective factors. Villages with a mortality proportion of HIV/TB above the median value remained at a significantly higher risk. Mozambican ethnicity and education were no longer significant after multivariable adjustment.
The most prominent risks for 50Á64 year mortality following multivariable adjustment were male gender, being a migrant, and death of household head (Table 3). Households headed by older males again reduced older adult mortality risk. Mozambicans appeared to have significantly lower risk in this age group when compared to South Africans. In contrast to the findings for younger adults, following multivariable adjustment in the 50Á64 year model, distance to nearest health facility (6 km) and household SES were no longer significant risk factors.
Based on the risk prediction for straight-line distance to health facility, we can see that two villages in particular, one in the upper and the other in the lower south-east region, appear to have a higher mortality risk as a function of increased distance to the nearest local clinic in the Agincourt sub-district (Fig. 3). We also observe that there are other villages that appear to be far from the nearest health facility.
Spatial distribution of age-specific mortality Spatial risk estimates based on a Bayesian kriging model suggest a higher risk of infant mortality on the eastern border of the site while child mortality was concentred  in two distinct foci: upper central and south-east corner of the sub-district. Five distinct foci of higher mortality in the 15Á49 year age-group were observed using Bayesian kriging (Fig. 4). Three are in the central to upper central region of the site and two in the south east. These correlate to areas with higher risk of infectious disease mortality in this age group, largely HIV/TB. A very similar pattern was seen in the 50Á64 year age-group when compared to 15Á49 years though with one minor difference in that one village in the south-east was no longer at higher risk and one additional village in the upper central region emerged as high risk. Similarly, this distribution is largely driven by HIV/TB mortality. Higher non-communicable disease mortality risk was observed in one particular village in the upper central region of the site.

Potential proximate reasons for the observed high-risk clusters
The high age-specific mortality clusters (Kulldorff spatial cluster scan statistic pB0.05), when compared to the lower mortality clusters, had significantly (pB0.05): . more deaths due to HIV/TB and diarrhoea or malnutrition . lower duration of breastfeeding . higher number of deaths of previous children . higher number of mothers dying of HIV/TB . lower mean maternal education years . higher number of cumulative household deaths . younger household heads . higher incidence of household heads dying . more Mozambican household heads . lower SES . higher temporary migration rates . lower proportion of individuals with secondary or higher level education.

Discussion
This study has demonstrated the usefulness of advanced epidemiological modelling in assessing risk factors and producing smooth maps of mortality risk in a population. Earlier work in South Africa and Agincourt has shown the profound impact of HIV/TB on mortality across most age groups (27Á29), with higher than normal mortality rates evident. The results show that infectious diseases (particularly HIV/TB) were the most prominent cause-ofdeath over the study period and have largely contributed to the observed mortality trends. The levelling out of mortality round 2005 is possibly linked to the ART rollout, which began in South Africa round 2004. Current studies in this area are assessing the impact of ART rollout on mortality, as well as specific villages or areas where equity of access may be an issue. Mortality from non-communicable disease has also increased significantly in adults 30 years and older in the rural Agincourt sub-district (29) and has implications with regard to the epidemiological transition.
Results suggest that strong and significant spaceÁtime mortality disparities exist, even within a small geographic area. This distribution is being driven by a complex web of multilevel interacting factors that have likely increased communicable disease mortality (HIV) and non-communicable disease mortality (in the older agegroup) in specific risk areas. According to the spatial analyses, the south-east and upper central regions of the site were consistently identified as high-risk areas for most age groups, thus indicating a definite non-random element to the mortality distribution in this rural subdistrict. A strong geographical pattern of higher infectious disease mortality risk (particularly HIV/TB and diarrhoea/malnutrition) and former Mozambican settlements lying to the east of the site was also generally observed. Mozambican settlements in the south-east have generally been shown to have poorer access to water, sanitation, and waste disposal; in addition, they had fewer schools and poor quality of housing and were particularly isolated from public transport (30,31).
Key individual and household level determinants have been confirmed and certain novel determinants have emerged. The results confirmed the importance of infant and child mortality risk factors such as maternal age, birth spacing, season, village and ethnic group (32,33). Lack of breastfeeding in infancy and maternal death during infancy or childhood (1Á4) were major risk factors as were a higher number of cumulative household deaths.
Other reports indicate that infants who survive the death of the mother have a 10% chance or less of living past the   (35,36), and temporary or labour migrants are more vulnerable to HIV than more settled populations. This has been shown in other African and southern African countries (37). One village in the upper central region appears to be at consistently higher risk across all age groups and has a significantly younger and highly mobile population, potentially engaging in higher risk behaviour with more time spent away as described by Collinson (31). Health education messages are also needed that focus on high-risk sexual behaviour that increases risk for HIV infection, and its consequences. Death of household members also appeared to be a significant risk in all age-groups. Previous research indicates that HIV has arguably had the greatest impact at the household level in terms of dissolution and reduced economic status (38).
There are several studies relating geographical access to use of health facilities. As one would expect, members of communities that are more distant use the facilities less than those that live nearer, but this does not necessarily translate into increased mortality risk (39). In this study, we did not observe significant excess risk associated with increased distance to nearest health facility among infants and children, and this has been demonstrated previously in this setting (20,21). Conversely, larger distance from the nearest health facility had a significantly higher risk associated with adult (15Á49 years) mortality. This has been shown in a previous study on adult mortality in China (40).
A limitation of the study is the potential to miss infant deaths, particularly neonatal deaths, which would underestimate the overall infant mortality burden. However, infant death ascertainment has improved in the study site, especially towards the end of the study period (41). Fig. 3. Geographic risk of adult mortality adjusting for distance to nearest health facility at prediction point using a univariate Bayesian spatial kriging model that includes distance to health facility as the risk factor (white and black scale represents lowest and highest risk, respectively).  Determination of cause-of-death through VA is more problematic for diseases that have less specific symptoms such as HIV/AIDS (42). Thus it is likely that the HIV burden may have been underestimated. However, previous validation studies of the VA in Agincourt HDSS have shown that it performs well in this high HIV prevalence setting (43). Levels of stigma associated with HIV are high in South Africa particularly prior to the introduction of HAART and may have also contributed to this underestimation.

Conclusion
This work has contributed through the testing, refinement, and application of various advanced spatialÁ temporal analyses and statistical modelling of risk factors to large longitudinal cohorts such as an HDSS. The novel application of methodologies in public health contributes to our understanding of factors related to mortality and how to quantify them accurately for correlated geostatistical and longitudinal data. This study also contributes to the development of public health interventions by targeting clusters of adverse health outcomes that appear to aggregate geographically and in time as well as the tracking and targeting of other emerging (or re-emerging) communicable diseases that are compromising achievements made in developing countries (44). In particular, spaceÁtime modelling and mapping can be an effective tool in public health by showing and monitoring diffusion patterns of communicable diseases and in searching for infectious agents. The identification of disparities in the distribution of mortality and related risk factors in space and time, can guide effective policy interventions and programmes. The methods developed, assessed, and used in this thesis contribute to our understanding of risk factor modelling of large correlated longitudinal data.
This study should be regarded as a first step in prioritising specific areas for follow-up public health efforts and evaluating their impact in this rural setting. Targeting prevention of HIV/TB and antiretroviral rollout in significant child and adult mortality clusters and ensuring maternal survival appear key to improving infant and child mortality rates. Further spatial assessment of antiretroviral therapy (ART) rollout that started in this area in 2007 as well as identifying any villages or areas not accessing ART equitably is also critical. The provision of adequate water and sanitation is needed in the mortality clusters particularly in the south-east where diarrhoeal mortality appears high.