The impact of natural hazards on migration in the United States and the effect of spatial dependence

.


Introduction
Recent literature shows that natural hazards have caused massive displacements of people towards safer areas (IMDC 2022).Some of these people returned home after the event, but others moved away permanently.When focusing on the US, there is quite some empirical literature on the effects of natural disasters on displacements in the US; however, the results are not always consistent.Some studies find that natural hazards lead to higher net out-migration in US counties (e.g.Boustan et al. 2020), while others find higher net in-migration due to hazard damage (e.g.Schultz and Elliott 2013).
Recent meta-analyses on the empirical literature on climate change and migration have attempted to identify why studies have found inconsistent results (Beine and Jeusette 2021;Hoffmann, Sedov a, and Vinke 2021; Sedov a, Cizmaziov a, and Cook 2021).One reason is that migration patterns are driven by a complex interplay of economic, social, environmental, political and demographic factors, and evolve over different time and spatial scales (Black et al. 2011).Consequently, researchers have employed a wide range of methods and data sources to analyze the unknown underlying migration process.Beine and Jeusette (2021), Hoffmann, Sedov a, and Vinke (2021), and Sedov a, Cizmaziov a, and Cook (2021) show that the methodological and modelling choices significantly affect the results.For example, the meta-analyses suggest that climatic hazards are more likely to be significant drivers of migration in internal migration studies than in international migration studies.
An important observation by Hoffmann, Sedov a, and Vinke ( 2021) is the lack of spatial models in migration studies, which can explicitly address spatial dependence in the data.Spatial dependence occurs because, for example, disaster damage in one region tends to be positively correlated with disaster damage in regions nearby (see Figure 1 in the online Supplementary Information).In other words, observations exhibit spatial dependence when they are not randomly distributed over geographic units.When spatial correlation is not accounted for in statistical models, then this could cause biased and inconsistent estimators, as well as inefficient estimates (i.e.large standard errors) (LeSage and Pace 2009).This means that the point estimates and statistical significance of the estimates cannot be trusted.Hoffmann, Sedov a, and Vinke (2021) and Sedov a, Cizmaziov a, and Cook (2021) therefore recommend that researchers must have a stronger focus on space in statistical modeling and account for spatial dependencies in the data.
Some of the existing studies on migration and natural hazards in the US consider spatial dependence, but in different ways: Boustan et al. (2020) use Conley standard errors (Conley 1999) that correct for spatial and temporal dependence; Winkler and Rouleau (2020) include a dummy variable when a neighboring county has experienced a wildfire; and Schultz and Elliott (2013) include a spatial lag of the dependent variable in the regression.This spatial lag is the average value of the dependent variable (in their case, population growth) in neighboring counties.Strobl (2011) adopted a spatial regression model that accounts for spatial autocorrelation in the error terms.We suspect that these different treatments of spatial dependence could have contributed to the mixed findings in this literature.In other fields, scholars have already found that accounting for spatial dependence can significantly impact research outcomes, as demonstrated by studies on child poverty rates in the US (Voss et al. 2006), the influence of urbanization and environmental pollution on public health (Wu et al. 2023), and child malnutrition in Africa (de Sherbinin 2011).
In this paper, our goal is threefold: i) we examine the impact of economic damage caused by natural hazards on migration in the US, ii) we aim to improve the assessment of internal migration dynamics by accounting for spatial dependence using spatial econometrics models, and iii) we examine whether and how the results change when two different measures of migration are used.With our approach, we add to the existing literature by trying to clarify some of the mixed evidence in the climate-migration literature.
We aim to do this in several ways.First, we adopt spatial econometrics techniques specifically designed for modeling data collected over geographic regions.The advantage of using these techniques is that we can explicitly take into account the spatial dependence in the data.Second, following the suggestions by Beine and Jeusette (2021), we estimate two separate models using either migration rates or migration flows to examine the effect of different drivers of migration.The first regression model uses migration rates and is relatively similar to the previous climate-migration literature in the US, except for the spatial econometrics approach.The second regression model is the gravity model using migration flows between counties (Poot et al. 2016).Through applying the gravity model and using flow data, the origin and destination of migration are known, whereas with migration rates, only the origin (in case of out-migration rates) or the destination (in case of in-migration rates) is known.Compared to the regression models using migration rates, the gravity model could therefore give more accurate results because we employ more information about the mechanisms behind the effects of natural hazards on migration.
Finally, we test whether the migration response to natural hazards is different in counties with a high risk of natural hazards and counties with a low risk of natural hazards.On the one hand, we could expect that the migration response in high-risk counties is lower due to familiarity with natural hazards and a higher disaster preparedness.On the other hand, the response could be higher because the occurrence of another disaster could be the trigger point to relocate to safer areas.

Methodology
The flowchart in Figure 1 presents an overview of the data used in the two modeling approaches.From the Internal Revenue Service (IRS) data, described in Section 2.1.3,we derive in-and out-migration rates and migration flows, which function as the dependent variables of the regression models.The dependent variable for the spatial econometric models is either out-migration rates or in-migration rates and the dependent variable for the gravity model is migration flows between counties.The explanatory variable in which we are most interested, damage caused by natural disasters, is derived from Federal Emergency Management Agency (FEMA) disaster data, described in Section 2.1.1.FEMA reports yearly damages per capita due to natural hazards in each of the 2,767 counties in the sample.Additional socio-economic data are available to act as control variables, which are discussed in Section 2.1.2.The two models are discussed in Sections 2.2 and 2.3.

FEMA damage data
In our migration models, we include hazard damage from different natural hazards in the US as a predictor for migration in response to a natural hazard (e.g.Fussell et al. 2017).These natural hazards comprise hurricanes, flooding, severe storms, tornadoes, wildfires, snow storms, ice storms and earthquakes.For declared disaster areas, FEMA provides i) individual assistance to households that are not insured by the National Flood Insurance Program (NFIP) and ii) public assistance to counties and states to restore infrastructure and public buildings.Because the amount of individual assistance depends on NFIP insurance rates, for which we do not have data, we only employ public assistance data from FEMA as a measure of economic damage from natural hazards.FEMA public assistance data at the county level is available from 1998 onwards up to the present.Additionally, we use the national risk index of natural hazards provided by FEMA to identify high-risk and low-risk counties.

Socioeconomic data
We control for some socio-economic variables that may explain migration patterns.For this, we include unemployment rates and personal income levels obtained from the US Bureau of Labor Statistics (2021) and the US Bureau of Economic Analysis (2021a), respectively.The personal income includes income from wages and salaries, Social Security and other government benefits, dividends and interest, business ownership, and other sources (US Bureau of Economic Analysis 2021a).The two traditional variables in the gravity model (population and distance between counties), are obtained from the US Bureau of Economic Analysis (2021b) and the National Bureau of Economic Research (2021), respectively.The distance is measured in miles from the centroids of the counties.

Migration data
We use yearly county-to-county migration flows that are derived from annual changes in addresses as indicated on individual income tax returns filed with the IRS.Hence, migrants are defined as those who change county of residence from one calendar year to the next.Counties are the administrative units within a state in the US and, on average, a state comprises around 60 counties.Although not all households file taxes, still 87% of the US households are present in the dataset (Molloy, Smith, and Wozniak 2011;Winkler and Rouleau 2020).Specifically, retired people, students and the poor could be underrepresented in the data, as these groups might not file taxes.Still, the percentage of households that does file taxes is consistent over time, which suggests that the underrepresentation of these specific groups is constant over time and should not affect trends (Molloy, Smith, and Wozniak 2011).We exclude international migrants from the data, as we focus on internal migration in this study.Following suggestions on data quality by DeWaard et al. (2021), we only use the data before 2011.Due to privacy reasons, the destination of migration flows lower than 10 people is classified as unknown.As a result, counties with smaller populations are likely to have reported lower migration inflows, because they are more prone to experiencing migration flows of less than 10 people.We therefore omit counties with a low population and, thus, potentially with low migration destination flows smaller than 10 people.More specifically, we remove counties with a population lower than 5,000 people in 1998 and which leaves 2,767 out of 3,100 counties in the data sample.Similar to Winkler and Rouleau (2020), the out(in)-migration rate per county is calculated as the number of out(in)-migrants divided by the county's population.Here, the county's population is the number of stayers plus the number of out-migrants.

Description of the data
To gain some insight into the data, Table 1 reports the measurement unit, as well as the minimum, mean, median, maximum and variance for each variable.Table 1 reveals that the migration flow data is skewed to the right, because the median is less than the mean.The same holds for the household damage data, where we see that for at least half of the observations, there is no disaster damage.

The spatial Durbin model
Data collected over geographic regions often contain spatial dependence, because there is usually spatial interaction between two geographically close areas.Consequently, observations from a region often exhibit trends similar to the region's neighbors.When spatial correlation is ignored, this could lead to inefficient estimates and biased estimators (Elhorst, 2014).
To account for dependence among observations in space, the spatial econometrics literature introduces a weighting matrix W that describes the interactions between geographic units (Elhorst 2014).The weighting matrix can take different forms, such as an inverse distance matrix or a binary matrix, where 1 indicates that two regions are neighbors.Equation (1) shows how the weighting matrix is integrated into a conventional panel model to account for three types of interaction or spillover effects (Elhorst 2014).
Y t represents the dependent variable and is an N Â 1 column vector, where N is the number of cross-section units, and subscript t denotes the time dimension, which runs from 1 to T. X t is a vector of explanatory variables, u t is the composite error term, and e t is a random variable that is independently and identically distributed with a zero mean and finite variance.The other Greek letters (i.e.a, q, b, h, k) are parameters to be estimated.Endogenous interaction effects are represented by qWY t : observations of the dependent variable depend on each other (i.e.q 6 ¼ 0), which means that migration processes are not bound to one region and spill over to nearby regions.Exogenous interaction effects are included through WX t h : observations of a region's dependent variable depend on the observations of the explanatory variables in nearby regions (i.e.h 6 ¼ 0).This effect arises, when, for example, a natural hazard in a region increases out-migration in a nearby region, because people realize that this disaster could have happened in their own region.Finally, u t ¼ kWu þ e t represents interaction effects among the error terms: There remains spatial correlation in the composite error term (i.e.k 6 ¼ 0), which stems from omitted variables and unobserved spatial shocks.
The model in Equation ( 1) is a general spatial econometric model but, in practice, only one or two interaction effects are included.The model including all three interaction effects is often overparameterized and unable to outperform simpler models with fewer interaction effects.Schultz and Elliott (2013), for example, estimate population growth using a spatial autoregressive model, which includes only the qWY term.Saldaña-Zorrilla and Sandberg ( 2009) apply the spatial Durbin model, which includes endogenous and exogenous interaction effects.They show that the spatial Durbin model provides the best fit for explaining out-migration in Mexican municipalities.Their results indicate that natural hazards in neighboring municipalities actually reduce out-migration.They argue that natural hazards drive up agricultural prices due to destroyed harvests, which increase farmers' income and thereby reduce the incentive to migrate.
In this study, we first test for spatial dependence by estimating a spatial error model, which only corrects for spatial correlation in the error term.If the null hypothesis of no spatial autocorrelation is rejected, we proceed with the spatial Durbin model.The advantage of the spatial Durbin model is that it produces unbiased coefficient estimates, even when the true model includes a spatial error term, that is, k 6 ¼ 0 (LeSage and Pace 2009).We therefore believe that our results are more reliable than the results published by other scholars who have considered spatial dependence but applied less rigorous approaches.A disadvantage of the spatial Durbin model is that coefficients may be estimated less efficiently (i.e. standard errors are higher), especially for small sample sizes.However, we do not expect this to be a problem because our sample size is relatively large.
A weighting matrix W describing the spatial relationship between geographic units must be specified before estimating a spatial econometrics model.Following Saldaña-Zorrilla and Sandberg (2009), we adopt a binary N Â N matrix, in our case, a 2767-by-2767 matrix, where 1 indicates that two counties are neighbors and 0 indicates otherwise.We define two counties as neighbors when they share a border.Two advantages of this specification are (i) this type of weighting matrix is relatively sparse, which makes estimation faster, and (ii) the coefficients of the spatial terms are easily interpreted.For estimation, W is row normalized such that every row in W sums to 1.This way, the number of neighbors does not affect the estimation results.
Whereas Saldaña-Zorrilla and Sandberg ( 2009) estimate a spatial model on crosssectional data, our dataset has a time dimension of 13 years, allowing us to control for both spatial and temporal heterogeneity.We, therefore, estimate a fixed effects model with spatial and time fixed effects.We also considered the random effects model, but the Hausman test indicated that this model produces biased coefficients.This bias occurs when the spatial effects, time effects, and error terms are correlated, which is not an issue in the fixed effects model (Elhorst 2014).
The spatial Durbin model for out-and in-migration rates is as follows: The terms with W i are the spatial terms.q 1 W i Rate t is the spatial lag, which can be interpreted as the weighted average over the migration rates of the neighboring counties, because W i is the i-th row of the row-normalized matrix W. Similarly, h 1 W i Income t , h 2 W i Unemployment t and h 3 W i Damage t can be interpreted as the averages of income, the unemployment rate and household damage in neighboring counties.

The gravity model
In the gravity model, migration flows (Flow ij ) are positively related to the population size in the origin and destination (Pop b i and Pop c j ), but negatively related to the distance between them (Dis d ij ), see Equation ( 3) below.G is a scaling factor.
In the gravity model, a person migrates to the destination that maximizes their utility, given that this utility is higher than the current utility in the origin.To estimate the multiplicative gravity model given in Equation (3), it is common practice to first log linearize the model (Santos Silva and Tenreyro 2006).The log-linearized version of the gravity model is provided in Equation (4): where a replaces ln G.However, Santos Silva and Tenreyro (2006) find overwhelming evidence that the error terms, ln e ij , are heteroskedastic in the usual log-linearized model of the gravity equation, which could lead to inconsistent OLS estimates of the parameters.They, therefore, propose a Pseudo Poisson Maximum Likelihood (PPML) estimation technique that is consistent under heteroscedasticity.The Poisson regression is shown in Equation ( 5).Additionally, because we take the exponent on both sides of the equation, the PPML technique circumvents the problem of taking the logarithm of zero flows, which are often present in migration flow data.
In the gravity model, migration flows are assumed to be independent of each other, but literature shows that they can exhibit network autocorrelation (Chun 2008).Consider, for example, a highly urbanized region that generates a large influx of migrants where some of the migrants move into neighboring regions of the urban region.This leads to spatial dependence between the migration flow to the urban region and the migration flows to regions surrounding the urban region.Again, this dependence could lead to inefficient and biased estimators.To account for spatial dependence, spatial econometrics techniques can be applied.However, the weighting matrix of the gravity model is not N Â N, that is, 2767 Â 2767, because the weighting matrix needs to specify the spatial relationship between pairs of counties.This means that W has a size of N(N-1) Â N(N-1), approximately 7.65 million by 7.65 million, which makes estimation infeasible.
To circumvent this issue, we calculate a spatial lag of the dependent variable (i.e. the migration flow) in advance and include it as an explanatory variable.Following LeSage and Pace (2008), we calculate the logarithm of the average of flows from the origin to the destination's neighbors, referred to as destination-based spatial dependence, and the logarithm of the average of flows from the origin's neighbors to the destination, referred to as origin-based spatial dependence.Both explanatory variables are included in the regression.In Figure 2, we show the two sources of spatial dependence: The dashed lines are the flows from the origin's neighbors to the destination county, and the dotted lines are the flows from the origin to the destination's neighbors.
The Poisson regression for migration flows that takes into account the spatial correlation is given by We include time fixed effects, / t and origin-destination fixed effects, l ij , in the gravity model to control for spatial and temporal heterogeneity.We decided to use origin-destination fixed effects because 98% of the county pairs experienced no migration between them during the time span of this study.Removing these county pairs could result in biased estimates, as they are not randomly distributed.However, we cannot include them either because the size of the dataset would explode, rendering estimation infeasible.Therefore, we resolve the issue by including origin-destination fixed effects, which are essentially (unobserved) constants per origin-destination pair.When the constant of an origin-destination pair goes towards minus infinity, the expected flow is zero, regardless of the value the other variables in the model take.As a result, we can safely remove the county pairs without migration from the dataset and without introducing a bias to the model.Also note that the origin-destination fixed effects capture the distance between two counties, so distance does not need to be included. 1Following Gr€ oschl and Steinwachs (2017), we do not take the natural logarithm of the disaster variables.The reason lies in the interpretation of the coefficients.If one takes the natural logarithm of household damage, then an increase in disaster damage from $10 to $100 has the same impact on migration flows as an increase in damage from $1,000 to $10,000.This is not realistic, because especially large and devastating disasters are expected to have an impact on migration, whereas small disasters are likely to have no or little impact.

Results
In the first two subsections, we present the results of the spatial Durbin model and the gravity model.Moreover, we show the results of similar models when spatial dependence is not, or differently, taken into account.In Section 3.3, we compare the results of the two modeling approaches.

Spatial Durbin model results
First, we test whether spatial autocorrelation is present in the error terms by estimating a spatial error model, which only controls for spatial correlation in the error term.We find that in the spatial error models for out-migration and in-migration, SEM1 and SEM2, spatial correlation in the error terms is significant against a 5% level, and according to Debarsy and Ertur (2010), the null hypothesis of no spatial autocorrelation can be rejected.Therefore, we proceed with the spatial Durbin model for both out-migration (SDM1) and in-migration (SDM2).Table 2 presents the coefficient estimates of the spatial Durbin models and the spatial error models.Additionally, in the final two columns of Table 2, we report the results of two fixed effects models where spatial dependence is not considered (FE1 and FE2).We use standard errors that are robust against heteroskedasticity and are clustered at the county level.
The coefficients can be interpreted as follows: According to SDM1, an increase in income of $1,000 is associated with an out-migration rate that is 0.0187 lower on average, ceteris paribus.In other words, the increase in income induces 19 "extra" inhabitants per 100,000 inhabitants to stay in the county.The effect of income on inmigration is insignificant.Unemployment rates have a significant impact on both out-migration and in-migration rates.An increase in the unemployment rate of 1% is associated with an increase in the out-migration rate of 0.0405, which means that per 100,000 inhabitants, 41 additional people move out.According to SDM2, an increase in the unemployment rate of 1% discourages 38 individuals per 100,000 inhabitants from migrating into the county.
In comparison to income and unemployment, household damage has a much greater effect, especially on out-migration.A $1,000 dollar damage (1 unit) leads to an additional 928 individuals moving out of a county of 100,000 inhabitants.When all neighboring counties have a damage of $1,000 dollars, another 951 individuals per 100,000 inhabitants would move out, irrespective of disaster damage in the county itself.However, it should be noted that both coefficients are only significant at a significance level of 10%.The effect of disaster damage on in-migration appears to be insignificant.
The estimation results of the fixed effects models, where we ignore the spatial correlation, are substantially different from the results obtained by using the spatial Durbin model.According to FE1, a $1,000 dollar damage is associated with the relocation of 1,651 individuals per 100,000 inhabitants, whereas this was 928 individuals according to SDM1.In general, the coefficients in the fixed effects models are not consistent with the spatial Durbin model and sometimes even counterintuitive.In FE1, higher unemployment rates are associated with lower out-migration, and in FE2, higher income levels are associated with lower in-migration.These results are contrary to what the spatial Durbin models predict, and they are not in line with economic logic.
Additionally, we see a large difference in the estimated impact of hazard damage on out-migration in SDM1 and SEM1.As mentioned in Section 2.2, LeSage and Pace (2009) have shown that the spatial Durbin model is the only spatial model that produces unbiased coefficient estimates regardless of the true spatial data-generating process in the data.Hence, even when the correct model is a spatial error model, the spatial Durbin model produces unbiased coefficient estimates.On the other hand, the spatial error model will yield biased estimates when the true model is not a spatial error model.We can actually test whether the spatial error model is the true model, because the spatial Durbin model can be reduced to the spatial error model when h ¼ −qb (LeSage and Pace 2009).In the last row of Table 2, we report the test statistic and p value of the null hypothesis that h þ q Á b ¼ 0. In both tests, we can reject the null hypothesis against a 5% significance level, which suggests that the spatial error model is not the appropriate model, and we should proceed with the Spatial Durbin model.
Finally, we formally test whether the estimated coefficients of the three models (SDM, SEM and FE) are significantly different from each other.The results of the ttests are given in Table S4 in the online Supplementary Information.The test results indicate that the coefficients of disaster damage are not statistically different from each other in the three models (SDM1, SEM1, FE1).In other words, even though we do find considerable difference between the estimated coefficients, and the impact of natural hazards on out-migration is overestimated when spatial correlation is not taken into account (FE1) or not correctly taken into account (SEM1), the differences between coefficients are not statistically significant.

Gravity model results
In Table 3, we report the estimated coefficients of two gravity models.In Gravity1, we account for spatial dependence, and in Gravity2 we do not consider spatial dependence.Again, we use standard errors that are robust against heteroskedasticity and clustered at the county pairs.
The coefficients of the logarithmic variables in the gravity model can be interpreted as elasticities: For example, when the population in the origin increases by 1%, then the flow increases by coefficient $b %, that is, $0.80% according to Gravity1.More specifically, when the population increases by x %, the flow increases by For the damage variables, the interpretation is somewhat different: For example, when hazard damage increases by c US dollars, the flow increases by e bc − 1 ð Þ Ã 100%: When c is $1,000 damage in the origin county, it results in a e 0:0867Ã1 − 1 ð Þ Ã 100% ¼ 9:06% increase in the (outward) flow.A comparison of the two gravity models in Table 3 highlights major differences in the estimates.When spatial dependence is not considered, the gravity model seems to overestimate the parameters.For example, the coefficient of disaster damage in the origin is 0.0867 in Gravity1 and 0.1398 in Gravity2 and a t-test demonstrates that this difference is statistically significant, see Table S4 in the online Supplementary Information.These results show that incorporating spatial dependence can significantly influence the coefficient estimates.

Comparing the results
The spatial Durbin model and the gravity model have different dependent variables, which affect the interpretation of the coefficients and prohibit a direct comparison of the models.To facilitate comparison of the effect of the explanatory variables on the migration flow and migration rates, we convert the coefficients to percentage change effects.The percentage change in the migration rates is calculated at the median value of the out-or in-migration rate.Table 4 presents the percentage change in the flow, out-migration rate and in-migration rate in response to an increase in one of the explanatory variables (ceteris paribus).
Table 4 shows that disaster damage has a relatively large impact on out-migration.For example, in the gravity model, a $1,000 dollar household damage per capita increases the outward flow of migrants by 9.06%.To achieve a similar effect, the unemployment rate has to increase from 5.4% (the median unemployment rate) to 19.98%, or the population has to grow from 29,861 to 34,251 inhabitants (i.e.11.47%). 2Interestingly, an increase in income in the origin county leads to higher outmigration in the gravity model (þ1.51%), whereas SDM1 indicates that an increase in  income leads to lower out-migration (−1.62%).The result of SDM1 seems more plausible: When individuals are able to earn a higher income in their county, they have an incentive to stay.However, it is also worth considering that an increase in income could provide individuals with the financial means to relocate to another county, as suggested by the gravity model.In any case, the impact of income is considerably low in both models.Interestingly, the impact of hazard damage is much higher in the spatial Durbin model than in the gravity model: In SDM1, the out-migration rate increases by 16.04% after a hazard with $1,000 damage per capita, whereas in Gravity1 the migration flows increase by 9.06%.We believe that the reason for this difference lies in the different model specifications; the spatial Durbin model is a linear model, whereas the gravity model has a Poisson specification.
In Figure 3, we plot the impact of different values of hazard damage on the percentage change out-migration according to the two models.Given the linear model specification of the spatial Durbin model, the change in out-migration rates increases linearly with hazard damage.On the other hand, the change in out-migration flows increases exponentially with hazard damage.Hence, for low hazard damage, the impact of natural hazard damage is comparatively larger in the spatial Durbin model than in the gravity model; however, this changes after 13,900 US dollars.

High-risk versus low-risk counties
In this subsection, we evaluate whether the response to a natural hazard varies between counties where the risk of natural hazards is high and counties where the risk is low.On the one hand, we could expect that the migration response is lower in high-risk counties because people are more aware of the risk and therefore better prepared, reducing the need for migration.On the other hand, we could expect that the migration response is higher because the experience of another disaster might be the tipping point to decide to relocate to a safer area.To test these hypotheses, we construct a dummy variable that is one in case of a high-risk county and zero in case of low-risk counties.Then, we create an interaction variable by multiplying the risk dummy with the damage variable.Since we introduce the interaction variable alongside the original damage variable, we can investigate whether there is an "additional" response for high-risk counties compared to counties with low or no risk.To identify high-risk counties, we use the national risk index provided by FEMA.The risk index classifies counties into five categories: very high, relatively high, relatively moderate, relatively low, and very low.We set our dummy to 1 when the county falls in the very high, relatively high, or relatively moderate category.This leads to 538 counties out of the total 2,767 having a value of 1 in the dummy variable. 3 The complete results of the spatial Durbin models and the gravity model are reported in the online Supplementary Information, see Table S2 and S3.The coefficients, along with the standard errors of the damage variables, are given in Table 5 below.We still find that disaster damage is associated with higher out-migration (SDM1 and Gravity1), although the coefficients have dropped considerably compared to the original models without the interaction variable.The coefficients of the interaction variables in SDM1 and Gravity1 are positive and significant, which suggests that disaster damage has a higher impact in high-risk counties compared to low-risk counties.This implies support for the second hypothesispeople in high-risk areas are more inclined to migrate after a natural hazard than in low-risk areas.This is in line with the survey results of Correll et al. (2021) in southern Louisiana: they found that relocation is positively correlated to the number of floods experienced and the perceived risk of floods.Finally, we want to point out that the coefficient estimates of SDM1 and SDM2 are more significant than in the original models where we did not differentiate between high and low-risk counties.This suggests that controlling for high-risk counties leads to more accurate estimates because we are better capturing the dynamics between damage, risk level and migration.

Discussion and conclusion
In this study, we assessed the effect of spatial dependence on estimating the impact of natural hazards on migration in US counties.We apply two modeling approaches to examine whether and how the results change when different measures of migration are used and when spatial dependence is taken into account.In the spatial Durbin model, $1,000 dollar damage per capita is associated with out-migration of roughly 1% of the county's population.This corresponds to an increase in the out-migration rate of 16.0%.In the gravity model, $1,000 dollar disaster damage per capita increases the outflow of migrants by 9.1%.This seems like a large difference in estimated impacts, but the difference is due to the different model specifications: the spatial Durbin model is a linear model, whereas the gravity model follows a Poisson distribution.Hence, for low values of natural hazard damage, the estimated impact of hazard damage is  relatively high in the spatial Durbin model, but this effect reverses for high values of natural hazard damage (see Figure 3).
The impact of natural hazards on in-migration is lower compared to the impact on out-migration.Only the gravity model suggests a significant decrease in in-migration in the event of a natural hazard.However, it is important to acknowledge that existing literature has suggested that natural hazards could actually attract migrants to a disaster-hit area due to employment opportunities in reconstruction work.Nevertheless, these migrants are mostly from Central America and undocumented, so they are not present in the IRS tax records data and, consequently, we are not able to detect this effect (Brown et al. 2022).
Comparing the spatial Durbin models and gravity model, we find that using a gravity model increases the probability of finding significant impacts of climate factors on migration.This is in line with the findings of Beine and Jeusette (2021).The reason could be that the gravity model makes use of more granular data (migration flows instead of migration rates), providing more information about the mechanisms behind the effects of natural hazards on migration.Because we utilize more information in the gravity model, we can expect better accuracy of the estimated coefficients.Therefore, we find that the null hypothesis of no relationship between disaster damage and migration is more strongly rejected in the gravity model compared to the other models employing migration rates and thus utilizing less information.Hence, we can confidently conclude that there is a significant relationship between natural hazards and migration flows.
In addition, when we differentiate between counties with a high risk of natural hazards and counties with a low risk of natural hazards, we obtain more accurate estimates in the spatial Durbin models.This suggests that we previously ignored certain non-linear dynamics in the relationship between hazard damage and migration, with the county's risk levels playing a key role in shaping the migration response.We find that high-risk counties exhibit a higher migration response to natural hazards compared to counties with a low risk of natural hazards.At the same time, there is a more pronounced decline in in-migration in the event of a natural hazard in high-risk counties compared to low-risk counties.These results are encouraging, as it suggests that individuals move away from disaster-prone counties and are deterred from moving to high-risk counties.
Compared to the impact of natural hazard damage, the impact of changes in income levels and unemployment rates is relatively small.For example, an increase of $5,000 in the income level is associated with a decrease in the out-migration rate of 1.61% in the spatial Durbin model.Interestingly, in the gravity model, an increase in the income level is associated with an increase in migration outflows.The estimated impact of income in the spatial Durbin model is more in line with the literature; for example, Piras (2017) and Saldaña-Zorrilla and Sandberg (2009) find that higher income levels provide an incentive to stay in the area.On the other hand, we could argue that higher income levels could provide the financial resources to relocate to other areas (Cattaneo et al. 2019), although we would expect that this effect requires a longer timeframe and does not occur within a year.The estimated impact of income and unemployment on in-migration is intuitive.In both modelling approaches, higher income levels and lower unemployment rates attract migrants to an area (except for the fixed effects model).
While the choice of the dependent variable had a marginal impact on the results, we found that taking spatial dependence into account does have a large impact on the results.As explained in the Introduction, spatial correlation in the data could result in wrong point estimates and invalid confidence intervals, if ignored.When we compared the spatial Durbin model with a regular fixed effects model, the impact of natural hazard damage on out-migration was overestimated by a factor of 1.8.Also, the coefficients of other variables were not in line with the spatial Durbin model.In addition, we found that the choice regarding the method of taking spatial dependence into account affects the results.Using statistical tests, we showed that the spatial error model is not the correct model for the data and, consequently, using this model could result in biased estimates.Indeed, the impact of natural hazards on out-migration was overestimated by a factor of 1.5 when compared to the spatial Durbin model.These results suggest that different treatments of spatial dependence in regression models could also have contributed to the mixed findings in the literature.However, we should add some nuance to our findings, as the coefficients in the models using migration rates did not appear to be significantly different based on t-tests.For the gravity model, we were not able to test several model specifications, but taking a spatial lag into account had a great impact on the estimated coefficients.The impact of natural hazards in the origin on migration flows was overestimated by a factor of 1.6 in the gravity model without considering spatial dependence, and this difference was statistically significant.Given these results, we agree with Hoffmann, Sedov a, and Vinke (2021) and Sedov a, Cizmaziov a, and Cook (2021) that a stronger focus on spatial dependence in statistical modeling is a necessary direction for further empirical research in the climate-migration literature.Notes 1.In the online Supplementary Information, Table S1, we report the coefficient estimates of the gravity model when separate origin and destination fixed effects are included.The results are very similar, suggesting that the bias from removing the county pairs without migration is small.2. A $1,000 dollar damage in the origin leads to a e 0:0867Ã1 − 1 ð Þ Ã 100% ¼ 9:06% increase in the flow.To calculate which percentage increase in the unemployment rate gives the same increase, we have to determine x in the following equation: 1þ ð ð x 100 Þ 0:060 −1ÞÃ100% ¼ 9:06%: This gives an x of 269.77.So $1,000 dollar damage is the same as an increase of 269.77% in the unemployment rate.Evaluated at the median, this is an unemployment rate of 5.4 Ã 3.70 ¼ 19.98%.For population we can do the same.3. We checked whether the 538 counties identified as high-risk counties overlap with the counties experiencing the highest cumulative damage during 1998-2010, but this was not the case.Only 117 high-risk counties are among the 538 with the highest cumulative damage; hence we are not capturing non-linear effects of economic damage that were previously ignored.In fact, when we replace the high risk dummy with a dummy for the highest damage counties, the interaction variables are insignificant.

Figure 1 .
Figure 1.Overview of the data and the models.

Figure 2 .
Figure 2. Spatial dependence in the gravity model.

Figure 3 .
Figure 3.The impact of different levels of hazard damage on out-migration according to the gravity model and the spatial Durbin model.

Table 2 .
Estimated coefficients of the spatial Durbin model, spatial error model and fixed effects model for out-migration (SDM1, SEM1, FE1) and for in-

Table 3 .
Estimated coefficients of the gravity model with and without accounting for spatial dependence.

Table 4 .
Quantitative comparison of the two modeling approaches ( ÃÃ evaluated at the median).