State-Building through Public Land Disposal? An Application of Matrix Completion for Counterfactual Prediction

This paper examines how homestead policies, which opened vast frontier lands for settlement, influenced the development of American frontier states. It uses a treatment propensity-weighted matrix completion model to estimate the counterfactual size of these states without homesteading. In simulation studies, the method shows lower bias and variance than other estimators, particularly in higher complexity scenarios. The empirical analysis reveals that homestead policies significantly and persistently reduced state government expenditure and revenue. These findings align with continuous difference-in-differences estimates using 1.46 million land patent records. This study's extension of the matrix completion method to include propensity score weighting for causal effect estimation in panel data, especially in staggered treatment contexts, enhances policy evaluation by improving the precision of long-term policy impact assessments.


Introduction
The exploration of state development patterns over time and across regions is a growing area of interest for social scientists.A key contribution in this field comes from Bensel (1990, p. 164), who emphasizes the significant role of mid-nineteenth century homestead policies -federal laws aimed at transferring public land to private individuals -in shaping the developmental trajectory of the United States.Additionally, Murtazashvili (2013, p. 250) and Frymer (2017, p. 12) suggest that these policies not only facilitated land distribution but also enhanced the federal government's bureaucratic capacity to manage public lands and secure future revenue streams.This paper examines how homestead policies impacted the size of state governments, which is closely related to state capacity, or the ability of governments to finance and implement policies (Besley and Persson, 2010).
Homesteading is expected to expand the size of state governments by increasing the land values and tax revenue of sparsely populated frontier states.The expansion in state size was historically evident in the adoption of compulsory primary education laws and public education investments by frontier state governments, as a strategy to attract homesteaders (Engerman and Sokoloff, 2005).Contrary to this expectation, I provide evidence that homesteads authorized under the Homestead Act (HSA) of 1862 and the Southern Homestead Act (SHA) of 1866, which opened for settlement hundreds of millions of acres of land for homesteading, significantly reduced the size of frontier state governments over the long-run.The finding that the homestead acts limited the size of frontier governments aligns with Mattheis and Raz's (2021) findings that regions impacted by the HSA experienced a slower transition from agriculture to other economic sectors and lower housing values.The paper further investigates land inequality as a possible causal mechanism underlying the relationship between homestead policies and state capacity, considering that median voter-based theories of inequality and redistribution predict inequality increases the size of governments, and show that exposure to homesteads decreased land inequality over time.
The paper makes a methodological contribution in extending the matrix completion method (Athey et al., 2021) for estimating the causal effects of policy interventions in panel data, by weighting the loss function with estimated unit-and time-varying treatment probabilities (i.e., the propensity score) to correct for imbalances in the covariate distributions between the factual and counterfactual values.This extension, which was proposed by Athey et al. (2021) but has not been implemented, places more emphasis on the loss for factual unit-time values that are most similar to the counterfactual values in terms of pre-treatment covariates.The covariates used in the application control for selective migration to more agriculturally productive land, and for selection bias arising from differences in access to frontier lands.
A standard method for causal inference with panel data is difference-in-differences (DID), which relies on the parallel trends assumption: in the absence of treatment, the average outcomes of treated and control units would have followed parallel paths.Under parallel trends, DID identifies causal effects by contrasting the change in outcomes preand post-treatment, between the treated and control groups.However, the parallel trends assumption is generally invalid in the presence of unobserved time-varying confounders.
DID has been extended to staggered treatment implementation settings, where the time of initial treatment varies among multiple treated units (Callaway and Sant'Anna, 2020;Goodman-Bacon, 2021;Athey and Imbens, 2021).
Another popular method of handling unobserved time-varying confounders in panel data is the synthetic control method (SCM; Abadie et al., 2010).The method constructs a convex combination of control units that are similar to a single treated unit in terms of pre-treatment outcomes or covariates, to help balance unobserved time-varying confounding between treatment and control groups.The SCM estimator assumes there is a stable convex combination of the control units that absorbs all time-varying unobserved confounding.The convexity restriction is equivalent to imposing a restriction of linear dependence between factor loadings in the context of matrix completion or latent factor models (Gobillon and Magnac, 2016;Xu, 2017;Xiong and Pelger, 2020;Bai and Ng, 2021).The SCM can be generalized to settings with multiple treated units (Doudchenko and Imbens, 2016) and staggered treatments (Ben-Michael et al., 2019), and to include features of DID estimation (Ben-Michael et al., 2018;Arkhangelsky et al., 2021) or matrix factorization (Amjad et al., 2018;Agarwal et al., 2021;Fan et al., 2021).
Similar to latent factor models, matrix completion attempts to model unobserved timevarying confounders by decomposing the factual outcomes into matrices of latent factors (i.e., time-varying coefficients) and factor loadings (i.e., unit-specific intercepts).The counterfactual values are then imputed using the estimated factors and loadings.Matrix completion and latent factor models avoid imposing convexity constraints on the factor loadings like the SCM, and typically use matrix norm regularization or factorization to produce a low-dimensional representation of the factual outcomes, which improves generalizability.
Matrix completion offers distinct advantages over latent factor models: first, it does not require fixing the rank (i.e., number of unobserved factors) of the underlying data; second, it is suitable in staggered treatment implementation settings, even when few control units are available; third, it uses all factual data to estimate unobserved factors, while latent factor models use only the pre-treatment data.
The structure of this paper is organized as follows: Section 2 provides an overview the historical context of homestead policies and their relationship to state size and land inequality.Section 3 details the matrix completion method for obtaining the causal estimands of interest and reports the results of simulation studies to evaluate the proposed method.Section 4 describes the data sources used for the application and potential sources of bias in the analysis.Section 5 presents estimates of the long-run impacts of homestead policies on state size using the matrix completion estimator and reports the results of a "no-treatment evaluation" to verify the consistency of the estimator.Section 6 reports DID estimates of the effect of homesteads on state size and land inequality.The final section discusses the study's findings, focusing on the long-term negative effects of homestead policies on state finances and the role of land inequality in state capacity.It also emphasizes the study's methodological advancement in policy evaluation through the matrix completion method with propensity score weighting.

Historical background
The view that the western frontier had long-lasting impacts on the evolution of democratic institutions can be traced to Turner (1956).Turner's frontier thesis posits that homestead policies acted as a "safety valve" for relieving pressure from congested urban labor markets in eastern states.The view of the frontier as a safety valve has been further explored by Ferrie (1997), who finds evidence in a linked census sample of substantial migration to the frontier by unskilled workers and considerable gains in wealth for these migrant workers.Bazzi et al. (2020) expand on this demographic profile, showing that frontier settlers, often illiterate and foreign-born, possessed a distinct individualism suited for the challenging frontier life.The historical experience of the frontier is reflected in modern times through lower property tax rates in counties with a longer frontier history and a prevailing sentiment among residents against taxation and redistribution.
Homestead policies not only offered greater economic opportunities to eastern migrants, but also the sparse population on the western frontier meant that state and local governments competed with each other to attract migrants in order to lower local labor costs and to increase land values and tax revenue.Frontier governments offered migrants broad access to cheap land and property rights, unrestricted voting rights, and more generous provision of schooling and other public goods (Engerman and Sokoloff, 2005).Consistent with this view, Poulos and Zeng (2021) estimate that the long-run impact of homestead policies on public school spending is equivalent to 2.5% of the total per-capita public school expenditures in 1929.García-Jimeno and Robinson (2008) test the frontier thesis in a global context and conclude that the economic effect of the frontier depends on the quality of political institutions at the time of frontier expansion.Frontier expansion promotes equitable outcomes only when societies are initially democratic.When institutional quality is weak, the existence of frontier land can yield worse developmental outcomes because non-democratic political elites can monopolize frontier lands.

Homestead policies
The 1862 HSA opened up hundreds of millions of acres of western public land for settlement.
Any adult household head -including women, immigrants who had applied for citizenship, and freed slaves following the passage of the Fourteenth Amendment-could apply for a homestead grant of 160 acres of land, provided that they live and make improvements on the land for five years and pay a $10 filing fee.Under the HSA, the bulk of newly surveyed land on the western frontier was reserved for homesteads, although the law did not end sales of public land.The explicit goal of the HSA was to liberalize the homesteading requirements set by the Preemption Act of 1841, which permitted individuals already inhabiting public land to purchase up to 160 acres at $1.25 per acre before the land was put up for sale.The implicit goal was to promote rapid settlement of the western frontier (Allen, 1991), and to reduce the federal government's costs of defending contested frontiers (Frymer, 2014).
In the pre-Reconstruction South, public land was not open to homestead but rather unrestricted cash entry, which permitted the direct sale of public land to private individuals of 80 acres or more for at least $1.25 an acre.The 1866 SHA restricted cash entry and reserved for homesteading over 46 million acres of public land, or about one-third of the total land area in the five southern public land states (Lanza, 1999, p. 13).The Bureau of Land Management (BLM) classified land disposed under the SHA under the same authority as land disposed by the HSA, since the SHA amended the HSA and dictated that public lands be disposed under the stipulations of the HSA (Hoffnagle, 1970).

Public and state-land states
Public land states (PLS) are states that were crafted from the public domain, and where the federal government has the primary authority to distribute public land (Murtazashvili, 2013, p. 4).In the South, these states include Alabama, Arkansas, Florida, Louisiana, and Mississippi.Western PLS include the 25 states that comprise the Midwestern, Southwestern, and Western U.S. (except Hawaii).State-land states, which include the original 13 states, Kentucky, Maine, Tennessee, Texas, Vermont, and West Virginia, were not open to homesteading because the state governments had primary authority to distribute public land.The maps in Figure 1 reflect the impact of these policies, indicating a significant increase in log per-capita cumulative homesteads in Southern and Western PLS by 1900.

Challenges and speculation in homesteading
There were substantial barriers to entry to homesteading, and homesteaders took on enormous risk in the five years required to file a homestead patent.One of the most significant obstacles to entry was the need for capital to build a successful farming operation: contemporary writers estimated that potential homesteaders required $600 to $1000 to start a farm (Deverell, 1988).The high cost associated with starting and maintaining a farm casts doubt on the safety valve hypothesis (Danhof, 1941), and the effectiveness of the land policies such as the HSA as a wealth-building tool was limited by the binding capital constraints faced by small farmers (Gates and Bogue, 1996, p. 35).Poulos et al. (2023) further contributes to this discussion by examining the 1901 Oklahoma Land Lottery, demonstrating gender differences in leveraging lottery wealth for land purchases and homestead patents, with female winners more effectively using lottery wealth to overcome liquidity constraints in entrepreneurial activities.
Homesteading was a risky venture: over the period of 1910 to 1919, out of 604,092 homestead entries in the U.S., totaling over 128 million acres, only 384,954 (63.7%) resulted in   (Long, 1995).White-colored counties have no homestead entries.States bordered in blue ( ) are state land states; yellow ( ) denote western public land states; orange ( ) denote southern public land states.successful patents (Shanks, 2005).At least part of the discrepancy between homestead entries and filings, however, may be explained by fraudulent filings.Speculators and corporations engaged in the practice of paying individual to stake a claim in a homestead, with no intention of completing the patent, in order to extract resources from the land (Gates, 1936).In the South, these "dummy entrymen" were used by timber and mining companies to extract resources while the cash entry restriction of the SHA was in effect.When the restriction was removed, there was no need for fraudulent filings because the larger companies could buy land in unlimited amounts at a nominal price (Gates, 1940(Gates, , 1979)).The same pattern of fraudulent filings existed in the West, where Murtazashvili (2013, p. 216) argues that speculators benefited disproportionately from public land policies because the economic balance of power tilted toward the wealthy.Gates (1942) characterizes western speculators who bought land in bulk prior to the 1889 restriction as being influential in state and local governments, resistant to paying taxes, and opposed to government spending.

Land inequality as a causal mechanism
Inequality is a potential causal mechanism underlying the relationship between homesteads and state size.Median voter-based theories that assume parity in the political influence of voters predict a positive relationship between inequality and the size of governments (Meltzer and Richard, 1981).In settings with high inequality, the median voter is poorer than the average voter, which in turn increases demand for redistribution in majority-rule elections.
However, models that allow for economic differences in the political influence of voters predict a nonlinear or inverse relationship between inequality and government size.In Benabou's (2000) model, for instance, the pivotal voter is wealthier than the median and has the power to block redistribution as inequality increases.But when inequality is too high, the poor can impose redistribution on elites through majority voting (Perotti, 1993;Saint-Paul and Verdier, 1993).In Besley and Persson's (2009) framework, for example, greater economic power of the ruling class reduces government spending and investments in state capacity.Similarly, Galor et al. (2009) propose a model where wealthy landowners block education reforms because education favors industrial labor productivity and decreases the value in farm rents.Inequality in this context can be thought of as a proxy for the amount of de facto political influence elites have to block reforms and limit the size of states (Acemoglu and Robinson, 2008).
To test whether homesteads affected future land inequality in frontier counties, I calculate a commonly-used measure of land inequality based on the Gini coefficient of census farm sizes, adjusted by the ratio of farms to adult males, a measure proposed by Vollrath (2013).Gini-based land inequality measures are commonly used as proxy for the de facto bargaining power of landed elites (e.g., Boix, 2003;Ziblatt, 2008;Ansell and Samuels, 2015).
Figure A1 in the online Appendix plots the results from bivariate regression models of land inequality and state government finances during the period of 1860 to 1950, demonstrating a positive relationship among groups of PLS and state-land states.This associational evidence is consistent with the predictions of the Meltzer and Richard model; however, it contrasts with recent empirical studies that establish a negative relationship based on cross-sectional analyses.Ramcharan (2010), for instance, finds an inverse relationship between land inequality and county-level property tax revenue in 1890.The authors find that the negative relationship is especially large in rural counties, where landownership tends to be more concentrated.Vollrath (2013) establish a negative relationship between land inequality and local property tax revenue in northern rural counties in 1890.The present findings, in contrast, are based on state-level expenditure and revenue panel data.

Matrix completion for counterfactual prediction
This paper applies the matrix completion method proposed by Athey et al. (2021) to predict counterfactual outcomes, and extends the method by propensity-weighting the loss function to correct for imbalances in the factual covariate distributions between treatment and control groups.
Following the notation of Athey and Imbens (2021), let a be a length-N vector, where a i ∈ {1, . . ., T, ∞} indexes the time of initial treatment, and a i = ∞ denotes control units.
If a unit enters treatment during the panel (a i ̸ = ∞), it remains treated for the remainder of the panel.There is a nonzero number of control units, denoting the indicator function, and a nonzero number treated units Let the values of the treatment indicator W it ∈ {0, 1} be W it = 0 for the control units in all time periods and W it = 1 for the treated units when t ≥ a i .Let O denote the set of factual outcome values; i.e., the values for which W it = 0.
Under the Neyman-Rubin potential outcomes framework (Neyman, 1923;Rubin, 1990), for each unit i and time t, there exist potential outcomes Y (a i ) it .The fundamental problem is that we can only observe a single potential outcome for each unit-time observation: Y (a i ) it is observed for treated units when entering treatment, and Y (∞) it is observed for the control units in all time periods.The potential outcomes framework implicitly assumes treatment is well-defined to ensure that each unit has the same number of potential outcomes.It also requires that the potential outcomes of unit i varies with a i but not with the other values of a, which is often referred to as the no interference assumption.
There are two additional assumptions are needed to write potential outcomes as a function of a, which are both made in Athey and Imbens (2021).First, there are no anticipatory effects; i.e., Y (a i ) it = Y (∞) it for all a i > t.This assumption, which is often implicitly made in panel data studies, assumes that if a unit has not yet entered treatment, the initial treatment time has no causal effect on potential outcomes in the current period.
Second, potential outcomes in period t are invariant to how long unit i has been exposed to treatment; i.e., Y (a i ) it = Y (1) it for all a i ≤ t.This assumption does not rule out causal effects of treatment duration on the outcome, but rather rules out causal effects varying by initial treatment time.

Causal estimands
The causal estimand of interest is the average treatment effect on the treated units (ATT) of entering treatment in a ′ i relative to being control (a i = ∞), on the outcome in period t: In the application, I consider a ′ i = min 1≤i≤N T a i , or the year of the earliest homestead entry among the treated units.The ATT averaged over the counterfactual period, which provides a summary measure of the overall treatment effect, is also of interest: (2)

Estimation
In the application, the outcome of interest is state size, measured by state government spending and revenue.The goal is to estimate the potential outcomes under control for the treated units; i.e., the counterfactual state size of treated units had they not been exposed to treatment.I model the outcome under control as: where L it is a typical element in the unknown matrix, L = UV ⊤ , the product of a matrix of factor loadings, U N ×R , and a matrix of factors, V T ×R .While latent factor models assume the rank, or number of unobserved factors R, is fixed, matrix completion methods assume that the rank of L is low relative to N and T .The model includes unit-specific fixed effects, {γ i } N i=1 , and time-specific fixed effects, {δ t } T t=1 , which are meant to capture unobserved confounders not absorbed by the low-rank matrix.The identifying assumption is that the errors ϵ it are conditionally mean zero and independent of a i , for all values of i and t: This assumption rules out correlation between the errors and initial treatment time in any period (Ben-Michael et al., 2019).It is analogous to the strict exogeneity assumption made for the estimation of the ATT using latent factor models (Xu, 2017).
Estimating L involves minimizing the sum of squared errors via nuclear norm regularized least squares: where the nuclear norm, ∥•∥ ⋆ = i σ i (•), or sum of singular values, is used to yield a low-rank solution for L. The value of the hyperparameter λ L is chosen among 30 possible values by five-fold cross-validation, where in each fold, 80% of the entries in O are randomly selected to be used for training, while the remaining 20% of entries are used for model validation.The model with λ L values that yield the lowest root mean squared error averaged over the validation sets is then fit using all entries in O.
To quantify the propensity score, w it , I model the probability of treatment as: where X ip is a typical element in a matrix of p covariates measured prior to a ′ i .In the application, the covariates include state-level measures of racial composition, prevalence of pre-emancipation slavery, average farm sizes and values, and railroad access.I estimate the treatment model by multivariate lasso logistic regression (Friedman et al., 2010).
The squared loss in Eq. ( 5) is weighted by estimated propensity scores, w it , to place more emphasis on the loss for the values in O most similar to the counterfactual values in terms of pre-treatment outcomes and covariates.Consistent estimation of the ATT (1) relies on the correct specification of the outcome model ( 3) under the assumption of exogeneity (4).It does not rely on the correct specification of the treatment model ( 6); although, the propensity scores from estimating this model are intended to help balance treated and control units in terms of the pre-treatment outcomes and covariates when fitting the matrix completion model on the factual data.
The algorithm for solving Eq. ( 5) iteratively replaces missing values with those recovered from a singular value decomposition of the matrix (Mazumder et al., 2010).Once L, γ, and δ have been estimated, we can predict the counterfactual values for the treated units in the post-treatment period by

Simulation studies
In simulation studies described in Section A2 in the online Appendix, I conduct two sets of simulation studies to assess the performance of the matrix completion estimator: the first on generated data in which we control the ground-truth treatment effects; and the second on empirical data, where the focus is on time periods where no treatment effects are expected.The comparison estimators are evaluated on their ability to recover the groundtruth ATT averaged over the counterfactual period, τ ∞a ′ i (2).The comparison estimators include two versions of the matrix completion estimator: with (MC-W) and without (MC) a treatment propensity-weighted loss function.The DID estimator is a regression of outcomes on treatment and unit and time fixed effects.The SCM is a regression of the pre-treatment outcomes of each treated unit on the control unit outcomes during the same periods, with the restrictions of no intercept and non-negative regression weights that sum to one.The SCM with lasso (SCM-L1) relaxes the zero-intercept and weight restrictions and estimates the counterfactual outcomes for each treated unit by lasso regression.I provide the exact form of these estimators in Section A3.

Generated data
Figure 2 provides box and whisker plots summarizing the median, the first and third quartiles, and outlying points of the distribution of the absolute bias -i.e., absolute difference between τ ∞a ′ i and the actual τ ∞a ′ i -and the variance of 399 block bootstrap replicates of τ ∞a ′ i for the first set of simulations on generated data.The absolute bias and bootstrap variance increase across all estimators as the rank of L it increases, which underscores the importance of the low-rank assumption.The matrix completion estimators and the SCM estimator yield the lowest absolute bias and bootstrap variance, regardless of rank, whereas the DID and SCM-L1 estimators struggle with higher bias and variance.
When the rank is high relative to N and T , MC-W exhibits lower absolute bias and bootstrap variance relative to the unweighted MC estimator, suggesting that propensity score weighting may mitigate the impact of increased model complexity on estimator bias and variance.

Empirical data
In the second set of simulations focusing on the empirical data, I leverage the fact that the true treatment effect is null in the pre-treatment period.I first discard the post-treatment data, and for each of 1000 simulation runs, randomly select half of the control units to be treated and impute their missing values following a placebo treatment time randomly chosen from {a ′ i , . . ., T }, varying a ′ i .
In simulation studies on the state government finance datasets, described in Section 4, the matrix completion estimators generally maintain lower absolute bias than the DID estimator while exhibiting higher bias when compared to SCM estimators, across all placebo a ′ i ratios for both expenditure and revenue datasets (Figure A2a).In terms of bootstrap variance, matrix completion estimators demonstrate results on par with synthetic control estimators and markedly outperform the DID estimator (Figure A2b).There is not much efficiency gain from propensity-weighting the matrix completion estimator in the empirical data simulation studies because, unlike the generated data simulations, treatment is assigned at random, rather than as a function of covariates.
In each of three datasets common to the synthetic control literature, the matrix completion estimators outperform DID and the SCM estimators by minimizing absolute bias (Figure A3) and bootstrap variance (Figure A4) across the different ratios of the placebo a ′ i to T .Together, the simulation results support the preferential use of the MC-W estimator in the application.

Application: Homestead policy and state size
In order to estimate causal impacts of homestead policies on state size, I create measures of total expenditure and revenue collected from the records of 48 state governments during the period of 1789 to 1932 (Sylla et al., 1993), 16 state governments during the period of 1933 to 1937 (Sylla et al., 1995a,b) according to the U.S. Consumer Price Index (Williamson, 2017) and scaled by the total free adult male population in the decennial census (Haines, 2010).I impute the outcome values that are missing due to lack of data collection using multiple imputation by chained equations (MICE, Buuren and Groothuis-Oudshoorn, 2010).Figure A5 visualizes the extent of the missing data in the entire dataset by state and treatment group, where 40% of values in the dataset are missing (29.9% and 10.1% missing in the control and treated groups, respectively).The majority of the outcome data for treated states prior to the treatment time were missing and have been imputed.To address the concern that the choice of imputation method can influence the estimated treatment effects, Table A1 evaluates the sensitivity of the causal estimates to alternative imputation methods.Lastly, I log-transform the data to alleviate exponential effects.
The staggered treatment implementation setting is appropriate in this application because a i varies across states that were exposed to homesteads following the passage of the HSA.I aggregate to the state level approximately 1.46 million individual land patent records authorized under the HSA.Using these records, which are made available by the BLM (General Land Office, 2017), I determine that the earliest homestead entries occurred in 1869 in about half of the western frontier states, about seven years following the enactment of the HSA.In 1872, the first homesteads were filed in southern PLS. Figure A6 shows how each state is categorized in the empirical analysis, as a PLS (treated group) or state-land state (control group), as well as the year of the earliest initial homestead entry for the PLS, which informs staggered treatment implementation.
I include the following covariates in the conditioning set of the treatment model ( 6): percapita spending or revenue prior to 1869; the ratio of slaves to the total population in 1860; and the ratio of free African-Americans, Native Americans, or Whites to the total non-slave population in 1860; average farm sizes in 1860 and average farm values in 1850 and 1860 (Haines, 2010); and the state-level share of total miles of operational railroad track per square mile, which I calculate by overlaying the railroad track map over historical county borders (Atack, 2013).These pre-treatment covariates control for selective migration to more agriculturally productive land, and for differences in the accessibility and availability of frontier lands.Bustos (2017, p. 45) finds that the prevalence of slavery in 1860 is an important predictor of available homestead lands, and reasons that the covariate acts as a proxy for the presence of large plantations.

Accounting for bias
Potential sources of bias include violations of the assumptions of exogeneity ( 4), no interference, no-anticipation, or invariance to treatment history.The exogeniety assumption would be violated if the error term in the outcome model ( 3) is correlated with the initial treatment time.While this assumption is not directly testable, the no-treatment evaluation on pre-treatment data reported in Section 5.2 provide indirect evidence that the exogeneity assumption is not violated.Additionally, the simulation results on the state government finances datasets reported in Section 3.4 demonstrate that propensity-weighting the loss function improves the consistency of the matrix completion estimator.
A second potential source of bias arises from interference, or the assumption that control units are unaffected by the effects of treatment.While the no interference assumption cannot directly be tested, it is likely in the present application that the outcomes of state-land states were indirectly affected by the out-migration of homesteaders from frontier states.
When assuming the absence of interference, the use of indirectly affected states as control units would underestimate treatment effects because it would make the counterfactual and factual treated unit observations in the post-treatment period more similar.Interference might also arise if state-land state governments increase public investments in order to dissuade workers from migrating to the frontier in the first place.The historical evidence, however, suggests that labor-scarce frontier states were more strongly motivated to attract migrants and stimulate population growth than long-settled state-land states (Engerman and Sokoloff, 2005).For example, the adoption of compulsory primary education laws and support for public education in general in western states has been considered as a means to attract potential migrants to the frontier (Meyer et al., 1979;Bandiera et al., 2018).
Interference arising from competition among state governments would also underestimate the effect of treatment.
A third potential source of bias arises in violations of the no-anticipation or invariance to treatment history assumptions.The no-anticipation assumption would be violated if there were anticipatory effects on the size of frontier state governments prior to the initial homestead entries.Anticipatory effects are plausible since the first homestead entries occurred in 1869 in western PLS, six years after the HSA went into effect.In Section 5.2, I conduct a no-treatment evaluation on the pre-treatment data and vary the placebo initial treatment year.The estimated placebo ATT is nonsignificant for most settings, which is direct evidence of no anticipatory effects.The invariance to treatment history assumption rules out variation in treatment effects by the initial treatment time, but does not rule out causal effects of treatment duration.In Section 5.1, I explore whether causal effects on state size differ with respect to year of initial homestead entry.
Lastly, bias may result from misspecification of imputation models.The imputation procedure assumes that after controlling for the available state government finances data, the missing values are Missing At Random (MAR).There are reasons to believe the data are not MAR, which could result in biased estimates.For example, the timing of a state's admission to the Union, which affects the extent of its missing values, may be determined by unobserved political and demographic variables rather than meeting a population threshold.
It is impossible to distinguish whether data are MAR or missing based on unobserved variables, given the observed data (Sterne et al., 2009).In Section 5, the sensitivity of causal estimates to two alternative imputation methods is evaluated, indicating that the choice of imputation method alters the conclusions in one out of the two scenarios examined, as detailed in Table A1.

Matrix completion estimates
I estimate the causal impacts of the initial treatment year on the state government finances of the treated units (i.e., PLS).Specifically, I fit the MC-W estimator on the entirety of factual outcomes to recover the counterfactual outcomes of the treated units had they not been exposed to treatment.The top panel of Figure 3 compares the average log percapita state government expenditure of treated units and control units along with the predicted average expenditure of treated units.The dashed vertical line represents the year of the earliest homestead entry, a ′ i = min 1≤i≤N T a i = 1869.The treated unit outcomes are generally higher than those of the control units in the pre-treatment period, whereas there is little difference between the treated and control unit outcomes, on average, in the post-treatment period.
The difference between the observed and predicted treated unit outcomes, which is the quantity τ t,∞a ′ i described in Eq. ( 1), corresponds to the estimated per-period ATT.These per-period average causal impacts are plotted in the bottom panel, with 95% normal interval confidence intervals estimated by calculating the standard error of the distribution of block bootstrap replicates of τ t,∞a ′ i .The bootstrap replicates are constructed by block resampling the columns (i.e, time dimension) of the observed outcomes, in order to preserve temporal dependence structure of the original data (Davison and Hinkley, 1997;Politis and White, 2004), and obtaining a set of point estimates from 999 resamples.The estimated per-period effects for both outcomes are essentially zero during the pre-treatment period and within the bounds of the bootstrap confidence intervals, which demonstrates that the model is closely fitting the pre-treatment period observations.By 1876, after most PLS had been exposed to homesteads, homestead exposure decreases per-capita state government expenditure by about 0.13 log points, and the trajectory of estimated causal impacts remains negative for the rest of the time-series, although the confidence intervals for the per-period effects all contain zero.Similar patterns are observed when the outcome is per-capita state government revenue (Figure A7)., factual control; , counterfactual treated; , τ t,∞a ′ i .
To infer the overall effect of treatment, I estimate the ATT averaged over the counterfactual period of 1869 to 2008, and report the point estimates and bootstrap standard errors in the second and third columns of Table 1.The estimates reveal that per-capita state government expenditure would have been 0.84 [0.61, 1.08] log points lower had the PLS never been exposed to homesteads.Relative to the observed log per-capita state ex-penditure of the PLS in the same period, the point estimate represents a decrease of 0.22%.
The estimated ATT on per-capita state government revenue is similar.
I compare the MC-W estimates with DID and SCM estimates, also reported in the second and third columns of Table 1. Figure A8 illustrates the parallel trends assumption in DID analysis by depicting the log per-capita state government expenditures and revenues over time for both treated and control groups, showing similar trajectories up to the treatment year of 1869, which supports the validity of the DID approach.The point estimates from the binary DID estimator are slightly larger and within the confidence intervals of the MC-W estimates.The ATT estimates from the SCM estimator are positive, but not statistically significant.
Table A1 presents counterfactual period estimates on differently imputed datasets.
When estimated on data with missing outcome values imputed by MICE with classification and regression trees (CART) as the imputation method, rather than predictive mean matching (the default method), the conclusions drawn from the estimates do not change.However, when estimating on data with missing values imputed by an expectationmaximization (EM) algorithm based method, the MC-W estimates are much larger in magnitude and no longer statistically significant.In interpreting the results presented in Table A1, it is important to reflect on the implications of imputing a majority of the outcomes in treated states prior to the treatment period.The differences in ATT estimates, especially under the EM imputation method, underscore the sensitivity of our results to the imputation of missing data.

Treatment effect heterogeneity
Recall that under staggered treatment implementation, the time of initial treatment a i varies across states that were exposed to homesteads, and that the year of the earliest homestead entry among the treated units a ′ i is used to calculate the ATT.Also recall that the SHA opened land for homesteading in the South under the same stipulations as the HSA, which opened land for homesteading in the western frontier.The results above set a ′ i at 1869, which is the earliest homestead entry that occurred in the western PLS.Among southern PLS, the earliest homestead entry occurred in 1872.Thus, there is a substantive interest in determining whether there is a differential effect of the year of initial homestead entry on state size based on region.Conducting a sub-group analysis by region also allows us to detect potential violations in the assumption of invariance to treatment history, since most of the western PLS are treated for a longer period than the southern PLS.
The last four columns of Table 1 decomposes the counterfactual period estimates by calculating the ATT with respect to the region of the PLS.The MC-W estimates show that the main effect on all of the PLS (second and third columns) is driven mainly by the effect on the Western PLS.The estimated effect size on the southern PLS are comparatively smaller in magnitude and significant for the effect on state government expenditure, -0.63 [-0.94, -0.33], but not on revenue, -0.42 [-0.73, -0.12].These results provide indirect evidence that the assumption of invariance to treatment history is not violated since the conclusions drawn from the main estimates are generally unchanged.

No-treatment evaluation
To assess whether the estimated effects are attributable to the year of initial homestead entry rather than other policy changes or spurious errors during the same period, I conduct a no-treatment evaluation by discarding the post-treatment period observations from the state government finances data and re-running the analysis on the pre-treatment data, when no treatment effect is expected (i.e., the ATT is zero).
Across all estimators, the standard error decreases with a larger ∆, reflecting the uncertainty of estimating causal effects in shorter (placebo) post-treatment periods.Compared to the binary DID and SCM estimators, MC-W exhibits lower standard errors in all settings and lower bias in three of the six settings.The placebo ATT estimates from the MC-W estimator is significant only when the outcome is state government expenditure and ∆ is 10 or 25.Similar patterns are observed when conducting no-treatment evaluations on differently imputed datasets (Table A2).These placebo results bolster the usage of the MC-W estimator in the application, and provide evidence supporting the plausibility of the exogeneity and no-anticipation assumptions.

Continuous DID estimation
The matrix completion approach estimates the impact of a binary exposure to treatment on a continuous outcome.In the application, however, a continuous form of treatment is available in the number of homestead patents.The model below is a continuous version of the DID estimator described in Section A3.1, where the first difference comes from variation in the date of initial exposure to homesteads, and the second difference comes from variation in the intensity of homestead entries: The model includes state and year fixed effects, {ξ i } N i=1 and {ψ t } T t=1 , respectively.The covariate X ip controls for average farm sizes and values, and railroad access, when the outcome Y it is log per-capita state government spending or revenue; X ip controls for average farm values when Y it is land inequality.The continuous treatment variable H it measures the log of the per-capita number of patents issued under the HSA in state i and year t.The coefficient of interest corresponds to the interaction term, ϕ, which represents the average causal effect of exposure to homesteads.A least squares estimator for ϕ is given by arg min

Estimates on state size and land inequality
Table 3 reports DID estimates of the average causal effect of exposure to log per-capita homestead patents, with 95% confidence intervals constructed using 999 state-stratified bootstrap samples.The estimates indicate that a 1% increase in log per-capita homesteads decreases log per-capita state government spending or revenue by about 4%.The point estimates are considerably smaller in magnitude -albeit, in the same direction -as the per-period MC-W estimates presented in Section 5.The bootstrap confidence intervals around the DID estimates are considerably more narrow than those for the MC-W estimates displayed in the bottom panels of Figures 1 and A7, and are potentially overoptimistic due to serial correlation in the DID regression errors (Bertrand et al., 2004).The estimates on state size are insensitive to the method used for imputing expenditure or revenue values that are missing due to nonresponse (Table A3).The third column of Table 3 presents DID estimates of the impact of log per-capita homesteads on land inequality at the state-level during the period of 1870 to 1950.Since land inequality is measured every decennial, I aggregate homesteads to the next decennial year; e.g., the number of homesteads measured in 1880 is the total for the years 1871 to 1880.Average farm values are included in the regression as a proxy for agricultural productivity, which might be associated with farm sizes approaching ideal scale and therefore land inequality.I estimate that homesteads significantly decreased land inequality in frontier states: a 1% increase in log per-capita homesteads lowers state-level land inequality by about 10 −5 points.
The direction of the point estimate is consistent with the study of Bustos (2017, p. 3), who conducts a county-level DID analysis and shows treatment based on terciles of Homestead Act acres reduced land inequality measured by the Gini coefficient over a similar post-treatment period, although the magnitude of the estimate in the present work is substantially smaller.The comparatively small coefficient implies that homestead policies did not fundamentally alter the long-run distribution of landownership, which may be explained by qualitative evidence that suggests homestead policies were exploited by land speculators and natural resource companies and that the rents from public land were appropriated by the private sector.

Conclusion
The findings of this paper signify that mid-nineteenth century homestead policies had longlasting impacts that can potentially explain contemporary differences in state government size.Estimates using matrix completion with a binary treatment and DID with continuous treatment evidence that homestead policies had significant and negative impacts on state government expenditure and revenue that lasted a century following their implementation.
This finding is in line with recent work documenting the adverse impact of homestead policies on the economic development of regions exposed to homesteading.
I explore land inequality as a possible causal mechanism underlying the relationship between homestead policies and state size, which is closely related to state capacity.First, I provide evidence of a positive relationship between land inequality and state government finances and that the slope of correlation increases at higher levels of inequality.
Nonlinearities in the relationship between inequality and state capacity can arise in theoretical models that incorporate economic differences in political influence: greater income inequality reduces government spending and investments in state capacity when elites have a monopoly on political power, however when inequality gets too high, the poor can impose redistribution through majority voting.Second, I present continuous DID estimates that reveal per-capita homesteads significantly lowered land inequality in frontier states; although, the magnitude of the effect is negligible.This finding is in line with previous empirical work showing that exposure to homesteads decreased land inequality.The failure to fundamentally alter the long-run distribution of landownership may be explained by qualitative evidence that suggests homestead policies were de facto corporate welfarism often exploited by land speculators and corporations to amass land and resources during early capitalist expansion.
This paper makes a methodological contribution by extending the matrix completion method for causal effect estimation in panel data with staggered treatment adoption to allow for propensity score weighting of the loss function.The matrix completion estimator with propensity score weighting outperforms regression-based estimators such as the synthetic control method and difference-in-differences in simulation studies and a no-treatment evaluation.This methodological contribution holds implications for policy evaluation, offering a more accurate tool for understanding the effects of policies over time and place.

Funding details
This work was supported by the National Science Foundation under Grants DGE 1106400 and TG-SES 180010.

Supplemental online material
The  A2. Simulations

A2.1. Generated data
In the first set of simulations, I generate potential outcomes under control according to the outcome model (3).For each of 100 trial runs, I generate the low-rank matrix L it as the product of factors V tr and factor loadings U ir .The factors are drawn independently from a multivariate normal distribution with means and variances of 1, and covariances of 0.2; the factor loadings are simulated from a first-order autoregressive model with slope coefficient ϕ = 0.3.I generate ground-truth propensity scores according to the treatment model ( 6), where the simulated covariate is also generated using a first-order autoregressive model with ϕ = 0.3.I focus on the case of a square matrices, N × T = 60 × 60.For each trial run, I sample N T = 30 treated units, and use the ground-truth propensity scores as sampling weights.Each treated unit is assigned an initial treatment period, with the first treatment time set to a ′ i = 3.
A2.2.Empirical data A2.2.1.State government finances datasets I evaluate the performance of the matrix completion estimator on the state government expenditure and revenue datasets described in Section 4 of the main paper, discarding the treated units and using the control units (N = 18, T = 203).Figures A2a and A2b present box and whisker plots summarizing the absolute bias and variance, respectively, across 1000 simulation runs on the state government finances datasets.The x axis in each figure is the ratio of the placebo initial treatment time to the number of periods in the placebo data, so higher values represent more training data.

A2.2.2. Synthetic control datasets
Next, I evaluate the performance of the matrix completion estimator on three datasets common to the synthetic control literature, with the actual treated unit removed from each dataset.The three synthetic control datasets originate from Abadie and Gardeazabal's [2003]

A3. Benchmark estimators
The following estimators are used for comparison in the no treatment evaluation (Section 2) and main estimates (Section 5).
A3.1.Difference-in-differences (DID) The DID model with binary treatment is specified by Athey and Imbens (2021).The outcome under control is modeled as: where {ξ i } N i=1 are unit-specific fixed effects and {ψ t } T t=1 are time-specific fixed effects.The model is estimated by least squares: (2) I use the implementation of DID in the MCPanel R package (Athey et al., 2017).

A3.2. Matrix completion
Matrix completion methods attempt to impute missing values by solving a convex optimization problem via nuclear norm minimization, even when relatively few values are observed in the data matrix (Candès and Recht, 2009;Candes and Plan, 2010;Mazumder et al., 2010;Recht, 2011).I implement two versions of matrix completion estimated by via nuclear norm regularized least squares (Athey et al., 2021): with (MC-W) and without (MC) an propensity-weighted loss function.The MC-W outcome model and estimation equation is specified in Eqs.
(3) and ( 5) of the main paper, respectively.The MC outcome model is the same Eq.(3) of the main paper, while the estimation equation does not weight the loss function: Both versions are implemented using an extended version of the MCPanel package that includes an option for propensity-weighting the loss function.1

A3.3. Synthetic control method (SCM)
The synthetic control method (SCM, Abadie et al., 2010) compares a single treated unit with a synthetic control that combines the outcomes of multiple control units on the basis of their pre-intervention similarity with the treated unit.Doudchenko and Imbens (2016) and Athey et al. (2021) show that the SCM can be interpreted as regressing the pre-treatment outcomes of a single treated unit on the control unit outcomes during the same periods.The parameters estimated on the controls are then used to predict the counterfactual outcomes for a single treated unit, i = 0: where ω = arg min A separate model is subsequently fit for each i, . . ., N t treated units.Note that Eq. ( 4) imposes the restrictions of the original SCM, namely zero intercept and non-negative regression weights that sum to one.I use the MCPanel implementation with default settings, except I bound gradient values within [-5,5] in order to facilitate convergence.

A3.4. SCM with lasso (SCM-L1)
The SCM with lasso (SCM-L1, Doudchenko and Imbens, 2016;Athey et al., 2021) relaxes the zero-intercept and weight restrictions of Eq. ( 4).The counterfactual outcomes for treated unit i = 0 is: where a separate model is fit for each i, . . ., N t treated units.Intuitively, the generalized SCM is a convex combination of control units with intercept µ and weight ω i for control units i, . . ., N. The model is fit with N + 1 predictors, including the number of control units and the intercept, and α ′ , . . ., T observations.Eq. ( 5) is estimated by lasso regression (Tibshirani, 1996;Tibshirani et al., 2012) in order to reduce the relative size of the predictor set.I use the MCPanel implementation with the strength of the lasso penalty selected among 30 possible values by 5-fold crossvalidation.

A4. Empirical application: tables & figures
• Figure A6 shows how each state is categorized in the empirical analysis, and the year of the earliest initial homestead entry for the treated states.
• Figure A5 visualizes the extent of the missing values in the state government finances datasets.
• Figure A1 plots bivariate regression estimates of the relationship between land inequality and state government finances.
• Figure A7 plots matrix completion estimates of treatment exposure on state government revenue.
• Figure A8 visualizes the average outcomes of treated and control groups for the purpose of assessing the parallel trends assumption for DID estimation with a binary treatment variable.
• In the no treatment evaluation and in the main analysis, the missing values are imputed by multivariate imputation by chained equations (MICE, Azur et al., 2011) with predictive mean matching as the imputation method, implemented using the mice R package (Buuren and Groothuis-Oudshoorn, 2010).Table A1 reports the results of sensitivity analyses on differently imputed datasets using the following two alternative imputation methods: MICE-CART MICE with classification and regression trees (CART) from the rpart package (Therneau and Atkinson, 2022) as the imputation method; EM An expectation-maximization (EM) algorithm based method for imputing missing values in multivariate normal time series, implemented using the mtsdi package with default settings (Therneau and Atkinson, 2022).
In order to avoid train-test set contamination, each imputation method is fit only on the outcome values of the control units.Table A2 reports the estimated ATT on differently imputed placebo datasets, and Table A3 reports treatment effect estimates using the continuous DID model (Section 6) on differently imputed datasets.

Figure 1 :
Figure 1: Log per-capita cumulative homesteads in 1870 and 1900, overlaid on 1911 county borders(Long, 1995).White-colored counties have no homestead entries.States bordered in blue ( ) are state land states; yellow ( ) denote western public land states; orange ( ) denote southern public land states.

Figure 2 :
Figure 2: Absolute bias and bootstrap variance in generated data, varying the rank of L it .Estimator: DID; MC; MC-W; SCM; , SCM-L1.
, and U.S. Census special reports for the period of 1902 to 2008, covering 48 states (Haines, 2010; U.S. Census Bureau, 2010).The expenditure measure includes state government spending on education, social welfare programs, and transportation.The revenue measure incorporates state government income streams such as tax revenue and non-tax revenue such as land sales.The expenditure and revenue data pre-processing steps are as follows.Removing years with zero or near-zero variance results in outcome matrices consisting of T = 203 observations for N = 48 states, 30 of which are treated.The outcome data are inflation-adjusted

Figure 3 :
Figure 3: Matrix completion estimates of the effect of the year of initial homestead entry (1869; dashed vertical line) on state government expenditure, 1809 to 1982: , factual treated;, factual control; , counterfactual treated; , τ t,∞a ′ i .
online Appendix includes simulation results, and describes model specifications and implementation details for each of the comparison estimators used in the simulations.It includes descriptive figures on the extent of missing data in the state government finances datasets, and reports the results of sensitivity analyses on differently imputed datasets.It also includes figures for matrix completion estimates of treatment exposure on state government revenue, a diagnostic plot for the DID parallel trends assumption, and bivariate regression estimates of the relationship between land inequality and state government finances.Online Appendix for State-Building through Public Land Disposal?An Application of Matrix Completion for Counterfactual Prediction A1.Empirical application: Descriptive statistics

Figure A1 :
Figure A1: Land inequality (lagged by 10 years) vs. log per-capita state government revenue and expenditure, 1860-1950.Each point is a state-year observation.Lines represent generalized additive model fits to the data for the two outcomes and shaded regions represent corresponding 95% confidence intervals.The model is fit separately on control states (i.e., state-land states) and treated states (i.e., PLS).
study of the economic impact of terrorism in the Basque Country during the late 1960s (N = 16, T = 43); Abadie et al.'s [2010] study of the effects of a large-scale tobacco control program implemented in California in 1988 (N = 38, T = 31); and Abadie et al.'s [2015] study of the economic impact of the 1990 German reunification on West Germany (N = 16, T = 44).Figure A3 provides box and whisker plots of the absolute bias for each estimator on the three synthetic control datasets.

Figure A2 :
Figure A2: Absolute bias and bootstrap variance for state government finances datasets, varying the placebo a ′ i .

Figure A3 :
Figure A3: Absolute bias for synthetic control datasets, varying the placebo a ′ i .

Figure A4 :
Figure A4: Bootstrap variance for synthetic control datasets, varying the placebo a ′ i .

Figure A5 :
Figure A5: State-year observations in the state government finances datasets that are missing ( ) or present ( ).

Figure A6 :
Figure A6: Treatment status by state and year corresponding to the state government finances datasets.'Control' are state-land states ( ) and 'treated' are public land states, before ( ) and after ( ) treatment.

Figure A7 :
Figure A7: Matrix completion estimates of the effect of the year of initial homestead entry (1869; dashed vertical line) on state government revenue, 1809 to 1982: , factual treated;, factual control; , counterfactual treated; , τ t,∞a ′ i .

( a )
State government expenditure.(b) State government revenues.

Figure A8 :
Figure A8: Visually assessing the parallel trends assumption for DID estimation with a binary treatment variable: , factual treated; , factual control.The dashed vertical line represents the earliest treatment year, 1869.

Table 1 :
Estimates of the ATT averaged over the counterfactual period of 1869 to 2008 and bootstrap standard errors (in parentheses).

Table 2 :
Placebo ATT estimates and bootstrap standard errors (in parentheses).

Table 3 :
Continuous DID estimates of the effect of (log) per-capita homestead patents on state size or land inequality.

Table A1 :
Estimates of the ATT averaged over the counterfactual period (2) and bootstrap standard errors (in parentheses) on differently imputed datasets.

Table A2 :
Placebo ATT estimates and bootstrap standard errors (in parentheses) on differently imputed datasets.

Table A3 :
Continuous DID estimates of the effect of (log) per-capita homestead patents on differently imputed state government finance datasets.