Skip to Main Content
6,020
Views
499
CrossRef citations to date
Altmetric

Original Articles

Using Heteroscedasticity to Identify and Estimate Mismeasured and Endogenous Regressor Models

Pages 67-80
Received 01 Apr 2010
Accepted 01 Dec 2010
Published online: 22 Feb 2012

This article proposes a new method of obtaining identification in mismeasured regressor models, triangular systems, and simultaneous equation systems. The method may be used in applications where other sources of identification, such as instrumental variables or repeated measurements, are not available. Associated estimators take the form of two-stage least squares or generalized method of moments. Identification comes from a heteroscedastic covariance restriction that is shown to be a feature of many models of endogeneity or mismeasurement. Identification is also obtained for semiparametric partly linear models, and associated estimators are provided. Set identification bounds are derived for cases where point-identifying assumptions fail to hold. An empirical application estimating Engel curves is provided.

1. INTRODUCTION

This article provides a new method of identifying structural parameters in models with endogenous or mismeasured regressors. The method may be used in applications where other sources of identification, such as instrumental variables, repeated measurements, or validation studies, are not available. The identification comes from having regressors uncorrelated with the product of heteroscedastic errors, which is shown to be a feature of many models in which error correlations are due to an unobserved common factor, such as unobserved ability in returns to schooling models, or the measurement error in mismeasured regressor models. Even when this main identifying assumption does not hold, it is still possible to obtain set identification, specifically bounds on the parameters of interest.

For the main model, estimators take the form of modified two-stage least squares or generalized method of moments (GMM). Identification of semiparametric partly linear triangular and simultaneous systems is also considered. In an empirical application, this article's methodology is applied to deal with measurement error in total expenditures, resulting in Engel curve estimates that are similar to those obtained using a more standard instrument. A literature review shows similarly satisfactory empirical results obtained by other researchers using this article's methodology, based on earlier working article versions of this article.

Let Y 1 and Y 2 be observed endogenous variables, let X be a vector of observed exogenous regressors, and let ϵ=(ϵ1, ϵ2) be unobserved errors. For now, consider structural models of the form: Later, the identification results will be extended to cases where X′β1 and X′β2 are replaced by unknown functions of X.

This system of equations is triangular when γ2=0, otherwise it is fully simultaneous (if it is known that γ1=0, then renumber the equations to set γ2=0). The errors ϵ1 and ϵ2 may be correlated with each other.

Assume EX)=0, which is the standard minimal regression assumption for the exogenous regressors X. This permits identification of the reduced form, but is of course not sufficient to identify the structural model coefficients. Typically, structural model identification is obtained by imposing equality constraints on some coefficients, such as assuming that some elements of β1 or β2 are zero, which is equivalent to assuming the availability of instruments. This article instead obtains identification by restricting correlations of ϵϵ′ with X. The resulting identification is based on higher moments and so is likely to provide less reliable estimates than identification based on standard exclusion restrictions, but may be useful in applications where traditional instruments are not available or could be used along with traditional instruments to increase efficiency.

Restricting correlations of ϵϵ′ with X does not automatically provide identification. In particular, the structural model parameters remain unidentified under the standard homoskedasticity assumption that E(ϵϵ′∣X) is constant, and more generally, are not identified when ϵ and X are independent.

However, what this article shows is that the model parameters may be identified given some heteroscedasticity. In particular, identification is obtained by assuming that cov(X, ϵ2 j )≠0 for j=2 in a triangular system (or for both j=1 and j=2 in a fully simultaneous system) and assuming that cov(Z, ϵ1ϵ2)=0 for an observed Z, where Z can be a subset of X. If cov(Z, ϵ1ϵ2)≠0, then set identification, specifically bounds on parameters, can still be obtained as long as this covariance is not too large.

The remainder of this section provides examples of models where these identifying assumptions hold and comparisons to related results in the literature.

1.1 Independent Errors

For the simplest possible motivating example, let Equations (1) and (2) hold. Suppose ϵ1 and ϵ2 have the standard model error property of being mean zero and are conditionally independent of each other, so ϵ1 ⊥ ϵ2Z and E1)=0. It would then follow immediately that the key identifying assumption cov(Z, ϵ1ϵ2)=0 holds, because then E1ϵ2 Z)=E1)E2 Z)=0. This, along with ordinary heteroscedasticity of the errors ϵ1 and ϵ2, then suffices for identification.

More generally, independence or uncorrelatedness of ϵ1 and ϵ2 is not required, for example, it is shown below that the identifying assumptions still hold if ϵ1 and ϵ2 are correlated with each other through a factor structure, and they hold in a classical measurement error framework.

1.2 Classical Measurement Error

Consider a standard linear regression model with a classically mismeasured regressor. Suppose we do not have an outside instrument that correlates with the mismeasured regressor, which is the usual method of identifying this model. It is shown here that we can identify the coefficients in this model just based on heteroscedasticity. The only nonstandard assumption that will be needed for identification is the assumption that the errors in a linear projection of the mismeasured regressor on the other regressors be heteroscedastic, which is more plausible than homoskedasticity in most applications.

The goal is estimation of the coefficients β1 and γ1 in where the regression error V 1 is mean zero and independent of the covariates X, Y*2. However, the scalar regressor Y*2 is mismeasured, and we instead observe Y 2, where Here, U is classical measurement error, so U is mean zero and independent of the true model components X, Y*2, and V 1, or equivalently, independent of X, Y*2, and Y 1. So far, all of these assumptions are exactly those of the classical linear regression mismeasured regressor model.

Define V 2 as the residual from a linear projection of Y*2 on X, so by construction Substituting out the unobservable Y*2 yields the familiar triangular system associated with measurement error models where the Y 1 equation is the structural equation to be estimated, the Y 2 equation is the instrument equation, and ϵ1 and ϵ2 are unobserved errors.

The standard way to obtain identification in this model is by an exclusion restriction, that is, by assuming that one or more elements of β1 equal zero and that the corresponding elements of β2 are nonzero. The corresponding elements of X are then instruments, and the model is estimated by linear two-stage least squares, with Y 2=X′β22 being the first-stage regression and the second stage is the regression of Y 1 on and the subset of X that has nonzero coefficients.

Assume now that we have no exclusion restriction and hence no instrument, so there is no covariate that affects Y 2 without also affecting Y 1. In that case, the structural model coefficients cannot be identified in the usual way and so, for example, are not identified when U, V 1, and V 2 are jointly normal and independent of X.

However, in this mismeasured regressor model, there is no reason to believe that V 2, the error in the Y 2 equation, would be independent of X, because the Y 2 equation (what would be the first-stage regression in two-stage least squares) is just the linear projection of Y 2 on X, not a structural model motivated by any economic theory.

The perhaps surprising result, which follows from Theorem 1 below, is that if V 2 is heteroscedastic (and hence not independent of X, as expected), then the structural model coefficients in this model are identified and can be easily estimated. The above assumptions yield a triangular model with E(Xϵ)=0, cov(X, ϵ2 2)≠0, and cov(X, ϵ1ϵ2)=0 and hence satisfy this article's required conditions for identification.

The classical measurement error assumptions are used here by way of illustration. They are much stronger than necessary to apply this article's methodology. For example, identification is still possible when the measurement error U is correlated with some of the elements X and the error independence assumptions given above can be relaxed to restrictions on just a few low-order moments.

1.3 Unobserved Single-Factor Models

A general class of models that satisfy this article's assumptions are systems in which the correlation of errors across equations are due to the presence of an unobserved common factor U, that is: where U, V 1, and V 2 are unobserved variables that are uncorrelated with X and are conditionally uncorrelated with each other, conditioning on X. Here, V 1 and V 2 are idiosyncratic errors in the equations for Y 1 and Y 2, respectively, while U is an omitted variable or other unobserved factor that may directly influence both Y 1 and Y 2.

Examples:

Measurement Error: The mismeasured regressor model described above yields Equation (3) with α1=−γ1 and Equation (4) with γ2=0 and α2=1. The unobserved common factor U is the measurement error in Y 2.

Supply and Demand: Equations (3) and (4) are supply and (inverse) demand functions, with Y 1 being quantity and Y 2 price. V 1 and V 2 are unobservables that only affect supply and demand, respectively, while U denotes an unobserved factor that affects both sides of the market, such as the price of an imperfect substitute.

Returns to Schooling: Equations (3) and (4) with γ2=0 are models of wages Y 1 and schooling Y 2, with U representing an individual's unobserved ability or drive (or more precisely, the residual after projecting unobserved ability on X), which affects both her schooling and her productivity (Heckman 1974, 1979).

In each of these examples, some or all of the structural parameters are not identified without additional information. Typically, identification is obtained by imposing equality constraints on the coefficients of X. In the measurement error and returns to schooling examples, assuming that one or more elements of β1 equal zero permits estimation of the Y 1 equation using two-stage least squares with instruments X. For supply and demand, the typical identification restriction is that each equation possess this kind of exclusion assumption.

Assume we have no ordinary instruments and no equality constraints on the parameters. Let Z be a vector of observed exogenous variables, in particular, Z could be a subvector of X, or Z could equal X. Assume X is uncorrelated with (U, V 1, V 2). Assume also that Z is uncorrelated with (U 2, UVj , V 1 V 2) and that Z is correlated with V 2 2. If the model is simultaneous, assume that Z is also correlated with V 2 1. An alternative set of stronger but more easily interpreted sufficient conditions are that one or both of the idiosyncratic errors Vj   be heteroscedastic, cov(Z, V 1 V 2)=0, and that the common factor U be conditionally independent of Z. These are all standard assumptions, except that one usually either imposes homoscedasticity or allows for heteroscedasticity, rather than requiring heteroscedasticity.

Given these assumptions, which are the requirements for applying this article's identification theorems and associated estimators.

To apply this article's estimators, it is not necessary to assume that the errors are actually given by a factor model such as ϵ j j U+Vj . In particular, third and higher moment implications of factor model or classical measurement error constructions are not imposed. All that is required for identification and estimation are the moments along with some heteroscedasticity of ϵ j . The moments (5) provide identification whether or not Z is subvector of X.

1.4 Empirical Examples

Based on earlier working versions of this article, a number of researchers apply this article's identification strategy and associated estimators to a variety of settings where ordinary instruments are either weak or difficult to obtain.

Giambona and Schwienbacher (2007) applied the method in a model relating the debt and leverage ratios of firms to the tangibility of their assets. Emran and Hou (2008) applied it to a model of household consumption in China based on distance to domestic and international markets. Sabia (2007) used the method to estimate equations relating body weight to academic performance, and Rashad and Markowitz (2007) used it in a similar application involving body weight and health insurance. Finally, in a later section of this article, I report results for a model of food Engel curves where total expenditures may be mismeasured. All of these studies report that using this article's estimator yields results that are close to estimates based on traditional instruments (though Sabia 2007 also noted that his estimates are closer to ordinary least squares). Taken together, these studies provide evidence that the methodology proposed in this article may be reliably applied in a variety of real data settings where traditional instrumental variables are not available.

1.5 Literature Review

Surveys of methods of identification in simultaneous systems include Hsiao (1983), Hausman (1983), and Fuller (1987). Roehrig (1988) provided a useful general characterization of identification in situations where nonlinearities contribute to identification, as is the case here. Particularly relevant for this article is previous work that obtains identification based on variance and covariance constraints. With multiple-equation systems, various homoscedastic factor model covariance restrictions are used along with exclusion assumptions in the LISREL class of models (Joreskog and Sorbom 1984). The idea of using heteroscedasticity in some way to help estimation appears in Wright (1928) and so is virtually as old as the method of instrumental variables itself. Recent articles that use general restrictions on higher moments instead of outside instruments as a source of identification include Dagenais and Dagenais (1997), Lewbel (1997), Cragg (1997), and Erickson and Whited (2002).

A closely related result to this article's is Rigobon (2002, 2003), which uses heteroscedasticity based on discrete, multiple regimes instead of regressors. Some of Rigobon's identification results can be interpreted as special cases of this article's models in which Z is a vector of binary dummy variables that index regimes and are not included among the regressors X. Sentana (1992) and Sentana and Fiorentini (2001) employed a similar idea for identification in factor models. Hogan and Rigobon (2003) propose a model that, like this article's, involves decomposing the error term into components, some of which are heteroscedastic.

Klein and Vella (2010) also used heteroscedasticity restrictions to obtain identification in linear models without exclusion restrictions (an application of their method is Rummery, Vella, and Verbeek 1999), and their model also implies restrictions on how ϵ2 1, ϵ2 2, and ϵ1ϵ2 depend on regressors, but not the same restrictions as those used in the present article. The method proposed here exploits a different set of heteroscedasticity restrictions from theirs, and as a result, this article's estimators have many features that estimators in Klein and Vella (2010) do not have, including the following: This article's assumptions nest standard mismeasured regressor models and unobserved factor models, unlike theirs. This article's estimator extends to fully simultaneous systems, not just triangular systems, and extends to a class of semiparametric models. Klein and Vella (2003) assumed a multiplicative form of heteroscedasticity that imposes strong restrictions on how all higher moments of errors depend on regressors, while this article's model places no restrictions on third and higher moments of ϵ j conditional on X, Z. Finally, this article provides some set identification results, yielding bounds on parameters, that hold when point-identifying assumptions are violated.

The assumption used here that a product of errors be uncorrelated with covariates has occasionally been exploited in other contexts as well, for example, to aid identification in a correlated random coefficients model, Heckman and Vytlacil (1998) assumed covariates are uncorrelated with the product of a random coefficient and a regression model error.

Some articles have exploited GARCH system heteroscedastic specifications to obtain identification, including King, Sentana, and Wadhwani (1994) and Prono (2008). Other articles that exploit heteroscedasticity in some way to aid identification include Leamer (1981) and Feenstra (1994).

Variables that in past empirical applications have been proposed as instruments for identification might more plausibly be used as this article's Z. For example, in the returns to schooling model, Card (1995) and others propose using measures of access to schooling, such as distance to or cost of colleges in one's area, as wage equation instruments. Access measures may be independent of unobserved ability (though see Carneiro and Heckman 2002) and may affect the schooling decision. However, access may not be appropriate as an excluded variable in wage (or other outcome) equations because access may correlate with the type or quality of education one actually receives or may be correlated with proximity to locations where good jobs are available (see, e.g., Hogan and Rigobon 2003). Therefore, instead of excluding measures of access to schooling or other proposed instruments from the outcome equation, it may be more appropriate to include them as regressors in both equations and use them as this article's Z to identify returns to schooling, given by γ1 in the triangular model, where Y 1 is wages and Y 2 is schooling.

The next section describes this article's main identification results for triangular and then fully simultaneous systems. This is followed by a description of associated estimators and an empirical application to Engel curve estimation. Later sections provide extensions, including set identification (bounds) for when the point-identifying assumptions do not hold and identification results for nonlinear and semiparametric systems of equations.

2. POINT IDENTIFICATION

For simplicity, it is assumed that the regressors X are ordinary random variables with finite second moments, so results are easily stated in terms of means and variances. However, it will be clear from the resulting estimators that this can be relaxed to handle cases such as time trends or deterministic regressors by replacing the relevant moments with probability limits of sample moments and sample projections.

2.1 Triangular Model Identification

First, consider the linear triangular model: Here, β10 indicates the true value of β1, and similarly for the other parameters. Traditionally, this model would be identified by imposing equality constraints on β10. Alternatively, if the errors ϵ1 and ϵ2 were uncorrelated, this would be a recursive system and so the parameters would be identified. Identification conditions are given here that do not require uncorrelated errors or restrictions on β10. Example applications include unobserved factor models such as the mismeasured regressor model and the returns to schooling model described in the Introduction.

Assumption A1.  Y=(Y 1, Y 2)′ and X are random vectors. E(XY′), E(XY 1 Y′), E(XY 2 Y′), and E(XX′) are finite and identified from data. E(XX′) is nonsingular.

Assumption A2. E(Xϵ1)=0, E(Xϵ2)=0, and, for some random vector Z, cov(Z, ϵ1ϵ2)=0.

The elements of Z can be discrete or continuous, and Z can be a vector or a scalar. Some or all of the elements of Z can also be elements of X. Sections 1.1, 1.2, and 1.3 provide examples of models satisfying these assumptions.

Define matrices Ψ ZX and Ψ ZZ by and let Ψ be any positive definite matrix that has the same dimensions as Ψ ZZ .

Theorem 1. Let Assumptions A1 and A2 hold for the model of Equations (6) and (7). Assume cov(Z, ϵ2 2)≠0. Then, the structural parameters β10, β20, γ10, and the errors ϵ are identified, and

Proofs are in the Appendix. For Ψ=Ψ−1 ZZ , Theorem 1 says that the structural parameters β10 and γ10 are identified by an ordinary linear two-stage least squares regression of Y 1 on X and Y 2 using X and [ZE(Z)]ϵ2 as instruments. The assumption that Z is uncorrelated with ϵ1ϵ2 means that is a valid instrument for Y 2 in Equation (6) since it is uncorrelated with ϵ1, with the strength of the instrument (its correlation with Y 2 after controlling for the other instruments X) being proportional to the covariance of with ϵ2, which corresponds to the degree of heteroscedasticity of ϵ2 with respect to Z.

Taking Ψ=Ψ−1 ZZ corresponds to estimation based on ordinary linear two-stage least squares. Other choices of Ψ may be preferred for increased efficiency, accounting for error heteroscedasticity. Efficient GMM estimation of this model is discussed later.

The requirement that cov(Z, ϵ2 2) be nonzero can be empirically tested, because this covariance can be estimated as the sample covariance between Z and the squared residuals from linearly regressing Y 2 on X. For example, we may apply a Breusch and Pagan (1979) test for this form of heteroscedasticity to Equation (7). Also, if cov(Z, ϵ2 2) is close to or equal to zero, then will be a weak or useless instrument, and this problem will be evident in the form of imprecise estimates with large standard errors. Hansen (1982) type tests of GMM moment restrictions can also be implemented to check validity of the model's assumptions, particularly Assumption A2.

2.2 Fully Simultaneous Linear Model Identification

Now consider the fully simultaneous model where the errors ϵ1and ϵ2 may be correlated, and again no equality constraints are imposed on the structural parameters β10, β20, γ10, and γ20.

In some applications, it is standard or convenient to normalize the second equation so that, like the first equation, the coefficient of Y 1 is set equal to 1 and the coefficient of Y 2 is to be estimated. An example is supply and demand, with Y 1 being quantity and Y 2 price. The identification results derived here immediately extend to handle that case, because identification of γ20 implies identification of 1/γ20 and vice versa when γ20≠0, which is the only case in which one could normalize the coefficient of Y 1 to equal 1 in the second equation.

Some assumptions in addition to A1 and A2 are required to identify this fully simultaneous model. Given Assumption A2, reduced-form errors Wj are:

Assumption A3. Define Wj by Equation (11) for j=1, 2. The matrix Φ W , defined as the matrix with columns given by the vectors cov(Z, W 2 1) and cov(Z, W 2 2), has rank 2.

Assumption A3 requires Z to contain at least two elements (though sometimes one element of Z can be a constant; see Corollary 1 later). If for some scalar , as would arise if the common unobservable U is independent of , then Assumptions A2 and A3 might be satisfied by letting Z be a vector of different functions of , for example, defining Z as the vector of elements and (as long as is not binary).

Assumption A3 is testable, because one may estimate Wj as the residuals from linearly regressing Yj on X and then use Z and the estimated Wj to estimate cov(Z, W 2 j ). A Breusch and Pagan (1979) test may be applied to each of these reduced-form regressions. An estimated matrix rank test such as Cragg and Donald (1997) could be applied to the resulting estimated matrix Φ W , or perhaps more simply test if the determinant of Φ′ W Φ W is zero, since rank 2 requires that Φ′ W Φ W be nonsingular.

Assumption A4. Let Γ be the set of possible values of (γ10, γ20). If (γ1, γ2)∈ Γ, then (γ−1 2, γ−1 1)∉Γ.

Given any nonzero values of (γ10, γ20), solving Equation (9) for Y 2 and Equation (10) for Y 1 yields another representation of the exact same system of equations but having coefficients (γ−1 20−1 10) instead of (γ1020). As long as (γ1020)≠(1, 1) and no restrictions are placed on β10 and β20, Assumption A4 simply says that we have chosen (either by arbitrary convenience or external knowledge) one of these two equivalent representations of the system. Assumption A4 is not needed for models that break this symmetry either by being triangular, as in Theorem 1, or through an exclusion assumption, as in Corollary 2. In other models, the choice of Γ may be determined by context, for example, many economic models (like those requiring stationary dynamics or decreasing returns to scale) require coefficients such as γ1 and γ2 to be less than 1 in absolute value, which then defines a set Γ that satisfies Assumption A4. In a supply-and-demand model, Γ may be defined by downward sloping demand and upward sloping supply curves, since in that case, Γ only includes elements γ12 where γ1⩾0 and γ2⩽0, and any values that violate Assumption A4 would have the wrong signs. This is related to Fisher's (1976) finding that sign constraints in simultaneous systems yield regions of admissible parameter values.

Theorem 2. Let Assumptions A1, A2, A3, and A4 hold in the model of Equations (9) and (10). Then, the structural parameters β10, β20, γ10, and γ20 and the errors ϵ are identified.

2.3 Additional Simultaneous Model Results

Lemma 1. Define Φϵ to be the matrix with columns given by the vectors cov(Z, ϵ2 1) and cov(Z, ϵ2 2). Let Assumptions A1 and A2 hold and assume |γ10γ20|≠1. Then, Assumption A3 holds if and only if Φϵ has rank 2.

Lemma 1 assumes γ10γ20≠1 and γ10γ20≠−1. The case γ10γ20=1 is ruled out by Assumption A4 in Theorem 2. This case cannot happen in the returns to schooling or measurement error applications because triangular systems have γ20=0. Having γ10γ20=1 also cannot occur in the supply-and-demand application, because the slopes of supply and demand curves make γ10γ20⩽0. As shown in the proof of Theorem 2, the case of γ10γ20=−1 is ruled out by Assumption A3, because it causes Φ W to have rank less than 2. However, Theorem 1 can be relaxed to allow γ10γ20=−1, by replacing Assumption A3 with the assumption that Φϵ has rank 2, because then Equation (27) in the proof still holds and identifies γ1020, which along with γ10γ20=−1 and some sign restrictions could identify γ10 and γ20 in this case. However, Assumption A3 has the advantage of being empirically testable.

In either case, Theorem 2 requires both ϵ1 and ϵ2 to be heteroscedastic with variances that depend upon Z, since otherwise the vectors cov(Z, ϵ2 1) and cov(Z, ϵ2 2) will equal zero. Moreover, the variances of ϵ1 and ϵ2 must be different functions of Z for the rank of Φϵ to be 2.

Corollary 1. Let Assumptions A1, A2, A3, and A4 hold in the model of Equations (9) and (10), replacing cov(Z, ϵ1ϵ2) in Assumption A2 with E(Zϵ1ϵ2) and replacing cov(Z, W 2 j ) with E(ZW 2 j ) in Assumption A3, for j=1, 2. Then, the structural parameters β10, β20, γ10, and γ20 and the errors ϵ are identified.

Corollary 1 can be used in applications where E1ϵ2)=0. Theorem 2 could also be used in this case, but Corollary 1 provides additional moments. In particular, if only a scalar is known to satisfy , then identification by Theorem 2 will fail because the rank condition in Assumption A3 is violated with , but identification may still be possible using Corollary 1 because there we may let .

Corollary 2. Let Assumptions A1 and A2 hold for the model of Equations (9) and (10). Assume cov(Z, ϵ2 2)≠0, that some element of β20 is known to equal zero and the corresponding element of β10 is nonzero. Then, the structural parameters β10, β20, γ10, and γ20 and the errors ϵ are identified.

Corollary 2 is like Theorem 1, except that it assumes an element of β20 is zero instead of assuming γ20 is zero to identify Equation (10). Then, as in Theorem 1, Corollary 2 uses cov(Z, ϵ1ϵ2)=0 to identify Equation (9) without imposing the rank 2 condition of Assumption A3 and the inequality constraints of Assumption A4. Only a scalar Z is needed for identification using Theorem 1 or Corollary 1 or 2.

3. ESTIMATION

3.1 Simultaneous System Estimation

Consider estimation of the structural model of Equations (9) and (10) based on Theorem 2. Define S to be the vector of elements of Y and X, and the elements of Z that are not already contained in X, if any.

Let μ=E(Z), and let θ denote the set of parameters {γ1, γ2, β1, β2, μ}. Define the vector valued functions: Define Q(θ, S) to be the vector obtained by stacking the above four vectors into one long vector.

Corollary 3. Assume Equations (9) and (10) hold. Define θ, S, and Q(θ, S) as above. Let Assumptions A1, A2, A3, and A4 hold. Let θ be the set of all values θ might take on, and let θ0 denote the true value of θ. Then, the only value of θ∈Θ that satisfies E[Q(θ, S)]=0 is θ=θ0.

A simple variant of Corollary 3 is that if E1ϵ2)=0, then μ can be dropped from θ, with Q 3 dropped from Q, and the Z−μ term in Q 4 replaced with just Z.

Given Corollary 3, GMM estimation of the model of Equations (9) and (10) is completely straightforward. With a sample of n observations S 1, …, Sn , the standard Hansen (1982) GMM estimator is for some sequence of positive definite Ω n . If the observations Si are independently and identically distributed and if Ω n is a consistent estimator of Ω0=E[Q0, S)Q0, S)′], then the resulting estimator is efficient GMM with More generally, with dependent data, standard time-series versions of GMM would be directly applicable. Alternative moment-based estimators with possibly better small-sample properties, such as generalized empirical likelihood, could be used instead of GMM (see, e.g., Newey and Smith 2004). Also, if these moment conditions are weak (as might occur if the errors are close to homoskedastic), then alternative limiting distribution theory based on weak instruments, such as Staiger and Stock (1997), would be immediately applicable. See Stock, Wright, and Yogo (2002) for a survey of such estimators.

The standard regularity conditions for the large-sample properties of GMM impose compactness of θ. When γ20≠0, this must be reconciled with Assumption A4 and with Lemma 1. For example, in the supply-and-demand model, we might define θ so that the product of the first two elements of every θ∈Θ is finite, nonpositive, and excludes an open neighborhood of –1. This last constraint could be relaxed, as discussed after Lemma 1.

If one wished to normalize the second equation so that the coefficient of Y 1 equaled 1, as might be more natural in a supply-and-demand system, then the same GMM estimator could be used just by replacing Y 2X′β2Y 1γ2 in the Q 2 and Q 4 functions with Y 1X′β2Y 2γ2, redefining β2 and γ2 accordingly.

Based on the proof of Theorem 2, a numerically simpler but possibly less efficient estimator would be the following. First, let be the vector of residuals from linearly regressing Yj on X. Next, let be the sample covariance of with Zh , where Zh is the element of the vector Z. Assume Z has a total of K elements. Based on Equation (27), estimate γ1 and γ2 by: where Γ is a compact set satisfying Assumption A4. The above estimator for γ1 and γ2 is numerically equivalent to an ordinary nonlinear least squares regression over K observations, where K is the number of elements of Z. Finally, β1 and β2 may be estimated by linearly regressing and on X, respectively. The consistency of this procedure follows from the consistency of each step, which in turn is based on the steps of the identification proof of Theorem 1 and the consistency of regressions and sample covariances.

In practice, this simple procedure might be useful for generating consistent starting values for efficient GMM.

3.2 Triangular System Estimation

The GMM estimator used for the fully simultaneous system can be applied to the triangular system of Theorem 1 by setting γ2=0. Define S and μ as before, and now let θ={γ1, β1, β2, μ} and Let Q(θ, S) be the vector obtained by stacking the above four vectors into one long vector, and we immediately obtain Corollary 4.

Corollary 4. Assume Equations (6) and (7) hold. Define θ, S, and Q(θ, S) as above. Let Assumptions A1 and A2 hold with cov(Z, W 2 2)≠0. Let θ be the set of all values θ might take on, and let θ0 denote the true value of θ. Then, the only value of θ∈Θ that satisfies E[Q(θ, S)]=0 is θ=θ0.

The GMM estimator (12) and limiting distribution (13) then follow immediately.

Based on Theorem 1, a simpler estimator of the triangular system of Equations (6) and (7) is as follows. With γ20=0, β20 can be estimated by linearly regressing Y 2 on X. Then, letting be the residuals from this regression, β10 and γ10 can be estimated by an ordinary linear two-stage least squares regression of Y 1 on Y 2 and X, using X and as instruments, where is the sample mean of Z. Letting overbars denote sample averages, the resulting estimators are: where replaces the expectation defining Ψ ZX with a sample average, and similarly for ; in particular, for ordinary two-stage least squares, would be a consistent estimator of . The limiting distribution for is standard ordinary least squares. The distribution for and is basically that of ordinary two-stage least squares, except account must be taken of the estimation error in the instruments . Using the standard theory of two-step estimators (see, e.g., Newey and McFadden 1994), with independent, identically distributed observations, this gives: where is the influence function associated with .

While numerically simpler, since no numerical searching is required, this two-stage least square estimator could be less efficient than GMM. It will be numerically identical to GMM when the parameters are exactly identified rather than overidentified, that is, when Z is a scalar. More generally, this two-stage least squares estimator could be used for generating consistent starting values for efficient GMM estimation.

3.3 Extension: Additional Endogenous Regressors

We consider two cases here: additional endogenous regressors for which we have ordinary outside instruments, and additional endogenous regressors to be identified using heteroscedasticity.

In the triangular system, the estimator can be described as a linear two-stage least squares regression of Y 1 on X and on Y 2, using X and an estimate of [ZE(Z)]ϵ2 as instruments. Suppose now that in addition to Y 2, one or more elements of X are also endogenous. Suppose for now that we also have a set of ordinary instruments P (so P includes all the exogenous elements of X and enough additional outside instruments so that P has at least the same number of elements as X). It then follows that estimation could be done by a linear two-stage least squares regression of Y 1 on X and on Y 2, using P and an estimate of [ZE(Z)]ϵ2 as instruments. Note, however, that it will now be necessary to also estimate the Y 2 equation by two-stage least squares, that is, we must first regress Y 2 on X by two-stage least squares using instruments P to obtain the estimated coefficient , before constructing . Then, as before, the estimate of [ZE(Z)]ϵ2 is . Alternatively, the GMM estimator now has Q 1(θ, S) and Q 2(θ, S) given by Q 1(θ, S)=P(Y 1X′β1Y 2γ1) and Q 2(θ, S)=P(Y 2X′β2), while Q 3(θ, S) and Q 4(θ, S) are the same as before.

Similar logic extends to the case where we have more than one endogenous regressor to be identified from heteroscedasticity. For example, suppose we have the model: So, now we have two endogenous regressors, Y 2 and Y 3, with no available outside instruments or exclusions. If our assumptions hold both for ϵ2 and for ϵ3 in place of ϵ2, then the model for Y 1 can be estimated by two-stage least squares, using X and estimates of both [ZE(Z)]ϵ2 and [ZE(Z)]ϵ3 as instruments.

4. ENGEL CURVE ESTIMATES

An Engel curve for food is empirically estimated, where total expenditures may be mismeasured. Total expenditures are subject to potentially large measurement errors, due in part to infrequently purchased items (see, e.g., Meghir and Robin 1992). The data consist of the same set of demographically homogeneous households that were used to analyze Engel curves in Banks, Blundell, and Lewbel (1997). These are all households in the United Kingdom Family Expenditure Survey 1980–1982 composed of two married adults without children, living in the Southeast (including London). The dependent variable Y 1 is the food budget share and the possibly mismeasured regressor Y 2 is log real total expenditures. Sample means are and . The other regressors X are a constant, age, spouse's age, squared ages, seasonal dummies, and dummies for spouse working, gas central heating, ownership of a washing machine, one car, and two cars. There are 854 observations.

The model is Y 1=X′β1+Y 2γ11. This is the Working (1943) and Leser (1963) functional form for Engel curves. Nonparametric and parametric regression analyses of these data show that this functional form fits food (though not other) budget shares quite well (see, e.g., Banks, Blundell, and Lewbel 1997, figure 1A).

Table 1 summarizes the empirical results. Ordinary least squares, which does not account for mismeasurement, has an estimated log total expenditure coefficient of . Ordinary two-stage least squares, using log total income as an instrument, substantially reduces the estimated coefficient to . This is model TSLS 1 or equivalently GMM 1 in Table 1. TSLS1 and GMM 1 are exactly identified and so are numerically equivalent.

Table 1 Engel Curve Estimates

If we did not observe income for use as an instrument, we might instead apply the GMM estimator based on Corollary 4, using the moments cov(Z, ϵ1ϵ2)=0. As discussed in the Introduction, with classical measurement error, we may let Z equal all the elements of X except the constant. The result is model GMM 2 in Table 1, which yields . This is relatively close to the estimate based on the external instrument log income, as would be expected if income is a valid instrument and if this article's methodology for identification and estimation without external instruments is also valid. The standard errors in GMM 2 are a good bit higher than those of GMM 1, suggesting that not having an external instrument hurts efficiency.

The estimates based on Corollary 4 are overidentified, so the GMM 2 estimates differ numerically from the two-stage least squares version of this estimator, reported as TSLS 2, which uses as instruments (Equation (14)). The GMM 2 estimates are closer than TSLS 2 to the income-instrument-based estimates GMM 1 and have smaller standard errors, which shows that the increased asymptotic efficiency of GMM is valuable here. A Hansen (1982) test fails to reject the overidentifying moments in this model at the 5% level, though the p-value of 6.5% is close to rejecting.

Table 1 also reports estimates obtained using both moments based on the external instrument, log income, and on cov(Z, ϵ1ϵ2)=0. The results in TSLS 3 and GMM 3 are very similar to TSLS 1 and GMM 1, which just use the external instrument. This is consistent with validity of both sets of identifying moments, but with the outside instrument being much stronger or more informative, as expected. The Hansen test also fails to reject this joint set of overidentifying moments, with a p-value of 12.5%.

To keep the analysis simple, possible mismeasurement of the food budget share arising from mismeasurement of total expenditures, as in Lewbel (1996), has been ignored. This is not an uncommon assumption, for example, Hausman, Newey, and Powell (1995) is a prominent example of Engel curve estimation assuming that budget shares are not mismeasured and log total expenditures suffer classical measurement error (though with the complication of a polynomial functional form). However, as a check, the Engel curves were reestimated in the form of quantities of food regressed on levels of total expenditures. The results were more favorable than those reported in Table 1. In particular, the ordinary least squares estimate of the coefficient of total expenditures was 0.124, the two-stage least squares estimate using income as an instrument was 0.172, and the two-stage least squares estimate using this article's moments was 0.174, nearly identical to the estimate based on the outside instrument.

One may question the validity of the assumptions for applying Theorem 1 in this application. Also, although income is commonly used as an outside instrument for total expenditures, it could still have flaws as an instrument (e.g., it is possible for reported consumption and income to have common sources of measurement errors). In particular, the estimates show a reversal of the usual attenuation direction of measurement error bias, which suggests some violation of the assumptions of the classical measurement error model, for example, it is possible that the measurement error could be negatively correlated with instruments or with other covariates.

Still, it is encouraging that this article's methodology for obtaining estimates without external instruments yields estimates that are close to (though not as statistically significant as) estimates that are obtained by using an ordinary external instrument, and the resulting overidentifying moments are not statistically rejected.

In practice, this article's estimators will be most useful for applications where external instruments are either weak or unavailable. The reason for applying it here in the Engel curve context, where a strong external instrument exists, is to verify that the method works in real data, in the sense that this article's estimator, applied without using the external instrument, produces estimates that are very close to those that were obtained when using the outside instrument. The fact that the method is seen to work in this context where the results can be checked should be encouraging for other applications where alternative strong instruments are not available.

5. SET IDENTIFICATION-RELAXING IDENTIFYING ASSUMPTIONS

This article's methodology is based on three assumptions—namely, regressors X uncorrelated with errors ϵ, heteroscedastic errors ϵ, and cov(Z, ϵ1ϵ2)=0. As shown earlier, this last assumption arises from classical measurement error and omitted factor models, but one may still question whether it holds exactly in practice. Theorem 3 shows that one can still identify sets, specifically interval bounds, for the model parameters when this assumption is violated by assuming this covariance is small rather than zero. Small here means that this covariance is small relative to the heteroscedasticity in ϵ2; specifically, Theorem 3 assumes that the correlation between Z and ϵ1ϵ2 is smaller (in magnitude) than the correlation between Z and ϵ2 2.

For convenience, Theorem 3 is stated using a scalar Z, but given a vector Z, one could exploit the fact that Theorem 3 would then hold for any linear combination of the elements of Z, and one could choose the linear combination that minimizes the estimated size of the identified set.

Define Wj by Equation (11) for j=1, 2. Given a random scalar Z and a scalar constant ζ  define γ1 as the set of all values of γ1 that lie in the closed interval bounded by the two roots (if they are real) of the quadratic equation: Also, define B 1 as the set of all values of β1=E(XX′)−1 E[X(Y 1Y 2γ1)] for each γ1∈Γ1.

Theorem 3. Let Assumption A1 hold for the model of Equations (6) and (7). Assume E(Xϵ1)=0, E(Xϵ2)=0, and for some observed random scalar Z and some nonnegative constant τ<1, |corr(Z, ϵ1ϵ2)| ⩽τ|corr(Z, ϵ2 2)|. Then the structural parameters γ10 and β10 are set identified by γ10∈Γ1, β10B 1, and β20 is point identified by β20=E(XX′)−1 E(XY 2).

Note that an implication of Theorem 3 is that Equation (15) has real roots whenever |corr(Z, ϵ1ϵ2)| <|corr(Z, ϵ2 2)|, and τ is defined as an upper bound on the ratio of these two correlations. The smaller the value of τ is, the smaller will be the identified sets γ1 and B 1, and hence the tighter will be the bounds on γ10 and β10 given by Theorem 3. One can readily verify that the sets γ1 and B 1 collapse to points, corresponding to Theorem 1, when τ=0.

An obvious way to construct estimates based on Theorem 3 is to substitute Wj =Yj XE(XX′)−1 E(XYj ) into Equation (15), replace all the expectations in the result with sample averages, and then solve for the two roots of the resulting quadratic equation given τ. These roots will then be consistent estimates of the boundary of the interval that brackets γ10.

To illustrate the size of the bounds implied by Theorem 3, consider the model: where X, U, S 1, and S 2 are independent standard normal scalars, Z=X, and β111221221=1. A supplemental appendix to this article includes a Monte Carlo analysis of the estimator using this design. It can be shown by tedious but straightforward algebra that for this design, Equation (15) reduces to: Evaluating these equations for various values of τ shows that the identified region γ1 for γ10 is quite narrow unless τ is very close to its upper bound of 1. In this design, the true value is γ10=1, which equals the identified region when τ=0. For τ=0.1, the identified interval based on Equation (18) is [0.995, 1.005], and for τ=0.5, the identified interval is [0.973, 1.023]. Even for the loose bound on cov(Z, ϵ1ϵ2) given by τ=0.9, the identified interval is still the rather narrow range [0.892, 1.084].

6. NONLINEAR MODEL EXTENSIONS

This section considers extending the model to allow for nonlinear functions of X. Details regarding regularity conditions and limiting distributions for associated estimators are not provided, because they are immediate applications of existing estimators once the required identifying moments are established.

6.1 Semiparametric Identification

Consider the model where the functions gj (X) are unknown. In this simultaneous system, each equation is partly linear as in Robinson (1988).

Assumption B1. Y=(Y 1, Y 2)′, where Y 1 and Y 2 are random variables. For some random vector X, the functions E(YX) and E(YY′∣X) are finite and identified from data.

Given a sample of observations of Y and X, the conditional expectations in Assumption B1 could be estimated by nonparametric regressions, and so would be identified. These conditional expectations are the reduced form of the underlying structural model.

Assumption B2. E1X)=0, E2X)=0, and for some random vector Z, cov(Z, ϵ1ϵ2)=0.

As before, the elements of Z can all be elements of X also, so no outside instruments are required. No exclusion assumptions are imposed, so all of the same regressors X that appear in g 1 can also appear in g 2, and vice versa. If ϵ j =Uα j +Vj , where U, V 1, and V 2 are mutually uncorrelated (conditioning on Z), cov(Z, ϵ1ϵ2)=0 if Z is uncorrelated with U 2.

Assumption B3. Define Wj =Yj E(Yj X) for j=1, 2. The matrix Φ W , defined as the matrix with columns given by the vectors cov(Z, W 2 1) and cov(Z, W 2 2), has rank 2.

Assumption B3 is analogous to Assumption A3, but employs a different definition of Wj . These definitions will coincide if the conditional expectation of Y given X is linear in X. Lemma 1 continues to hold with this new definition of Wj and hence of Φ W , and more generally heteroscedasticity of W 1 and W 2 implies heteroscedasticity of ϵ.

Theorem 4. Let Equations (19) and (20) hold. If Assumptions B1 and B2 hold, cov(Z, ϵ2 2)≠0, and γ20=0, then the structural parameter γ10, the functions g 1(X) and g 2(X), and the variance of ϵ are identified. If Assumptions B1, B2, B3, and A4 hold, then the structural parameters γ10 and γ20, the functions g 1(X) and g 2(X), and the variance of ϵ are identified.

An immediate corollary of Theorem 4 is that the partly linear simultaneous system where X=(X 1, X 2) will also be identified, since gj (X)=hj (X 1)+X 2β j0 is identified.

6.2 Nonlinear Model Estimation

Consider the model where the functions Gj (X, β0) are known and the parameter vector β0, which could include γ1 and γ2, is unknown. This generalizes Equations (3) and (4) by allowing nonlinear functions of X. Letting gj (X)=Gj (X, β0), Theorem 4 provides sufficient conditions for identification of this model, assuming that β0 is identified given identification of the functions gj (X)=Gj (X, β0). The immediate analog to Corollary 3 is then that β0, γ10, γ20, and μ0 can be estimated from the moment conditions: For efficient estimation in this case, where some of the moments are conditional, see, for example, Chamberlain (1987), Newey (1993), and Kitamura, Tripathi, and Ahn (2003). Ordinary GMM can be used for estimation by replacing the first two conditional moments above with unconditional moments: For some chosen vector valued function ζ(X), asymptotic efficiency may be obtained by using an estimated optimal ζ(X); see, for example, Newey (1993) for details.

As in the linear model, some of these moments may be weak, which would suggest the use of weak instrument limiting distributions in the GMM estimation. See Stock, Wright, and Yogo (2002) for a survey of applicable weak moment procedures.

6.3 Semiparametric Estimation

Consider estimation of the partly linear system of Equations (19) and (20), where the functions gj (X) are not parameterized. We now have identification based on the moments: These are conditional moments containing unknown parameters and unknown functions and so general estimators for these types of models may be applied. Examples include Ai and Chen (2003), Otsu (2003), and Newey and Powell (2003).

Alternatively, the following estimation procedure could be used, analogous to the numerically simple estimator for linear simultaneous models described earlier. Assume we have n independent, identically distributed observations. Let be a uniformly consistent estimator of Hj (X)=E(Yj X), for example, a kernel or local polynomial nonparametric regression of Yj on X. Now, as defined by Assumption B3, Wj =Yj Hj (X), so let for each observation i. Next, let be the sample covariance of with Zh , where Zh is the element of the vector Z. Assume Z has a total of K elements. Based on Equation (27), estimate γ1 and γ2 by where Γ is a compact set satisfying Assumption A4. The above estimator for γ1 and γ2 is numerically equivalent to an ordinary nonlinear least squares regression over K observations of data, where K is the number of elements of Z. In a triangular system, that is, with γ2=0, this step reduces to a linear regression for estimating γ1. Finally, estimates of the functions g 1(X) and g 2(X) are obtained by nonparametrically regressing and on X, respectively. The consistency of this procedure follows from the consistency of each step, which in turn is based on the steps of the identification proof of Theorem 4.

This estimator of and is an example of a semiparametric estimator with nonparametric plug-ins (see, e.g., section 8 of Newey and McFadden 1994). Unlike Ai and Chen (2003), this numerically simple procedure might not yield efficient estimates of and . However, assuming that and converge at a faster rate than nonparametric regressions, the limiting distributions of the estimates of the functions g 1(X) and g 2(X) will be the same as for ordinary nonparametric regressions of Y 1Y 2γ10 and Y 2Y 1γ20 on X, respectively.

Further extension to estimation of the partly linear system of Equations (21) and (22) is immediate. For this model, the Assumption B2 moments E1X)=0, E2X)=0, and cov(Z, ϵ1ϵ2)=0 are: which could again be consistently estimated by the above described procedure, replacing the nonparametric regression steps with partly linear nonparametric regression estimators, such as Robinson (1988), or by directly applying an estimator, such as Ai and Chen (2003), to these moments.

7. CONCLUSIONS

This article describes a new method of obtaining identification in mismeasured regressor models, triangular systems, simultaneous equation systems, and some partly linear semiparametric systems. The identification comes from observing a vector of variables Z (which can equal or be a subset of the vector of model regressors X) that are uncorrelated with the covariance of heteroscedastic errors. The existence of such a Z is shown to be a feature of many models in which error correlations are due to an unobserved common factor, including mismeasured regressor models. Associated two-stage least squares and GMM estimators are provided.

The proposed estimators appear to work well in both a small Monte Carlo study (provided as a supplemental appendix to this article) and in an empirical application. Citing working paper versions of the present article, some articles by other researchers listed earlier include empirical applications of the proposed estimators and find them to work well in practice.

Unlike ordinary instruments, identification is obtained even when all the elements of Z are also regressors in every model equation. However, Z shares many of the convenient features of instruments in ordinary two-stage least squares models. As with ordinary instrument selection, given a set of possible choices for Z, the estimators remain consistent if only a subset of the available choices are used, so variables that one is unsure about can be safely excluded from the Z vector, with the only loss being efficiency. Similarly, as with ordinary instruments, if some variable satisfies the conditions to be an element of Z, but is only observed with classical measurement error, then this mismeasured can still be used as an element of Z. If Z has more than two elements (or more than one element in a triangular system), then the model parameters are overidentified and standard tests of overidentifying restrictions, such as Hansen's (1982) test, can be applied.

The identification here is based on higher moments and so is likely to give noisier, less reliable estimates than identification based on standard exclusion restrictions, but may be useful in applications where traditional instruments are weak or nonexistent. This article's moments based on cov(Z, ϵ1ϵ2)=0 can be used along with traditional instruments to increase efficiency and provide testable overidentifying restrictions.

This article also shows that bounds on estimated parameters can be obtained when the identifying assumption cov(Z, ϵ1ϵ2)=0 does not hold, provided that this covariance is not too large relative to the heteroscedasticity in the errors. In a numerical example, these bounds appear to be quite narrow.

The identification scheme in the article requires the endogenous regressors to appear additively in the model. A good direction for future research would be searching for ways to extend the identification method to allow for including the endogenous regressors nonlinearly. Perhaps it would be possible to replace linearity in endogenous regressors with local linearity, applying this article's methods and assumptions to a kernel weighted locally linear representation of the model.

It would also be worth considering whether additional moments for identification could be obtained by allowing for more general dependence between Z and ϵ2 2 and corresponding zero higher moments. One simple example is to let the assumptions of Theorems 1 and 2 hold using ϖ(Z) in place of Z for different functions ϖ, such as higher moments of Z, thereby providing additional instruments for estimation.

APPENDIX

Proof of Theorem 1. Define Wj by Equation (11) for j=1, 2. These Wj are identified by construction. Using the Assumptions, substituting Equations (6) and (7) for Y 1 and Y 2 in the definitions of W 1 and W 2 shows that W 112γ10 and W 22, so cov(Z, ϵ1ϵ2)=0 is equivalent to cov[Z, (W 1−γ10 W 2)W 2]=0. Solving for γ10 shows that γ10 is identified by γ10=cov(Z, W 1 W 2)/cov(Z, W 2 2). Given identification of γ10, the coefficients β10 and β20 are identified by β10=E(XX′)−1 E[X(Y 1Y 2γ10)] and β20=E(XX′)−1 E(XY 2), which follow from E(Xϵ j )=0. Also, ϵ is identified by ϵ1=Y 1X′β10Y 2γ10 and ϵ2=Y 2X′β20. Finally, to show Equation (8), observe that Ψ ZX simplifies to: which spans the same column space as and so has rank equal to the number of columns, which makes Ψ ZX ΨΨ ZX nonsingular. Also which then gives Equation (8).

Proof of Theorem 2. Substituting Equations (9) and (10) for Y 1 and Y 2 in the definitions of W 1 and W 2 shows that: and solving these equations for ϵ yields: Note that γ10γ20≠1 by Assumption A4. Using Equation (A.3), the condition cov(Z, ϵ1ϵ2)=0 is equivalent to: Now 1+γ10γ20≠0, since otherwise, it would follow from Equation (A.3) that the rank of Φ W is less than 2. Define and λ=(λ1, λ2)′, then we have: so λ is identified by: and Φ′ W Φ W is not singular because Φ W is rank 2. Solving Equation (28) for γ10 gives: The above quadratic in γ10 has at most two roots, and for each root, the corresponding value for γ20 is given by γ2010λ21. Let (γ*1, γ*2) denote one of these solutions. It can be seen from that the other solution must be (γ*−1 2, γ*−1 1), since that yields the same values for λ1 and λ2. One of these solutions must be (γ10, γ20), and by Assumption A4, the other solution is not an element of Γ, so (γ10, γ20) is identified. Note that the conditions required for the quadratic to have real rather than complex or imaginary roots are automatically satisfied, because (γ10, γ20) is real.

Given identification of γ10 and γ20, the coefficients β10 and β20 are identified by β10=E(XX′)−1 E[X(Y 1Y 2γ10)] and β20=E(XX′)−1 E[X(Y 2Y 1γ20)], which follow from E(Xϵ j )=0. Finally, ϵ is now identified by ϵ1=Y 1X′β10Y 2γ10 and ϵ2=Y 2X′β20Y 1γ20.

Proof of Lemma 1. Equation (A.1) in Theorem 2 was derived using only Assumptions A1 and A2. Evaluating cov(Z, W 2 j ) using Equation (A.1) and the assumption that cov(Z, ϵ1ϵ2)=0 gives, for each element Zk of Z so Φ W is rank 2 if and only if Φϵ is rank 2 and the matrix relating the two above is nonsingular, which requires |γ10γ20|≠1.

Proof of Corollary 1. Using Equation (A.2) and following the same steps as the proof of Theorem 2, the condition E(Zϵ1ϵ2)=0 yields instead of Equation (A.5). This identifies λ and the rest of the proof is the same.

Proof of Corollary 2. β20 and γ20, and hence ϵ2, are identified from the usual moments that permit two-stage last squares estimation. Each Wj is identified as in Theorem 1, and by Equation (A.1), cov(Z, ϵ1ϵ2)=0 implies cov[Z, (W 1−γ10 W 22]=0, which when solved for γ10 gives and cov(Z, W 2ϵ2)=cov(Z, ϵ2 2)≠0, so γ10 is identified. The rest of the proof is the same as the end of the proof of Theorem 2.

Proof of Corollaries 3 and 4. By Equations (9) and (10), Q 1=Xϵ1, Q 2=Xϵ2 and Q 4=(Z−μ)ϵ1ϵ2, and E(Q 3)=0 makes μ=E(Z), so E(Q)=0 is equivalent to E(Xϵ1)=0, E(Xϵ2)=0, and cov(Z, ϵ1ϵ2)=0. It then follows from Theorem 2, or from Theorem 1 when γ20=0, that the only θ∈Θ that satisfies E[Q(θ, S)]=0 is θ=θ0.

Proof of Theorem 3. First observe that if cov(Z, ϵ2 2)=0, then this fact along with the other assumptions would imply that the conditions of Theorem 1 hold, giving point identification, which is a special case of the statement of Theorem 3. So, for the remainder of the proof, assume the case in which cov(Z, ϵ2 2)≠0. Note this means also that var2 2)≠0 and var(Z)≠0, because var2 2)=0 or var(Z)=0 would imply cov(Z, ϵ2 2)=0. These inequalities will ensure that the denominators in the fractions given below are nonzero.

By the definition of τ:

Now by Assumption A1 and Equation (11), ϵ1=W 1W 2γ10 and W 22, so and moving all the terms to the left gives: For 0⩽τ<1, the coefficient of γ2 10 is positive, so this inequality holds for all γ1 that lie between the roots of the corresponding equality given by Equation (15).

Proof of Theorem 4. Like Theorem 1, substituting Equations (19) and (20) for Y 1 and Y 2 in the Assumption B3 definitions of W 1 and W 2 shows that Equations (A.1) and (A.2) hold in this model. Identification of γ10 and γ20 then follows exactly as in the Proof of Theorem 1. Given identification of γ10 and γ20, the functions g 1(X) and g 2(X) are identified by g 1(X)=E(Y 1X)−E(Y 2X10 and g 2(X)=E(Y 2X)−E(Y 1X20, both of which follow from E j X)=0. Finally, ϵ is now identified by ϵ1=Y 1g 1(X)−Y 2γ10 and ϵ2=Y 2g 2(X)−Y 1γ20.

SUPPLEMENTAL MATERIALS

Appendix: The supplemental online appendix to this paper contains a Monte Carlo analysis of the proposed estimators.

Supplemental material

ubes_a_643126_sup_23652447.zip

Download Zip (81 KB)

ACKNOWLEDGMENTS

I thank Roberto Rigobon, Frank Vella, Todd Prono, Susanne Schennach, Jerry Hausman, Raffaella Giacomini, Tiemen Woutersen, Christina Gathmann, Jim Heckman, and anonymous referees for helpful comments. Any errors are my own.

    REFERENCES

  • Ai, C. and Chen, X. 2003. “Efficient Estimation of Models With Conditional Moment Restrictions Containing Unknown Functions,”. Econometrica, 71: 17951844.  [Crossref], [Web of Science ®][Google Scholar]
  • Banks, J., Blundell, R. and Lewbel, A. 1997. “Quadratic Engel Curves and Consumer Demand,”. Review of Economics and Statistics, 79: 527539.  [Crossref], [Web of Science ®][Google Scholar]
  • Breusch, T. and Pagan, A. 1979. “A Simple Test for Heteroscedasticity and Random Coefficient Variation,”. Econometrica, 47: 12871294.  [Crossref], [Web of Science ®][Google Scholar]
  • Card, D. 1995. “Geographic Variations in College Proximity to Estimate the Returns to Schooling,”. In Aspects of Labor Market Behavior: Essays in Honor of John Vanderkamp, Edited by: Christofides, L. N., Grand, E. K. and Swidinsky, R. 201222. Toronto: University of Toronto Press.  [Google Scholar]
  • Carneiro, P. and Heckman, J. 2002. “The Evidence on Credit Constraints in Post Secondary Schooling,”. Economic Journal, 112: 705734.  [Crossref], [Web of Science ®][Google Scholar]
  • Chamberlain, G. 1987. “Asymptotic Efficiency in Estimation With Conditional Moment Restrictions,”. Journal of Econometrics, 34: 305334.  [Crossref], [Web of Science ®][Google Scholar]
  • Cragg, J. 1997. “Using Higher Moments to Estimate the Simple Errors-in-Variables Model,”. Rand Journal of Economics, 28: S71S91.  [Crossref], [Web of Science ®][Google Scholar]
  • Cragg, J. and Donald, S. 1997. “Inferring the Rank of a Matrix,”. Journal of Econometrics, 76: 223250.  [Crossref], [Web of Science ®][Google Scholar]
  • Dagenais, M. G. and Dagenais, D. 1997. “Higher Moment Estimators for Linear Regression Models with Errors in the Variables,”. Journal of Econometrics, 76: 193222. L. [Crossref], [Web of Science ®][Google Scholar]
  • Emran, M. S. and Hou, Z. 2008. “Access to Markets and Household Consumption: Evidence from Rural China,”, unpublished manuscript, George Washington University..  [Google Scholar]
  • Erickson, T. and Whited, T. 2002. “Two-Step GMM Estimation of the Errors-in-Variables Model Using High Order Moments,”. Econometric Theory, 18: 776799.  [Crossref], [Web of Science ®][Google Scholar]
  • Feenstra, B. 1994. “New Product Varieties and the Measurement of International Prices,”. American Economic Review, 84: 157177.  [Web of Science ®][Google Scholar]
  • Fuller, W. A. 1987. Measurement Error Models, New York: Wiley.  [Crossref][Google Scholar]
  • Giambona, E. and Schwienbacher, A. 2007. “Debt Capacity of Tangible Assets: What Is Collateralizable in the Debt Market?”, unpublished manuscript, University of Amsterdam..  [Google Scholar]
  • Hansen, L. 1982. “Large Sample Properties of Generalized Method of Moments Estimators,”. Econometrica, 50: 10291054.  [Crossref], [Web of Science ®][Google Scholar]
  • Hausman, J. 1983. “Specification and Estimation of Simultaneous Models,”. In Handbook of Econometrics Edited by: Griliches, Z. and Intriligator, M. 391448.. Amsterdam: North-Holland Vol. 1 [Google Scholar]
  • Hausman, J. A., Newey, W. K. and Powell, J. L. 1995. “Nonlinear Errors in Variables: Estimation of Some Engel Curves,”. Journal of Econometrics, 65: 205253.  [Crossref], [Web of Science ®][Google Scholar]
  • Heckman, J. 1974. “Shadow Prices, Market Wages, and Labor Supply,”. Econometrica, 42: 679693.  [Crossref], [Web of Science ®][Google Scholar]
  • Heckman, J. 1979. “Sample Selection Bias as a Specification Error,”. Econometrica, 47: 153161.  [Crossref], [Web of Science ®][Google Scholar]
  • Heckman, J. and Vytlacil, E. 1998. “Instrumental Variables Methods for the Correlated Random Coefficient Model,”. Journal of Human Resources, 33: 974987.  [Crossref], [Web of Science ®][Google Scholar]
  • Hogan, V. and Rigobon, R. 2003. “Using Unobserved Supply Shocks to Estimate the Returns to Educations,” Unpublished manuscript. [Google Scholar]
  • Hsiao, C. 1983. “Identification,”. Edited by: Griliches, Z. and Intriligator, M. 223283. Amsterdam: North-Holland in Handbook of Econometrics, (vol. 1) [Google Scholar]
  • Joreskog, K. G. and Sorbom, D. 1984. Advances in Factor Analysis and Structural Equation Models, Lanham,, , MD: Rowman and Littlefield..  [Google Scholar]
  • King, M., Sentana, E. and Wadhwani, S. 1994. “Volatility and Links Between National Stock Markets,”. Econometrica, 62: 901933.  [Crossref], [Web of Science ®][Google Scholar]
  • Kitamura, Y., Tripathi, G. and Ahn, H. 2003. “Empirical Likelihood Based Inference in Conditional Moment Restriction Models,” unpublished manuscript. [Google Scholar]
  • Klein, R. and Vella, F. 2010. “Estimating a Class of Triangular Simultaneous Equations Models without Exclusion Restrictions,” 154164. Journal of Econometrics 154 [Google Scholar]
  • Leamer, E. 1981. “Is It a Demand Curve or Is It a Supply Curve? Partial Identification Through Inequality Constraints,”. Review of Economics and Statistics, 63: 319327.  [Crossref], [Web of Science ®][Google Scholar]
  • Leser, C. E. V. 1963. “Forms of Engel Functions,”. Econometrica, 31: 694703.  [Crossref], [Web of Science ®][Google Scholar]
  • Lewbel, A. 1996. “Demand Estimation With Expenditure Measurement Errors on the Left and Right Hand Side,”. Review of Economics and Statistics, 78: 718725.  [Crossref], [Web of Science ®][Google Scholar]
  • Lewbel, A. 1997. “Constructing Instruments for Regressions With Measurement Error When No Additional Data Are Available, With an Application to Patents and R&D,”. Econometrica, 65: 12011213.  [Crossref], [Web of Science ®][Google Scholar]
  • Meghir, C. and Robin, J.- M. 1992. “Frequency of Purchase and the Estimation of Demand Systems,”. Journal of Econometrics, 53: 5386.  [Crossref], [Web of Science ®][Google Scholar]
  • Newey, W. 1993. “Efficient Estimation of Models With Conditional Moment Restrictions,”. In Handbook of Statistics, Edited by: Maddala, G. S., Rao, C. R. and Vinod, H.D. 419454. Amsterdam:, , North Holland Vol. 11, eds. [Crossref][Google Scholar]
  • Newey, W. K. and McFadden, D. 1994. “Large Sample Estimation and Hypothesis Testing,”. In Handbook of Economics, Edited by: Engle, R. F. and McFadden, D. L. 21112245. Amsterdam: Elsevier.  [Google Scholar]
  • Newey, W. and Powell, J. 2003. “Instrumental Variables Estimation of Nonparametric Models,”. Econometrica, 71: 15571569.  [Crossref], [Web of Science ®][Google Scholar]
  • Newey, W. and Smith, R. J. 2004. “Higher Order Properties of GMM and Generalized Empirical Likelihood Estimators,”. Econometrica, 72: 219255.  [Crossref], [Web of Science ®][Google Scholar]
  • Otsu, T. 2003. “Penalized Empirical Likelihood Estimation of Conditional Moment Restriction Models With Unknown Functions,”, unpublished manuscript, University of Wisconsin..  [Google Scholar]
  • Prono, T. 2008. “GARCH-Based Identification and Estimation of Triangular Systems,”, Working Paper No. QAU08-4, Federal Reserve Bank of Boston..  [Google Scholar]
  • Rashad, I. and Markowitz, S. 2007. “Incentives in Obesity and Health Insurance,”, NBER Working Paper No. W13113, NBER..  [Crossref][Google Scholar]
  • Rigobon, R. 2002. “The Curse of Non Investment Grade Countries,”. Journal of Development Economics, 69: 423449.  [Crossref], [Web of Science ®][Google Scholar]
  • Rigobon, R. 2003. “Identification Through Heteroscedasticity,”. In Review of Economics and Statistics Vol. 85, 419454.  [Google Scholar]
  • Roehrig, C. S. 1988. “Conditions for Identification in Nonparametric and Parametric Models,”. Econometrica, 56: 433447.  [Crossref], [Web of Science ®][Google Scholar]
  • Rummery, S., Vella, F. and Verbeek, M. 1999. “Estimating the Returns to Education for Australian Youth via Rank-Order Instrumental Variables,”. Labour Economics, 6: 491507.  [Crossref][Google Scholar]
  • Sabia, J. J. 2007. “The Effect of Body Weight on Adolescent Academic Performance,”. Southern Economic Journal, 73: 871900.  [Web of Science ®][Google Scholar]
  • Sentana, E. 1992. “Identification of Multivariate Conditionally Heteroscedastic Factor Models,” FMG discussion paper number 139, LSE. [Google Scholar]
  • Sentana, E. and Fiorentini, G. 2001. “Identification, Estimation, and Testing of Conditional Heteroscedastic Factor Models,”. Journal of Econometrics, 102: 143164.  [Crossref], [Web of Science ®][Google Scholar]
  • Staiger, D. and Stock, J. H. 1997. “Instrumental Variable Regression With Weak Instruments,”. Econometrica, 65: 557586.  [Crossref], [Web of Science ®][Google Scholar]
  • Stock, J. H., Wright, J. H. and Yogo, M. 2002. “A Survey of Weak Instruments and Weak Identification in Generalized Methods of Moments,”. Journal of Business and Economic Statistics, 20: 518529.  [Taylor & Francis Online], [Web of Science ®][Google Scholar]
  • Working, H. 1943. “Statistical Laws of Family Expenditure. Journal of the American Statistical Association, 38: 4356.  [Taylor & Francis Online], [Web of Science ®][Google Scholar]
  • Wright, P. 1928. The Tariff on Animal and Vegetable Oils, New York: Macmillan..  [Google Scholar]