A Ridge-Regularized Jackknifed Anderson-Rubin Test

Abstract We consider hypothesis testing in instrumental variable regression models with few included exogenous covariates but many instruments—possibly more than the number of observations. We show that a ridge-regularized version of the jackknifed Anderson and Rubin (henceforth AR) test controls asymptotic size in the presence of heteroscedasticity, and when the instruments may be arbitrarily weak. Asymptotic size control is established under weaker assumptions than those imposed for recently proposed jackknifed AR tests in the literature. Furthermore, ridge-regularization extends the scope of jackknifed AR tests to situations in which there are more instruments than observations. Monte Carlo simulations indicate that our method has favorable finite-sample size and power properties compared to recently proposed alternative approaches in the literature. An empirical application on the elasticity of substitution between immigrants and natives in the United States illustrates the usefulness of the proposed method for practitioners.


Introduction
Instrumental variables (IVs) are commonly employed in economics and related fields to estimate causal effects from observational data.Interest in inference with (very) many IVs that are potentially weak has recently received increased attention for at least two reasons.First, when identification is weak, researchers may attempt to obtain more precise inference by using a large number of IVs to capture the limited exogenous variation in the endogenous covariates.Second, recent econometric approaches and theoretical results make a number of IVs that approaches or even exceeds the number of observations more common.For instance, the so-called granular IV approach proposed by Gabaix and Koijen (2020) leads to a number of IVs equal to the size of the cross-section of the panel considered.This number can approach, and potentially outstrip, the number of available observations.Another example is the so-called saturation approach to the identification of (weighted) local average treatment effects as in Blandhol et al. (2022), which involves considering as IVs a sufficient number of interactions between the original IVs and the covariates.The number of IVs generated in this way can easily outstrip the number of observations available.However, there is no guarantee that a vast number of IVs will be jointly informative about the causal effects of interest.1Hence, it is important to have methods of inference that remain reliable when using (very) many weak IVs.The present paper contributes to the literature on the development of such methods.
We propose a ridge-regularised jackknifed Anderson-Rubin (RJAR) test to construct confidence sets for the coefficients of endogenous variables in weakly-identified and heteroskedastic IV models when the number of IVs is large.Jackknife-based methods have recently been used in this context because they are applicable in an asymptotic framework where the number of IVs diverges with the number of observations.However, by relying on existing central limit theorems developed for standard projection matrices, these methods require that the number of IVs be less than the number of observations, and often perform poorly when the number of IVs is close to (but still less than) the number of observations.Recently-proposed regularisation approaches for inference under many IVs require strong identification or a sparse relationship between the endogenous variables and the IVs to work well.By combining jackknifing with ridge regularisation, we provide a test that remains valid under heteroskedasticity, arbitrarily weak identification, and more IVs than observations, all while achieving good power both when the relationship between the endogenous variables and the IVs is sparse and when it is dense.
In the context of fully weak-identification robust inference with an increasing number of IVs, it is helpful to distinguish three different asymptotic regimes.The first 'moderately many' IVs regime allows the number of IVs, k n , to grow with the sample size, n, but still requires it to be asymptotically negligible with respect to the sample size.
The second 'many' IVs regime allows the number of IVs to be of the same order of magnitude as the number of observations.Anatolyev and Gospodinov (2011) provide an Anderson and Rubin (1949, henceforth AR) test that remains valid for k n /n → λ, 0 ≤ λ < 1, n → ∞, provided that the error terms are homoskedastic and a strong balanced-design assumption of the IVs is satisfied. 2,3Bun, Farbmacher, and   Poldermans (2020) provide analogous results for the GMM version of the AR statistic under the assumption of independently and identically distributed (i.i.d.) data.These results were recently extended in the form of a jackknifed AR statistic by Crudu, Mellace, and Sándor (2020, henceforth CMS) and Mikusheva and Sun (2022, henceforth MS) to the case where errors are allowed to display arbitrary heteroskedasticity, and the only assumption on the IVs is that the diagonal entries of the projection matrix of the IVs are bounded away from unity from above.
The third 'very many' IVs regime allows the number of IVs to grow with n, and further allows the number of IVs to be greater than the number of observations.Belloni et al. (2012, henceforth BCCH) propose a Sup Score test that remains valid under mild conditions, and allows the number of IVs to increase exponentially with the sample size.Carrasco and Tchuente (2016) propose a ridge-regularised AR statistic that allows for more IVs than observations under the assumption of i.i.d.data.Kapetanios, Khalaf, and Marcellino (2015) extend Bai and Ng (2010) by proposing weak-identification robust factor-based statistics that in principle allow for the number of IVs to be larger than the number of observations, provided that the factor structure of the IVs is sufficiently strong.
Using the notation of the model introduced in the next section, Table 1 provides a schematic overview of the main assumptions and results in the literature on inference that is robust to many weak IVs.
Our test provides a twofold extension of the existing literature.First, it allows for valid inference under many IVs while further weakening the assumptions of similar tests proposed by CMS and MS.This is made possible by deriving the limiting behaviour of the RJAR statistic from the bottom up, without relying on the existing results in Chao et al. (2012) or Hansen and Kozbur (2014).Second, this test allows for more IVs than observations.The only other approach currently available in the literature that is robust to heteroskedastic error terms and more IVs than observations is the Sup Score test of BCCH.Simulations show that the RJAR test has power comparable to the Sup Score test of BCCH whenever the signal in the first stage is sparse (i.e., only a few of the IVs are informative), and substantially more power when the signal in the first stage is dense (i.e., not sparse).Finally, we provide a comparison of the most recently proposed approaches to conducting inference in possibly heteroskedastic linear IV models when the number of IVs is not negligible with respect to the sample size.Indeed, using extensive simulation evidence and an empirical application based on Card (2009), we provide a comparison between these existing 'state-of-the-art' methods in a controlled and comparable setting.
Notation.I p denotes the p × p identity matrix.The entry (i, j) for an m × p matrix A is denoted as

Model
We consider the heteroskedastic linear IV model given by where y is an n × 1 vector containing the dependent variable, X is an n × g matrix containing the endogenous variables, β is a g × 1 coefficient vector, ε is an n × 1 vector containing the structural error terms, Z is an n × k n matrix containing the IVs, Π is a k n × g coefficient matrix, and V is an n × g matrix containing the first-stage errors.k n can diverge with n, but g is fixed.Also, let y i , X i , ε i , Z i , and V i denote the i th row of y, X, ε, Z, and V , respectively.As in BCCH, CMS and MS, we treat Z as fixed (nonstochastic), and assume that any exogenous covariates have been partialled out (see the discussion of Assumption 3 below).We exclusively consider methods that allow for arbitrarily weak identification, i.e., methods that control asymptotic size irrespective of the value of Π.
Inference is conducted on the coefficient vector β by testing hypotheses of the following type for a prespecified β 0 ∈ g : For a given non-randomised test of asymptotic size α ∈ (0, 1), a confidence set of asymptotic coverage 1 − α can be constructed as the collection of those β 0 for which H 0 in Equation ( 2) is not rejected.For convenience, define e(β 0 ) := e 1 (β 0 ), . . ., e n (β 0 ) , e i (β 0 ) : which we refer to as the structural error under the null hypothesis.
It should be noted that so long as the error term remains additive, the linearity in the structural equation ( 1a) and the first-stage equation (1b) can be relaxed to allow for any (known) real-valued function, without affecting the asymptotic properties of the RJAR test.

The RJAR test
This section introduces our proposed RJAR test, derives its large sample properties under the null hypothesis in Equation (2), and discusses its relationship to the most closely related tests in the literature: the jackknifed AR tests of CMS and MS, and the Sup Score test of BCCH.

Definition of the RJAR test
The original AR test of the null hypothesis in Equation ( 2) can be thought of as a test that the IVs are exogenous using the implied structural errors in Equation (3).
More specifically, the AR test is a Wald test of the significance of the IVs in the auxiliary regression of the structural errors under the null, e i (β 0 ), on the IVs, Z i .4 When the number of IVs k n grows with n, i.e., when there are many (potentially weak) IVs, the χ 2 approximation of the original AR statistic breaks down.The recent papers by CMS and MS, which we shall briefly review in the next subsection, propose jackknifed versions of the AR test that remain valid when k n grows with n, but k n < n.
Our proposed method combines jackknifing with ridge regularisation of the auxiliary regression of e i (β 0 ) on Z i .
We first standardise the IVs in-sample (after partialling out any covariates) as in BCCH (p.2393) so that 1 The RJAR test for the testing problem in Equation ( 2) is then based on the following statistic: where r n := rank(Z) (assumed to be positive without loss of generality), and P γn := Z(Z Z + γ n I kn ) −1 Z is the ridge-regularised projection matrix for γ n ≥ 0 if r n = k n and γ n > 0 if r n < k n .γ n is a (sequence of) regularisation parameter(s) whose choice we discuss next.
We set the penalty parameter γ n to where n be an element of arg max γn∈Γn n i=1 j =i (P γn ij ) 2 out of conservativeness, i.e., to make Assumption 3 below as plausible as possible given the IVs.Furthermore, we let γ * n be the maiximal element of this set because the maximiser of the first argument is not necessarily unique without imposing additional assumptions on the singular values and left-singular vectors of the IVs (although in practice we only found unique maximisers).We choose to take the maximum of the maximisers to make the smallest eigenvalues of the ridge-regularised Gram matrix, Z Z + γ n I kn , as far away from zero as possible when r n < k n .
We now have all the ingredients to define our new test.

Asymptotic properties of the RJAR test
We make the following assumptions to derive the limiting distribution of RJAR γ * n (β 0 ) under the null hypothesis in Equation ( 2).
Assumption 1. {ε i } i∈N is a sequence of independent random variables satisfying Assumption 1 is a mild condition on the structural error terms, and allows for conditional heteroskedasticity.It is the same as in CMS.It is slightly less restrictive than the one in MS (who require finite sixth moments on the structural error terms), and slightly more restrictive than the one in BCCH (who require finite third moments).
Assumption 2 is a weak technical assumption.It implies that both k n and n diverge.
It also allows for the sum of the number of IVs and the number of exogenous covariates to be larger than the number of observations, provided that the number of exogenous covariates that have been partialled out be sufficiently small (so that the rank of the matrix of partialled IVs continues to diverge).This assumption is weaker than the restriction on the dimensionality in MS and CMS, who require r n = k n , k n < n for each n ∈ N, and k n → ∞ as n → ∞.BCCH prove asymptotic size control of their Sup Score test under the assumption that log k n = o(n 1/3 ).
Assumption 3 is a high-level assumption on the number of IVs and their correlation structure.Assumption 3 implies that γ * n satisfies When k n < n (and there are no exogenous covariates, as in CMS and MS), Assumption 3 is weaker than the balanced-design assumption in CMS and MS which requires for for some 0 < δ < 1.The balanced-design assumption implies Assumption 3.
Proposition 1. Suppose (as in CMS and MS) that rank(P ) = k n and that there exists a 0 < δ < 1 such that max 1≤i≤n It is difficult to give more primitive conditions for Assumption 3 that remain both easily-verifiable and sufficiently general when k n > n. Figure 1 shows that for the Gaussian data-generating process (DGP) design used in the simulations in Section 4, the ratio of n i=1 j =i (P γn ij ) 2 to r n is constant, lending plausibility to Assumption 3 in this context.Figure 1 also shows that γ * n grows with the number of observations, which implies that our penalty parameter may grow with the sample size (in contrast to the one proposed in Carrasco and Tchuente (2016)).In practice, and similarly to the heuristics recommended in CMS, MS, and Hansen and Kozbur (2014), we recommend that in a given application, practitioners check the implied minimum value of c in Assumption 3. If very small values are observed, then Assumption 3 may be questionable.We are now in a position to derive the asymptotic distribution of the RJAR test under the null hypothesis.
Notice that we did not impose any assumption on the coefficients of the instruments Π in the first-stage regression in Equation ( 1).Therefore, the RJAR test is robust to arbitrarily weak identification.

Closest alternatives in the literature
The RJAR test is similar to the jackknifed AR tests proposed in CMS and MS.The distinguishing feature is the use of a ridge-regularised projection matrix P γn , which makes the RJAR test applicable also when r n < k n , unlike the aforementioned two tests that use the normal projection matrix P , and cannot be computed when k n > n.
CMS assume r n = k n < n, and propose the jackknifed AR statistic given by MS also assume that r n = k n < n, and propose a different jackknifed AR statistic that can be obtained from RJAR γn (β 0 ) in Equation ( 5) by setting γ n = 0, and replacing Φγn (β 0 ) with where M = I n − P , and M i is the i th row of M .The reason why the unregularised jackknifed AR test in MS uses ΦMS (β 0 ) instead of the variance estimator given in Equation ( 6) evaluated at γ n = 0, is because, according to their Theorem 4, the former yields higher power than the latter.It follows that the unregularised version of our RJAR test, which arises when γ * n = 0 in Equation ( 7), will be dominated by the jackknifed AR test of MS in terms of power, and practitioners may prefer the latter in those cases.We note, however, that the asymptotic size control and superior power of the jackknifed AR test of MS are proven under the balanced design assumption given in Equation ( 9) which is strictly stronger than our Assumption 3.This can have implications for the finite-sample performance of the jackknifed AR test of MS compared to the RJAR, which we investigate in Section 4.
When k n is larger than n, the AR tests in CMS and MS are not applicable, since Z Z does not have full rank.In this case, the Sup Score test of BCCH remains valid.
BCCH first standardise the IVs as in Equation ( 4), and then propose the Sup Score statistic given by and α ∈ (0, 1).They show that comparing their Sup Score statistic to this critical value yields a test of the null hypothesis in Equation ( 2) that has asymptotic size less than or equal to α. Being a supremum-norm test suggests that the BCCH Sup Score test will work well with a sparse first stage (i.e.where only a few elements of Π are zero), but may have lower power than the RJAR test when the first stage is dense.This is verified in the simulations in Section 4.

Simulations
We now investigate the size and power properties of the RJAR test and compare them to those of the tests proposed in BCCH, CMS and MS.We take our simulation setup from Hansen and Kozbur (2014) who in turn take theirs from BCCH.The DGP is given by The IVs are independent Gaussian with mean 0 and Var [Z il ] = 0.3 and Corr [Z il , Z im ] = 0.5 |l−m| .6 π = κ, where κ is a vector of zeros and ones that varies with the type of DGP considered (sparse or dense, as modelled below), and is some scalar that ensures that for a given concentration parameter, µ 2 , the following relationship is satisfied: To illustrate how the sparsity structure of the first stage in Equation (11b) can affect the size and power of the studied tests, we consider both a sparse first stage and a dense first stage.Sparsity in the first stage is modelled by setting κ = [ι 5 , 0 kn−5 ] , where ι q is a q × 1 vector of ones, and 0 q is a q × 1 vector of zeros.Density in the first stage is modelled by setting κ = [ι 0.4kn , 0 0.6kn ] .
We consider k n = 30, 90, 190.Throughout, γ − in Assumption 3 is set equal to 1.For the case of 30 IVs, the RJAR test does not impose any regularisation (γ * n = 0).For the case of 90 IVs, γ * n = 12.048.We note that n i=1 j =i (P . This shows that even in the case where r n < n, ridge regularisation can make Assumption 3 strictly more plausible.For the case of 190 IVs, γ * n = 109.187.The variance estimator of MS occasionally yields a negative value.These cases are conservatively interpreted as a failure to reject the null hypothesis.As recommended by BCCH, c BCCH = 1.1.The number of Monte Carlo replications is 10,000.

Size
Figure 2 shows the simulation results for the sparse first stage for tests of size 0.01 to 0.99, that is the rejection frequency under H 0 : β 0 = 1.As far as the illustration of the tests' size properties is concerned, the dense first stage yields virtually the same results, and is hence omitted.Since all tests are robust to weak IVs, the rejection frequencies of the tests are not affected by the value of the concentration parameter.
For the case of 30 IVs, the AR tests of CMS and MS and the RJAR test have correct size, while the BCCH Sup Score test is undersized.
For the case of 90 IVs, the AR test of CMS appears to control size for common small nominal test sizes (e.g., 0.05 or 0.1).The AR test of MS appears to be generally oversized.For example, at nominal level 0.05, the rejection frequency of the test is 0.189. 7The BCCH Sup Score test continues to be undersized, while the RJAR test continues to have correct size irrespective of the value of µ 2 .
For the case of 190 IVs, only the BCCH Sup Score test and the RJAR test are feasible.
As before, the BCCH Sup Score test is undersized, while the RJAR test has correct size.

Power
Figures 3-8 show the power of the tests when the number of IVs and the sparsity pattern of the first stage is varied.It is still the case that H 0 : β 0 = 1.For the case of 30 IVs (Figure 3 and Figure 4), the AR test of CMS has similar power to the RJAR test, while MS is slightly more powerful than the RJAR test.The BCCH Sup Score test is less powerful than all other tests.For the case of 90 sparse IVs (Figure 5), the RJAR test is slightly more powerful than the BCCH Sup Score test.The AR test of MS fails to control the size, while the AR test of CMS exhibits power properties substantially worse than those of the BCCH Sup Score test and the RJAR test.For the case of 90 dense IVs (Figure 6), the AR tests of CMS and MS exhibit the same lower power and failure to control the test size as in the case of 90 sparse IVs.In this dense setting, the RJAR test is substantially more powerful than the BCCH Sup Score test.For the case of 190 sparse and dense IVs (Figure 7 and Figure 8), the RJAR test is more powerful than the BCCH Sup Score test.Thus, for all the DGPs that are considered here, the RJAR test is as powerful as existing methods whenever these are applicable, and sometimes much more powerful.

Empirical Application
We consider an empirical application based on Card (2009).The coefficient of interest is given by β s in the following model: where y is is the difference between residual log wages for immigrant and native men in skill group s in city i,8 X is is the log ratio of immigrant to native hours worked in skill group s of both men and women in city i, and W i is a vector of city-level controls with coefficient vector δ s , and ε is is the structural error.In the context of the production function specified in Card (2009, Section I), β s can be interpreted as the (negative) inverse elasticity of substitution between immigrants and natives in the US in their respective skill group.As in Card (2009), we consider two skill groups s = h, c (high school or college equivalent) separately.Card (2009) raises the concern that unobserved factors in a city may lead to both higher wages and higher employment levels of immigrants relative to natives, causing X is to be endogenous.Card (2009) proposes to use the ratio of the number of immigrants from country l in city i to the total number of immigrants from foreign country l in the US as an IV.The rationale for these IVs is that existing immigrant enclaves are likely to attract additional immigrant labour through social and cultural channels unrelated to labour market outcomes.We consider two sets of IVs.First, we consider the original setup of Card (2009), using as IVs the k n = 38 different countries of origin of the immigrants.Second, motivated by the saturation approach of Blandhol et al.
(2022), we consider the setup where these 38 original IVs are interacted with the q = 9 available controls (including a constant).This yields k n = 342 IVs.In both cases, the number of observations (i.e., the number of cities) is n = 124.9 We construct (weak-identification robust) confidence sets for β s by inverting the AR tests of CMS and MS, the Sup Score test, and the RJAR test.Thus, the 95% confidence set for any test is obtained as the collection of β s,0 for which that test does not reject the null at 5% level of significance.As in the simulation exercise in Section 4, γ − = 1, and c BCCH = 1.1.A grid of 100 values for β s,0 is used for s = h, c.Data is taken from a single cross section, as made available by Goldsmith-Pinkham, Sorkin, and Swift (2020).
Figure 9 shows the confidence sets when k n = 38 for high-school workers and college workers.We find that γ * n = 0, implying that no regularisation is needed.The 95% confidence sets for each test are given by all the points below the grey horizontal line.This is in line with our simulations in Section 4, where we found that regularisation was not needed to maximise the sum in Assumption 3 when k n /n = 0.3.The confidence sets for both skill groups broadly confirm the results in Card (2009).We find that the confidence sets for high-school workers is smallest for the jackknifed AR statistic of CMS, whereas the BCCH Sup Score test yields the smallest confidence interval for the application to college workers.Based on the power results in Section 4, this suggests a very sparse first stage for college workers, i.e., a few nationalities being highly predictive of inflows of immigrant labour.In line with the simulation evidence on power for k n /n = 0.3 in Section 4, the differences in confidence sets across the four different tests considered appear to be reasonably small.

Conclusion
We contributed to the literature on (very) many IVs in the cross-sectional linear IV model by proposing a new, ridge-regularised jackknifed AR test.Our test compares favourably with existing methods in the literature both theoretically, by allowing for high-dimensional IVs and weakening a common assumption on the IVs' projection matrix, and practically, by having correct asymptotic size and displaying favourable power properties even when the number of IVs approaches or exceeds the number of observations.

Appendix A Proofs
Throughout this appendix, C > 0 denotes a universal constant that can change across lines.CSHNW refers to Chao et al. (2012) from where we also borrow our summation conventions.Since all the proofs in this appendix are under the null hypothesis, we write ε i and Φγn for e i (β 0 ) and Φγn , respectively.We also define The following singular value decomposition of the n × k n matrix of instruments Z of rank r n will be used frequently (see, e.g., Lütkepohl (1996, p. 60)): where D is the diagonal r n × r n matrix containing the singular values of Z. Hence, one can write where D = S(S S + γ n I kn ) −1 S is the diagonal n × n matrix with diagonal entries given by given by Dll = ll +γn ≤ 1 for l = 1, . . ., r n , and zero otherwise.Note that the diagonal entries of D are also the the eigenvalues of P γn .

A.1 Lemmas
The following lemma collects some properties of P γn .Recall that Z has rank r n .
By Equation (A.1) one has that which verifies (i) since 0 ≤ Dll ≤ 1; the last inequality following from the largest diagonal entry of P γn being bounded from above by its largest eigenvalue, which is no greater than one.Next, by ( (P γn hi ) 2 = (P γn ) 2 hh ≤ P γn hh ≤ 1, such that (ii) and (v) follow.Furthermore, by (v) and (iv), and (vi) follows since the penultimate inequality being a consequence of (i) and the last of (iv).
Lemma 2. Fix n ≥ 4. For all γ n ≥ 0 if r n = k n and γ n > 0 if r n < k n one has i<j<l<m P γn il P γn jl P γn im P γn jm ≤ Cr n .
the dictionary {i : m, j : j, l : i, m : l}.The last equality follows from the symmetry of P γn .
Using the expression for (L 2 L 2 ) ab in Equation (A.5), (A.8) The third equality follows since the product within the summations is non-zero only when both i > j > l and i > m > l.The fourth equality follows from the symmetry of P γn .The sixth equality follows by replacing the indices in the second summation according to the dictionary {i : m, j : l, l : i, m : j}.The seventh equality follows by replacing the indices in the third summation according to the dictionary {i : m, j :  (A.9) The third equality follows by replacing the indices in the summation according to the dictionary {i : j, j : i, l : m, m : l}.The fourth equality follows since the product within the summation is non-zero only when both i > j, m and l > j, m.The fifth equality follows from replacing the indices in the last four summations in the previous line according to the dictionaries {i : m, j : j, l : l, m : i}, {i : l, j : j, l : m, m : i}, {i : m, j : i, l : l, m : j}, {i : l, j : i, l : m, m : j}, respectively.The sixth equality follows from replacing the indices in the second, third, fourth and fifth summations in the previous line according to the dictionaries {i : l, j : i, l : j}, {i : j, j : i, l : l}, {i : l, j : i, m : j}, {i : l, j : j, m : i}, respectively.where the last inequality follows from Lemma 1 (vi) and Equation (A.10).Since where the third equality follows from Lemma 1 (iv) and Equation (A.2).Since Hence, the maximum is not attained for arbitrarily large γ n .Since P γn is a nondiagonal matrix by assumption, n i=1 j =i (P γn ij ) 2 is strictly positive.This leaves a compact set over which the non-zero n i=1 j =i (P γn ij ) 2 is maximised, such that a maximiser exists.
Lemma 4.Under Assumptions 1, 2 and the null hypothesis in Equation ( 2 ||A|| F := tr(A A) denotes the Frobenius norm of A, and tr(B) := m i=1 B ii for any m × m matrix B with entries given by B ij for 1 ≤ i, j ≤ m.The remaining notation follows standard conventions.Organisation of the paper.Section 2 specifies the linear IV model considered throughout.Section 3 introduces the RJAR test, and provides the main asymptotic results.Section 4 provides simulation evidence on the size and power of our RJAR test and compares it with the tests proposed by BCCH, CMS and MS.Section 5 considers an empirical application based on Card (2009).Section 6 concludes.All proofs are given in the Appendix.

Figure 10 Figure 10 :
Figure10shows the confidence sets when k n = 342 for high-school workers and college workers, respectively.Since r n = n − q = 124 − 9 = 115 < 342 = k n , the jackknifed AR statistics of CMS and MS are not applicable.We find that γ * n = 5.299 and r −1 n and S is the n × k n matrix given by j, l : i, m : l}.The last equality follows from the symmetry of P γn .Using the expression for (L LL L) ab in Equation(A.5 il ≤ 1 for l = 1, . . ., r n and i = 1, . . ., n, ).Using Equation (A.1) once more yields thatP − P γn = U ( Ď − D)U ,where Ď is the n × n diagonal matrix with upper left block being the k n × k n identity matrix.Therefore, ||P − P γn || 2 F = tr([P − P γn ] [P − P γn ]) = tr(U ( Ď − D)

Table 1 :
Weak-identification robust inference with many IVs: schematic comparison of main assumptions and results in the literature.
L mi L ml L jl L ji The last equality follows by the symmetry of P γn .Take {u i } to be a sequence of i.i.d.mean-zero and unit variance random variables.