Estimations of treatment effects based on covariate adjusted nonparametric methods

Abstract Nonparametric tests are commonly used tests for two sample comparison in clinical studies. However, the estimation of treatment effects associated with the tests may not be obvious, especially under the covariate adjustment. In this article, we evaluated the effect of covariate adjustment on estimating treatment effects based on the Wilcoxon Rank Sum test, the van Elteren test, aligned rank test, and Jaeckel, Hettmansperger-McKean test through Monte Carlo simulations via mean square error and coverage probability. Based on the simulation, commonly used ANCOVA-based approach do not have good estimation of treatment effect when the covariate imbalance is severe. Aligned rank test seems perform well across most scenarios.


Introduction
It is well known fully randomized trials or experiments have properties that assumptions for statistical comparison are met and observed treatment difference is unbiased estimate of true treatment difference for many statistical procedures. However, these properties are not guaranteed for small, poorly designed, or poorly conducted trials. When there is covariates imbalance, the treatment groups are not comparable even before the start of the trials. And the estimated treatment effect is biased. In this situation, it is necessary to adjust for covariates effect. Adjusting covariates will not only reduce the impact of covariate imbalance, but also improve precision for the estimation of treatment effect. When the covariates and the primary response measure are associated, adjusting for such covariate would generally improve the precision of the estimation.
Matching, stratification, and regression are commonly used approaches to adjust for covariates effects. Matching is commonly used observational studies. The treated subjects are matched with untreated subjects which share similar covariates values. However, it may exclude unmatched ABOUT THE AUTHORS Jiabu Ye is senior statistician at AstraZeneca Pharmaceuticals. He is study statistician for multiple ongoing immuno-oncology phase III studies. His research interest is in clinical trial related topics.
Dejian Lai is professor of biostatistics and data science at The University of Texas Health Science Center at Houston School of Public Health. His research areas include nonparametric statistical methods, spatio-temporal models and their applications in public health.

PUBLIC INTEREST STATEMENT
Many factors may inference the interpretations of results from clinical trials. Statistical methods for designing and analyzing clinical trials are under investigation all the time. In this article, we performed simulation studies on many nonparametric statistical methods with covariate adjustments.
subjects. Stratification is subclassification of subjects into mutual exclusive categories based on the value of covariates. The overall treatment effect is estimated as weighted average of within strata treatment effect. It will be problematic if there are many covariates need be adjusted. Regression or ANCOVA approaches adjust covariate effect by treating covariate as regression parameters. After adjusting the covariates, within treatment variance is reduced and confounds are eliminated. Regression or ANCOVA approaches rely on their hypotheses which include linearity relationship between covariates and response, independent identical normally distributed residuals and same covariate coefficients. Under certain scenarios, these assumptions may be violated. Moreover, selection of covariates in the regression model is subjective and overfit will cause the estimation be biased.
Although covariate adjustment has been widely applied to clinical studies. Few studies focus on the evaluation combining covariate adjusting approaches with both parametric and nonparametric approaches of estimating treatment effect when covariate are imbalanced. This study tried to fill the gap in between.

Treatment effect of two sample comparison adjusting for covariates
Before evaluation of estimation of treatment effect by different approaches, it is necessary to test if the magnitude of treatment effect is statistically significant (Ye & Lai, 2019). There are many tests that could evaluate the significance of treatment effect. Some of these tests may be valid after adjusting for covariates when the covariates are not fully balanced. Some tests may be invalid after adjusting for non-balanced covariates. The evaluation of several nonparametric tests in terms of their power and type I error were investigated (Ye & Lai, 2019). In current article, we evaluate the treatment effect after adjusting for unbalanced covariates.
Here, response variable Y can be measurement of change from baseline to endpoint. For simplicity, we denote the measure of change as Y.

Mean difference (TT)
The first statistic to estimate the treatment effect is mean difference between groups, it is also known as average treatment effect. This mean difference statistic is associated with T test.

Hodges Lehmann estimate (WRS)
Hodges Lehmann estimate of treatment effect for two sample is the median of all paired differences between responses in two groups (Hodges & Lehmann, 1963).
where Y Trt:i is the response in treatment group and Y ctrl:j is the response in control group. m and n denote the number of subjects in treatment group and control group correspondingly. The above Hodges-Lehmann estimatorβ Trt is associated with Wilcoxon rank sum test (Wilcoxon, 1945). 2.3. Jaeckel's rank estimation of regression parameter, regress treatment variable only (JHM(n)) Jaeckel proposed a rank regression based approach to estimate regression parameters (Jaeckel, 1972). The basic idea is to find a set of estimate b β for regression parameters by minimizing a is non-decreasing rank score function. Generally, there is no closed for the dispersion function and numeric approach is required to get the estimate. In this approach, we only consider treatment variable as the regression parameter and ignore the other covariate. And the estimate of treatment effectβ Trt is the solution to minimize the bellowing dispersion function.
This point estimation is associated with Jaeckel, Hettmansperger-Mckean Test (Hettmansperger & McKean, 1978), in which the null hypothesis is β Trt equal to 0 and there is no remaining covariate in the linear rank regression model.
2.4. Jaeckel's rank estimation of regression parameter, treatment variable, and the covariate (JHM(x)) In this Jaeckel's rank estimation, we treat treatment variable and the covariate as the regression parameter. The estimate of treatment effectβ Trt 0 is solution to minimize the corresponding dispersion function.
In the associated JHM test, the null hypothesis is β Trt equals to 0 and there is one remaining covariate X in the linear rank regression model.

Adjusting covariate effect based on Jaeckel's rank estimation and Hodges Lehmann estimate of treatment effect (JHM(x)-WRS)
In this approach, we adjust for covariate effect first for each individual subjects and Y adj i β X is the solution to minimize the corresponding dispersion function D J Y i À Trt iβTrt 0 À X iβX . Then the Hodges Lehmann estimator is applied to estimate treatment effect  The continuous covariate is stratified based on its quintile. The covariate effect is adjusted by alignment within each strata and the k th strata effect is estimated by the Walsh averageδ i , , here n is number of subjects in k th strata and Y align i where Y i is subjects in the k th strata. And the treatment effect is defined aŝ Trt;i is the aligned response in treatment group and Y align Ctrl;j is the aligned response in the control group.
2.8. Adjusting covariate effect based on treatment matched quintile stratification of the covariate and align for strata effect; Hodges Lehmann estimate of treatment effect (AR(MS,x5)) The continuous covariate is stratified based on its conditional quintile condition on the treatment assignment. Therefore, the number of subjects of both treatment groups within each stratum are equal, whereas in AR(x5) the number of each treatment group within each stratum are not equal. The following procedures are similar to AR(x5). The k th strata effect is estimated by one sample Hodges And each response aligned for the strata effect and Y where Y i is subjects in the k th strata. And the treatment effect is defined as Trt;i is the aligned response in treatment group and Y align 0 Ctrl;j is the aligned response in the control group (Hodges & Lehmann, 1962, 1963).

2.9
Adjusting covariate effect based on quintile stratification of the covariate and align for strata effect; Hodges Lehmann estimate of within strata treatment effect; inverse variance weighted overall treatment effect (Aligned-WS(HL,iv,x5)) The first step is also to stratify the continuous covariate based on its quintile. Then the response variable is adjusted by alignment. In the k th strata effect, the strata effect is estimated by the Walsh averageδ k ,δ k ¼ median o , here n is number of subjects in k th strata and where Y i is subjects in the k th strata. Unlike previous approaches to adjust covariate effect by estimating strata effect, the treatment effects within each strata are estimated by Hodges Lehmann estimator. And the treatment effect within k th stratum is defined aŝ Where Y kTrt:i is the response in treatment group in the k th stratum and Y kctrl:j is the response in control group in the k th stratum. m and n denote the number of subjects in treatment group and control group in the k th stratum correspondingly.
The overall effect is estimated by weighted average of strata effect. 2.10. Adjusting covariate effect based on quintile stratification of the covariate and align for strata effect; within strata treatment effect by mean difference; inverse variance weighted overall treatment effect (Aligned-WS(mean,iv,x5)) The first step is to stratify the continuous covariate based on its quintile. And similar to Aligned-WS (HL,iv,x5), the response variable is adjusted by alignment. The average treatment effect is estimated by mean difference for each stratum. For the k th stratum, the treatment effect is defined aŝ The overall effect is estimated by weighted average of strata effect.
kctrl is the sample variance with stratum k for each treatment group and m, n is the number of subjects for each treatment group correspondingly.
2.11. Adjusting covariate effect based on treatment matched quintile stratification of the covariate and align for strata effect; Hodges Lehmann estimate of within strata treatment effect; inverse variance weighted overall treatment effect (Aligned-WMS(HL,iv,x5)) This is similar to Aligned-WS(HL,iv,x5) and the only difference is the continuous covariate is stratified based on its conditional quintile condition on the treatment assignment. Therefore, the number of subjects of both treatment groups within each stratum are equal.
2.12. Adjusting covariate effect based on treatment matched quintile stratification of the covariate and align for strata effect; within strata treatment effect by mean difference; inverse variance weighted overall treatment effect (Aligned-WMS(mean,iv,x5)) This approach is similar to Aligned-WS(mean,iv,x5) and the only difference is the continuous covariate is stratified based on its conditional quintile condition on the treatment assignment. There are equal number of subjects from both treatment group within each stratum.
2.13. Adjusting covariate effect based on quintile stratification of the covariate; Hodges Lehmann estimate of within strata treatment effect; inverse variance weighted overall treatment effect (WS(HL,iv,x5)) The first step is also to stratify the continuous covariate based on its quintile. The treatment effects within each strata are estimated by Hodges Lehmann estimator. And the treatment effect within k th stratum is defined aŝ where Y kTrt:i is the response in treatment group in the k th stratum and Y kctrl:j is the response in control group in the k th stratum. m and n denote the number of subjects in treatment group and control group in the k th stratum correspondingly.
The overall effect is estimated by weighted average of strata effect.
metric two-sided confidence interval of ordered pairs of difference in Hodges Lehmann estimator (Byun et al., 2013).
2.14. Adjusting covariate effect based on quintile stratification of the covariate; within strata treatment effect by mean difference; inverse variance weighted overall treatment effect (WS(mean,iv,x5)) The first step is to stratify the continuous covariate based on its quintile. The average treatment effect is estimated by mean difference for each stratum. For the k th stratum, the treatment effect is defined aŝ Similar to Aligned-WS(mean, iv, x5), The overall effect is estimated by weighted average of strata effect.
kctrl n , where c var 2 ktrt , c var 2 kctrl is the sample variance with stratum k for each treatment group and m, n is the number of subjects for each treatment group correspondingly.
2.15. Adjusting covariate effect based on treatment matched quintile stratification of the covariate; Hodges Lehmann estimate of within strata treatment effect; inverse variance weighted overall treatment effect (WMS(HL,iv,x5)) This is similar to WS(HL,iv,x5) and the only difference is the continuous covariate is stratified based on its conditional quintile condition on the treatment assignment. Therefore, the number of subjects of both treatment groups within each stratum are equal.
2.16. Adjusting covariate effect based on treatment matched quintile stratification of the covariate; within strata treatment effect by mean difference; inverse variance weighted overall treatment effect (WMS(mean,iv,x5)) This approach is similar to WS(mean,iv,x5) and the only difference is the continuous covariate is stratified based on its conditional quintile condition on the treatment assignment. There are equal number of subjects from both treatment group within each stratum.
2.17. Adjusting covariate effect based on alignment of strata effect of binary covariate; Hodges Lehmann estimate of treatment effect (AR) This approach is for binary covariate. The binary covariate is treated as stratum variable. Unlike the quintile stratified strata, the sample size within each stratum could be uneven. The covariate effect is adjusted by alignment within each strata and the strata effect is estimated by the Walsh average in AR(x5). And the aligned response is adjusted similary, Trt;i is the aligned response in treatment group and Y align Ctrl;j is the aligned response in the control group.
2.18. Adjusting covariate effect based on alignment of strata effect of binary covariate; Hodges Lehmann estimate of within strata treatment effect; inverse variance weighted overall treatment effect (Aligned-WS(HL,iv)) This is for binary covariate besides treatment variable. This is similar to Aligned-WS(HL,iv,x5). The only difference is the binary variable is treated as strata variable instead of quintile stratification of continuous variable.
2.19. Adjusting covariate effect based on alignment of strata effect of binary covariate; within strata treatment effect by mean difference; inverse variance weighted overall treatment effect (Aligned-WS(mean,iv)) This is for binary covariate besides treatment variable. This is similar to Aligned-WS(mean,iv,x5). The only difference is the binary variable is treated as strata variable instead of quintile stratification of continuous variable.
2.20. Hodges Lehmann estimate of within strata treatment effect; inverse variance weighted overall treatment effect (WS(HL,iv)) This is for binary covariate besides treatment variable. This is similar to WS(HL,iv,x5). The only difference is the binary variable is treated as strata variable instead of quintile stratification of continuous variable.
2.21. Within strata treatment effect by mean difference; inverse variance weighted overall treatment effect (WS(mean,iv)) This is for binary covariate besides treatment variable. This is similar to WS(mean,iv,x5). The only difference is the binary variable is treated as strata variable instead of quintile stratification of continuous variable.

Simulation settings
The motivation for this simulation study is to evaluate if covariate adjusting approaches based on their empirical mean square error (MSE) and coverage probability. MSE is defined as where β Trt is the true parameter of treatment effect and b β Trt is estimate of treatment effect by different approaches. And coverage probability is defined as rate that β Trt is within calculated 95% confidence interval for the parameter. In this simulation, only single covariate scenarios will be evaluated in this study. Setting 1: one normal covariate and linear regression.
In the first simulation, one normal covariate is included for adjusting. The simulation is through following steps: (1) Simulate population of 10,000 subjects with outcome Y, treatment variable Trt, and covariate X.
Trt follow Bernoulli 0:5 ð Þ distribution and the covariate X follow standard normal distribution.
(2) Both scenario with outlier and without outliers are simulated. ( determined so that under alternative hypothesis the power is close to 0.8 for most tests when the covariate X are fully balanced. b 2 is determined so that correlation coefficient between Y and X is 0.3 for scenario with moderate covariate effect. (4) Create new indicator variable indX so that if X i ! median X ð Þ, indX i ¼ 1; otherwise, indX i ¼ 0: (5) Sample 200 subjects from the population. Each arm has 100 subjects. In the control arm, Pr indX i ¼ 1jTrt ¼ 0 ð Þ is set to 0.5. Thus, in the control arm, there is 50% chance that covariate X is greater than true population median. In the treatment arm, Pr indX i ¼ 1jTrt ¼ 1 ð Þ range from 0.5 to 0.95 with increment 0.05. Thus in the first scenario, treatment arm has 50% chance that covariate X is greater than true population median. In the second scenario, the treatment arm has 55% chance that covariate X is greater than true population median. In the next scenario, the probability increment is 5% more. And in the 10 th scenario, the treatment arm has 95% chance that covariate X is greater than true population median. Thus, in the first scenario, the covariate X is fully balanced between the two arms. In the second scenario, the covariate X is slightly imbalanced and in treatment arm has more large values of X comparing to control arm. In the 10 th scenario, the covariate X is extremely imbalanced.
(6) Under each scenario, the baseline covariate t-test are computed for each iteration. For each covariate adjusting approaches for estimation of treatment effect, the empirical MSE and coverage probability is computed.

Setting 2: One binary covariate and linear regression
In the second simulation, one binary covariate is included for adjusting. The simulation is similar to Simulation 1 with following modifications: (1) In Step 1, the covariate X follow Bernoulli 0:5 ð Þ.
(4) In Step 6, the baseline covariate Chi-square test are computed for each iteration.

Setting 3: One normal covariate and exponential regression
In the third simulation, one normal covariate is included for adjusting. The simulation is similar to Simulation 1 with difference in the true model. The true model is

Simulation results
The empirical MSE of above approaches are plotted in Figure 1 and coverage probability is plotted in Figure 2. In the subfigures, the first digit of the figure indicating the MSE (1) or coverage probability (2) and the second digit indicates the three simulation settings. The third digit indicates various cases of covariate effects: (1) no covariate effect with no outlier; (2) moderate covariate effect with no outlier; (3) no covariate effect with outliers; and (4) moderate covariate effect with outliers. In each figures, the labels in x-axis represent scenarios of covariate imbalance. From the original of x-axis, '0:5 : 0:5 0 represent the scenario that covariate X has 50% chance greater than true population median in both control arm and treatment arm. This represents the covariate fully balanced scenario. '0:55 : 0:5 0 represent the scenario that covariate X has 55% chance greater than true population median in the treatment arm and 50% chance greater than true population median in control arm. The other labels represent the scenarios in the similar ways. '0:95 : 0:5 0 represent the scenario that covariate X has 95% chance greater than true population median in the treatment arm and 50% chance greater than true population median in the control arm. This represent the most extreme scenario of covariate imbalance. From left to right of x-axis in each figures, the covariate imbalance become more extreme. In the two sub-plots of each figure, the first plot (a) includes all the tests and second plot (b) show the selected tests in smaller scale.

Results from setting 1 (normal covariate)
When there is no covariate effect and the covariate are fully balanced, the approaches estimating overall treatment effect based on weighted treatment effect within each stratum by nonparametric Hodges Lehmann estimator have slightly larger empirical MSE comparing to other approaches (Figure 1.1.1a). These approaches include Aligned-WS(HL,iv,x5), Aligned-WMS(HL,iv, x5), WS(HL,iv,x5), and WMS(HL,iv,x5) As covariate imbalance got more severe, the empirical MSE of approaches not involving covariate adjustment (TT, WRS, and JHM(n)) do not increase. The empirical MSE of approaches estimating overall treatment effect based on weighted treatment effect with matching will maintain at the covariate fully balanced level. These approaches includes Aligned-WMS(HL,iv,x5), Aligned-WMS (mean,iv,x5), WMS(HL,iv,x5), and WMS(mean,iv,x5). The empirical MSE of AR(MS,x5) does not increase as covariate imbalance gets severe since it adjusts covariate effect based on matched subjects for each stratum.

Results from setting 2 (Binary covariate)
When there is no covariate effect and the covariate are fully balanced, the approaches involving mean differences (including TT, Aligned-WS(mean,iv), and WS(mean,iv)) has smallest empirical MSE. As covariate imbalance get severe, the empirical MSE of approaches not involving covariate adjustment (TT, WRS, and JHM(n)) do not increase (Figure 1.2.1a, Figure 1.2.1b). The empirical MSE of approaches estimating overall treatment effect based on weighted treatment effect within each stratum by nonparametric Hodges Lehmann estimator (Including Aligned-WS(HL,iv), WS(HL,iv)) increases as covariate imbalance get severe. The empirical MSE of other approaches have slightly increase as covariate get imbalanced (Figure 1.2.1a and Figure 1.2.1b).
When there is true covariate effect, the empirical MSE of approaches do no adjusting for the covariate (TT, WRS, JHM(n)) will increase dramatically as to covariate imbalance get severe (Figure where X,Bern 0:5 ð Þ and e,Nð0; 0:063).

1.2.2a).
Among the approaches adjusting for the covariates effect, The empirical MSE of approaches estimating overall treatment effect based on weighted treatment effect within each stratum by nonparametric Hodges Lehmann estimator (Including Aligned-WS(HL,iv), WS(HL,iv)) increases as covariate imbalance get severe. The other approaches have similar Empirical MSE. As covariate get severely imbalanced, AR has lowest MSE (Figure 1.2.2b).
At presence of outliers with no covariate effect, the empirical MSE of approaches involving mean differences (including TT, Aligned-WS(mean,iv), and WS(mean,iv)) have smallest empirical MSE when covariate are balanced. Similarly to no outlier scenario, as the covariate imbalance get severe, the empirical MSE of Aligned-WS(HL,iv) and WS(HL,iv) increase (Figure 1.2.3a). And the empirical MSE of approaches not involving covariate adjustment (TT, WRS, and JHM(n)) do not increase. The empirical MSE of other approaches have slightly increase (Figure 1.2.3b).
At presence of 20% outliers with moderate covariate effect, the situation for MSE is similar as no outliers. The empirical MSE of approaches do no adjusting for the covariate (TT, WRS, JHM(n)) will increase dramatically as to covariate imbalance get severe (Figure 1.2.4a). The empirical MSE of Aligned-WS(HL,iv) and WS(HL,iv) increase most as the covariate get imbalanced among the rest. AR has the lowest MSE across the scenarios of covariate imbalance (Figure 1.2.4b).
At presence of 20% outlier with moderate covariate effect, the empirical MSE of approaches involve matching within stratification and approaches do not adjust for covariate effect increase dramatically as covariate imbalance get severe (Figure 1.3.4a). Among the remaining approaches, AR(x5) maintain the empirical MSE as covariate imbalance get severe (Figure 1.3.4a, Figure 1.3.4b).

Results from setting 1 (normal covariate)
When there is no covariate effect and the covariate are fully balanced, the approaches estimating overall treatment effect based on weighted treatment effect within each stratum by nonparametric Hodges Lehmann estimator. These approaches include Aligned-WS(HL,iv,x5), Aligned-WMS (HL,iv,x5), WS(HL,iv,x5), and WMS(HL,iv,x5). The coverage probabilities for these approaches are also 6% lower than nominal level when covariate are fully balanced (Figure 2.1.1a). The co,verage probability of Aligned-WMS(mean,iv,x5) and WMS(mean,iv,x5) are 1% slightly below 95% (   The coverage probabilities for JHM(x), JHM(x)-WRS, ANCOVA(x)-WRS, and AR(x5) also decrease as covariate imbalance get more severe. Among them, Aligned-WS(HL,iv,x5) and WS (HL,iv,x5) decrease to less than 80%, while the coverage probability of other approaches is still above 90% (Figure 2.1.1a).
At presence of 20% outliers with no covariate effect, the approaches involving mean differences (including TT, Aligned-WS(mean,iv,x5), Aligned-WMS(mean,iv,x5), WS(mean,iv,x5), and WMS(mean, iv,x5)) have largest empirical MSE comparing to other approaches when the covariate are fully balanced. The coverage probability of Aligned-WS(mean,iv,x5), Aligned-WMS(mean,iv,x5), WS (mean,iv,x5), and WMS(mean,iv,x5) is about 94% which is very similar to the scenarios without outliers. The coverage probability of Aligned-WS(HL,iv,x5), Aligned-WMS(HL,iv,x5), WS(HL,iv,x5), and WMS(HL,iv,x5) is 98%-99% indicating these approaches are very conservative at presence of outliers (Figure 2.1.3a, Figure 2.1.3b). As the covariate imbalance get severe, the coverage  At presence of outliers, the coverage probability of different approaches behave similarly except for Aligned-WS(HL,iv,x5) and WS(HL,iv,x5). These two approaches estimating overall treatment effect based on weighted treatment effect within each stratum by nonparametric Hodges Lehmann estimator behave very conservative and the coverage probability is close to 1. This suggest these using Hodges Lehmann estimator to estimate treatment effect within each stratum may be conservative at presence of outliers if inverse variance is used as the weight (Figure 2.1.4a, Figure 2.1.4b).

Results from setting 2 (binary covariate)
When there is no covariate effect and the covariate are fully balanced, coverage probability of estimates involving mean differences (including TT, Aligned-WS(mean,iv), and WS(mean,iv)) maintain around 95% (Figure 2.2.1a, Figure 2.2.1b). The coverage probability of weighted treatment effect within each stratum by nonparametric Hodges Lehmann estimator (Including Aligned-WS (HL,iv), WS(HL,iv)) are close to 1 (Figure 2.2.1a). The coverage probability for JHM(x) remains at 95% as covariate imbalance get severe. For other remaining approaches, the coverage probability decrease slightly as covariate imbalance get severe.
When there is true moderate covariate effect, the coverage probability of approaches that do no adjust for the covariate (TT, WRS, JHM(n)) decrease dramatically as covariate get more imbalanced  At presence of outliers with no covariate effect, the coverage probability of TT, WRS, and JHM(n) keep around 95% as covariate imbalance get severe. The other approaches(AR, ANCOVA(x)-WRS, JHM(x)-WRS, WS(mean, iv), and Aligned-WS(mean,iv)) will have slight decrease as covariate get imbalanced (Figure 2.3.3a, Figure 2.3.3b).
At presence of 20% outliers with moderate covariate, the coverage probability is similar as no outliers. The coverage probability for approaches without adjusting for the covariate (TT, WRS, JHM(n)) decrease dramatically (Figure 2.2.4a). The coverage probability of Aligned-WS(HL,iv) and WS(HL,iv) remains at 1 as covariate imbalance get severe. The coverage probability of JHM(x) maintain at 95% as covariate get imbalance. The coverage probability of AR, Aligned-WS(mean,iv), and WS(mean, iv) slightly decrease as covariate imbalance get severe, while coverage probability of ANCOVA(x)-WRS and JHM(x)-WRS decrease slightly more when the covariate are extremely imbalanced (Figure 2

Results from setting 3 (exponential covariate)
When there is no covariates effect and the covariate is fully balanced, The coverage probability for approaches estimating overall treatment effect based on weighted treatment effect within each  stratum by mean differences (Aligned-WS(mean,iv,x5), Aligned-WMS(mean,iv,x5), WS(mean,iv,x5), and WMS(mean,iv,x5)) about 92% across the scenarios (Figure 2.3.1a, Figure 2.3.1b).
When is moderate covariate effect, the coverage probability for approaches involve matching within stratification (AR(MS,x5), Aligned-WMS(HL,iv,x5), Aligned-WMS(mean,iv,x5), WMS(HL,iv,x5), and WMS(mean,iv,x5)) and the approaches do not adjust for covariates (TT, WRS, and JHM(n)) decrease dramatically (Figure 2 At presence of 20% outliers without covariate effect, the situation of empirical coverage probability are similar to that without outliers as the empirical coverage probability of Aligned-WS   (mean,iv,x5), Aligned-WMS(mean,iv,x5), WS(mean,iv,x5), and WMS(mean,iv,x5) are slightly smaller (Figure 2 At presence of 20% outliers with covariate effect, the empirical coverage probability of approaches involve matching within stratification and approaches without adjusting for covariate effect decrease dramatically as covariate imbalance get severe. Among the remaining approaches, AR(x5) maintains the empirical coverage probability as covariate imbalance get severe. Among these approaches, AR(x5) is the only approach have coverage probability close to 95% (

Concluding remarks
Covariate adjustment is necessary to estimate treatment effect when there is covariate imbalance. The approaches without adjustment for covariate effect have large MSE and low coverage probability at presence of covariate imbalance.
Some covariate adjustment approaches improve the estimation of treatment effect based on empirical MSE and coverage probability but not all of them. Approaches based on matching within stratification will fail to adjust the covariate effect when covariate imbalance exists. Matching will introduce extra bias when the covariate is imbalanced.
The approaches estimating overall treatment effect based on weighted treatment effect within each stratum by nonparametric Hodges Lehmann estimator do not work well either. At presence of outliers or covariate imbalance, these approaches are conservative and coverage probability are close to 1 for most scenarios.
ANCOVA-based approach do not have good estimation of treatment effect when the covariate imbalance is severe. It is likely the basic assumption for ANCOVA is violated at these extreme cases. The coverage probability tends to be much lower and empirical MSE is larger comparing to other covariate adjusting approaches.
Rank regression based approaches work well when the linearity assumption holds. But these approaches does not work well under some nonlinear scenario.
Among these approaches, alignment for covariate strata effect and then estimate treatment effect based on Hodges Lehmann approach seems perform well across all the scenarios in the simulation.