Discussion on ‘A review of distributed statistical inference’

We congratulate the authors on an impressive team effort to comprehensively review various statistical estimation and inference methods in distributed frameworks. This paper is an excellent resource for anyone wishing to understand why distributed inference is important in the era of big data, what the challenges of conducting distributed inference instead of centralized inference are, and how statisticians propose solutions to overcome these challenges. First, we notice that this paper focuses mainly on distributed estimation, and we would like to point out several other works on distributed inference. For smooth loss functions, Jordan et al. (2018) established asymptotic normality for their multi-round distributed estimator, which yields two communication-efficient approaches to constructing confidence regions using a sandwiched covariance matrix. For non-smooth loss functions, Chen et al. (2021) similarly proposed a sandwich-type confidence interval based on the asymptotic normality of their distributed estimator. More generic inference approaches, such as bootstrap, have also been studied in the massive data setting including the distributed framework. The authors reviewed the Bag of Little Bootstraps (BLB) method proposed by Kleiner et al. (2014), which is to repeatedly resample and refit the model at each local machine and finally aggregate the bootstrap statistics. Considering the huge computational cost of BLB, Sengupta et al. (2016) proposed the Subsampled Double Bootstrap (SDB) method, which has higher computational efficiency but requires a large number of local machines to maintain statistical accuracy. In addition to distributed samples, the dimensionality can also become large in the big data era, and in this case researchers may be more interested in simultaneous inference onmultiple parameters. In the centralized setting, bootstrap is one of the solutions to the simultaneous inference problems (Zhang & Cheng, 2017). In a distributed framework where the dimensionality grows, Yu et al. (2020) proposed distributed bootstrap methods for simultaneous inference, which not only are efficient in terms of both communication and

We congratulate the authors on an impressive team effort to comprehensively review various statistical estimation and inference methods in distributed frameworks. This paper is an excellent resource for anyone wishing to understand why distributed inference is important in the era of big data, what the challenges of conducting distributed inference instead of centralized inference are, and how statisticians propose solutions to overcome these challenges.
First, we notice that this paper focuses mainly on distributed estimation, and we would like to point out several other works on distributed inference. For smooth loss functions, Jordan et al. (2018) established asymptotic normality for their multi-round distributed estimator, which yields two communication-efficient approaches to constructing confidence regions using a sandwiched covariance matrix. For non-smooth loss functions, Chen et al. (2021) similarly proposed a sandwich-type confidence interval based on the asymptotic normality of their distributed estimator. More generic inference approaches, such as bootstrap, have also been studied in the massive data setting including the distributed framework. The authors reviewed the Bag of Little Bootstraps (BLB) method proposed by Kleiner et al. (2014), which is to repeatedly resample and refit the model at each local machine and finally aggregate the bootstrap statistics. Considering the huge computational cost of BLB, Sengupta et al. (2016) proposed the Subsampled Double Bootstrap (SDB) method, which has higher computational efficiency but requires a large number of local machines to maintain statistical accuracy.
In addition to distributed samples, the dimensionality can also become large in the big data era, and in this case researchers may be more interested in simultaneous inference on multiple parameters. In the centralized setting, bootstrap is one of the solutions to the simultaneous inference problems (Zhang & Cheng, 2017). In a distributed framework where the dimensionality grows, Yu et al. (2020) proposed distributed bootstrap methods for simultaneous inference, which not only are efficient in terms of both communication and CONTACT Guang Cheng chengg@purdue.edu computation, but also allow a flexible number of local machines. The idea of their first method k-grad is to gather gradient vectors from all of the K local machines and conduct a multiplier bootstrap, which requires a large K. Based on k-grad, they developed the second method n+k-1-grad by using n gradient vectors from the central machine so that it also allows a small K. The trade-off between the communication efficiency and the statistical accuracy, as mentioned by the authors, was shown through theoretical and numerical studies. Moreover, their theory characterizes a sufficient number of communication rounds that guarantee the optimal statistical accuracy and efficiency, which also provides a practical guide on how to determine the number of communication rounds. Interestingly, this sufficient number of communication rounds is only logarithmically increasing in the number of local machines. When the dimensionality is higher than the local sample size, or even higher than the total sample size, simultaneous inference becomes of more interest. Yu et al. (2021) extended k-grad and n+k-1-grad to the high-dimensional domain using de-biased Lasso for high-dimensional estimation and nodewise Lasso for approximating inverse Hessian matrix ( Van de Geer et al., 2014). Under the sparsity assumption, they similarly established a sufficient number of communication rounds for guaranteeing the optimal statistical accuracy and efficiency, which is logarithmically increasing in both the number of local machines and the sparsity level in the true parameter and the inverse population Hessian matrix. Given that these methods depend on hyper-parameters for regularization, they also proposed a communication-efficient crossvalidation approach to tuning the hyper-parameters.
Second, we want to point out a direction on nonparametric inferences in the distributed frameworks. The existing non-parametric works are all one-shot methods, which are expected to have a limitation of an upper bound on the number of local machines. It would be interesting to see how to bypass this limitation by developing a distributed estimator that allows © 2022 The Author(s). Published by Informa UK Limited, trading as Taylor & Francis Group This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. multiple rounds of communication as in those parametric works.

Disclosure statement
No potential conflict of interest was reported by the author(s).

Funding
This work was supported by NSF [Grant Number 2134209].