Recursive versus nonrecursive Richardson algorithms: systematic overview, unified frameworks and application to electric grid power quality monitoring

Sufficiently accurate, fast and computationally efficient solution of the system of linear equations is required in many estimation problems. Richardson iteration is one of the main solvers for linear equations, which provides optimization possibilities for time critical and accuracy critical applications. Convergence rate improvement and reduction of the computational complexity of the Richardson iteration are the most important problems in the area. The introduction of Newton–Schulz iterations is the efficient way for convergence rate improvement and the paper starts with systematic overview of the high-order Newton–Schulz matrix inversion algorithms. In addition, the unified framework for recursive computationally efficient convergence accelerators and error models for a number of combinations of Richardson and Newton–Schulz iterations is developed. A new nonrecursive parameter estimation concept is introduced and compared in this paper with recursive estimation. Recursive and nonrecursive Richardson algorithms together with the standard LU decomposition method were applied to the electric grid power quality monitoring problem. The algorithms were tested for the detection of the sag and swell signatures in the voltage and current signals on real data in three-phase power system. Nonrecursive Richardson algorithms which save close to half of the computational time compared to LU decomposition method were recommended for power quality monitoring applications.


Restricted linear system of equations and Richardson iteration
The solution of the system of linear equations with respect to θ * with SPD (Symmetric and Positive Definite) (and ill-conditioned in many cases) matrix A is required in many application areas such as control, system identification, signal processing, statistics as well as in many big data applications, machine learning and in many others, [1][2][3][4].
The iterative Richardson method for solving (1) is often preferable due to the optimization possibility (achieved via a proper choice of the number of iterations), robustness, less processor time and memory space compared to direct methods. The trade-off between parameter estimation accuracy and computational burden allows different optimization of Richardson algorithms for approximate computing in time and accuracy critical applications. Convergence rate improvement and reduction of the computational complexity of the Richardson iteration are the most important problems in the area.
The convergence rate of the Richardson iteration can be improved via the introduction of fast iterative matrix inversion algorithms, [5][6][7][8] such as high order Newton-Schulz algorithms, [9][10][11][12][13][14][15][16][17][18][19]. Iterative algorithms for calculations of the inverses and different generalized inverses are widely used in cryptography, control systems, signal and image processing and in many other areas. Notice that practical applications of the iterative matrix inversion methods are hampered by the lack of general unified description. Mini-tutorial with general unified description of the iterative matrix inversion algorithms with convergence analysis is presented in this paper.
Convergence rate improvement is usually associated with additional computational burden. In order to reduce the computational complexity the following algorithms should be developed: (a) the recursive procedures associated with combinations of Newton-Schulz and Richardson iterations and (b) nonrecursive forms of the Richardson algorithm.

Nonrecursive Richardson algorithms
The Richardson iteration can be presented in a nonrecursive form, where the estimate is the function of initial estimate and initial power series and all the quantities are calculated once only, [5]. The Richardson algorithm written in a nonrecursive form can be implemented on parallel computational units using sequential matrix vector multiplications only. Accuracy critical applications require a relatively large number of iterations of the recursive algorithm for the ill-conditioned case. This results in a large number of matrix vector multiplications in a nonrecursive counterpart. The approach is computationally efficient for large-scale systems and less efficient for small and medium size matrices. The algorithm should be selected according to the properties of the system in each application.
Factorizations of the power series, see for example [8,15,20] in combination with matrix vector multiplications can be applied for the improvement of the computational efficiency of the solution taking into account the size of the matrix A.

Application: electric grid power quality monitoring
Power quality disturbances such as sag and swell events may cause significant damages and problems to critical equipment connected to the network. Fast detection with possible mitigation of these events is one the most important problems in the area, [21][22][23][24]. A number of approaches such as peak voltage detection, root mean square method, Fourier transformation and many others have been developed for sag and swell detection. Voltage peak detection is not robust in the presence of harmonics and the detection methods, which use the windowing technique can be slow for a sufficiently large window size, [25]. The approach presented in this paper allows to use short enough windows for the estimation of the frequency contents of oscillating signals and rapid detection of the disturbances. The estimation involves solution of (1) with positive definite and ill-conditioned (due to a small window size) information matrix A by using Richardson algorithms with improved convergence rate.
The algorithms developed in this paper have been tested for the detection of the sag and swell signatures of real failure event in voltage and current signals in threephase power systems, [26]. Nonrecursive Richardson algorithms which can be optimized via a proper choice of the number of steps for approximate computing save close to half of the computational time compared to the LU decomposition method which does not allow optimization of the performance. Detection performance improvement associated with computational savings of the nonrecursive Richardson method is an essential contribution to time critical power quality monitoring applications.

Structure of the paper and main contributions
This paper starts with short systematic overview (presented as a mini tutorial) of iterative matrix inversion algorithms with unified convergence analysis in Section 2. In addition to unified description of known algorithms, a new high order iteration with memory based convergence accelerator is introduced. A new unified framework for recursive computationally efficient accelerators for the Richardson algorithm presented in the form of Newton-Schulz iterations, see Section 3 is the first main contribution of the paper. New nonrecursive estimators (the second main contribution of the paper) are introduced in Section 4 and compared to recursive estimators in Section 5. Grid power quality monitoring application is considered in Section 6, where nonrecursive and recursive estimators and the standard LU decomposition method are compared. The paper ends with brief conclusions and future outlook in Section 7.

Overview of iterative matrix inversion algorithms: mini tutorial
A brief overview of iterative matrix inversion algorithms with convergence analysis is presented in this Section. The overview is presented in the systematic way (from simple algorithms to more sophisticated ones) in the chronological sequence with convergence analysis and starts with the second order Newton-Schulz algorithms in Section 2.1 followed by the description of high order Newton-Schulz algorithms in Section 2.2. Computationally efficient Durand iterations are described in Section 2. 3. High-order Newton-Schulz algorithms with improved convergence rate are presented in Section 2.4, where the combination of high order Newton-Schulz and Durand algorithms is also discussed. Finally, a new high-order Newton-Schulz iteration with memory based convergence accelerator is presented in Section 2. 4.4. Iterative algorithms for calculations of the inverses and different generalized inverses are widely used in cryptography, control systems, signal and image processing and in many other areas.

Second order Newton-Schulz iterations
Consider the following iterations, [9][10][11][12]: where G k is the estimate of the inverse of the matrix A and I is the identity matrix, k = 1, 2 . . . . The evaluation of the error in the first step F 1 yields: which gives the error model: Algorithm (2) which is also known as the Newton-Schulz-Hotelling matrix iteration, [10] estimates the inverse of the matrix A via minimization of the estimation error F k provided that the spectral radius of initial matrix is less than one, where G 0 is the preconditioner (initial guess of A −1 ). The preconditioner has especially a simple form for SPD matrices which can be split as where · ∞ is the maximum row sum matrix norm, ε > 0, [3].
The proof of convergence is presented in [9][10][11][12], where the algorithm (2) is written in the following form: The order of the iterative algorithm can be increased for convergence rate improvement by the introduction of the power series in the algorithm (2).

High-order Newton-Schulz iterations
High-order algorithms (which are also known as hyperpower iterative algorithms, which are widely used for calculation of generalized inverses) can be presented in the following form for n = 2, 3, . . . , [13][14][15]: or in the form presented in [6,14]: The error F k in (5) can be written in the following form: Notice that algorithm (5) is written in following form in [13][14][15] (and it is widely used in this form): is better aligned with the Richardson iteration and therefore is considered further in this paper.

Durand iterations
The computationally efficient matrix inversion method described by Durand, [16,17] can be derived from the relation associated with the initial error in the Newton-Schulz approach, The error models (10), (11) are valid for the algorithm (9). Higher order Durand iterations are discussed in [4,17].
Notice that the convergence rate of Newton-Schulz iterations is significantly higher than the convergence rate of Durand iterations. However, Durand iteration (9) requires only one matrix product in each step.
Notice also that the convergence performance can be enhanced by the introduction of the initial power series as preconditioning G 0 (which is calculated only once) in Newton-Schulz and Durand iterations [4,18]. The order of initial power series can be chosen sufficiently high, which essentially improves the convergence rate. Convergence rate can also be improved via combination of two Newton-Schulz iterations, which also cover Durand iterations as special case, see next Section 2.4.

Iterative method with improved convergence rate
The improved iterative method can be written in the following form, [19]: where both Z k and G k are the estimates of the matrix inverse and n = 2, 3, . . . is the order. Multiplication of (13) by A together with evaluation of the power series (12) yields: and algorithm (12) and (13) has the following error model, [8]: Introduction of the power series for F k−1 improves further the convergence rate.

Combination of two high order Newton-Schulz iterations
Two Newton-Schulz iterations can be combined as follows, [8]: where (4). Notice that Durand iteration (9) can be derived from (16)-(19) with n = 1. The error model (24) can be evaluated as follows: The error model (21) is obtained via explicit evaluation of (22). The algorithm (16)-(21) has a faster convergence due to the high order error F n k−1 in the error model (20) compared to algorithm (12) and (13) which has the error model (15) with the first order error F k−1 .
The algorithm (16)- (19) has two independent computational parts, where the first part is associated with the calculations of k , L k and n k , where k = n k−1 , and the term n−1 j=0 F j k−1 G k−1 is calculated in the second part. This algorithm is suitable for parallel implementation.

Nested Newton-Schulz iterations
The following new algorithm with nested Newton-Schulz loops and improved convergence rate extends the framework described above as follows: where . . is the number of Newton-Schulz loops.

High order Newton-Schulz iteration with memory based convergence accelerator
The high order iterative method with memory based convergence accelerator, where the estimate in the step k is calculated using the estimates from the steps k−1 and k−2 and the error model accumulates multiplicatively inversion errors calculated in the previous steps can be written in the following form: P k = P k−1 F n k−1 , where G 1 = (I + F n 0 ) n−1 j=0 F j 0 G 0 and G 0 satisfies (4). The error model (31) is obtained by multiplication of (29) by A which results in the error model with subsequent evaluation of P k−1 using (30). The algorithm (28)-(31) is new high-order method with a memory based convergence accelerator, which has especially simple form for n = 1, and a similar algorithm with memory can be found in [27].
Interestingly enough that the algorithm (28)-(31) represents fast power series expansion. The evaluation of one step of the algorithm (29) for k = 2 by taking into account that G 1 = 2n−1 j=0 F j 0 G 0 and F 1 = F 2n 0 is presented as follows: and results in the error model (32) which verifies the model (31). The expansion can be continued for the next steps for complete verification of the error model. The evaluation of the power series shows the idea of the algorithm. The power series n−1 j=0 F j k−2 G k−2 is used for truncation of the series (I + F n k−1 ) n−1 j=0 F j k−1 G k−1 so that the resulting series multiplied by P k−1 extends the power series coming from the previous step G k−1 .

Richardson iteration: recursive accelerators and error models
Consider the following general form, [5] of the Richardson iteration (33): and error models (34), (35), where θ k is the estimate of unknown parameters θ * defined in (1) and ω k is associated with the expansion rate of the power series, and V k is the convergence accelerator.
Specifying the parameters ω k and k j=1 ω j the family of estimation algorithms with error models (34) and (35) which allow individual and comparative performance assessment is obtained, see Table 1. The table represents the systematic view (implementation ready tool-kit) for convergence rate improvement of the Richardson iteration with Newton-Schulz type recursive computationally efficient accelerators.
The first item in the table is the combination of Richardson and Newton-Schulz iterations described in [6,7], whereas the second item is recursive realization of the combination described in [6]. The third and the fourth items are described in [4,8]. The table can be further extended by integration of the matrix inversion algorithms presented in Section 2 in the Richardson iteration.
Moreover, a stepwise evaluation of the algorithm with the highest convergence rate (the third item in the table), which shows the idea of fast power series expansion for the convergence accelerator and validation of the error model (34) is presented below. Table 1 The evaluation of the convergence accelerator V k , starts with the evaluation of the power series L 1 and G 1 for the first step k = 1:

Stepwise evaluation of the convergence accelerator V k , (3) in
and the convergence accelerator V k is evaluated as follows: The expansion principle showed in Equations (38)-(40) is valid in all the subsequent steps k = 2, 3, . . .. Another way of parameter estimation is based on non-recursive form of the Richardson algorithm and factorization of the power series, described in the next section.

Nonrecursive Richardson estimators
Algorithm (33)-(35) can be presented in the following non-recursive form: where the parameter vector θ k is a function of the initial state θ 0 and power series { Algorithm (41) can be implemented using sequential matrix vector multiplications only, [5] with power series factorization of the polynomial with respect to (G 0 A) described for example in [8].
Accuracy critical applications require a relatively large number of iterations for the ill-conditioned case and in finite digit calculations. Accuracy requirements and requirements imposed on the convergence rate result in a high order of the polynomial which involves a large number of matrix vector multiplications. Taking into account the size of the matrix A and the number of iterations to reach a certain accuracy a suitable method for factorization of the power series F j 0 }G 0 can be found in order to reduce the large number of matrix vector multiplications.
The procedure of the application of factorization based estimators can be described as follows: (1) The final step k * is determined using the error model (35), pre-specified parameter estimation accuracy and the functions k j=1 ω j presented in Table 1.
F j 0 }G 0 which corresponds to the final step k * (determined in the previous step) is factorized using the factorization tool-kit, [8] taking into account the size of matrix A and providing the best combination of matrix by matrix and matrix vector multiplications.
(3) The final parameter estimate θ k * which guarantees pre-specified accuracy is calculated with (41) and the power series factorized in the previous step.
The procedure provides nonrecursive estimation of the parameter vector, which is more computationally efficient than recursive estimators for relatively small number of iterations.

Comparison of recursive and nonrecursive estimators
The recursive estimator presented as item 2 in Table 1 for the second order, n = 2 is chosen for comparison. The implementation of the algorithm requires 2 mmm (matrix by matrix multiplications) in each step and matrix vector multiplications (mvm).
The nonrecursive counterpart has the form (41) [8] which requires two matrix products only and matrix vector multiplications.
The estimation of the parameter vector θ * for the system (1) with ill-conditioned SPD information matrix A associated with the system with harmonic regressor and different number of frequencies is chosen for simulations. Several steps are considered for both recursive and nonrecursive algorithms providing exactly the same vector θ k in both cases. The ratio of two computational times for the nonrecursive algorithm (in numerator) and the recursive algorithm (in denominator) multiplied by one hundred per cent is plotted as a function of the step number and the size of the matrix A (the number of frequencies for the system with harmonic regressor multiplied by two). Simulation results are presented in Figure 1. Nonrecursive algorithm is faster than the recursive one (time ratio is less than one hundred percent) for relatively small number of iterations.
Accuracy critical applications require a relatively large number of iterations for the ill-conditioned case. For the same convergence rate of both algorithms the large number of steps results in high-order polyno- F j 0 in the nonrecursive algorithm. For example, for k = 20 and n = 2 the order is 4194300. The implementation of Richardson iteration for a relatively large number of steps requires very large number of matrix vector multiplications and the recursive algorithm is preferable in this case for small and medium size matrices, see Figure 1.
The advantage of the nonrecursive algorithm relative to the recursive algorithm is pronounced for the large-scale systems. However, nonrecursive estimators may have more significant error accumulation in finite digit calculations. In other words, the algorithms have distinct regions of applicability and can be applied for different cases. Further comparison is performed in the Section 6 for power system application. 6. Grid power quality monitoring 6. 1

. Estimation of the frequency content of the power system signals
Suppose that the measured signal of the power system (voltage or current) y k can be presented in the following form: where ϑ is the vector of unknown parameters and ϕ k is the harmonic regressor presented in the following form: where q 0 is the fundamental frequency of network (for example q 0 = 50 Hertz or q 0 = 60 Hertz), m is the number of harmonics, and ξ k is a zero mean white Gaussian noise, k = 1, 2, . . . is the step number.
The model of the signal (42) with adjustable parameters θ * k is presented in the following form: The signal y k is approximated by the modelŷ k in the least squares sense in each step k of moving window of a size s. The estimation algorithm is based on minimization of the following error E k ; for a fixed step k, where k ≥ s.
The least squares solution for the estimation of the parameter vector θ * k can be written as follows: where the matrix A k is symmetric and positive-definite information matrix, and the parameter vector θ * k , which satisfy (46) should be calculated in each step.
In power quality monitoring applications, the number of harmonics in voltage and current signals (higher harmonics are more pronounced in the current signals) is not large in three-phase systems and the size of the information matrix A k is relatively small. Notice that the information matrix may change the size when new harmonics appear in the signals. Notice also that Equation (46) is exactly Equation (1) and the methods described above can be applied in this case.

Fault detection on real data
Power quality monitoring and detection of sag and swell events are time critical applications since the faults should be detected as soon as possible and appropriate action should be taken for power system protection. Monitoring is performed via the estimation of the frequency content of the voltage and current signals.
Real data obtained from the DOE/EPRI National Database of Power System Events, [26] are used for power quality monitoring. The event (Event ID 2827) which resulted in voltage and current sag and swell was considered. The event is associated with the fault on the lead cable caused by circuit breaker to trip on substation. Measured voltages and the currents which correspond to sag and swell in phases 'a', 'b' and 'c' are plotted with green, red and blue lines respectively in Figures 2(a,b).
It is assumed that the voltage and current signals can be described by the model (42) with harmonic regressor (43) and fundamental frequency of 60 Hz with four higher harmonics and unknown parameter vector ϑ.
The model of the signals is given by (44) with the vector of the parameters θ * k which estimates unknown vector ϑ. The estimation problem is reduced to the restricted linear system of Equation (46) which should be solved w.r.t. θ * k in each step. The 10 × 10 matrix A k is the SPD matrix in each step (which is verified by simulations). The preconditioner described in Section 2.1 can be easily calculated for this matrix in each step. Measured signals are well described by the model (42) with the small measurement noise ξ k , which allows to choose short enough window (s = 40) for rapid detection. The choice of the window size represents the trade-off between the quality of the detection signal and detection time. Larger window sizes (s ≥ 54) [22] imply that matrix A is strictly diagonally dominant for which better preconditioning, which reduces convergence time, is available, [6].
Amplitudes of the first harmonic of the measured signals recovered by the nonrecursive parameter estimation algorithm described in Section 5 are used for detection, see Figures 2(c,d). The figure shows that the sag and swell events are more pronounced on the current signal and detectable on both voltage and current signal.
Three algorithms (1) recursive estimator (2) nonrecursive estimator (both described in Section 5), and (3) the LU decomposition method are applied for parameter calculation of θ * k in (46). The mean values of the computational times required for the estimation of the parameter vector are calculated for three algorithms. The computational time for the first two algorithms depends on the number of steps k * and the minimal number of steps (four steps) which provides sufficient quality for the detection of the sag and swell events was chosen. Notice that the standard LU decomposition method (and parameter calculation method based on LU decomposition) does not have any parameters to optimize and provides worse parameter estimation accuracy than four steps of the first two methods. Simulation results show that the nonrecursive algorithm provides the best result and saves in average 59% of the computational time compared to the recursive algorithm and 43% compared to the standard LU decomposition method.

Discussion
The most significant computational burden associated with the implementation of the detection algorithm is associated with calculation of the parameters θ * k in Equation (46) where the matrix A is a wellconditioned matrix for small enough window sizes. Therefore, the parameter vector θ * k can be rapidly calculated using Richardson algorithms described above with a small number of steps. The nonrecursive algorithm provides the best computational performance. The result is in agreement with the results presented in Section 5, which show the advantages of the nonrecursive algorithm for a small number of steps and small size matrices, see Figure 1. The events are associated with the fault on the lead cable caused by circuit breaker to trip on substation. The data are obtained from the DOE/EPRI National Database of Power System Events, [26], event ID is 2827. Amplitudes of the first harmonic (with the colours corresponding to the signals in Figures 2(a,b)) which show delectability of the sag and swell events are plotted in Figures 2(c,d).
Notice that recursive Richardson iteration (in comparison with nonrecursive counterpart) is preferable for accuracy critical signal processing applications in noisy environment and in finite digit calculations. For example, the recursive Richardson iteration is preferable for the frequency estimation applications in microgrids (which is both accuracy and time critical application) where the frequency should be estimated with very high accuracy [21,28].

Conclusions and future challenges
The family of Richardson algorithms was extended in this paper with (a) new unified framework for recursive convergence accelerators and error models and (b) new nonrecursive algorithms. The paper showed that the Richardson approach represents general optimizable tool which is applicable to many practical problems.
Recursive and nonrecursive Richardson algorithms together with the standard LU decomposition method were considered in practical problem of the detection of the sag and swell signatures in voltage and current signals in three-phase power systems. Real data obtained from the DOE/EPRI National Database of Power System Events were used for a comparison of the algorithms. New nonrecursive Richardson algorithms which save close to half of the computational time compared to the LU decomposition method were recommended for power quality monitoring applications.
Additional significant distortions of voltage and current signals are expected in the future electricity networks due to higher penetration level of renewable energy sources, power electronics, non-linear controllable loads and many others. This will result in challenging detection/estimation problems associated with numerical and computational methods for large-scale systems, where algorithms developed in this paper can be applied.

Disclosure statement
No potential conflict of interest was reported by the author.