GMRES based numerical simulation and parallel implementation of multicomponent multiphase flow in porous media

Abstract This article considered the numerical simulation of multicomponent multiphase flow in porous media. The resulting system of nonlinear equations linearized by the Newton-Raphson method and solved with the iterative Generalized minimal residual method (GMRES) algorithm. To achieve better convergence, we used the ILU(0) preconditioner to the GMRES algorithm. As a result, we used a completely implicit scheme called the Newton-ILU0-GMRES algorithm to solve the problem of interest. Based on the obtained sequential algorithm, we implemented a parallel algorithm using Message Passing Interface (MPI) technology. Additionally, we made comparisons between the parallel program of the presented algorithm and the parallel program using the ready-made Portable Extensible Toolkit for Scientific Computation (PETSc) library. We developed an MPI parallel algorithm and tested it on the MVS-10P supercomputer of the Interdepartmental Supercomputer Center of the Russian Academy of Sciences.


Introduction
The development of high-performance computing (HPCwire: Global News on High Performance Computing, 2020) enables the solving of major scientific and applied problems in various fields. For example, supercomputers can be loaded with weather and climate forecasting tasks, acoustic ABOUT THE AUTHOR Danil Lebedev has a PhD in computer science and is an acting associate professor in the Department of Computer Science of Al-Farabi Kazakh National University. The authors of the article are employees of the Scientific Research Institute of Mathematics and Mechanics, operating in Al-Farabi Kazakh National University and are engaged in problems of modelling the flow of liquids and distributed data processing.

PUBLIC INTEREST STATEMENT
Reducing the cost of oil production is an urgent task for every oil company. For this purpose, companies use models of the production process to identify optimal conditions. During modelling, multiphase liquid flows into porous media, and taking into account the multicomponent nature of hydrocarbon phases allows one to obtain more accurate calculation results. Increasing the accuracy of these calculations leads to an increase in the runtime. This problem is solved by the development of parallel algorithms that will run on supercomputers. In this article, the authors consider the development of a parallel algorithm to solve such oil problems. tasks, hydrodynamics, medical preparations manufacturing tasks, and biological research (Akhmed-Zaki et al., 2017;Farrukh, 2018;Yeshmukhametov et al., 2017).
Using supercomputers can significantly accelerate the solution of problems when using numerical methods. One of these tasks is forecasting oil and gas production in specific oil and gas fields. Modelling multicomponent multiphase fluid (oil and gas) flow in porous media (in oil reservoirs) is relevant and, at the same time, a complex problem of hydrodynamic simulation. To solve such problems, various methods and schemes are used (Aceto et al., 2006;Ahmed, 2006;Borisov et al., 2013;Chen, 2006;Chen et al., 2006;Edwards et al., 2018;Javidi & Ahmad, 2013;Imankulov et al. 2018;Zheng & Yin, 2014), some of which are iterative methods for solving linear systems (Lacroix et al., 2003;Mittal & Al-Kurdi, 2002;Vabishchevich & Vasilyeva, 2011;Wang et al., 2013).
In this paper, the equation system of multicomponent multiphase flow in porous media, which is linearized by the Newton-Raphson method is solved, and at each iteration for solving the linear equation system, the generalized minimal residual method (GMRES) (Saad & Schults, 1986) is used with the ILU(0) preconditioner. The method, called Newton-ILU(0)-GMRES, is the combination of the Newton-Raphson method and GMRES algorithm with ILU0 preprocessing to develop a reliable and efficient method. A parallel implementation of the considered problem is realized by the MPI programming interface, which is the main tool in the development of parallel programs for supercomputers. In the future, a project to develop an active knowledge management system to solve the multicomponent fluid flow problem is planned (Akhmed-Zaki et al., 2019). The result of this project will be a system that will automatically choose the a solution method, one of which is the method described in this article, for practical problems.

Mathematical model and numerical method
The one-dimensional problem of compositional fluid flow in a porous medium is considered. Compositional flow involves multiple components, and the one-dimensional problem of compositional fluid flow in a porous medium is considered. Compositional flow involves multiple components and three phases, and there is a mass transfer between the vapour and liquid phases. The model has the following assumptions: • the flow process is isothermal, • the components form at most three phases (e.g., vapour, liquid, and water), • there is no mass interchange between the water and the hydrocarbon (vapour, liquid) phases.
• the diffusion/dispersion effect is negligible.
We describe the compositional model that has been widely used in the petroleum industry because the general compositional model is very difficult to solve.
At point A, water is pumped in, which, due to the difference in pressure through porous media, moves to point B. Liquid/vapor is pumped from point B as shown in Figure 1. The distribution of water, vapour, and liquid in the porous medium should be calculated.
Let us consider a mathematical model of the multicomponent flow of three-phase fluid in a porous medium. We write the law of conservation of mass for each component (Chen, 2006;Chen et al., 2006): where S w , S o , and S g are the saturations of the water, oil and gas phase, respectively; x mα is the mole fraction of component m in phase α; � α is the molar density of phase α; ϕ and k are the porosity and permeability, respectively, of the rock; μ α is the viscosity; p α is the phase pressure, p coω and p cgo are the capillary pressures; k rα is the relative phase permeability; u α is the velocity of phase α; q w and q m are the molar velocities of the water flow and m-th component, respectively; f mo and f mg are the fugacity functions; and φ mα is the fugacity coefficient of component m in phase α.
Distribution of chemical components in the hydrocarbon phase described by the K-value method (Pederson & Christensen, 2008). The thermodynamic behaviour of fluids under reservoir conditions are described by the Peng-Robinson equations of state (Peng & Robinson, 1976). (1)-(6) is linearized by the Newton-Raphson method.

The system of nonlinear Equations
In the (l+ 1)-th layer of the Newton iterative process, the value of the unknowns is updated by the following law : Figure 1. Layout diagram.
The resulting system of linear Equations (7) is reduced to the following form, as shown by Chen et al. (2006): There are several well-known methods for solving systems of algebraic equations of this type; direct methods, such as the Gauss method or LU decomposition, are not suitable for systems with sparse matrices since there is the possibility of overflow (Higham, 2011). The Conjugate gradient method (CG) is designed for systems with symmetric matrices, and the Biconjugate Gradient Method (BiCG) has slow convergence ( Van der Vorst, 2003). In this work, system (8) was solved by the iterative method GMRES, which is a widely used Krylov subspace method (Saad, 2003).
To reduce the number of iterations in the computation, preconditioning was used as an explicit or implicit modification of the system of linear equations, making it possible to simplify the solution. One of the widely used methods for finding the preconditioner is incomplete LU (ILU) decomposition of the original matrix A (Mittal & Al-Kurdi, 2003). We choose ILU(0) (Chow & Saad, 1997) as a preconditioner for the GMRES algorithm, which was found by decomposing matrix A by a fine-grained ILU factorization algorithm. As a result, for the numerical solution of the given problem for solving a system of linear equations, (8) uses the Newton-ILU0-GMRES algorithm as a fully implicit scheme.

Parallel algorithm
To implement the parallel algorithm of the GMRES, we need to determine the parts of the calculations that can be the basis of parallelization.
For instance, the GMRES algorithm consists of the following steps: (a) Initialization. Choose x 0 , compute r ¼ b À Ax 0 , solve Pw ¼ r and compute v 1 ¼ W w k k 2 (b) Arnoldi iteration. Iterate m times: j = 1,2, …,m. The specific Arnoldi algorithm is as follows: Compute the approximate solution x m ¼ x 0 þ Q n;m y m . Here, minimizes β� À H m;m y � � � � 2 ; Q n;k is an n � k matrix, the column vector of the matrix is composed of v 1; v 2 ; . . . ; v k orthogonal vectors, and H mm is the Hessenberg matrix.
(b) Determination. Compute r m ¼ b À x m , stop if the condition is met: otherwise let x 0 ¼ x m and go back to a) and recalculate. The convergence condition by which the method stops is given by an arbitrary number ε.
In this GMRES algorithm, there are common operations such as matrix-matrix multiplication, matrix-vector multiplication, vector-vector multiplication and vector norm calculation in steps of a), b) and c). These operations could be implemented in parallel with data decomposition. In a) and b) steps of the GMRES algorithm, P is used as the ILU(0) preconditioner.

Existing libraries
The GMRES algorithm can be implemented on some well-known libraries, such as Intel MKL (Intel® Math Kernel Library Developer Reference, 2020), MATLAB (MathWorks, 2020), Portable, Extensible Toolkit for Scientific Computation (PETSc) (Abhyankar et al., 2018). In this work, the PETSc library was chosen for comparisons with our Newton-ILU0-GMRES MPI parallel implementation since this tool implements parallel versions of different methods for solving linear and nonlinear equations.
During the review, it was revealed that it was impossible to use the ILU(0) preconditioner in a parallel program with PETSc; by default, the Jacobi block preconditioner is used instead.

Verification of the convergence of the algorithm with preconditioning
The testing of the program was carried out on matrices corresponding to the cases when the number of grid point schemes varied from 500 to 8000. The parameters of the matrix are shown in Table 1.
From this table, the size of the matrix being solved is 6 times larger than the number of grid points. For example, the size of matrix #5, which corresponds to the case when the number of points of the difference grid is 8000, is 48000 � 48000 because 6 Equations (7) are written for each point.
Problem (8) was solved by the Newton-GMRES algorithm without a preconditioner and with the ILU(0) preconditioner. The graph of the change in the solution residuals with the execution of a certain number of iterations for various cases is shown in Figure 2.
As seen in Figure 2, for all the matrices tested, the Newton-ILU(0)-GMRES algorithm converges in less than three times the number of iterations than the Newton-GMRES algorithm. Based on this result, further tests were conducted only on the Newton-ILU (0)-GMRES algorithm.

Comparison of the Newton-ILU(0)-GMRES algorithm with the PETSc library
To compare our parallel program of Newton-ILU(0)-GMRES algorithm and the program using the PETSc library, tests were carried out on a personal 8-core computer, since it was not possible to install third-party libraries on a supercomputer. The results are shown in the following Table 2.
The tests were carried out on a 3000×3000 matrix, which corresponds to the case when the number of nodes of the difference scheme of (1)-(6) was 500. As we can see from Table 2, in parallel calculations, the implementation of the Newton-ILU(0)-GMRES algorithm runs faster than the program using PETSc. The comparisons shown above are not under the same conditions, as different preconditioners were used in different parallel implementations. However, the implementation of the presented Newton-ILU(0)-GMRES algorithm is faster than the PETSc version on the same personal computer.

Test results of the Newton-ILU(0)-GMRES parallel algorithm
Test runs for the considered problem (8) were made on the MVS-10P supercomputer of the Interdepartmental Supercomputer Center of the Russian Academy of Sciences, which includes nodes with two Xeon X5450 processors and 8 GB of RAM for each node. The test results are shown in Table 3.
As we can see from the test results, the parallel program running on 8 processors achieved the shortest runtime. As the size of the task increases, the parallel application achieves the shortest calculation time running on 64 processors, but it is not much different from the runtime of a parallel program running on 8, 128 and 256 processors. The reason is because with an increase in the number of processors, the computation time on each processor decreases, and the time spent on communications increases.
It can be seen that the runtimes on 64 and 128 processors are about the same. This finding is explained by the fact that on 64 processors, the communication time is less than the computation time, and on 128 processors, the computation time is less than the communication time, but in total, the time consumed is almost the same.
For matrix #1 on 256 and 512 processors, the parallel program is slower than the sequential program. With an increase in the size of the matrix, for instance, for matrices #2-#5, the runtime  of the parallel program decreases compared to the sequential program. This regularity makes it possible to predict that with an increase in the size of the matrix, in large numbers of processors, acceleration also increases. Figure 3 shows the parallel program acceleration. It can be concluded that it makes sense to run the parallel algorithm of the method on a large number of processors but only with large of matrix that cannot fit in the memory of a single node.
On matrix #1 with 256 and 512 processes, the parallel program considers it slower than the sequential one. With an increase in the size of the matrix for instance, on matrix #2-matrix #5, the runtime of the parallel program decreases compared to the sequential one. This regularity makes it possible to predict that with an increase in the size of the matrix, in large numbers of processes, acceleration also increases. Figure 3 shows the parallel program acceleration. It can be concluded that the parallel algorithm of the method makes sense to run on a large number of processors, only on large sizes that cannot fit in the memory of one node.

Conclusion
This research examines the parallel implementation of the Newton-ILU0-GMRES algorithm for a multicomponent three-phase flow problem. The considered problem is compared to the parallel implementation of the presented method and parallel program with the PETSc library on an 8-core personal computer. The results have shown that the presented Newton-ILU(0)-GMRES algorithm achieves faster convergence than the Newton-GMRES algorithm and has a shorter runtime than the parallel program with PETSc. Tests were also carried out with different node sizes and numbers of processes on supercomputers. The study showed that on a large number of processes, the MPI program runs better in large-scale tasks. In future research, we plan to consider other variants of  the method to improve the efficiency of the parallel algorithm. In particular, we plan to consider other options instead of the Arnoldi iteration and develop a fragmented algorithm (Akhmed-Zaki et al., 2019) that will allow users to not to spend time on the implementation of communications.