A parallel fictitious domain method for the interface-resolved simulation of particle-laden flows and its application to the turbulent channel flow

ABSTRACT A parallel direct-forcing (DF) fictitious domain (FD) method for the simulation of particulate flows is reported in this paper. The parallel computing strategies for the solution of flow fields and particularly the distributed Lagrange multiplier are presented, and the high efficiency of the parallel code is demonstrated. The new code is then applied to study the effects of particle density (or particle inertia) on the turbulent channel flow. The results show that the large-scale vortices are weakened more severely, and the flow friction drag increases first and then reduces, as particle inertia is increased.


Introduction
Particle-laden flows are ubiquitous in nature and industrial settings such as fluidized bed reactors and slurry transportation. The two-fluid model and discrete particle model (DPM) based on the point-particle approximation are two traditional methods commonly used to treat the particle-laden flows in engineering applications, whereas the interface-resolved simulation method has been developed to advance mechanistic understanding in multiphase flows in the past two decades (Balachandar & Eaton, 2010;Tryggvason, 2010). In the interfaceresolved simulation method, the no-slip boundary condition on the particle-fluid interface is considered and the hydrodynamic force on the particles is determined via the solution of the flow fields outside the particle boundaries. Unlike the two-fluid and discrete particle models, the interface-resolved method directly computes the hydrodynamic force on the particles, and therefore is also called a direct numerical simulation method for the particle-laden flows (Hu, Patankar, & Zhu, 2001).
There are a range of different interface-resolved solvers, including the arbitrary Lagrangian-Eulerian method (Hu et al., 2001), the lattice Boltzmann method (Ladd, 1994), the fictitious domain (FD) method (Glowinski, Pan, Hesla, & Joseph, 1999) and the immersed boundary method (Uhlmann, 2005). The FD method for particulate flows was proposed by Glowinski et al. (1999), and the key idea in this method is that the interior domains of the particles are filled with the same fluids CONTACT Corresponding author. Email: mecsxm@zju.edu.cn as the surroundings, and the distributed Lagrange multiplier (i.e., a pseudo body-force) is introduced to enforce the interior (fictitious) fluids to satisfy the constraint of rigid-body motion. Yu and colleagues have made new implementations of the FD method and conducted many successful applications with a serial code (Shao, Wu, & Yu, 2012;Shao, Yu, & Sun, 2008;Yu, Phan-Thien, & Tanner, 2004;Yu & Shao, 2007, 2010Yu, Shao, & Wachs, 2006). The primary aim of the present study is to develop an efficient parallel code of the FD method. We note that the parallelizations of the immersed boundary methods or the FD methods for the immersed objects were reported previously (Blasco, Calzada, & Marin, 2009;Borazjani, Ge, Le, & Sotiropoulos, 2013;Clausen, Reasor, & Aidun, 2010;Tai, Zhao, & Liew, 2005;Uhlmann, 2003;S. Wang, He, & Zhang, 2013;Z. Wang, Fan, & Luo, 2008;Yildirim, Lin, Mathur, & Murthy, 2013). Nevertheless, the time and spatial discretization schemes of our FD method are different from those reported and, in particular, our key parallelization strategy for the particle-related problem (i.e., the introduction of main and subordinate particle lists) have not been mentioned in previous studies. In addition, our new parallel code is used to investigate the effects of particle inertia on the turbulent channel flow. The rest of paper is organized as follows. The parallelization algorithms of our FD method are presented in section 2. The scalability of the parallel code is tested in section 3. In section 4, the new code is applied to the simulation of particle-laden turbulent channel flows with different particle-to-fluid density ratios. Concluding remarks are given in the final section.

Formulation of the FD method
In the FD method, the interior of each solid particle is filled with the fluid and a pseudo body-force is introduced over the particle inner domain to enforce the fictitious fluid to satisfy the rigid-body motion constraint. For simplicity of description, only one particle is considered. Let ρ s , V p , J, U, and u s represent the particle density, volume, moment of inertia tensor, translational velocity, and angular velocity, respectively. Fluid viscosity and density are denoted by μ and ρ f , respectively. P(t) represents the solid domain, and the entire domain comprising both the interior and exterior of the body. The governing equations can be non-dimensionalized by introducing the following scales: L c for length, U c for velocity, L c /U c for time, ρ f U 2 c for pressure, and ρ f U 2 c /L c for the pseudo body-force. The dimensionless governing equations in strong form for the FD method are (Yu & Shao, 2007): In the above equations, u represents the fluid velocity, p the fluid pressure, l the pseudo body-force (Lagrange multiplier) defined in the solid particle domain, r the position vector with respect to the particle mass center, ρ r the solid-fluid density ratio defined by ρ r = ρ s /ρ f , g the gravitational acceleration, Re the Reynolds number defined by Re = ρ f U c L c /μ, Fr the Froude number defined by Fr = gL c /U 2 c , V p * the dimensionless particle volume defined by V * p = V p /L 3 c , and J* the dimensionless moment of inertia tensor defined by J * = J/ρ s L 5 c .

Discretization schemes
A fractional-step time integration scheme is used to decouple the combined system (1-5) into two subproblems.
The fluid sub-problem (6-7) is a Navier-Stokes problem. The projection scheme is used to further decompose Equations (6-7) into three sub-problems: (a) Helmholtz equation for velocity: (b) Poisson equation for pressure: (9) (c) Correction of velocity and pressure: The finite-difference scheme on half-staggered grids is employed for spatial discretization. All spatial derivatives are discretized with the second-order central difference scheme.
(1) Particle sub-problem for U n+1 , u n+1 s , l ±1 n , u n+1 : In Equations (11-12), all the right-hand-side terms are known quantities, so U n+1 and u n+1 s are obtained explicitly. Then, l ±1 n defined at the Lagrangian nodes can be determined as follows: Finally, the fluid velocities u n+1 at the Eulerian nodes are corrected: In above equations, the quantities with the superscript 'n' mean their values at the time of n t, i.e., at the nth time-level, whereas u* and u # represent the fluid velocities at the fractional time-steps from the nth to (n + 1)th time-levels.
We choose the tri-linear function, as a discrete delta function, to transfer the quantities between the Lagrangian and Eulerian nodes (Yu & Shao, 2007). Only spherical particles are considered in the present study. The position of the particle mass center can be determined from the kinematic equation dX/dt = U. The reader is referred to Yu and Shao (2007) for more details on our FD method.

Parallel algorithms
Since the computational domain is rectangular or can be extended to be rectangular with the FD technique, it is natural to take the domain decomposition as the parallel strategy: one process copes with one sub-domain. A message passing interface (MPI) is adopted to transfer data between sub-domains.

Parallelization strategy for the fluid sub-problem
In the fluid sub-problem, the main tasks are to solve the velocity Helmholtz equation (8) and the pressure Poisson equation (9). In our serial code, the Helmholtz equation is discretized into a tri-diagonal algebraic system by using the alternating direction implicit (ADI) scheme, and the Poisson equation for the pressure is solved with a combination of a fast cosine transformation (FCT) and a tri-diagonal system solver (Yu & Shao, 2007). For our parallel code, we employ the multi-grid iterative method to solve these two equations, considering that it is more convenient to parallelize the multi-grid solver than the combination of FCT and the tri-diagonal system solver. The multi-grid algorithm mainly includes (Wesseling, 1992): (1) Gauss-Seidel relaxation with red-black ordering for the smoothing of residuals; (2) Full weighting of residuals for restriction from fine to coarse grids, and linear interpolation of residuals for prolongation from coarse to fine grids.
Note that the fluid velocity is defined on grid points, whereas the pressure is defined on the cell center points, therefore, the restriction and interpolation between fine and coarse grids for the fluid velocity and pressure are different (Wesseling, 1992). The reader is referred to Wesseling (1992) for a detailed description of the multi-grid method. For parallelization, the data of each sub-domain are sent to its 26 neighboring sub-domains in terms of point, line and face forms, respectively, depending on the demand from the neighboring sub-domains.

Parallelization strategy for the particle sub-problem
Since the pseudo body-force λ is defined on the Lagrangian nodes and the fluid velocity u defined on the Eulerian nodes, mutual interpolations between these two types of nodes are required. In Equations (8) and (14), we need to transfer (or distribute) the pseudo bodyforce λ from the Lagrangian nodes to the Eulerian nodes, and in Equations (11-13) we need to transfer the fluid velocity u* from the Eulerian nodes to the Lagrangian nodes. In the present work, we choose the tri-linear function to transfer the quantities between the Lagrangian and Eulerian nodes. When a particle is crossing the boundary of a sub-domain (e.g., particle B in Figures 1  and 2), its Lagrangian points are located in different subdomains. Therefore data communication between subdomains is needed for the mutual interpolation between the Lagrangian and Eulerian points.
The key of our algorithm is to establish two particle lists for each sub-domain: a main list (MP_LIST) and a subordinate list (SP_LIST). The main list is used to update the particle data in simulations and the subordinate list for the data communication in mutual interpolations. The MP_LIST contains the data of particles whose central positions are located in the sub-domain considered, and the SP_LIST stores the information of particles whose central positions are not located in the current sub-domain but whose pseudo body-forces have influence on (or are affected by) the velocities in the sub-domain due to the mutual interpolations. For subdomain 5 in Figure 2, as an example, particles A and B belong to MP_LIST because of their central positions located in sub-domain 5. The particles C, D, E, and F belong to SP_LIST. Note that particle F belongs to SP_LIST because we assume that the gap distance between the particle and the boundary of sub-domain 5 is smaller than the size of one mesh (h), and thus its pseudo body-force affects (or is affected by) the velocity in sub-domain 5 (or on the boundary of sub-domain 5) via the interpolation, although none of the particle domain overlaps with sub-domain 5. The critical gap distance to determine whether particle F belongs to SP_LIST depends on the type of interpolation function (or discrete delta function), and is equal to the mesh size h for the tri-linear function we employed. Particle F is detected by the processor of sub-domain 1 as a member of its MP_LIST and a member of the SP_LIST of subdomain 5, according to the particle position. Sub-domain 5 finds this particle by receiving the information from sub-domain nbsp;1. For both MP_LIST and SP_LIST, we establish a new data structure to store the information of particles, including the particle sequence number, position, translational and angular velocities, pseudo body-force and fluid velocity defined on the Lagrangian nodes. For a particle in SP_LIST of each sub-domain, the fluid velocities on the Lagrangian nodes are computed from the interpolation of the velocities on the Eulerian points in the local sub-domain, and thus only the values at some of the Lagrangian points are obtained (e.g., the yellow Lagrangian points of particle B in sub-domain 6 in Figure 2). For particle B in Figure 2, these fluid velocities at the yellow Lagrangian points in the SP_LIST of sub-domain 6 are just needed by the MP_LIST of sub-domain 5 for a complete set of data. In order to transfer the data in the SP_LIST in a sub-domain to the MP_LIST in a neighboring sub-domain, we establish an index list for SP_LIST, whose contents include the sequence number of the sub-domain in which the particle center is located (i.e., identifying the MP_LIST), the index (sequence number) of the particle in the identified MP_LIST, and the flags of Lagrangian nodes (being 1 if the fluid velocity at this Lagrangian node is computed in this sub-domain, otherwise 0).
The full algorithm of our parallel FD method is the following: (1) Code initialization, including the construction of the MP_LIST for each sub-domain with a new data structure.
(2) Establish the SP_LIST (as well as the index list) by exchanging the particle data in the MP_LIST among sub-domains. For a particle in the MP_LIST of one sub-domain, the particle data will be sent to the neighboring sub-domain if it is judged to belong to the SP_LIST of the neighboring sub-domain from the distance between the particle surface and the boundary of the neighboring sub-domain for the spherical particle considered. and pseudo body-force l n+1 of the particle in the MP_LIST for each sub-domain from Equations (11-13). (7) Correct the fluid velocity u n+1 from Equation (14).
The manipulation is the same as in step (3). (8) Deal with the collision of the particles with the collision model described below, if needed -otherwise, directly update the particle positions in the MP_LIST with the velocities obtained. (9) Update the MP_LIST. When the particle's central position changes from one sub-domain to another sub-domain, the particle is added to the MP_LIST of the new sub-domain and deleted from the MP_LIST of the old sub-domain. (10) Delete the SP_LIST and the index list, and go back to step (2) for the next time-step.
A flowchart for our parallel FD method, corresponding to the above algorithms, is presented in Figure 3.

Parallel implementation strategy for collision model
Next, we describe our parallel algorithm for particle collision. The soft-sphere collision model is chosen, and the artificial repulsive force is given as where F 0 is a constant, d is the particle inter-distance, d c is a cut-off distance, and n is a unit vector of the particle connector. The procedure for our collision model is as follows: (8a) Establish a particle list SP_LIST_C for each subdomain by gathering the particle data in the MP_LIST of the neighboring sub-domains. (8b) Seek the particle pairs satisfying the collision criterion by considering the relative positions (or possibly velocities for the hard-sphere model) of the particles in the MP_LIST with respect to other particles in the MP_LIST (same sub-domain), in the SP_LIST_C (neighboring sub-domains) and the wall boundaries. In addition, we use a small time-step (normally one tenth of that for the solution of flow fields) for the collision model in order to improve the stability of the explicit scheme for the repulsive force (Glowinski et al., 1999).

Scalability of the parallel code
We choose two cases of particle-laden flows with grid numbers of 512 3 and 1024 3 to test the parallel efficiency of our FD code. The particle numbers are 1024, 10,240, and 102,400, respectively, corresponding to the volume fractions of 0.1%, 1% and 10%. One particle diameter covers 6.4 grids for the 512 3 grid case and 12.8 grids for the 1024 3 grid case. The tests were performed on  the Yellowstone supercomputer at the National Center for Atmospheric Research. We ran one case for several time-steps and a few times, and then calculated the mean computational (wall-clock) time per one time-step. The particle collision and data input/output (IO) were not considered, since we focused on the parallel efficiency of our FD scheme. The particles were initially distributed randomly and homogeneously over the domain. Figure 4 shows the computational time for one time-step as a function of core (or processor) number. In Figure 4(a), we observe that for the case of 512 3 the computational time is roughly reduced by half when the core number is doubled for the core number up to 1024, which corresponds to the grid number per core of 64 × 64 × 32. In other words, the parallel efficiency of our FD code is almost 100%, when the grid number per core does not exceed 64 × 64 × 32, above which the efficiency drops due to the increased data communication time relative to the computation time. In addition, Figure 4 shows that the effect of the particle number on the computational time is insignificant for three cases of particle numbers tested; the computational time caused by the presence of the particles is around 10% of the total computational time for the case of 102,400 particles. Figure 4(b) shows that for the case of 1024 3 grid resolution, the speed-up of our parallel code remains largely linear, as the core number is increased up to 8192 (64 × 64 × 32 grid number per core).

Application example
As mentioned earlier, our parallel FD method differs from our serial FD method only in the solvers for the pressure Poisson equation and the velocity Helmholtz equation. Some tests (such as particle sedimentation and particle inertial migration) show that both versions of our FD method produce almost the same results. Since our serial FD code has been validated in previous studies (Yu & Shao, 2007, 2010, the validations of our new parallel code are not shown here. We apply the new code to the investigation of the particle inertial effects on the turbulent channel flow. Our method is characterized by using the direct numerical simulation methods for both particle-laden and turbulent flows. Such methods have been applied to the simulations of particle-laden isotropic turbulent flows (Gao, Li, & Wang, 2013;Lucci, Ferrante, & Elghobashi, 2010; Ten Cate, Derksen, Portela, & Van den Akker, 2004) and turbulent channel flows (Do-Quang, Amberg, Brethouwer, & Johansson, 2014;Kidanemariam, Chan-Braun, Doychev, & Uhlmann, 2013;Pan & Banerjee, 1997;Shao et al., 2012;Uhlmann, 2008), but not for the effects of the particle inertia (i.e., particle-fluid density ratio) on the turbulent channel flow. In the following, we introduce the numerical model for the problem of interest, validate the accuracy for the single-phase case, conduct the meshconvergence test for the particle-laden case, and finally report the main results. Figure 5 shows the geometrical model of our particleladen channel flow. Periodic boundary conditions are imposed in the streamwise and spanwise directions, and a non-slip wall boundary condition in the transverse direction. Let x, y and z represent respectively the streamwise, transverse and spanwise directions, a the particle radius, and H half of channel height in the transverse direction. We choose H and wall friction velocity u τ as the characteristic length and velocity for the nondimensionlization scheme. The wall friction velocity u τ is defined by the wall shear stress τ and the fluid viscosity ρ f : u τ = τ/ρ f . The friction Reynolds number is defined by Re τ = ρ f u τ H/μ. Because of the periodic boundary condition in the streamwise direction, a constant pressure gradient ∇p e is needed to overcome the wall friction drag and sustain the flow. The value of the constant pressure gradient is ∇p e = −τ/H, corresponding to the dimensionless value of ∇p e /(ρ f u 2 τ /H) = −1. The friction Reynolds number Re τ is 180. To study the effects of the particle inertia on the turbulent channel flow, different particle-fluid density ratios of ρ r = 1.0, 10.42, 104.2 are considered, and the gravity effect is ignored. The particle radius is a/H = 0.05. The size of our computational domain is 8H × 2H × 4H. A uniform grid is employed and the grid number is 512 × 128 × 256. The particle volume fraction is 0.84%, corresponding to the particle number of N P = 1024. The dimensionless time-step is 0.0001 for ρ r = 104.2 and 0.0002 for the other cases.

Validation for the single-phase case
In Figure 6, our results on the mean and root-meansquare (RMS) velocities of the particle-free turbulent channel flow are compared to those of Hoyas and Jimenez (2008), who employed the pseudo-spectral method. It can be seen that our results for the case of L = 8H (i.e., the computational domain size being 8H × 2H × 4H) agree well with those of Hoyas and Jimenez. For the lowfriction Reynolds number of 180 considered, the sizes of 4H × 2H × 2H and 2H × 2H × 2H appear too small to produce size-independent results. This is the reason why we chose the computational domain size of 8H × 2H × 4H.  . Mean velocity profiles for particle-free and particleladen turbulent channel flows.

Mesh-convergence test for the particle-laden case
For the turbulent particle-laden channel flows, no benchmark data are available to validate the accuracy and, to our knowledge, no mesh-convergence tests have been performed, presumably due to high computational cost. For our simulation case of a/H = 0.05, there are only 3.2 meshes per particle radius, and one may question whether such mesh resolution is high enough to ensure acceptable accuracy. With the parallel code, it is now possible for us to conduct a mesh-convergence test. The parameters for the test case are a/H = 0.05 and ρ r = 104.2. The reason why we chose this density ratio is that the RMS velocities for ρ r = 104.2 deviate significantly from those for the particle-free case. For mesh h = a/3.2, the grid number is 512 × 128 × 256 and the time-step is 0.0001, whereas for mesh h = a/6.2, the grid number is 1024 × 256 × 512 and the time-step is 0.00005,  which is required by the stability condition due to mesh refinement. The results on the RMS velocities for two meshes are plotted in Figure 7. We observe a good agreement between the two results. The maximum relative error at the peaks of the streamwise RMS velocities is around 3% (2.35 vs 2.28) in Figure 7. Figure 8 shows the mean velocity profiles for all cases studied. One can see that the flow flux first reduces and then increases, as the density ratio increases. Since the pressure gradient is fixed, a reduction in flow flux means an increase in flow resistance. Figure 9 shows the RMS velocities in three directions. Generally, the presence of particles enhances the RMS velocities near the wall (very close to the wall for the streamwise component and not clearly shown in Figure 9) and attenuates the RMS velocities at the region away from the wall. The suppression of the RMS velocities at the bulk region becomes more pronounced for larger density ratios. Figure 10 shows the vortex structures for the singlephase case and the density ratios of 10.42 and 104.2. The presence of particles weakens the large-scale vortices, and the effect is enhanced as the density ratio increases. A significant suppression of the large-scale vortices is found at ρ r = 104.2, which is clearly responsible for the considerable attenuation of the RMS velocities at ρ r = 104.2 in Figure 9. The presence of particles causes additional viscous dissipation (Lucci et al., 2010), which has dual effects on the flow drag. On one hand, more viscous dissipation means higher viscosity of the suspension and thereby larger flow drag. On the other hand, more viscous dissipation leads to suppression of the large-scale quasi-streamwise vortices which are primarily responsible for the drag-enhancement of turbulence with respect to laminar flow, and thereby lower flow resistance. The competition of these two effects gives rise to the results observed earlier: the flow drag first increases and then decreases with increasing density ratio.

Conclusions
We have presented a parallel direct-forcing fictitious domain (DF/FD) method for the simulation of particulate flows and demonstrated its high parallel efficiency. The new code has been applied to studying the effects of particle inertia on the turbulent channel flow. The results show that the large-scale vortices are weakened more severely, and the flow friction drag increases first and then reduces, as particle inertia increases. We conjecture that the competition of the dual effects of particleinduced viscous dissipation on the flow drag is responsible for the variation of the flow drag with particle inertia. Our results on the particle inertial effects in the present study are preliminary, and we plan to conduct a more extensive study in the future.