Fast reactive flow simulations using analytical Jacobian and dynamic load balancing in OpenFOAM

Detailed chemistry-based computational fluid dynamics (CFD) simulations are computationally expensive due to the solution of the underlying chemical kinetics system of ordinary differential equations (ODEs). Here, we introduce a novel open-source library aiming at speeding up such reactive flow simulations using OpenFOAM, an open-source C++ software for CFD. First, our dynamic load balancing model DLBFoam (Tekg\"ul et al., 2021) is utilized to mitigate the computational imbalance due to chemistry solution in multiprocessor reactive flow simulations. Then, the individual (cell-based) chemistry solutions are optimized by implementing an analytical Jacobian formulation using the open-source library pyJac, and by increasing the efficiency of the ODE solvers by utilizing the linear algebra package LAPACK. We demonstrate the speed-up capabilities of this new library on various combustion problems. These test problems include a 2D turbulent reacting shear layer and 3D stratified combustion to highlight the favorable scaling aspects of the library on ignition/flame front initiation setups for dual-fuel combustion. Furthermore, two fundamental 3D demonstrations are provided on non-premixed and partially premixed flames, namely the ECN Spray A and the Sandia flame D experimental configurations. The novel model offers up to two orders of magnitude speed-up for most of the investigated cases. The openly shared code along with the test case setups represent a radically new enabler for reactive flow simulations in the OpenFOAM framework.


Introduction
There is an urgent need to develop effective and accurate engineering so- In this work, we proceed in a similar track and focus on the search for the bottlenecks in reacting flow simulations within OpenFOAM with the aim of significant computational speed-up.
One of the most fundamental approaches for reactive flow modeling is to use CFD with the direct integration of chemical kinetics, referred to as finite-rate chemistry [2]. In this approach, each computational cell is treated as an individual chemistry problem with pressure (p), and a thermochemical state vector (Φ) comprising temperature (T ) and species mass fractions (Y k ).
The rate of change of the thermochemical state vector, ∂Φ/∂t = f (Φ, t), within a computational cell forms stiff system of ordinary differential equations (ODEs) that requires a specific class of algorithms for temporal integration.
In reactive CFD applications, chemistry evaluation comprises the most computationally demanding part of the simulation. With the growing complexity of chemical kinetic mechanisms, the cost of solving the chemistry problem often exceeds the cost of fluid flow solution by up to two orders of magnitude [3]. Nevertheless, high-fidelity reactive flow simulations can be still feasible through efficient usage of the provided computational resources, in addition to cell-based optimization of the chemistry solution [4]. In gen-eral, considerable efforts have been made in the literature to improve the computational performance of CFD simulations involving turbulence [5][6][7][8][9], shock waves [10], gas rarefaction [11], geometrical parameterization [12], or chemical reactions [13][14][15][16] through model reduction, machine learning, or efficient parallelization.
In finite-rate chemistry simulations, the high computational cost associated with the chemistry solution originates mainly from two factors. First, the cost of solving the system of ODEs depends on the system stiffness which is, to some extent, influenced by the local thermochemical state. For example, an oxidation process comprising different radicals at various chemical timescales may constitute a greater computational challenge compared to a local state of burnt mixture. Second, the cost of computing the rate of change of the thermochemical state vector is directly proportional to the chemical mechanism size [17,18], which can reach up to thousands of species and even more reactions [19,20].
For the chemistry ODE problem, implicit solvers involving Jacobian evaluation are usually preferred over explicit ones for stability and accuracy reasons [21]. The evaluation of the Jacobian, J = ∂f /∂Φ, is a computationally demanding process within an ODE solution routine. ODE solvers often employ finite differencing methods for its evaluation, which may be an expensive operation depending on the chemical mechanism size. Lu et al. [17] reported that the computational cost of numerical evaluation of Jacobian scales quadratically with the number of reactions. It has been shown that using an analytically computed Jacobian provides high performance gain during the ODE system solution [17,22,23]. There are various implementations of the analytical Jacobian matrix evaluation in the literature [23][24][25]. Recently, Niemeyer et al. [26] introduced an open-source library pyJac, which generates C subroutines for analytical Jacobian evaluation. They demonstrated its accuracy and out-performance over the available analytical Jacobian generators in the literature.
While the cost of integrating a single chemistry ODE problem poses a performance challenge in reactive CFD simulations, there is a secondary issue which is often left unaddressed. As previously mentioned, the computational cost of solving a chemistry ODE depends on the local stiffness of Φ. The difference in computational complexity throughout the geometrical domain poses a load imbalance issue for multiprocessor applications, where one process becomes the computational bottleneck and causes performance issues. There have been several studies in the recent literature tackling this load imbalance issue via utilizing dynamic load balancing algorithms, often using Message-Passing Interface (MPI) routines. Antonelli et al. [27] developed an MPI-based model introducing a cell distribution based load balancing algorithm. Following this work, Shi et al. [28] and Kodavasal et al. [29] both introduced stiffness-detection based load balancing algorithms and employed them in reactive CFD simulations. Zirwes et al. developed an MPI-based dynamic load balancing algorithm for chemistry problem distribution for OpenFOAM [30]. More recently, Muela et al. [31] presented a dynamic load balancing model, which also utilizes a stiffness detection algorithm that chooses the optimal ODE integration method for each computational cell. These methods introduced computational speed-up around 3 to 5, depending on the application. We have recently also introduced an open-source dynamic load balancing model called DLBFoam for OpenFOAM, providing speed-up up to a factor of 10 [4].
In this work, we introduce a novel chemistry model that provides speedup in reactive flow simulations in OpenFOAM by targeting the two major issues in reactive CFD simulations addressed above. We further extend our dynamic load balancing model DLBFoam, and focus on the optimization of the cell-wise chemistry solution by introducing two new features. First, we utilize an analytical Jacobian approach using the pyJac library. Second, we make use of the standard linear algebra library (LAPACK) [32]  The paper is outlined as follows. First, implementation details of the developed model are presented in Section 2. Then, in Section 3 we discuss the application of our model to different combustion simulations, represented by the aforementioned test cases. Finally, conclusions are provided in Section 4.

Implementation
In this section, implementation details of our model are presented. We note that an earlier in-house version of this implementation was carried out by Kahila et al. [33], details of which can be found in his thesis work [34].
This section briefly discusses the DLBFoam model introduced by Tekgül et al. [4] as an improvement to our in-house code, and further describes the efforts towards optimizing the solution of the chemistry ODE problem by utilizing an analytical Jacobian formulation (pyJac) together with efficient linear algebraic routines (LAPACK).

Finite rate chemistry
In reactive CFD applications, we numerically solve the governing equations for mass, momentum, energy, and species transport. The total production or consumption rate of chemical species, and the heat release rate are represented as source terms in the species transport equations and the energy equation, respectively. Due to the vast difference between the chemistry and flow time scales, commonly an operator splitting procedure is utilized to separate the calculation of source terms from the flow solution. These source terms define the change of the local thermochemical state which results from solving a stiff, nonlinear initial value problem described as: where N sp is the number of species in the chemical mechanism. The ODE system in Equation (1) is integrated in each computational cell over the CFD time step ∆t CFD in smaller chemical time steps ∆t ODE ≤ ∆t CFD (see Part C for more information). The source terms are then evaluated and used in species transport and energy equations. Further details on this modeling approach can be found in the work of Imren and Haworth [35].

Dynamic load balancing: DLBFoam
Next, the previously developed DLBFoam library is briefly introduced here as essential background information. As mentioned earlier, the computational cost of solving the stiff nonlinear ODE may depend on the local thermochemical conditions. In multiprocessor reactive CFD simulations, the process with the highest computational load may take longer to compute than the rest, hence creating a bottleneck within the given CFD time step.

Recently, we introduced an open-source model called DLBFoam [4] for
OpenFOAM, aiming to mitigate this imbalance issue in multiprocessor reactive CFD simulations via dynamic load balancing. DLBFoam uses MPI routines to redistribute the chemistry computational load evenly between processes during the simulation. Figure

ODE solver optimization and analytical Jacobian
The new development in the present paper introduces a coupling of 1) the analytical Jacobian formulation introduced in pyJac library, 2) standard linear algebra routines from LAPACK library for the ODE solution, and 3) the DLBFoam library previously introduced by Tekgül et al. [4]. Within a CFD time step, chemistry is treated as an independent, stiff system of ODEs.
Implicit and semi-implicit algorithms for solving this system of stiff ODEs are usually preferred to ensure the solution stability [21].
The solution to the nonlinear initial value problem Equation (1), can be obtained using a variant of Newton iteration with the following discretized which is then linearized by neglecting the higher-order terms. Here, ∆t ODE is a subinterval of the full integration interval ∆t CFD , which is commonly determined by the underlying ODE solver using a combination of theoretical and heuristic relations. The direct solution of the previous equation for Φ n+1 requires the inverse of J , which is commonly avoided through matrix factorization (i.e., LU decomposition) and back substitution techniques.
Commonly in reactive CFD codes, the Jacobian of the system of ODEs representing chemistry is implemented via finite differencing to be used in ODE solver integration. However, using such an approximation introduces a negative effect on the ODE solver accuracy and performance [26]. Furthermore, calculating the Jacobian with finite differencing is a computationally expensive procedure [17,26]. Instead, utilizing a fully algebraic Jacobian is more efficient and it improves the ODE solver accuracy and performance. Lu et al. [17] reported that using an analytical Jacobian reduces its evaluation time from square of the number of species to a linear dependence. OpenFOAM features a fully algebraic Jacobian implementation to speed up the solution of chemical kinetics in its recent releases [36]. However, we have found that the implemented algebraic Jacobian fails to deliver a fast solution to chemistry ODE integration when high ODE convergence tolerances are utilized. Instead, we utilize the python-based open-source software pyJac, introduced by Niemeyer et al. [26]. The pyJac package generates C subroutines for evaluation of an analytical Jacobian for a chemical mechanism. They reported that pyJac performs 3 to 7.5 times faster than the first-order finite differencing approach. In our proposed model, OpenFOAM's Jacobian calculation subroutines are replaced with the subroutines generated by pyJac.
The fully algebraic Jacobian utilization via pyJac speeds up the Jaco- In this study, LAPACK functions were linked to a semi-implicit extrapolationbased Euler method, denoted as Seulex [21,35]. Seulex requires the specification of relative and absolute tolerances of the solution along with an initial estimate of ∆t ODE . As the solution progresses, the solver improves the estimate for ∆t ODE at the current thermochemical state using theoretical relations and heuristics. The previous available value is then used as an initial value for the following integration interval. Furthermore, a ∆t ODE is recursively split into a number of partitions and each partition is then solved using a low-order method. Finally, the high-order solution over the whole interval ∆t ODE is recursively gathered from solutions of the smaller partitions.
If the solution does not satisfy the tolerance criteria, partitioning continues until it is satisfied. A more detailed explanation of the Seulex algorithm can be found in the work by Imren and Haworth [35].
As noted earlier, each computational cell has its own independent ODE problem with a thermochemical state vector comprising only intensive mix-ture properties. Furthermore, the ODE solver guarantees that if the solution has converged, the error will lie within user-specified tolerance regardless of the width of the integration interval ∆t CFD . However, the error related to the operator splitting technique needs to be taken care of. The assumption behind the operator splitting is that the chemistry timescales are orders of magnitude smaller than the flow timescales or the CFD solver time step ∆t CFD . Nevertheless, the time step ∆t CFD is to be chosen so that the species and temperature transport effects over a time step are accounted for and the thermochemical state vector is updated correspondingly. We note that in this study we do not consider any chemical source term closures for the filtered or averaged equations. Such turbulence-chemistry interaction (TCI) models could significantly depend on the mesh parameters and the CFD time   Utilization of pyJac together with LAPACK allows using much tighter ODE tolerances.

2D reacting shear layer
The first test case presents a simple, temporally evolving 2D reacting shear layer which is computationally affordable even on a personal computer. The presented test case is fundamentally related to the ignition and flame initiation of DF combustion in compression-ignition engines such as the reactivity controlled compression ignition (RCCI) engine [33,43]. We demonstrate the computational speed-up and scaling effects of DLBFoam+pyJac with respect to the number of cores.
The schematic of the present numerical setup is illustrated in Figure 4a.
The setup contains a high reactivity n-dodecane jet stream that mixes with the surrounding oxidizer consisting of premixed methane, oxidizer and exhaust gas recirculation (EGR). The simulation domain is a square box with a side length D = 1.5 mm. The number of grid points in both directions is 300 which is based on the pre-estimated laminar flame thickness for n-dodecanemethane premixed flame at the corresponding most reactive mixture fraction, p = 60 bar, and T reactants = 800 K (δ f ≈ 50 µm) to resolve the flame by 10 grid points. A hyperbolic tangent function is used to generate the shear layer between the n-dodecane and oxidizer streams. The n-dodecane jet is initially set to 700 K whereas the methane/air mixture is set to 900 K at a constant pressure of 60 bar. The initial conditions corresponding to the ambient premixed mixture are similar to our previous DF spray studies [33,[43][44][45][46][47] as reported in Table 2. The n-dodecane jet moves initially with a relative velocity of 10 m/s to the methane/air stream and develops a Kelvin-Helmholtz instability. A skeletal chemical mechanism (54 species and 269 reactions) developed by Yao et al. [38] for n-dodecane combustion is used. The performance of this mechanism in DF context has been already demonstrated in our earlier studies [33,44,46].     Next, we report the computational speed-up and the parallel efficiency provided by our model for different processor counts, as depicted in Figure 6.

3D stratified combustion
After providing speed-up benchmarks along with scaling tests in the previous section, we next present another academic test case in this section, i.e. 3D stratified combustion (Figure 7). This case can not be even tested for 10 iterations using Standard within a reasonable computational time. Hence, the purpose here is mainly to show that it is possible to investigate 3D ignition/combustion problems using DLBFoam+pyJac. This DF test case has close relevance to combustion mode design in modern engines such as RCCI. This test case is conceptually similar to the previous 2D setup by Karimkashi et al. [48]. As shown in the schematic Figure 4b, the computational domain is a 3D cube with periodic boundary conditions and a side length D = 1 mm.
The grid size is 4 µm with 256 grid points along each direction, to capture the flame with at least 10 grid points within the laminar flame thickness (δ f ≈ 40 µm). The initial pressure is set at 60 bar and the initial temperature is homogeneous within the entire computational domain at 800 K, which is relevant to the estimated temperature at the most reactive mixture fraction for DF methane/n-dodecane based on 0D simulations [44,46,47,49].
Here, pure n-dodecane is initially constrained in a blob in the middle of the domain with a diameter d = D/3. The ambient gas consists of premixed methane, oxidizer and EGR at φ CH4 = 0.5, similar to the previous test case, with the composition reported in Table 2. As in the previous test case, Yao mechanism is used.
Turbulence is initialized using Taylor-Green-Vortex (TGV) structure, which is generated in a separate non-reactive 3D simulation with an initial velocity U = 50 m/s and reference length L ref = D/2π. We let the initialized TGV to evolve in time until the gradient of total kinetic energy reaches its peak and then, velocity and pressure fields are mapped to the reactive case as initial fields. The estimated Reynolds number at the start of reacting simulation is Re ≈ 1000. In this case study, the interactive roles of convection, diffusion and reaction are investigated. First, the initial n-dodecane is mixed with the ambient gas at early simulation time instances, long before the secondstage ignition (τ 2 ). Here, τ 2 ≈ 0.28 ms is defined as the first time instance when T > 1500 K. At τ 2 , the stratified mixture ignites at the most reactive mixture fractions. The ignition front propagates and finally forms laminar premixed flame fronts, which complete the combustion in the standard deflagration mode. The present observations in 3D are in close agreement with the observations made in our previous study in 2D [48]. Figure 7 depicts 2D cut-planes for temporal evolution of temperature (top row), n-dodecane (second row), methane (third row), and hydroxyl radical OH (bottom row) wherein time is normalized by τ 2 . It is observed that a stratified mixture of n-dodecane and methane/oxidizer is formed before any significant temperature rise in the system (t = 0.7τ 2 ). Closer to the ignition time, hot spots emerge (t = 0.9τ 2 ) and deflagrative-like fronts develop at a moderate rate with various wrinkled surfaces (t = 1.1τ 2 ). Slightly after the ignition time, the fronts evolve (t = 1.3τ 2 ) as clearly observed from the OH cutplane. Finally, the fronts merge (t = 1.8τ 2 ) and combustion is almost complete. We note that at t = 1.8τ 2 , only small fractions of n-dodecane and methane are left in the system, which are convected from other iso-surfaces to the displayed cutplane in Figure 7. In this case study, a speed-up gain by a factor of 2 to 3 orders of magnitude is observed with DLBFoam+pyJac compared to Standard. Therefore, it would not be possible to benchmark this problem using Standard for 100 iterations.
In particular, we evaluate the speed-up of our model against Standard by restarting the simulation from the first-stage ignition time (0.2 ms) and using the same number of processors (1152 processors) and identical toler-ances. While Standard simulation proceeds only 7 iterations in ∼ 76000 seconds of clock time, the respective time is only about 200 seconds with DLBFoam+pyJac; i.e. speed-up by a factor of ≈ 400 with DLBFoam+pyJac compared to Standard is achieved. As a final note, the total clock time for simulating this case using DLBFoam+pyJac with 1152 processors for 0.5 ms simulation time is ≈ 13 hours. Hence, DLBFoam+pyJac is a key enabler to study such 3D reactive flow configurations with OpenFOAM.

ECN Spray A
At this point, it is clear that DLBFoam+pyJac offers great advantages over Standard. In particular, DLBFoam+pyJac was shown to scale almost linearly with the number of processors in an academic case and enable 3D reactive simulations with a relatively large chemical kinetic mechanism and tight solver tolerances. Next, we move on to a more realistic 3D configuration i.e. the Engine Combustion Network Spray A test case with experimental validation data. Spray A represents an igniting non-premixed diffusion flame under engine-relevant conditions. Hence, it is an optimal test case for reactive CFD large-eddy simulation (LES) code validation. with our previous spray combustion works [27, 44-47, 50, 51], while the nobreakup model approach is used for droplet atomization which is further discussed by Gadalla et al. [51]. The volume-rendered spray flame shown in  After that, we demonstrate in Figure 10 the gained speed-up using DLBFoam+pyJac

Sandia flame D
The ECN Spray A case demonstrated excellent performance for LES of non-premixed spray flame using DLBFoam+pyJac. The last test case is the Sandia flame D [53] representing a piloted partially premixed methane-air flame. Sandia flame D is a well-known configuration from the TNF workshop flame series, which is often studied in the literature, e.g. [54][55][56][57]. Therefore, it is considered as a useful benchmark for reactive CFD LES code validation for gas burner flame investigations.  is overpredicted compared with the measured data. Overall, mass fraction prediction of short lived species like OH is a challenging task due to the nonlinear evolution of the species [55][56][57][58] and it is out of the scope of the present study. In the radial direction, the reactions are mostly occurring in the shear layer between the main jet and co-flow (r/D = 1), which is evident in the peaks of the mass fraction profiles. The radial profiles agree well with the experimental data on the rich side. However, mixing is overpredicted in the lean region (r/D > 1). Close to the nozzle and away from the central axis, the reaction zone may still be considered relatively laminar [54].
Therefore, the overpredicted profiles in the lean region could be explained by the utilized unity Lewis number assumption. Moreover, the temperature and species profiles also follow the same trend as the mixture fraction in the flame on the rich and lean sides.
Next, a speed-up test is carried out for a stabilized flame for 100 iterations with a constant time step of 1 × 10 −6 s. The results are presented in Figure 12. The DLBFoam provides speed-up by a factor of 4.52, which is significantly lower than with ECN Spray A. This can be explained by a relatively low number of mapped cells due to the location of the refinement zone in the core of the jet. Optimized ODE solver routines pyJac and LAPACK bring additional speed-up by a factor of 2.99. Here, the lower speed-up is a consequence of the use of the reduced mechanism, which as discussed earlier significantly reduces computational cost of the chemistry problem. However, the combined effect of the DLBFoam+pyJac is significant and brings about 1 order of magnitude speed-up to this case.  Finally, the source code and the case setup files used in this work are openly available [41,59] project. We gratefully thank Dr. Heikki Kahila (Wärtsilä Finland Oy) and Dr. Petteri Peltonen (VTT Oy) for their precious comments and technical advice throughout the implementation process.

Appendix A. Supplementary material
Library source code is openly available in a public GitHub repository [59].
Test case setups are openly available in a separate public GitHub repository [41].