The efficiency of Krylov iterative methods for the solution of linear systems can be greatly enhanced through preconditioning. For problems in which the user cannot define a more effective, problem-specific preconditioner, ARKode provides two internal preconditioner modules that may be used by ARKStep: a banded preconditioner for serial and threaded problems (ARKBANDPRE) and a band-block-diagonal preconditioner for parallel problems (ARKBBDPRE).
This preconditioner provides a band matrix preconditioner for use with iterative SUNLINSOL modules through the ARKLS linear solver interface, in a serial or threaded setting. It requires that the problem be set up using either the NVECTOR_SERIAL, NVECTOR_OPENMP or NVECTOR_PTHREADS module, due to data access patterns. It also currently requires that the problem involve an identity mass matrix, i.e. \(M = I\).
This module uses difference quotients of the ODE right-hand
side function \(f_I\) to generate a band matrix of bandwidth
ml + mu + 1
, where the number of super-diagonals (mu
, the
upper half-bandwidth) and sub-diagonals (ml
, the lower
half-bandwidth) are specified by the user. This band matrix is used
to to form a preconditioner the Krylov linear solver. Although this
matrix is intended to approximate the Jacobian
\(J = \frac{\partial f_I}{\partial y}\), it may be a very crude
approximation, since the true Jacobian may not be banded, or its true
bandwidth may be larger than ml + mu + 1
. However, as long as the
banded approximation generated for the preconditioner is sufficiently
accurate, it may speed convergence of the Krylov iteration.
In order to use the ARKBANDPRE module, the user need not define
any additional functions. In addition to the header files required
for the integration of the ODE problem (see the section
Access to library and header files), to use the ARKBANDPRE module, the user’s
program must include the header file arkode_bandpre.h
which
declares the needed function prototypes. The following is a summary
of the usage of this module. Steps that are unchanged from the
skeleton program presented in A skeleton of the user’s main program are
italicized.
Initialize multi-threaded environment (if appropriate)
Set problem dimensions
Set vector of initial values
Create ARKStep object
Specify integration tolerances
Create iterative linear solver object
When creating the iterative linear solver object, specify the type
of preconditioning (PREC_LEFT
or PREC_RIGHT
) to use.
Set linear solver optional inputs
Attach linear solver module
Initialize the ARKBANDPRE preconditioner module
Specify the upper and lower half-bandwidths (
mu
andml
, respectively) and call
ier = ARKBandPrecInit(arkode_mem, N, mu, ml);
to allocate memory and initialize the internal preconditioner data.
Set optional inputs
Note that the user should not call
ARKStepSetPreconditioner()
as it will overwrite the
preconditioner setup and solve functions.
Create nonlinear solver object
Attach nonlinear solver module
Set nonlinear solver optional inputs
Specify rootfinding problem
Advance solution in time
Get optional outputs
Additional optional outputs associated with ARKBANDPRE are
available by way of the two routines described below,
ARKBandPrecGetWorkSpace()
and
ARKBandPrecGetNumRhsEvals()
.
Deallocate memory for solution vector
Free solver memory
Free linear solver memory
The ARKBANDPRE preconditioner module is initialized and attached by calling the following function:
ARKBandPrecInit
(void* arkode_mem, sunindextype N, sunindextype mu, sunindextype ml)¶Initializes the ARKBANDPRE preconditioner and allocates required (internal) memory for it.
NULL
NULL
Notes: The banded approximate Jacobian will have nonzero elements only in locations \((i,j)\) with ml \(\le j-i \le\) mu.
The following two optional output functions are available for use with the ARKBANDPRE module:
ARKBandPrecGetWorkSpace
(void* arkode_mem, long int* lenrwLS, long int* leniwLS)¶Returns the sizes of the ARKBANDPRE real and integer workspaces.
realtype
values in the
ARKBANDPRE workspace.NULL
NULL
NULL
Notes: The workspace requirements reported by this routine
correspond only to memory allocated within the ARKBANDPRE module
(the banded matrix approximation, banded SUNLinearSolver
object, and temporary vectors).
The workspaces referred to here exist in addition to those given by
the corresponding function ARKStepGetLSWorkspace()
.
ARKBandPrecGetNumRhsEvals
(void* arkode_mem, long int* nfevalsBP)¶Returns the number of calls made to the user-supplied right-hand side function \(f_I\) for constructing the finite-difference banded Jacobian approximation used within the preconditioner setup function.
NULL
NULL
NULL
Notes: The counter nfevalsBP is distinct from the counter
nfevalsLS returned by the corresponding function
ARKStepGetNumLSRhsEvals()
and also from nfi_evals returned by
ARKStepGetNumRhsEvals()
. The total number of right-hand
side function evaluations is the sum of all three of these
counters, plus the nfe_evals counter for \(f_E\) calls
returned by ARKStepGetNumRhsEvals()
.
A principal reason for using a parallel ODE solver (such as ARKode) lies in the solution of partial differential equations (PDEs). Moreover, Krylov iterative methods are used on many such problems due to the nature of the underlying linear system of equations that needs to solved at each time step. For many PDEs, the linear algebraic system is large, sparse and structured. However, if a Krylov iterative method is to be effective in this setting, then a nontrivial preconditioner is required. Otherwise, the rate of convergence of the Krylov iterative method is usually slow, and degrades as the PDE mesh is refined. Typically, an effective preconditioner must be problem-specific.
However, we have developed one type of preconditioner that treats a rather broad class of PDE-based problems. It has been successfully used with CVODE for several realistic, large-scale problems [HT1998]. It is included in a software module within the ARKode package, and is accessible within the ARKStep time stepping module. This preconditioning module works with the parallel vector module NVECTOR_PARALLEL and is usable with any of the Krylov iterative linear solvers through the ARKLS interface. It generates a preconditioner that is a block-diagonal matrix with each block being a band matrix. The blocks need not have the same number of super- and sub-diagonals and these numbers may vary from block to block. This Band-Block-Diagonal Preconditioner module is called ARKBBDPRE.
One way to envision these preconditioners is to think of the computational PDE domain as being subdivided into \(Q\) non-overlapping subdomains, where each subdomain is assigned to one of the \(Q\) MPI tasks used to solve the ODE system. The basic idea is to isolate the preconditioning so that it is local to each process, and also to use a (possibly cheaper) approximate right-hand side function for construction of this preconditioning matrix. This requires the definition of a new function \(g(t,y) \approx f_I(t,y)\) that will be used to construct the BBD preconditioner matrix. At present, we assume that the ODE be written in explicit form as
where \(f_I\) corresponds to the ODE components to be treated implicitly, i.e. this preconditioning module does not support problems with non-identity mass matrices. The user may set \(g = f_I\), if no less expensive approximation is desired.
Corresponding to the domain decomposition, there is a decomposition of the solution vector \(y\) into \(Q\) disjoint blocks \(y_q\), and a decomposition of \(g\) into blocks \(g_q\). The block \(g_q\) depends both on \(y_p\) and on components of blocks \(y_{q'}\) associated with neighboring subdomains (so-called ghost-cell data). If we let \(\bar{y}_q\) denote \(y_q\) augmented with those other components on which \(g_q\) depends, then we have
and each of the blocks \(g_q(t,\bar{y}_q)\) is decoupled from one another.
The preconditioner associated with this decomposition has the form
where
and where \(J_q\) is a difference quotient approximation to \(\frac{\partial g_q}{\partial \bar{y}_q}\). This matrix is taken to be banded, with upper and lower half-bandwidths mudq and mldq defined as the number of non-zero diagonals above and below the main diagonal, respectively. The difference quotient approximation is computed using mudq + mldq + 2 evaluations of \(g_m\), but only a matrix of bandwidth mukeep + mlkeep + 1 is retained. Neither pair of parameters need be the true half-bandwidths of the Jacobian of the local block of \(g\), if smaller values provide a more efficient preconditioner. The solution of the complete linear system
reduces to solving each of the distinct equations
and this is done by banded LU factorization of \(P_q\) followed by a banded backsolve.
Similar block-diagonal preconditioners could be considered with different treatments of the blocks \(P_q\). For example, incomplete LU factorization or an iterative method could be used instead of banded LU factorization.
The ARKBBDPRE module calls two user-provided functions to construct
\(P\): a required function gloc (of type ARKLocalFn()
)
which approximates the right-hand side function \(g(t,y) \approx
f_I(t,y)\) and which is computed locally, and an optional function
cfn (of type ARKCommFn()
) which performs all inter-process
communication necessary to evaluate the approximate right-hand side
\(g\). These are in addition to the user-supplied right-hand side
function \(f_I\). Both functions take as input the same pointer
user_data that is passed by the user to
ARKStepSetUserData()
and that was passed to the user’s
function \(f_I\). The user is responsible for providing space
(presumably within user_data) for components of \(y\) that are
communicated between processes by cfn, and that are then used by
gloc, which should not do any communication.
(*ARKLocalFn)
(sunindextype Nlocal, realtype t, N_Vector y, N_Vector glocal, void* user_data)¶This gloc function computes \(g(t,y)\). It fills the vector glocal as a function of t and y.
ARKStepSetUserData()
.Return value:
An ARKLocalFn should return 0 if successful, a positive value if
a recoverable error occurred (in which case ARKStep will attempt to
correct), or a negative value if it failed unrecoverably (in which
case the integration is halted and ARKStepEvolve()
will return
ARK_LSETUP_FAIL).
Notes: This function should assume that all inter-process communication of data needed to calculate glocal has already been done, and that this data is accessible within user data.
The case where \(g\) is mathematically identical to \(f_I\) is allowed.
(*ARKCommFn)
(sunindextype Nlocal, realtype t, N_Vector y, void* user_data)¶This cfn function performs all inter-process communication necessary for the execution of the gloc function above, using the input vector y.
ARKStepSetUserData()
.Return value:
An ARKCommFn should return 0 if successful, a positive value if a
recoverable error occurred (in which case ARKStep will attempt to
correct), or a negative value if it failed unrecoverably (in which
case the integration is halted and ARKStepEvolve()
will return
ARK_LSETUP_FAIL).
Notes: The cfn function is expected to save communicated data in space defined within the data structure user_data.
Each call to the cfn function is preceded by a call to the
right-hand side function \(f_I\) with the same \((t,y)\)
arguments. Thus, cfn can omit any communication done by
\(f_I\) if relevant to the evaluation of glocal. If all
necessary communication was done in \(f_I\), then cfn =
NULL
can be passed in the call to ARKBBDPrecInit()
(see below).
In addition to the header files required for the integration of the
ODE problem (see the section Access to library and header files), to use the
ARKBBDPRE module, the user’s program must include the header file
arkode_bbdpre.h
which declares the needed function prototypes.
The following is a summary of the proper usage of this module. Steps that are unchanged from the skeleton program presented in A skeleton of the user’s main program are italicized.
Initialize MPI
Set problem dimensions
Set vector of initial values
Create ARKStep object
Specify integration tolerances
Create iterative linear solver object
When creating the iterative linear solver object, specify the type
of preconditioning (PREC_LEFT
or PREC_RIGHT
) to use.
Set linear solver optional inputs
Attach linear solver module
Initialize the ARKBBDPRE preconditioner module
Specify the upper and lower half-bandwidths for computation
mudq
and mldq
, the upper and lower half-bandwidths for
storage mukeep
and mlkeep
, and call
ier = ARKBBDPrecInit(arkode_mem, Nlocal, mudq, mldq, mukeep, mlkeep, dqrely, gloc, cfn);
to allocate memory and initialize the internal preconditioner
data. The last two arguments of ARKBBDPrecInit()
are the
two user-supplied functions of type ARKLocalFn()
and
ARKCommFn()
described above, respectively.
Set optional inputs
Note that the user should not call
ARKStepSetPreconditioner()
as it will overwrite the
preconditioner setup and solve functions.
Create nonlinear solver object
Attach nonlinear solver module
Set nonlinear solver optional inputs
Specify rootfinding problem
Advance solution in time
Get optional outputs
Additional optional outputs associated with ARKBBDPRE are
available through the routines
ARKBBDPrecGetWorkSpace()
and
ARKBBDPrecGetNumGfnEvals()
.
Deallocate memory for solution vector
Free solver memory
Free linear solver memory
Finalize MPI
The ARKBBDPRE preconditioner module is initialized (or re-initialized) and attached to the integrator by calling the following functions:
ARKBBDPrecInit
(void* arkode_mem, sunindextype Nlocal, sunindextype mudq, sunindextype mldq, sunindextype mukeep, sunindextype mlkeep, realtype dqrely, ARKLocalFn gloc, ARKCommFn cfn)¶Initializes and allocates (internal) memory for the ARKBBDPRE preconditioner.
ARKLocalFn()
)
which computes the approximation \(g(t,y) \approx f_I(t,y)\).ARKCommFn()
) which
performs all inter-process communication required for the
computation of \(g(t,y)\).NULL
NULL
Notes: If one of the half-bandwidths mudq or mldq to be used in the difference quotient calculation of the approximate Jacobian is negative or exceeds the value Nlocal-1, it is replaced by 0 or Nlocal-1 accordingly.
The half-bandwidths mudq and mldq need not be the true half-bandwidths of the Jacobian of the local block of \(g\) when smaller values may provide a greater efficiency.
Also, the half-bandwidths mukeep and mlkeep of the retained banded approximate Jacobian block may be even smaller than mudq and mldq, to reduce storage and computational costs further.
For all four half-bandwidths, the values need not be the same on every processor.
The ARKBBDPRE module also provides a re-initialization function to
allow solving a sequence of problems of the same size, with the same
linear solver choice, provided there is no change in Nlocal,
mukeep, or mlkeep. After solving one problem, and after
calling ARKStepReInit()
to re-initialize ARKStep for a
subsequent problem, a call to ARKBBDPrecReInit()
can be made
to change any of the following: the half-bandwidths mudq and
mldq used in the difference-quotient Jacobian approximations, the
relative increment dqrely, or one of the user-supplied functions
gloc and cfn. If there is a change in any of the linear solver
inputs, an additional call to the “Set” routines provided by the
SUNLINSOL module, and/or one or more of the corresponding
ARKStepSet***
functions, must also be made (in the proper order).
ARKBBDPrecReInit
(void* arkode_mem, sunindextype mudq, sunindextype mldq, realtype dqrely)¶Re-initializes the ARKBBDPRE preconditioner module.
NULL
NULL
NULL
Notes: If one of the half-bandwidths mudq or mldq is negative or exceeds the value Nlocal-1, it is replaced by 0 or Nlocal-1 accordingly.
The following two optional output functions are available for use with the ARKBBDPRE module:
ARKBBDPrecGetWorkSpace
(void* arkode_mem, long int* lenrwBBDP, long int* leniwBBDP)¶Returns the processor-local ARKBBDPRE real and integer workspace sizes.
realtype
values in the
ARKBBDPRE workspace.NULL
NULL
NULL
Notes: The workspace requirements reported by this routine
correspond only to memory allocated within the ARKBBDPRE module
(the banded matrix approximation, banded SUNLinearSolver
object, temporary vectors). These values are local to each process.
The workspaces referred to here exist in addition to those given by
the corresponding function ARKStepGetLSWorkSpace()
.
ARKBBDPrecGetNumGfnEvals
(void* arkode_mem, long int* ngevalsBBDP)¶Returns the number of calls made to the user-supplied
gloc function (of type ARKLocalFn()
) due to the finite
difference approximation of the Jacobian blocks used within the
preconditioner setup function.
NULL
NULL
NULL
In addition to the ngevalsBBDP gloc evaluations, the costs associated with ARKBBDPRE also include nlinsetups LU factorizations, nlinsetups calls to cfn, npsolves banded backsolve calls, and nfevalsLS right-hand side function evaluations, where nlinsetups is an optional ARKStep output and npsolves and nfevalsLS are linear solver optional outputs (see the table Linear solver interface optional output functions).