This chapter is concerned with using GPU-acceleration and ARKODE for the solution of IVPs.
In this section, we introduce the SUNDIALS GPU programming model and highlight SUNDIALS GPU features. The model leverages the fact that all of the SUNDIALS packages interact with simulation data either through the shared vector, matrix, and solver APIs (see Vector Data Structures, Matrix Data Structures, Description of the SUNLinearSolver module, and Description of the SUNNonlinearSolver Module) or through user-supplied callback functions. Thus, under the model, the overall structure of the user’s calling program, and the way users interact with the SUNDIALS packages is similar to using SUNDIALS in CPU-only environments.
Within the SUNDIALS GPU programming model, all control logic executes on the CPU, and all simulation data resides wherever the vector or matrix object dictates as long as SUNDIALS is in control of the program. That is, SUNDIALS will not migrate data (explicitly) from one memory space to another. Except in the most advanced use cases, it is safe to assume that data is kept resident in the GPU-device memory space. The consequence of this is that, when control is passed from the user’s calling program to SUNDIALS, simulation data in vector or matrix objects must be up-to-date in the device memory space. Similarly, when control is passed from SUNDIALS to the user’s calling program, the user should assume that any simulation data in vector and matrix objects are up-to-date in the device memory space. To put it succinctly, it is the responsibility of the user’s calling program to manage data coherency between the CPU and GPU-device memory spaces unless unified virtual memory (UVM), also known as managed memory, is being utilized. Typically, the GPU-enabled SUNDIALS modules provide functions to copy data from the host to the device and vice-versa as well as support for unmanaged memory or UVM. In practical terms, the way SUNDIALS handles distinct host and device memory spaces means that users need to ensure that the user-supplied functions, e.g. the right-hand side function, only operate on simulation data in the device memory space otherwise extra memory transfers will be required and performance will be poor. The exception to this rule is if some form of hybrid data partitioning (achievable with the The NVECTOR_MANYVECTOR Module) is utilized.
SUNDIALS provides many native shared features and modules that are GPU-enabled.
Currently, these are primarily limited to the NVIDIA CUDA platform [CUDA],
although support for more GPU computing platforms such as AMD ROCm/HIP [ROCm]
and Intel oneAPI [oneAPI], is an area of active development. Table
List of SUNDIALS GPU-enabled Modules summarizes the shared SUNDIALS modules that are
GPU-enabled, what GPU programming environments they support, and what class of
memory they support (unmanaged or UVM). Users may also supply their own
GPU-enabled N_Vector
, SUNMatrix
, SUNLinearSolver
, or
SUNNonlinearSolver
implementation, and the capabilties will be leveraged
since SUNDIALS operates on data through these APIs.
In addition, SUNDIALS provides Tools for Memory Management to support applications which implement their own memory management or memory pooling.
Module | CUDA | ROCm/HIP | oneAPI | Unmanaged Memory | UVM |
---|---|---|---|---|---|
NVECTOR_CUDA | X | X | X | ||
NVECTOR_RAJA | X | X | X | ||
NVECTOR_OPENMPDEV | X | X2 | X2 | X | |
SUNMATRIX_CUSPARSE | X | X | X | ||
SUNLINSOL_CUSOLVERSP | X | X | X | ||
SUNLINSOL_SPGMR | X1 | X1 | X1 | X1 | X1 |
SUNLINSOL_SPFGMR | X1 | X1 | X1 | X1 | X1 |
SUNLINSOL_SPTFQMR | X1 | X1 | X1 | X1 | X1 |
SUNLINSOL_SPBCGS | X1 | X1 | X1 | X1 | X1 |
SUNLINSOL_PCG | X1 | X1 | X1 | X1 | X1 |
SUNNONLINSOL_NEWTON | X1 | X1 | X1 | X1 | X1 |
SUNNONLINSOL_FIXEDPOINT | X1 | X1 | X1 | X1 | X1 |
In addition, note that implicit UVM (i.e. malloc
returning UVM) is not
accounted for.
For any SUNDIALS package, the generalized steps a user needs to take to use GPU accelerated SUNDIALS are:
Users should refer to Table List of SUNDIALS GPU-enabled Modules for a list of GPU-enabled native SUNDIALS modules.