Scientific simulation is worthless unless the results of those simulations can be analyzed and understood. Unfortunately, most classes on scientific computing focus almost exclusively on how to create the methods for performing the simulations, with little time (if any) dedicated to the analysis of those results. While one session of a workshop is insufficient to fully rectify this situation, we’ll try to get get you started down the right path, by first focusing on how to input simulation data into more interactive computing environments, and then how to postprocess and visualize that data. While there are many available interactive computing environments that can be used for these purposes, we’ll focus on two of the most popular options, Matlab and Python.

To perform this tutorial session, we’ll need to set up our environment to use Python 2.7.x,

```
$ module load python/2.7.8
$ module load MATLAB
```

Retrieve the set of files for this session either through
`clicking here` or by copying the
relevant files at the command line:

```
$ cp ~dreynolds/ManeFrame_tutorial/session8.tgz .
```

Unzip this file, enter the resulting directory, and build the
executable with `make`.

Run the executable at the command-line.

```
$ ./advection.exe
```

You should see a set of output, ending with lines similar to:

```
writing output file 24, step = 2399, t = 0.48
writing output file 25, step = 2499, t = 0.5
total runtime = 3.1000000000000000e-01
```

List the files in the subdirectory; you should see a new set of files
with the names `u_sol.###.txt`. These files contain solution data from
the simulation that you just ran, which models the propagation of an
acoustic wave over a periodic, two-dimensional surface, using a coarse
\(50 \times 50\) spatial grid.

In the following sections, we will work on importing these data files into either a Matlab or Python environment, and then performing some simple data analysis. For the remainder of this session, both Matlab and Python will be presented, though you may choose to specialize in only your preferred interactive environment.

Before we can understand how to load data into Matlab or Python, we
must understand how it was written from the program. Here is the C++
function used to output the two-dimensional data array `u`:

```
// Daniel R. Reynolds
// SMU HPC Workshop
// 20 May 2013
// Inclusions
#include <stdio.h>
#include <string.h>
#include "advection.h"
// Writes current solution to disk
int output(double *u, double t, int nx, int ny, int noutput) {
// set output file name
char outname[100];
sprintf(outname, "u_sol.%03i.txt", noutput);
// open output file
FILE *FID = fopen(outname,"w");
if (FID == NULL) {
fprintf(stderr, "output: error opening output file %s\n", outname);
return 1;
}
// output the solution values
for (int j=0; j<ny; j++)
for (int i=0; i<nx; i++)
fprintf(FID, "%.16e\n",u[idx(i,j,nx)]);
// write current solution time and close the data set
fprintf(FID, "%.16e\n", t);
fclose(FID);
// now output a metadata file, containing general run information
FID = fopen("u_sol_meta.txt","w");
fprintf(FID, "%i\n", nx);
fprintf(FID, "%i\n", ny);
fprintf(FID, "%i\n", noutput);
fclose(FID);
return 0;
} // end output
```

A few contextual notes about this code to better understand what is happening (we’ll discuss in greater detail during class):

`u`holds a two-dimensional array of size`nx`by`ny`, stored in a one-dimensional index space of length`nx*ny`. The mapping between the 2D physical space and 1D index space is handled by the`idx()`macro, defined in`advection.h`, of the form// simple macro to map a 2D index to a 1D address space #define idx(i,j,nx) ((j)*(nx)+(i))

This function is called once every output time; these outputs are indexed by the integer

`noutput`, and correspond to the solution at the physical time`t`.At each output time, this routine writes two files:

- The first file is the solution file (
`u_sol.###.txt`), that holds the 2D data array, printed as one long array with the \(x\) coordinate the faster index. In this same file, after`u`is stored, the physical time of the output,`t`is also stored. - The second file is a metadata file (
`u_sol_meta.txt`), that contains the problem size and the total number of outputs that have been written so far in the simulation.

- The first file is the solution file (

We will first build Matlab and Python functions that can read in the metadata file. First. let’s view the contents of the metadata file:

```
$ cat u_sol_meta.txt
50
50
25
```

Here the first “50” corresponds to `nx`, the second “50” corresponds
to `ny`, and the “25” corresponds to the total number of solutions
that have been output (i.e. the final value for `noutput`).

Due to this file’s simple structure, we we only need to read three
numbers in a single column and store them appropriately. The relevant
Matlab code is in the file `load_info.m`, and relies on the built-in
Matlab function `load`:

```
function [nx,ny,nt] = load_info()
% Usage: [nx,ny,nt] = load_info()
%
% Outputs: nx,ny are the grid size, and nt is the total number of
% time steps that have been output to disk.
%
% Daniel R. Reynolds
% SMU HPC Workshop
% 20 May 2013
% input general problem information
load u_sol_meta.txt; % reads values from disk, storing in a vector
nx = u_sol_meta(1); % unpack vector to name each output
ny = u_sol_meta(2);
nt = u_sol_meta(3);
return
% end of function
```

The corresponding Python code is in the file `load_info.py`, which
similarly relies on the built-in Numpy function `loadtxt`:

```
# Defines the function load_info().
#
# Daniel R. Reynolds
# SMU HPC Workshop
# 20 May 2013
# import requisite modules
import numpy as np
def load_info():
"""Returns the mesh size and total number of output times
from the input file 'u_sol_meta.txt'. Has calling syntax:
nx,ny,nt = load_info(). """
# reads integer values from disk, storing in a vector
data = np.loadtxt("u_sol_meta.txt", dtype=int)
return data # return entire vector
# end of file
```

In both of these scripts, the data in the file `u_sol_meta.txt` is
input and converted to a one-dimensional array of numbers. In the
Matlab code we name these and return each separately. In the Python
code we merely return the array, leaving unpacking and naming to the
calling routine.

Note

In the R package for interactive statistical data analysis, the
corresponding command to Matlab’s `load` and Python/Numpy’s
`loadtxt` is the R function `read.table`, e.g.

```
> read.table("u_sol_meta.txt")
V1
1 50
2 50
3 25
```

However, since I do not know how to use R all of the following examples will only be in Matlab or Python. Of course, if you are more familiar with R, you are welcome to attempt the remainder of this session with that instead of Matlab or Python.

Now that we’ve seen a simple approach for loading an array into Matlab
and Python, we can move on to functions for reading the larger
`u_sol.###.txt` files. As with the above functions, since the data
is output in a single (but very long) column of numbers, we may use
`load` or `loadtxt` to input the data. Once this data has been
read in, however, we will then split it into the solution component,
`u`, and the current time, `t`. Since `u` holds a
two-dimensional array, but is stored in a flattened one-dimensional
format, we can use `reshape` (the same command in both Matlab and
Python) to convert it from the one-dimensional to the two-dimensional
representation.

First, the Matlab code, `load_data_2d.m`:

```
function [t,u] = load_data_2d(tstep)
% Usage: [t,u] = load_data_2d(tstep)
%
% Input: tstep is an integer denoting which time step output to load
%
% Outputs: t is the physical time, and u is the 2D array containing
% the result at the requested time step
%
% Daniel R. Reynolds
% SMU HPC Workshop
% 20 May 2013
% input general problem information
[nx,ny,nt] = load_info();
% ensure that tstep is allowable
if (tstep < 0 || tstep > nt)
error('load_data_2d error: illegal tstep')
end
% set filename string and load as a long 1-dimensional array
infile = sprintf('u_sol.%03i.txt',tstep);
data = load(infile);
% separate data array from current time, and reshape data into 2D
u1D = data(1:end-1);
t = data(end);
u = reshape(u1D, [nx, ny]);
return
```

and here is the corresponding Python code, `load_data_2d.py`:

```
# Defines the function load_data_2d().
#
# Daniel R. Reynolds
# SMU HPC Workshop
# 20 May 2013
# import requisite modules
import numpy as np
from load_info import load_info
def load_data_2d(tstep):
"""Returns the solution over the mesh for a given time snapshot.
Has calling syntax:
t,u = load_data_2d(tstep)
Input: tstep is an integer denoting which time step output to load.
Outputs: t is the physical time, and u is the 2D array containing
the result at the requested time step."""
# load the parallelism information
nx,ny,nt = load_info()
# check that tstep is allowed
if (tstep < 0 or tstep > nt):
print 'load_data_2d error: illegal tstep!'
return
# determine data file name and load as a long 1-dimensional array
infile = 'u_sol.' + repr(tstep).zfill(3) + '.txt'
data = np.loadtxt(infile, dtype=np.double)
# separate data array from current time and reshape data into 2D
u1D = data[:len(data)-1]
t = data[-1];
u = np.reshape(u1D, (nx,ny), order='F')
return [t,u]
```

How these work:

- These routines take as input an integer,
`tstep`, that corresponds to the desired time step output file (the`###`in the file name). - They then call the corresponding
`load_info`function to find out the two-dimensional domain size and the total number of time steps written to disk, and perform a quick check to see whether`tstep`is an allowable time step index.*Matlab:*The function namespace for Matlab corresponds to all ”.m” files in the current folder, followed by all built-in functions. So as long as both of the scripts`load_info.m`and`load_data_2d.m`are in the same folder, the`load_data_2d`function can call the`load_info`function automatically.*Python:*Since Python protects the namespace by default, any non-built-in Python functions from other files must be loaded before they may be executed. As a result,`load_data_2d.py`must import the`load_info`function from the`load_data.py`file before it may be used (*note: the ”.py” extension for the ``load_data.py`` file is assumed, and should not be added to the “from” portion of the ``import`` command*).

- The routine then combines the time step index into a string that
represents the correct file name (e.g.
`u_sol.006.txt`), and calls the relevant`load`or`loadtxt`routine to input the data. - The routine then splits the data into the one-dimensional version of
`u`(called`u1D`) and`t`, before reshaping`u1D`into a two-dimensional version of the solution, before returning the values.

Note

In the Python version, we must specify that the data is ordered in “Fortran” style, i.e. that the first index is the fastest (as opposed to “C” style, where the first index is the slowest). Fortran ordering is the default in Matlab, whereas C ordering is the default in Python. This output was written in “Fortran” style, so we use that here.

These data input routines can be used by Matlab or Python scripts to first read in the data, before either performing analysis or plotting.

A few general comments on the above approach:

- By storing the values as raw text, these files are larger than
necessary. In this example the files are not too large (~58 KB
each), but in more realistic simulations it would be preferred to
store data in a more compressed format. Two approaches for this are
to:
- Zip each file after it is written to disk, through using library
routines (e.g.
`libz`,`libzip`,`libgzip`), and the uncompress them when reading. If the file is compressed with`gzip`, Numpy’s`loadtxt`routine will automatically unzip as it reads. - Write the data to disk in binary format.

- Zip each file after it is written to disk, through using library
routines (e.g.
- Performance-wise, it is best to write out data in the
order in which it is stored in memory during the simulation. In
this example, the data is stored with the
`x`index being the fastest, hence the “Fortran” ordering of the data file.

High-quality alternatives to such manual I/O approaches abound. Two popular I/O libraries in high-performance computing are HDF5 and netCDF. Both of these libraries have the following benefits over doing things manually:

- Natively output in binary format for smaller file sizes.
- Allow you to output descriptive information in addition to just the data (e.g. units of each field, version of the code).
- Allow you to output multiple items to the same file (e.g. density, momentum, energy).
- Support parallel computing, allowing many MPI tasks to write to the same file.
- Professional visualization utilities typically have readers built-in for these file types.
- Have data input utilities in both Matlab and Python:
- Matlab/HDF5:
`h5create`,`h5disp`,`h5info`,`h5read`,`h5readatt`,`h5write`,`h5writeatt`. All are built into Matlab (see this Matlab help page for information). - Matlab/netCDF: although not built into Matlab, there are contributed versions of netCDF readers on Matlab Central.
- Python/HDF5: the Python module
`h5py`contains a full Pythonic interface to the HDF5 data format (click here for more information on h5py). - Python/netCDF: the Python module
`netcdf4-python`contains interfaces to the majority of netCDF (click here for more information on netcdf4-python).

- Matlab/HDF5:
- Last but not least: someone else writes and debugs the code, allowing you to focus on your work instead of spending your time fiddling with I/O.

We will now use the above data input routines to do some
post-processing of these simulated results. For this example, we’ll
create surface plots of the field `u`, one for each time step, and
write them to the disk. Of course, once the data is available in our
preferred scripting environment (Matlab, Python, etc.), we can easily
perform additional data analysis, as will be included in the hands-on
exercise at the end of this session.

As we did earlier, we’ll first show the code and then go through the steps. You may focus on your preferred computing environment, since both scripts are functionally equivalent.

First the Matlab code, `plot_solution.m`:

```
% Plotting script for 2D acoustic wave propagation example
% simulation. This script inputs the file u_sol_meta.txt to determine
% simulation information (grid size and total number of time steps).
% It then calls load_data_2d() to read the solution data from each
% time step, plotting the results (and saving them to disk).
%
% Daniel R. Reynolds
% SMU HPC Workshop
% 20 May 2013
clear
% input general problem information
[nx,ny,nt] = load_info();
% loop over time steps
for tstep = 0:nt
% load time step data
[t,u] = load_data_2d(tstep);
% plot current solution (and save to disk)
xvals = linspace(0,1,nx);
yvals = linspace(0,1,ny);
h = surf(yvals,xvals,u);
shading flat
view([50 44])
axis([0, 1, 0, 1, -1, 1])
xlabel('x','FontSize',14), ylabel('y','FontSize',14)
title(sprintf('u(x,y) at t = %g, mesh = %ix%i',t,nx,ny),'FontSize',14)
pfile = sprintf('u_surf.%03i.png',tstep);
saveas(h,pfile);
%disp('pausing: hit enter to continue')
%pause
end
```

and then the Python code, `plot_solution.py`:

```
# Plotting script for 2D acoustic wave propagation example
# simulation. This script calls load_info() to determine
# simulation information (grid size and total number of time steps).
# It then calls load_data_2d() to read the solution data from each
# time step, plotting the results (and saving them to disk).
#
# Daniel R. Reynolds
# SMU HPC Workshop
# 20 May 2013
# import the requisite modules
from pylab import *
import numpy as np
from mpl_toolkits.mplot3d import Axes3D
from matplotlib import cm
import matplotlib.pyplot as plt
from load_info import load_info
from load_data_2d import load_data_2d
# input general problem information
nx,ny,nt = load_info()
# iterate over time steps
for tstep in range(nt+1):
# input solution at this time
t,u = load_data_2d(tstep)
# set string constants for output plots, current time, mesh size
pname = 'u_surf.' + repr(tstep).zfill(3) + '.png'
tstr = repr(round(t,4))
nxstr = repr(nx)
nystr = repr(ny)
# set x and y meshgrid objects
xspan = np.linspace(0.0, 1.0, nx)
yspan = np.linspace(0.0, 1.0, ny)
X,Y = np.meshgrid(xspan,yspan)
# plot current solution as a surface, and save to disk
fig = plt.figure(1)
ax = fig.add_subplot(111, projection='3d')
ax.plot_surface(X, Y, u, rstride=1, cstride=1, cmap=cm.jet,
linewidth=0, antialiased=True, shade=True)
ax.set_xlabel('y')
ax.set_ylabel('x')
title('u(x,y) at t = ' + tstr + ', mesh = ' + nxstr + 'x' + nystr)
savefig(pname)
#ion()
#plt.show()
#ioff()
#raw_input('pausing: hit enter to continue')
plt.close()
# end of script
```

How these work:

- These first call
`load_info`to determine the simulation grid size and total number of time steps that have been output to disk. - These then loop over each time step, and:
- Call
`load_data_2d`to read the simulation time and solution array. - Create arrays for the \(x\) and \(y\) coordinates of each solution data point.
- Plot
`u`at that time step as a 2D surface plot, setting the plot labels and title appropriately. - Save the plot to disk in files of the form
`u_surf.###.png`. - (Commented out) Pause the loop until the user hits “enter”.

- Call

Run this code as usual, using either Matlab,

```
$ matlab -r plot_solution
```

or Python,

```
$ python ./plot_solution.py
```

You should then see a set of `.png` images in the directory:

```
$ ls
Makefile plot_solution.m u_sol.012.txt u_sol_meta.txt u_surf.013.png
advection.cpp plot_solution.py u_sol.013.txt u_surf.000.png u_surf.014.png
advection.exe u_sol.000.txt u_sol.014.txt u_surf.001.png u_surf.015.png
advection.h u_sol.001.txt u_sol.015.txt u_surf.002.png u_surf.016.png
density.txt u_sol.002.txt u_sol.016.txt u_surf.003.png u_surf.017.png
initialize.cpp u_sol.003.txt u_sol.017.txt u_surf.004.png u_surf.018.png
input.txt u_sol.004.txt u_sol.018.txt u_surf.005.png u_surf.019.png
load_data_2d.m u_sol.005.txt u_sol.019.txt u_surf.006.png u_surf.020.png
load_data_2d.py u_sol.006.txt u_sol.020.txt u_surf.007.png u_surf.021.png
load_data_2d.pyc u_sol.007.txt u_sol.021.txt u_surf.008.png u_surf.022.png
load_info.m u_sol.008.txt u_sol.022.txt u_surf.009.png u_surf.023.png
load_info.py u_sol.009.txt u_sol.023.txt u_surf.010.png u_surf.024.png
load_info.pyc u_sol.010.txt u_sol.024.txt u_surf.011.png u_surf.025.png
output.cpp u_sol.011.txt u_sol.025.txt u_surf.012.png
```

You can view these plots on ManeFrame with the command, e.g.

```
$ display u_surf.009.png
```

Alternately, you can open them all and cycle through them by right-clicking and selecting “Next”:

```
$ display u_surf.*.png
```

A few difficulties with using either Matlab or Python for data visualization include:

- Difficulty dealing with three-dimensional plotting: while slices and projections are simple, 3D data sets require much more interactive visualization, including isocontour surface plots, moving slices, rotating, etc.
- Difficulty dealing with data output from parallel simulations: you need to read in each processor’s data file and glue them together manually, and such in-core processing is impossible when the data sets grow too large.

As a result, there are a variety of high-quality visualization packages that are designed for interactive 3D visualization, as discussed below. None of these are installed on ManeFrame at present, though all are freely-available and open-source, so if you need/want one you should make a request to the ManeFrame system administrators.

Mayavi is a Python plotting package designed primarily for interactive 3D visualization. See:

VisIt is an open source visualization package being developed at Lawrence Livermore National Laboratory. It is designed for large-scale visualization problems (i.e. large data sets, rendered in parallel). VisIt has a GUI interface, as well as a Python interface for scripting. See:

In the set of files for this session, you will find one additional
file that you have not yet used, `density.txt`. This is a
snapshot of a three-dimensional cosmological density field at a
redshift of approximately \(z = 9\). Unlike the previous
example, this file contains only the data field itself, with no
auxiliary metadata. Like the previous example, this data is stored in
a single column, with \(x\) being the fastest index and \(z\)
the slowest. The three-dimensional grid is uniform in each direction,
(i.e. it has size \(N\times N\times N\)) so the total number of
lines in the file should equal \(N^3\).

Create a Matlab or Python script that accomplishes the following tasks:

- Determine the maximum density over the domain, and where it occurs.
- Determine the minimum density over the domain, and where it occurs.
- Determine the average density over the domain.
- Generate the following two-dimensional plots, and save each to disk:
- Slice through the center of the domain parallel to the \(xy\) plane.
- Slice through the center of the domain parallel to the \(xz\) plane.
- Slice through the center of the domain parallel to the \(yz\) plane.
- Plot a projection of the density onto the \(xy\) plane (i.e. add all entries in the \(z\) direction to collapse the 3D set to 2D).
- Plot a projection of the density onto the \(xz\) plane.
- Plot a projection of the density onto the \(yz\) plane.

*Hints*:

- If you plot the \(log\) of the density, you will get more
interesting pictures. In both Matlab and Python/Numpy, this is
easily computed using whole-array operations, e.g.
`logd = log(d)`. - Both Matlab and Python allow array slicing to extract a plane from
a 3D data set, e.g.
- Matlab:
`dslice = squeeze(d(:,:,2))`– here, the`squeeze`command may be used to eliminate the now-trivial 3rd dimension that has length 1. - Python/Numpy:
`dslice = d[:][:][2]`or even`dslice = d[:,:,2]`(both forms of syntax are equivalent for Numpy arrays).

- Matlab:
- Both Matlab and Python/Numpy have a
`sum`command that will add all values of a multi-dimensional array along a specified dimension. Read their documentation to see how this works (it will help with the average value and with the projection plots). - Both Matlab and Python/Numpy have
`max`and`min`commands that can be applied to array-valued data. Read their documentation to see how this works (it will help with the maximum and minimum values).