Example 3: Solving the heat equation in parallel

Introduction

This problem extends Example 1: Solving the heat equation in a few meaningful ways:

  • The exact solution and right hand side are more challenging compared to a solution that is linear in time and quadratic in space.

  • We change the support points for the temporal discretization.

  • The linear algebra is now distributed over multiple MPI processes by using Trilinos. We will give a short overview over the important concepts, but for a good in-depth explanation of MPI-parallel simulations we highly recommend the deal.II ressources step-40 ,which solves the Poisson or stationary heat equation, and the overview page Parallel computing with multiple processors using distributed memory.

  • Calculation of the space-time \(L^2\)-error and subsequent convergence studies.

  • Providing a command line interface to set simulation parameters for use in a script based batch-processing. This is used to run all the simulations for the convergence studies.

Support points for discontinuous elements

For continuous Lagrange finite elements the outermost DoFs are always placed on the boundary. That way they can be combined with the DoFs of neighboring elements as they have to have the same values due to the continuity.

For discontinuous elements however, this is no longer necessary. In the previous examples we have used the default of DGFiniteElement which are support points at the locations of Gauss-Lobatto quadrature points. This quadrature rule is defined by requiring the first and last quadrature point to be on the boundary of the interval, like in the continuous case.

Additionally, ideal.II offers the following three values for support_type

  • Gauss(-Legendre) with no DoFs on the element boundary

  • left Gauss-Radau with only the first DoF on the left boundary

  • right Gauss-Radau with only the last DoF on the right boundary

These may provide better approximations as we can see in the results. However, this introduces some important differences that have to be taken into account

  • The final value needed for the initial jump of the following slab might not be on a temporal DoF, so we have to use extract_subvector_at_time_point instead of extracting the final subvector of the slab

  • At least one of values needed for the jump values is no longer on a DoF and we get a larger sparsity pattern. This is handled internally by make_upwind_sparsity_pattern() so you only need to be aware of it.

  • The output is no longer at the temporal grid points when using extract_subvector_at_time_dof there. In the future this will be addressed by a space-time version of DataOutput.

Concepts around distributed computing

We start by partitioning the spatial mesh and distributing it to each MPI rank/process. The Triangulation in parallel::distributed provides this functionality. Then, each spatial element belongs to exactly one rank. Additionally, each spatial DoF is owned by only one rank. As a result a DoF can belong to a different rank but is also part of an owned element and will be needed for assembly. Therefore, there are two sets of DoF indices:

  • The locally owned DoFs actually owned by the current rank

  • The locally relevant DoFs which are all DoFs belonging to a locally owned element, i.e. the locally owned DoFs plus so called ‘ghost’ DoFs.

These index sets will be used to initialize distributed vectors and matrices. Note that only vectors initialized with the smaller locally owned set can be written into directly to avoid overriding a value on multiple MPI processes.

For the space-time index sets we take the spatial sets obtained from the DoFHandler and shift them by the local temporal DoF index times the total number of spatial DoFs. So for a spatial mesh with 8 DoFs of which the indices \((0,3,5,7)\) are locally owned and a temporal mesh with 3 DoFs, the space-time index set would be \((0,3,5,7,8,11,13,15,16,19,21,23)\). PETSc does not allow for the specification of such an index set, so only Trilinos is supported in ideal.II.

Since some vectors, like the initial value, are defined on a single time point, we need to know four index sets in total, two on the spatial and two on the space-time mesh.

Apart from that, all changes from step-1 to step-3 regarding distributed computing are standard changes you would also do in a stationary problem. Some of these are

  • Calling MPI_InitFinalize in main().

  • Knowing and saving the MPI Communicator.

  • Changing the triangulation and linear algebra objects to distributed versions.

  • Storing and using index sets during setup .

  • Communicating between different locally owned and locally relevant vectors

  • Compressing the matrix and vectors after assembly to communicate off-processor entries.

  • Using parallel output functions from dealii::DataOut

The commented program

include files

ideal.II includes

Here, we use the parallel distributed triangulation and linear algebra (provided by Trilinos) so we have to use the matching includes.

#include <ideal.II/distributed/fixed_tria.hh>

#include <ideal.II/lac/spacetime_trilinos_vector.hh>

#include <cmath>

All other ideal.II includes are known

#include <ideal.II/base/quadrature_lib.hh>
#include <ideal.II/base/time_iterator.hh>

#include <ideal.II/dofs/slab_dof_tools.hh>
#include <ideal.II/dofs/spacetime_dof_handler.hh>

#include <ideal.II/fe/fe_dg.hh>
#include <ideal.II/fe/spacetime_fe_values.hh>

#include <ideal.II/numerics/vector_tools.hh>

As we need the support type quite often we shorten this lengthy typename with an alias

using DGFE = idealii::spacetime::DG_FiniteElement<2>;

deal.II includes

Most includes are the same as before except for the distributed versions of triangulation and linear algebra objects. Additionally, we include the conditional output stream to suppress duplicate output on other processors.

#include <deal.II/base/conditional_ostream.h>
#include <deal.II/base/function.h>

#include <deal.II/distributed/tria.h>

#include <deal.II/fe/fe_q.h>

#include <deal.II/grid/grid_generator.h>

#include <deal.II/lac/dynamic_sparsity_pattern.h>
#include <deal.II/lac/full_matrix.h>
#include <deal.II/lac/sparsity_pattern.h>
#include <deal.II/lac/sparsity_tools.h>
#include <deal.II/lac/trilinos_solver.h>
#include <deal.II/lac/trilinos_sparse_matrix.h>
#include <deal.II/lac/trilinos_vector.h>
#include <deal.II/lac/vector.h>

#include <deal.II/numerics/data_out.h>

Trilinos and C++ includes

To simplify handling of command line arguments in main()

#include <Teuchos_CommandLineProcessor.hpp>

#include <fstream>

Space-time functions

The manufactured exact solution \(u\) is described in [Hartmann1998]. The right hand side function is derived by inserting \(u\) into the heat equation.

class ExactSolution : public dealii::Function<2, double>
{
public:
  ExactSolution()
    : dealii::Function<2, double>()
  {}
  double
  value(const dealii::Point<2> &p, const unsigned int component = 0) const;
};

double
ExactSolution::value(const dealii::Point<2>             &p,
                     [[maybe_unused]] const unsigned int component) const
{
  const double t  = get_time();
  const double x0 = 0.5 + 0.25 * std::cos(2. * M_PI * t);
  const double x1 = 0.5 + 0.25 * std::sin(2. * M_PI * t);
  return 1. /
         (1. + 50. * ((p[0] - x0) * (p[0] - x0) + (p[1] - x1) * (p[1] - x1)));
}

class RightHandSide : public dealii::Function<2>
{
public:
  virtual double
  value(const dealii::Point<2> &p, const unsigned int component = 0);
};

double
RightHandSide::value(const dealii::Point<2>             &p,
                     [[maybe_unused]] const unsigned int component)
{
  double       t  = get_time();
  const double x0 = 0.5 + 0.25 * std::cos(2. * M_PI * t);
  const double x1 = 0.5 + 0.25 * std::sin(2. * M_PI * t);
  double       a  = 50.;
  const double divisor =
    1. + a * ((p[0] - x0) * (p[0] - x0) + (p[1] - x1) * (p[1] - x1));

  double dtu = -((a * (p[0] - x0) * M_PI * std::sin(2. * M_PI * t)) -
                 (a * (p[1] - x1) * M_PI * std::cos(2. * M_PI * t))) /
               (divisor * divisor);

  const double u_xx =
    -2. * a *
    (1. / (divisor * divisor) + 2. * a * (p[0] - x0) * (p[0] - x0) *
                                  (-2. / (divisor * divisor * divisor)));

  const double u_yy =
    -2. * a *
    (1. / (divisor * divisor) + 2. * a * (p[1] - x1) * (p[1] - x1) *
                                  (-2. / (divisor * divisor * divisor)));

  return dtu - (u_xx + u_yy);
}

The Step3 class

class Step3
{
public:
  Step3(unsigned int       spatial_degree,
        unsigned int       temporal_degree,
        unsigned int       M,
        unsigned int       n_ref_space,
        DGFE::support_type support_type,
        bool               write_vtu);
  void
  run();

private:
  void
  make_grid();
  void
  time_marching();
  void
  setup_system_on_slab();
  void
  assemble_system_on_slab();
  void
  solve_system_on_slab();
  void
  process_results_on_slab();
  void
  output_results_on_slab();

  MPI_Comm     mpi_comm;    // MPI Communicator
  unsigned int M;           // Number of temporal elements
  unsigned int n_ref_space; // number of spatial refinements
  bool         write_vtu;   // output results?

  dealii::ConditionalOStream pout; // To allow output only on MPI rank 0

The triangulation is now parallel distributed and the matrices and vectors are provided by Trilinos.

  idealii::spacetime::parallel::distributed::fixed::Triangulation<2>
                                     triangulation;
  idealii::spacetime::DoFHandler<2>  dof_handler;
  idealii::spacetime::TrilinosVector solution;

  idealii::spacetime::DG_FiniteElement<2> fe;

  dealii::SparsityPattern                            slab_sparsity_pattern;
  std::shared_ptr<dealii::AffineConstraints<double>> slab_constraints;
  dealii::TrilinosWrappers::SparseMatrix             slab_system_matrix;
  dealii::TrilinosWrappers::MPI::Vector              slab_system_rhs;
  dealii::TrilinosWrappers::MPI::Vector              slab_initial_value;
  unsigned int                                       slab;
  ExactSolution                                      exact_solution;
  double                                             L2_sqr_error;
  unsigned int                                       dofs_total;

For parallel communication we need some additional information. First we need to know which DoF indices are owned by the current MPI rank. Then, we need to know which indices are relevant, i.e. owned or ghost entries needed for assembly but belonging to a different rank. We need those both for the complete slab and for the spatial grid in different points of the code. Finally, we sometimes need temporary vectors that the current rank is allowed to write into for communication between different ranks.

  dealii::IndexSet                      slab_locally_owned_dofs;
  dealii::IndexSet                      slab_locally_relevant_dofs;
  dealii::IndexSet                      space_locally_owned_dofs;
  dealii::IndexSet                      space_locally_relevant_dofs;
  dealii::TrilinosWrappers::MPI::Vector slab_owned_tmp;

  struct
  {
    idealii::slab::parallel::distributed::TriaIterator<2> tria;
    idealii::slab::DoFHandlerIterator<2>                  dof;
    idealii::slab::TrilinosVectorIterator                 solution;
  } slab_its;
};

Step3::Step3

The constructor is almost the same as for step-1 as we are solving the same PDE. The main differences are

  • A larger set of parameters to support a parameter study,

  • Initializing the MPI communicator. Here to world, i.e. all ranks,

  • Restricting conditional output to rank 0,

  • Allowing the temporal support type to change.

Step3::Step3(unsigned int       spatial_degree,
             unsigned int       temporal_degree,
             unsigned int       M,
             unsigned int       n_ref_space,
             DGFE::support_type support_type,
             bool               write_vtu)
  : mpi_comm(MPI_COMM_WORLD)
  , pout(std::cout, dealii::Utilities::MPI::this_mpi_process(mpi_comm) == 0)
  , triangulation()
  , dof_handler(&triangulation)
  , fe(std::make_shared<dealii::FE_Q<2>>(spatial_degree),
       temporal_degree,
       support_type)
  , M(M)
  , n_ref_space(n_ref_space)
  , write_vtu(write_vtu)
  , slab(0)
  , exact_solution()
  , L2_sqr_error(0.)
  , dofs_total(0)
{}

Step3::run

void
Step3::run() // same as before
{
  make_grid();
  time_marching();
}

Step3::make_grid

void
Step3::make_grid()
{
  auto space_tria = // Construct an MPI parallel triangulation
    std::make_shared<dealii::parallel::distributed::Triangulation<2>>(mpi_comm);
  dealii::GridGenerator::hyper_cube(*space_tria);
  triangulation.generate(space_tria, M);
  triangulation.refine_global(n_ref_space, 0);
  dof_handler.generate();
}

Step3::time_marching

We now want to assess the performance of the different discretizations so we will calculate local contributions to the \(L^2\) error in process_results_on_slab(). Additionally, we only do output if write_vtu is true.

void
Step3::time_marching()
{
  idealii::TimeIteratorCollection<2> tic = idealii::TimeIteratorCollection<2>();

  solution.reinit(triangulation.M());

  slab_its.tria     = triangulation.begin();
  slab_its.dof      = dof_handler.begin();
  slab_its.solution = solution.begin();
  slab              = 0;
  tic.add_iterator(&slab_its.tria, &triangulation);
  tic.add_iterator(&slab_its.dof, &dof_handler);
  tic.add_iterator(&slab_its.solution, &solution);
  pout << "*******Starting time-stepping*********" << std::endl;
  for (; !tic.at_end(); tic.increment())
    {
      pout << "Starting time-step (" << slab_its.tria->startpoint() << ","
           << slab_its.tria->endpoint() << "]" << std::endl;

      setup_system_on_slab();
      assemble_system_on_slab();
      solve_system_on_slab();
      process_results_on_slab();
      if (write_vtu)
        output_results_on_slab();

For some support points the final DoF of the slab is no longer at the final time point. Therefore, we have to calculate the value as a linear combination of the DoF values at the final temporal element. This is done by the following extract function.

      idealii::slab::VectorTools::extract_subvector_at_time_point(
        *slab_its.dof,
        *slab_its.solution,
        slab_initial_value,
        slab_its.tria->endpoint());
      slab++;
    }
  double L2_error = std::sqrt(L2_sqr_error);
  pout << "Total number of space-time DoFs:\n\t" << dofs_total << std::endl;
  pout << "L2_error:\n\t" << L2_error << std::endl;
}

Step3::setup_system_on_slab

void
Step3::setup_system_on_slab()
{
  slab_its.dof->distribute_dofs(fe);
  pout << "Number of degrees of freedom: \n\t" << slab_its.dof->n_dofs_space()
       << " (space) * " << slab_its.dof->n_dofs_time()
       << " (time) = " << slab_its.dof->n_dofs_spacetime() << std::endl;

Tally the total amount of space-time

  dofs_total += slab_its.dof->n_dofs_spacetime();

Extract the spatial DoF indices by accessing the spatial() component of the slab::DoFHandler.

  space_locally_owned_dofs = slab_its.dof->spatial()->locally_owned_dofs();
  dealii::DoFTools::extract_locally_relevant_dofs(*slab_its.dof->spatial(),
                                                  space_locally_relevant_dofs);

The slab DoF indices are extracted in a similar way, but by using functions provided in ideal.II.

  slab_locally_owned_dofs = slab_its.dof->locally_owned_dofs();
  slab_locally_relevant_dofs =
    idealii::slab::DoFTools::extract_locally_relevant_dofs(*slab_its.dof);

  slab_owned_tmp.reinit(slab_locally_owned_dofs, mpi_comm);

  if (slab == 0)
    {

On the first slab the initial value has to be set to the set of spatially relevant dofs

      slab_initial_value.reinit(space_locally_owned_dofs,
                                space_locally_relevant_dofs,
                                mpi_comm);

We only have write access to locally owned vectors. Compared to slab_owned_tmp we need this temporary vector only on the first slab for interpolation of the initial value into the FE space.

      dealii::TrilinosWrappers::MPI::Vector tmp;
      tmp.reinit(space_locally_owned_dofs, mpi_comm);

Set time of the exact solution to initial time point (For a time independent initial value function this is not necessary)

      exact_solution.set_time(0);

Interpolate the initial value into the FE space

      dealii::VectorTools::interpolate(*slab_its.dof->spatial(),
                                       exact_solution,
                                       tmp);

Communicate locally owned initial value to locally relevant vector

      slab_initial_value = tmp;
    }

For the interpolation of boundary values we pass the relevant index set.

  slab_constraints = std::make_shared<dealii::AffineConstraints<double>>();
  idealii::slab::VectorTools::interpolate_boundary_values(
    space_locally_relevant_dofs,
    *slab_its.dof,
    0,
    exact_solution,
    slab_constraints);
  slab_constraints->close();

  dealii::DynamicSparsityPattern dsp(slab_its.dof->n_dofs_spacetime());
  idealii::slab::DoFTools::make_upwind_sparsity_pattern(*slab_its.dof, dsp);

To save memory we distribute the sparsity pattern to only hold the locally needed set of space-time degrees of freedom.

  dealii::SparsityTools::distribute_sparsity_pattern(
    dsp, slab_locally_owned_dofs, mpi_comm, slab_locally_relevant_dofs);

  slab_system_matrix.reinit(slab_locally_owned_dofs,
                            slab_locally_owned_dofs,
                            dsp);

  slab_its.solution->reinit(slab_locally_owned_dofs,
                            slab_locally_relevant_dofs,
                            mpi_comm);
  slab_system_rhs.reinit(slab_locally_owned_dofs, mpi_comm);
}

Step3::assemble_system_on_slab

We only can calculate the contributions of processor local cells. Otherwise the assembly is the same as in step-1

void
Step3::assemble_system_on_slab()
{
  idealii::spacetime::QGauss<2> quad(fe.spatial()->degree + 2,
                                     fe.temporal()->degree + 2);

  idealii::spacetime::FEValues<2> fe_values_spacetime(
    fe,
    quad,
    dealii::update_values | dealii::update_gradients |
      dealii::update_quadrature_points | dealii::update_JxW_values);

  idealii::spacetime::FEJumpValues<2> fe_jump_values_spacetime(
    fe,
    quad,
    dealii::update_values | dealii::update_gradients |
      dealii::update_quadrature_points | dealii::update_JxW_values);

  RightHandSide      right_hand_side;
  const unsigned int dofs_per_spacetime_cell = fe.dofs_per_cell;

  auto N = slab_its.tria->temporal()->n_global_active_cells();
  dealii::FullMatrix<double> cell_matrix(N * dofs_per_spacetime_cell,
                                         N * dofs_per_spacetime_cell);
  dealii::Vector<double>     cell_rhs(N * dofs_per_spacetime_cell);

  std::vector<dealii::types::global_dof_index> local_spacetime_dof_index(
    N * dofs_per_spacetime_cell);

  unsigned int n;
  unsigned int n_quad_spacetime = fe_values_spacetime.n_quadrature_points;
  unsigned int n_quad_space     = quad.spatial()->size();
  for (const auto &cell_space :
       slab_its.dof->spatial()->active_cell_iterators())
    {
      if (cell_space->is_locally_owned())
        {
          fe_values_spacetime.reinit_space(cell_space);
          fe_jump_values_spacetime.reinit_space(cell_space);
          std::vector<double> initial_values(
            fe_values_spacetime.spatial()->n_quadrature_points);
          fe_values_spacetime.spatial()->get_function_values(slab_initial_value,
                                                             initial_values);

          cell_matrix = 0;
          cell_rhs    = 0;
          for (const auto &cell_time :
               slab_its.dof->temporal()->active_cell_iterators())
            {
              n = cell_time->index();
              fe_values_spacetime.reinit_time(cell_time);
              fe_jump_values_spacetime.reinit_time(cell_time);
              fe_values_spacetime.get_local_dof_indices(
                local_spacetime_dof_index);

              for (unsigned int q = 0; q < n_quad_spacetime; ++q)
                {
                  right_hand_side.set_time(
                    fe_values_spacetime.time_quadrature_point(q));
                  const auto &x_q =
                    fe_values_spacetime.space_quadrature_point(q);
                  for (unsigned int i = 0; i < dofs_per_spacetime_cell; ++i)
                    {
                      for (unsigned int j = 0; j < dofs_per_spacetime_cell; ++j)
                        {

\((\partial_t u,v)\)

                          cell_matrix(i + n * dofs_per_spacetime_cell,
                                      j + n * dofs_per_spacetime_cell) +=
                            fe_values_spacetime.shape_value(i, q) *
                            fe_values_spacetime.shape_dt(j, q) *
                            fe_values_spacetime.JxW(q);

\((\nabla u, \nabla v)\)

                          cell_matrix(i + n * dofs_per_spacetime_cell,
                                      j + n * dofs_per_spacetime_cell) +=
                            fe_values_spacetime.shape_space_grad(i, q) *
                            fe_values_spacetime.shape_space_grad(j, q) *
                            fe_values_spacetime.JxW(q);

                        } // dofs j

                      cell_rhs(i + n * dofs_per_spacetime_cell) +=
                        fe_values_spacetime.shape_value(i, q) *
                        right_hand_side.value(x_q) * fe_values_spacetime.JxW(q);

                    } // dofs i

                } // quad

              for (unsigned int q = 0; q < n_quad_space; ++q)
                {
                  for (unsigned int i = 0; i < dofs_per_spacetime_cell; ++i)
                    {
                      for (unsigned int j = 0; j < dofs_per_spacetime_cell; ++j)
                        {

\((u^+, v^+)\)

                          cell_matrix(i + n * dofs_per_spacetime_cell,
                                      j + n * dofs_per_spacetime_cell) +=
                            fe_jump_values_spacetime.shape_value_plus(i, q) *
                            fe_jump_values_spacetime.shape_value_plus(j, q) *
                            fe_jump_values_spacetime.JxW(q);

\(-(u^-, v^+)\)

                          if (n > 0)
                            {
                              cell_matrix(i + n * dofs_per_spacetime_cell,
                                          j + (n - 1) *
                                                dofs_per_spacetime_cell) -=
                                fe_jump_values_spacetime.shape_value_plus(i,
                                                                          q) *
                                fe_jump_values_spacetime.shape_value_minus(j,
                                                                           q) *
                                fe_jump_values_spacetime.JxW(q);
                            }
                        } // dofs j

\(-(u_0,v^+)\)

                      if (n == 0)
                        {
                          cell_rhs(i) +=
                            fe_jump_values_spacetime.shape_value_plus(i, q) *
                            initial_values[q] // value of previous solution
                            * fe_jump_values_spacetime.JxW(q);
                        }
                    } // dofs i
                }

            } // cell time
          slab_constraints->distribute_local_to_global(
            cell_matrix,
            cell_rhs,
            local_spacetime_dof_index,
            slab_system_matrix,
            slab_system_rhs);
        }
    } // cell space

We need to communicate local contributions to other processors after assembly

  slab_system_matrix.compress(dealii::VectorOperation::add);
  slab_system_rhs.compress(dealii::VectorOperation::add);
}

Step3::solve_system_on_slab

void
Step3::solve_system_on_slab()
{

The interface needs a solver control that is practically irrelevant as we use a direct solver without convergence criterion

  dealii::SolverControl sc(10000, 1.0e-14, false, false);

Amesos_Klu is always installed with Trilinos Amesos. Note that both SuperLU_dist and MUMPS might have a better performance.

  dealii::TrilinosWrappers::SolverDirect::AdditionalData ad(false,
                                                            "Amesos_Klu");
  dealii::TrilinosWrappers::SolverDirect                 solver(sc, ad);

In this interface the factorization step is called initialize

  solver.initialize(slab_system_matrix);
  solver.solve(slab_owned_tmp, slab_system_rhs);
  slab_constraints->distribute(slab_owned_tmp);

The solver can only write into a vector with locally owned dofs so we need to communicate that result between the processors

  *slab_its.solution = slab_owned_tmp;
}

Step3::process_results_on_slab

Here, we add the local contribution to \((u-u_{kh},u-u_{kh})_{L^2(Q)}=||u-u_{kh}||_{L^2(Q)}^2\).

void
Step3::process_results_on_slab()
{
  idealii::spacetime::QGauss<2> quad(fe.spatial()->degree + 2,
                                     fe.temporal()->degree + 2);
  L2_sqr_error +=
    idealii::slab::VectorTools::calculate_L2L2_squared_error_on_slab<2>(
      *slab_its.dof, *slab_its.solution, exact_solution, quad

    );
}

Step3::output_results_on_slab

void
Step3::output_results_on_slab()
{
  auto n_dofs = slab_its.dof->n_dofs_time();

To distinguish output between the different support types we append the name to the output filename

  std::string support_type;
  if (fe.type() == DGFE::support_type::Lobatto)
    support_type = "Lobatto";
  else if (fe.type() == DGFE::support_type::Legendre)
    support_type = "Legendre";
  else if (fe.type() == DGFE::support_type::RadauLeft)
    support_type = "RadauLeft";
  else
    support_type = "RadauRight";

  dealii::TrilinosWrappers::MPI::Vector tmp = *slab_its.solution;
  for (unsigned i = 0; i < n_dofs; i++)
    {
      dealii::DataOut<2> data_out;
      data_out.attach_dof_handler(*slab_its.dof->spatial());
      dealii::TrilinosWrappers::MPI::Vector local_solution;
      local_solution.reinit(space_locally_owned_dofs,
                            space_locally_relevant_dofs,
                            mpi_comm);

      idealii::slab::VectorTools::extract_subvector_at_time_dof(tmp,
                                                                local_solution,
                                                                i);
      data_out.add_data_vector(local_solution, "Solution");
      data_out.build_patches();
      std::ostringstream filename;
      filename << "solution_" << support_type << "_cG(" << fe.spatial()->degree
               << ")dG(" << fe.temporal()->degree << ")_t_" << slab * n_dofs + i
               << ".vtu";

instead of a vtk we use the parallel write function

      data_out.write_vtu_in_parallel(filename.str().c_str(), mpi_comm);
    }
}

The main function

We want to be able to do paramter studies without having to recompile the whole program. An option would be the ParameterHandler class of deal.II, but passing command line arguments makes the study easier to automate.

int
main(int argc, char *argv[])
{

With MPI we need to begin with an InitFinalize call.

  dealii::Utilities::MPI::MPI_InitFinalize mpi(argc, argv, 1);

Trilinos Teuchos offers a nice interface to specify and parse command line arguments. Single options are added using one of the various setOption methods.

  Teuchos::CommandLineProcessor clp;
  clp.setDocString("This example program demonstrates solving the heat "
                   "equation with Trilinos + MPI");
  bool write_vtu = true;
  clp.setOption("write-vtu",
                "no-vtu",
                &write_vtu,
                "Write results into vtu files?");

Finite element orders for cG(s)dG(r) element.

  int s = 1;
  int r = 0;
  clp.setOption("s", &s, "spatial FE degree");
  clp.setOption("r", &r, "temporal FE degree");

Number of temporal elements

  int M = 100;
  clp.setOption("M", &M, "Number of temporal elements");

Number of spatial refinements resulting in \(h = 0.5^(n_{\text{ref})\)

  int n_ref_space = 6;
  clp.setOption("n-ref-space", &n_ref_space, "Number of spatial refinements");

Which temporal support type do we want to use?

  DGFE::support_type       st          = DGFE::support_type::Lobatto;
  const DGFE::support_type st_values[] = {DGFE::support_type::Lobatto,
                                          DGFE::support_type::Legendre,
                                          DGFE::support_type::RadauLeft,
                                          DGFE::support_type::RadauRight};
  const char *st_names[] = {"Lobatto", "Legendre", "RadauLeft", "RadauRight"};
  clp.setOption("support-type",
                &st,
                4,
                st_values,
                st_names,
                "Location of temporal FE support points");

  clp.throwExceptions(false);

  Teuchos::CommandLineProcessor::EParseCommandLineReturn parse_return =
    clp.parse(argc, argv);
  if (parse_return == Teuchos::CommandLineProcessor::PARSE_HELP_PRINTED)
    {
      return 0; // don't fail if the program was called with ``--help``.
    }
  if (parse_return != Teuchos::CommandLineProcessor::PARSE_SUCCESSFUL)
    {
      return 1; // Error!
    }

  Step3 problem(s, r, M, n_ref_space, st, write_vtu);
  problem.run();
}

Results

As in the previous steps, we start with a video of the solution:

The .vtu files used to produce this animation have again been obtained by running the program with default parameters.

Convergence studies

As \(u\) is nonlinear we can use it to study the convergence behaviour of the different discretizations, namely dG(0) to dG(2) with bilinear Q1 elements in space as well as dG(2) with biquadratic elements. To get a better understanding we will look at uniform refinement only in time or space with a sufficently small mesh in the other dimensions as well as uniform space-time refinement.

To make the following results easier to produce we provide to scripts in the R language. Both assume that you have built the executable out-of-source by calling cmake -S. -Bbuild in the configure stage. We recommend setting the build type to reduce with cmake -S. -Bbuild -DCMAKE_BUILD_TYPE=Release to get the best performance. Depending on your system the simulations might still run a few hours, The first, run_simuatlions.R starts by defining some convenience functions.

  • simulationcall for constructing the calls to mpirun, if you want/have to use more/less cores for the simulations you can change num_mpi_cores,

  • runsim for running through a set of spatial and temporal refinement parameters and saving their respective results in .csv files,

  • *_refinement for constructing the parameter sets to pass to runsim depending on the type of refinement.

Finally, the script calls these refinement functions for the different discretization choices with Gauss-Lobatto support points. As well as k-refinement for at least piecewise linear temporal elements for all support types to compare them.

h-refinement

All (spatial) h-refinement runs start at the same spatial refinement level (4), but have different temporal meshes such that the \(L^2\)-errors are similar enough to fit in the same plot.

As we can see from the following image, all refinements with bilinear elements start parallel to each other. This is of course expected as the spatial convergence order does not depend on the temporal discretization. However, we see that the curve for dG(0) elements stagnates. This is due to the fact that the overall \(L^2\)-error is composed of a spatial and a temporal part. Since we do not refine the temporal mesh, the temporal error dominates for dG(0) and the overall error converges to that part.

../_images/h_convergence_step-3.svg

k-refinement

For the (temporal) k-refinement, all runs start with 10 temporal elements. The spatial refinement is chosen in a way that all runs have the same amount of spatial DoFs, i.e. 7 refinement steps for bilinear and 6 for biquadratic elements. We can again see that elements of the same order exhibit the same overall convergence behaviour. Here, due to the number of spatial DoFs being identical, the curves for dG(2) are even overlapping in the first few steps. In this figure we can see the dominance of the non-refined error part even more clearly, as the dG(1) and dG(2) curve for bilinear elements converge to the same value. As the dG(0) curve is still above that error we don’t yet see a stagnation. Additionally, we see that the curve for the biquadratic elements also stagnates, but to a lower value.

../_images/k_convergence_step-3.svg

kh-refinement

For the uniform (space-time) kh-refinement we again start at 4 spatial refinement levels for all element combinations and use different initial temporal meshes to plot the curves close to each other. We can see three different convergence orders and we can see that the overall convergence is determined by the minimum of the spatial and temporal convergence order as a dG(2) discretization for bilinear elements produces about the same errors, but with \(3/2\) of the amount of unknowns. More precisely, the relative error between cG(1)dG(1) and cG(1)dG(2) is around 1% on the same space-time mesh.

To conclude, we see that matching the temporal and spatial element orders is important for achieving the optimal convergence. Luckily, due to the construction of the tensor product elements it is also easy to match the orders.

../_images/kh_convergence_step-3.svg

Comparison of support types

Finally, we have run the k-refinement studies for the different choices of support points and compared the results to the Gauss-Lobatto runs. The tables below give the factor between the respective support point results and the Lobatto results in percent, i.e. \(100*L^2_{\text{other}}/L^2_{\text{Lobatto}\). We can see that for an increasing number of temporal elements and for higher order discretizations, the choice of support points gets less important for this particular problem. However, we see that the left Gauss-Radau rule consistently yields the lowest \(L^2\)-error, followed by Gauss-Legendre and the right Gauss-Radau rule. When keeping in mind that the left Gauss-Radau rule also has the second smallest sparsity pattern (after Gauss-Lobatto), it makes it clear that choosing a different support type than Gauss-Lobatto might be worthwhile depending on the problem.

cG(1)dG(1)

#DoFs

RadauLeft

Legendre

RadauRight

332820

96.62%

97.04%

98.52%

665640

96.86%

97.62%

99.03%

1331280

97.37%

97.98%

99.02%

2662560

97.87%

98.20%

98.77%

cG(1)dG(2)

#DoFs

RadauLeft

Legendre

RadauRight

499230

99.22%

99.30%

99.85%

998460

99.25%

99.31%

99.71%

1996920

99.79%

99.81%

99.91%

3993840

99.75%

99.79%

99.99%

cG(2)dG(2)

#DoFs

RadauLeft

Legendre

RadauRight

499230

99.22%

99.30%

99.86%

998460

99.23%

99.29%

99.72%

1996920

99.32%

99.42%

99.72%

3993840

99.75%

99.80%

99.92%

The plain program

#include <ideal.II/distributed/fixed_tria.hh>

#include <ideal.II/lac/spacetime_trilinos_vector.hh>

#include <cmath>

#include <ideal.II/base/quadrature_lib.hh>
#include <ideal.II/base/time_iterator.hh>

#include <ideal.II/dofs/slab_dof_tools.hh>
#include <ideal.II/dofs/spacetime_dof_handler.hh>

#include <ideal.II/fe/fe_dg.hh>
#include <ideal.II/fe/spacetime_fe_values.hh>

#include <ideal.II/numerics/vector_tools.hh>
using DGFE = idealii::spacetime::DG_FiniteElement<2>;


#include <deal.II/base/conditional_ostream.h>
#include <deal.II/base/function.h>

#include <deal.II/distributed/tria.h>

#include <deal.II/fe/fe_q.h>

#include <deal.II/grid/grid_generator.h>

#include <deal.II/lac/dynamic_sparsity_pattern.h>
#include <deal.II/lac/full_matrix.h>
#include <deal.II/lac/sparsity_pattern.h>
#include <deal.II/lac/sparsity_tools.h>
#include <deal.II/lac/trilinos_solver.h>
#include <deal.II/lac/trilinos_sparse_matrix.h>
#include <deal.II/lac/trilinos_vector.h>
#include <deal.II/lac/vector.h>

#include <deal.II/numerics/data_out.h>

#include <Teuchos_CommandLineProcessor.hpp>

#include <fstream>


class ExactSolution : public dealii::Function<2, double>
{
public:
  ExactSolution()
    : dealii::Function<2, double>()
  {}
  double
  value(const dealii::Point<2> &p, const unsigned int component = 0) const;
};

double
ExactSolution::value(const dealii::Point<2>             &p,
                     [[maybe_unused]] const unsigned int component) const
{
  const double t  = get_time();
  const double x0 = 0.5 + 0.25 * std::cos(2. * M_PI * t);
  const double x1 = 0.5 + 0.25 * std::sin(2. * M_PI * t);
  return 1. /
         (1. + 50. * ((p[0] - x0) * (p[0] - x0) + (p[1] - x1) * (p[1] - x1)));
}

class RightHandSide : public dealii::Function<2>
{
public:
  virtual double
  value(const dealii::Point<2> &p, const unsigned int component = 0);
};

double
RightHandSide::value(const dealii::Point<2>             &p,
                     [[maybe_unused]] const unsigned int component)
{
  double       t  = get_time();
  const double x0 = 0.5 + 0.25 * std::cos(2. * M_PI * t);
  const double x1 = 0.5 + 0.25 * std::sin(2. * M_PI * t);
  double       a  = 50.;
  const double divisor =
    1. + a * ((p[0] - x0) * (p[0] - x0) + (p[1] - x1) * (p[1] - x1));

  double dtu = -((a * (p[0] - x0) * M_PI * std::sin(2. * M_PI * t)) -
                 (a * (p[1] - x1) * M_PI * std::cos(2. * M_PI * t))) /
               (divisor * divisor);

  const double u_xx =
    -2. * a *
    (1. / (divisor * divisor) + 2. * a * (p[0] - x0) * (p[0] - x0) *
                                  (-2. / (divisor * divisor * divisor)));

  const double u_yy =
    -2. * a *
    (1. / (divisor * divisor) + 2. * a * (p[1] - x1) * (p[1] - x1) *
                                  (-2. / (divisor * divisor * divisor)));

  return dtu - (u_xx + u_yy);
}

class Step3
{
public:
  Step3(unsigned int       spatial_degree,
        unsigned int       temporal_degree,
        unsigned int       M,
        unsigned int       n_ref_space,
        DGFE::support_type support_type,
        bool               write_vtu);
  void
  run();

private:
  void
  make_grid();
  void
  time_marching();
  void
  setup_system_on_slab();
  void
  assemble_system_on_slab();
  void
  solve_system_on_slab();
  void
  process_results_on_slab();
  void
  output_results_on_slab();

  MPI_Comm     mpi_comm;    // MPI Communicator
  unsigned int M;           // Number of temporal elements
  unsigned int n_ref_space; // number of spatial refinements
  bool         write_vtu;   // output results?

  dealii::ConditionalOStream pout; // To allow output only on MPI rank 0
  idealii::spacetime::parallel::distributed::fixed::Triangulation<2>
                                     triangulation;
  idealii::spacetime::DoFHandler<2>  dof_handler;
  idealii::spacetime::TrilinosVector solution;

  idealii::spacetime::DG_FiniteElement<2> fe;

  dealii::SparsityPattern                            slab_sparsity_pattern;
  std::shared_ptr<dealii::AffineConstraints<double>> slab_constraints;
  dealii::TrilinosWrappers::SparseMatrix             slab_system_matrix;
  dealii::TrilinosWrappers::MPI::Vector              slab_system_rhs;
  dealii::TrilinosWrappers::MPI::Vector              slab_initial_value;
  unsigned int                                       slab;
  ExactSolution                                      exact_solution;
  double                                             L2_sqr_error;
  unsigned int                                       dofs_total;

  dealii::IndexSet                      slab_locally_owned_dofs;
  dealii::IndexSet                      slab_locally_relevant_dofs;
  dealii::IndexSet                      space_locally_owned_dofs;
  dealii::IndexSet                      space_locally_relevant_dofs;
  dealii::TrilinosWrappers::MPI::Vector slab_owned_tmp;

  struct
  {
    idealii::slab::parallel::distributed::TriaIterator<2> tria;
    idealii::slab::DoFHandlerIterator<2>                  dof;
    idealii::slab::TrilinosVectorIterator                 solution;
  } slab_its;
};
Step3::Step3(unsigned int       spatial_degree,
             unsigned int       temporal_degree,
             unsigned int       M,
             unsigned int       n_ref_space,
             DGFE::support_type support_type,
             bool               write_vtu)
  : mpi_comm(MPI_COMM_WORLD)
  , pout(std::cout, dealii::Utilities::MPI::this_mpi_process(mpi_comm) == 0)
  , triangulation()
  , dof_handler(&triangulation)
  , fe(std::make_shared<dealii::FE_Q<2>>(spatial_degree),
       temporal_degree,
       support_type)
  , M(M)
  , n_ref_space(n_ref_space)
  , write_vtu(write_vtu)
  , slab(0)
  , exact_solution()
  , L2_sqr_error(0.)
  , dofs_total(0)
{}

void
Step3::run() // same as before
{
  make_grid();
  time_marching();
}

void
Step3::make_grid()
{
  auto space_tria = // Construct an MPI parallel triangulation
    std::make_shared<dealii::parallel::distributed::Triangulation<2>>(mpi_comm);
  dealii::GridGenerator::hyper_cube(*space_tria);
  triangulation.generate(space_tria, M);
  triangulation.refine_global(n_ref_space, 0);
  dof_handler.generate();
}


void
Step3::time_marching()
{
  idealii::TimeIteratorCollection<2> tic = idealii::TimeIteratorCollection<2>();

  solution.reinit(triangulation.M());

  slab_its.tria     = triangulation.begin();
  slab_its.dof      = dof_handler.begin();
  slab_its.solution = solution.begin();
  slab              = 0;
  tic.add_iterator(&slab_its.tria, &triangulation);
  tic.add_iterator(&slab_its.dof, &dof_handler);
  tic.add_iterator(&slab_its.solution, &solution);
  pout << "*******Starting time-stepping*********" << std::endl;
  for (; !tic.at_end(); tic.increment())
    {
      pout << "Starting time-step (" << slab_its.tria->startpoint() << ","
           << slab_its.tria->endpoint() << "]" << std::endl;

      setup_system_on_slab();
      assemble_system_on_slab();
      solve_system_on_slab();
      process_results_on_slab();
      if (write_vtu)
        output_results_on_slab();
      idealii::slab::VectorTools::extract_subvector_at_time_point(
        *slab_its.dof,
        *slab_its.solution,
        slab_initial_value,
        slab_its.tria->endpoint());
      slab++;
    }
  double L2_error = std::sqrt(L2_sqr_error);
  pout << "Total number of space-time DoFs:\n\t" << dofs_total << std::endl;
  pout << "L2_error:\n\t" << L2_error << std::endl;
}

void
Step3::setup_system_on_slab()
{
  slab_its.dof->distribute_dofs(fe);
  pout << "Number of degrees of freedom: \n\t" << slab_its.dof->n_dofs_space()
       << " (space) * " << slab_its.dof->n_dofs_time()
       << " (time) = " << slab_its.dof->n_dofs_spacetime() << std::endl;

  dofs_total += slab_its.dof->n_dofs_spacetime();
  space_locally_owned_dofs = slab_its.dof->spatial()->locally_owned_dofs();
  dealii::DoFTools::extract_locally_relevant_dofs(*slab_its.dof->spatial(),
                                                  space_locally_relevant_dofs);

  slab_locally_owned_dofs = slab_its.dof->locally_owned_dofs();
  slab_locally_relevant_dofs =
    idealii::slab::DoFTools::extract_locally_relevant_dofs(*slab_its.dof);

  slab_owned_tmp.reinit(slab_locally_owned_dofs, mpi_comm);

  if (slab == 0)
    {
      slab_initial_value.reinit(space_locally_owned_dofs,
                                space_locally_relevant_dofs,
                                mpi_comm);

      dealii::TrilinosWrappers::MPI::Vector tmp;
      tmp.reinit(space_locally_owned_dofs, mpi_comm);

      exact_solution.set_time(0);
      dealii::VectorTools::interpolate(*slab_its.dof->spatial(),
                                       exact_solution,
                                       tmp);
      slab_initial_value = tmp;
    }

  slab_constraints = std::make_shared<dealii::AffineConstraints<double>>();
  idealii::slab::VectorTools::interpolate_boundary_values(
    space_locally_relevant_dofs,
    *slab_its.dof,
    0,
    exact_solution,
    slab_constraints);
  slab_constraints->close();

  dealii::DynamicSparsityPattern dsp(slab_its.dof->n_dofs_spacetime());
  idealii::slab::DoFTools::make_upwind_sparsity_pattern(*slab_its.dof, dsp);

  dealii::SparsityTools::distribute_sparsity_pattern(
    dsp, slab_locally_owned_dofs, mpi_comm, slab_locally_relevant_dofs);

  slab_system_matrix.reinit(slab_locally_owned_dofs,
                            slab_locally_owned_dofs,
                            dsp);

  slab_its.solution->reinit(slab_locally_owned_dofs,
                            slab_locally_relevant_dofs,
                            mpi_comm);
  slab_system_rhs.reinit(slab_locally_owned_dofs, mpi_comm);
}

void
Step3::assemble_system_on_slab()
{
  idealii::spacetime::QGauss<2> quad(fe.spatial()->degree + 2,
                                     fe.temporal()->degree + 2);

  idealii::spacetime::FEValues<2> fe_values_spacetime(
    fe,
    quad,
    dealii::update_values | dealii::update_gradients |
      dealii::update_quadrature_points | dealii::update_JxW_values);

  idealii::spacetime::FEJumpValues<2> fe_jump_values_spacetime(
    fe,
    quad,
    dealii::update_values | dealii::update_gradients |
      dealii::update_quadrature_points | dealii::update_JxW_values);

  RightHandSide      right_hand_side;
  const unsigned int dofs_per_spacetime_cell = fe.dofs_per_cell;

  auto N = slab_its.tria->temporal()->n_global_active_cells();
  dealii::FullMatrix<double> cell_matrix(N * dofs_per_spacetime_cell,
                                         N * dofs_per_spacetime_cell);
  dealii::Vector<double>     cell_rhs(N * dofs_per_spacetime_cell);

  std::vector<dealii::types::global_dof_index> local_spacetime_dof_index(
    N * dofs_per_spacetime_cell);

  unsigned int n;
  unsigned int n_quad_spacetime = fe_values_spacetime.n_quadrature_points;
  unsigned int n_quad_space     = quad.spatial()->size();
  for (const auto &cell_space :
       slab_its.dof->spatial()->active_cell_iterators())
    {
      if (cell_space->is_locally_owned())
        {
          fe_values_spacetime.reinit_space(cell_space);
          fe_jump_values_spacetime.reinit_space(cell_space);
          std::vector<double> initial_values(
            fe_values_spacetime.spatial()->n_quadrature_points);
          fe_values_spacetime.spatial()->get_function_values(slab_initial_value,
                                                             initial_values);

          cell_matrix = 0;
          cell_rhs    = 0;
          for (const auto &cell_time :
               slab_its.dof->temporal()->active_cell_iterators())
            {
              n = cell_time->index();
              fe_values_spacetime.reinit_time(cell_time);
              fe_jump_values_spacetime.reinit_time(cell_time);
              fe_values_spacetime.get_local_dof_indices(
                local_spacetime_dof_index);

              for (unsigned int q = 0; q < n_quad_spacetime; ++q)
                {
                  right_hand_side.set_time(
                    fe_values_spacetime.time_quadrature_point(q));
                  const auto &x_q =
                    fe_values_spacetime.space_quadrature_point(q);
                  for (unsigned int i = 0; i < dofs_per_spacetime_cell; ++i)
                    {
                      for (unsigned int j = 0; j < dofs_per_spacetime_cell; ++j)
                        {
                          cell_matrix(i + n * dofs_per_spacetime_cell,
                                      j + n * dofs_per_spacetime_cell) +=
                            fe_values_spacetime.shape_value(i, q) *
                            fe_values_spacetime.shape_dt(j, q) *
                            fe_values_spacetime.JxW(q);

                          cell_matrix(i + n * dofs_per_spacetime_cell,
                                      j + n * dofs_per_spacetime_cell) +=
                            fe_values_spacetime.shape_space_grad(i, q) *
                            fe_values_spacetime.shape_space_grad(j, q) *
                            fe_values_spacetime.JxW(q);

                        } // dofs j

                      cell_rhs(i + n * dofs_per_spacetime_cell) +=
                        fe_values_spacetime.shape_value(i, q) *
                        right_hand_side.value(x_q) * fe_values_spacetime.JxW(q);

                    } // dofs i

                } // quad

              for (unsigned int q = 0; q < n_quad_space; ++q)
                {
                  for (unsigned int i = 0; i < dofs_per_spacetime_cell; ++i)
                    {
                      for (unsigned int j = 0; j < dofs_per_spacetime_cell; ++j)
                        {
                          cell_matrix(i + n * dofs_per_spacetime_cell,
                                      j + n * dofs_per_spacetime_cell) +=
                            fe_jump_values_spacetime.shape_value_plus(i, q) *
                            fe_jump_values_spacetime.shape_value_plus(j, q) *
                            fe_jump_values_spacetime.JxW(q);

                          if (n > 0)
                            {
                              cell_matrix(i + n * dofs_per_spacetime_cell,
                                          j + (n - 1) *
                                                dofs_per_spacetime_cell) -=
                                fe_jump_values_spacetime.shape_value_plus(i,
                                                                          q) *
                                fe_jump_values_spacetime.shape_value_minus(j,
                                                                           q) *
                                fe_jump_values_spacetime.JxW(q);
                            }
                        } // dofs j
                      if (n == 0)
                        {
                          cell_rhs(i) +=
                            fe_jump_values_spacetime.shape_value_plus(i, q) *
                            initial_values[q] // value of previous solution
                            * fe_jump_values_spacetime.JxW(q);
                        }
                    } // dofs i
                }

            } // cell time
          slab_constraints->distribute_local_to_global(
            cell_matrix,
            cell_rhs,
            local_spacetime_dof_index,
            slab_system_matrix,
            slab_system_rhs);
        }
    } // cell space

  slab_system_matrix.compress(dealii::VectorOperation::add);
  slab_system_rhs.compress(dealii::VectorOperation::add);
}

void
Step3::solve_system_on_slab()
{
  dealii::SolverControl sc(10000, 1.0e-14, false, false);
  dealii::TrilinosWrappers::SolverDirect::AdditionalData ad(false,
                                                            "Amesos_Klu");
  dealii::TrilinosWrappers::SolverDirect                 solver(sc, ad);
  solver.initialize(slab_system_matrix);
  solver.solve(slab_owned_tmp, slab_system_rhs);
  slab_constraints->distribute(slab_owned_tmp);
  *slab_its.solution = slab_owned_tmp;
}

void
Step3::process_results_on_slab()
{
  idealii::spacetime::QGauss<2> quad(fe.spatial()->degree + 2,
                                     fe.temporal()->degree + 2);
  L2_sqr_error +=
    idealii::slab::VectorTools::calculate_L2L2_squared_error_on_slab<2>(
      *slab_its.dof, *slab_its.solution, exact_solution, quad

    );
}
void
Step3::output_results_on_slab()
{
  auto n_dofs = slab_its.dof->n_dofs_time();

  std::string support_type;
  if (fe.type() == DGFE::support_type::Lobatto)
    support_type = "Lobatto";
  else if (fe.type() == DGFE::support_type::Legendre)
    support_type = "Legendre";
  else if (fe.type() == DGFE::support_type::RadauLeft)
    support_type = "RadauLeft";
  else
    support_type = "RadauRight";

  dealii::TrilinosWrappers::MPI::Vector tmp = *slab_its.solution;
  for (unsigned i = 0; i < n_dofs; i++)
    {
      dealii::DataOut<2> data_out;
      data_out.attach_dof_handler(*slab_its.dof->spatial());
      dealii::TrilinosWrappers::MPI::Vector local_solution;
      local_solution.reinit(space_locally_owned_dofs,
                            space_locally_relevant_dofs,
                            mpi_comm);

      idealii::slab::VectorTools::extract_subvector_at_time_dof(tmp,
                                                                local_solution,
                                                                i);
      data_out.add_data_vector(local_solution, "Solution");
      data_out.build_patches();
      std::ostringstream filename;
      filename << "solution_" << support_type << "_cG(" << fe.spatial()->degree
               << ")dG(" << fe.temporal()->degree << ")_t_" << slab * n_dofs + i
               << ".vtu";
      data_out.write_vtu_in_parallel(filename.str().c_str(), mpi_comm);
    }
}

int
main(int argc, char *argv[])
{
  dealii::Utilities::MPI::MPI_InitFinalize mpi(argc, argv, 1);

  Teuchos::CommandLineProcessor clp;
  clp.setDocString("This example program demonstrates solving the heat "
                   "equation with Trilinos + MPI");
  bool write_vtu = true;
  clp.setOption("write-vtu",
                "no-vtu",
                &write_vtu,
                "Write results into vtu files?");
  int s = 1;
  int r = 0;
  clp.setOption("s", &s, "spatial FE degree");
  clp.setOption("r", &r, "temporal FE degree");
  int M = 100;
  clp.setOption("M", &M, "Number of temporal elements");
  int n_ref_space = 6;
  clp.setOption("n-ref-space", &n_ref_space, "Number of spatial refinements");
  DGFE::support_type       st          = DGFE::support_type::Lobatto;
  const DGFE::support_type st_values[] = {DGFE::support_type::Lobatto,
                                          DGFE::support_type::Legendre,
                                          DGFE::support_type::RadauLeft,
                                          DGFE::support_type::RadauRight};
  const char *st_names[] = {"Lobatto", "Legendre", "RadauLeft", "RadauRight"};
  clp.setOption("support-type",
                &st,
                4,
                st_values,
                st_names,
                "Location of temporal FE support points");

  clp.throwExceptions(false);

  Teuchos::CommandLineProcessor::EParseCommandLineReturn parse_return =
    clp.parse(argc, argv);
  if (parse_return == Teuchos::CommandLineProcessor::PARSE_HELP_PRINTED)
    {
      return 0; // don't fail if the program was called with ``--help``.
    }
  if (parse_return != Teuchos::CommandLineProcessor::PARSE_SUCCESSFUL)
    {
      return 1; // Error!
    }

  Step3 problem(s, r, M, n_ref_space, st, write_vtu);
  problem.run();
}

References

[Hartmann1998]

Hartmann, R. A- posteriori Fehlerschätzung und adaptive Schrittweiten- und Ortsgittersteuerung bei Galerkin-Verfahren für die Wärmeleitungsgleichung Diploma Thesis, University of Heidelberg, 1998