9.1 Built-In-Reducers#

Kokkos provides Reducers for the most common reduction types:

  • BAnd: Do a binary “and” reduction

  • BOr: Do a binary “or” reduction

  • LAnd: Do a logical “and” reduction

  • LOr: Do a logical “or” reduction

  • Max: Finding the maximum value

  • MaxLoc: Retrieve the maximum value as well as its associated index

  • Min: Finding the minimum value

  • MinLoc: Retrieve the minimum value as well as its associated index

  • MinMax: Finding the minimum and the maximum value

  • MinMaxLoc: Find both the maximum and minimum value as well as their associated indices

  • Prod: Computing the product of all input values

  • Sum: For simple Summations

These reducers work only for scalar data, i.e. you can’t have a runtime length array as the reduction type (for example finding the minimum values for each vector in a multi vector concurrently). Generally the Reducers are templated on the Scalar type for the reduction as well as an optional template parameter for the memory space of the result (more on that later). The MinLoc, MaxLoc and MinMaxLoc reducers are additionally templated on the index type.

The following is an example for doing a simple min-reduction, finding the minimal value in a discretization of a parable.

double min;

Kokkos::parallel_reduce( "MinReduce", N, KOKKOS_LAMBDA (const int& x, double& lmin) {
  double val = (1.0*x- 7.2) * (1.0*x- 7.2) + 3.5;
  if( val < lmin ) lmin = val; 
}, Kokkos::Min<double>(min));

printf("Min: %lf\n", min);

In this example the Min reducer was templated on the reducing type double and the variable to store the result was taken in by reference. Note that the reducer is only used to combine values from different threads. The per thread reduction is still performed explicitly. One could have used the reducer for that as well through a reducer instance:

double min;

Kokkos::Min<double> min_reducer(min);
Kokkos::parallel_reduce( "MinReduce", N, KOKKOS_LAMBDA (const int& x, double& lmin) {
  double val = (1.0*x- 7.2) * (1.0*x- 7.2) + 3.5;
  min_reducer.join(lmin, val); 
}, min_reducer);

printf("Min: %lf\n", min);

For the MinLoc, MaxLoc and MinMaxLoc reducers the reduction type is a complex scalar type which is accessible through a value_type typedef. MinLoc and MaxLoc have value types which contain a val and loc member to store the reduction value and the index respectively. Note that index (loc) can be a struct itself, for example to store a multidimensional index result (see later).

typedef Kokkos::MinLoc<double,int>::value_type minloc_type;
minloc_type minloc;

Kokkos::parallel_reduce( "MinLocReduce", N, KOKKOS_LAMBDA (const int& x, minloc_type& lminloc) {
  double val = (1.0*x- 7.2) * (1.0*x- 7.2) + 3.5;
  if( val < lminloc.val ) { lminloc.val = val; lminloc.loc = x; }
}, Kokkos::MinLoc<double,int>(minloc));

printf("Min: %lf at %i\n", minloc.val, minloc.loc);

Reducers can be used in nested reductions. This example also makes use of a 2D index type to find the minimum and maximum value of a matrix as well as their indices.

Kokkos::View<double**> A("A",N,M);
// fill A

// Create a variable for the result
typedef Kokkos::MinMaxLoc<double, Kokkos::pair<int,int>> reducer_type;
typedef reducer_type::value_type value_type;
value_type minmaxloc

typedef Kokkos::TeamPolicy<>::member_type team_type;

// Start a team parallel reduce
Kokkos::parallel_reduce( "MinLocReduce", Kokkos::TeamPolicy<>(N,AUTO), 
    KOKKOS_LAMBDA (const team_type& team, value_type& team_minmaxloc) {

  // Create a temporary to store the reduction value for the row
  value_type row_minmaxloc;
  int n = team.league_rank();

  // Run a nested parallel reduce with the team over the row
  Kokkos::parallel_reduce( Kokkos::TeamThreadRange(team, M), 
      [=] (const int& m, value_type& thread_minmaxloc) {
    double val = A(n,m);
    
    // Check whether this is a new minimum or maximum value
    if(val < thread_minmaxloc.min_val) {
      thread_minmaxloc.min_val = val;
      thread_minmaxloc.min_loc = Kokkos::pair<int,int>(n,m);
    }
    if(val > thread_minmaxloc.max_val) {
      thread_minmaxloc.max_val = val;
      thread_minmaxloc.max_loc = Kokkos::pair<int,int>(n,m);
    }

  }, reducer_type(row_minmaxloc));
  
  // One guy in the team should contribute to the whole
  // Note: for a min or max reduction it wouldn't hurt if 
  //       every team member did this
  Kokkos::single(Kokkos::PerTeam(team), [=] () {
    if( row_minmaxloc.min_val < team_minmaxloc.min_val ) {
      team_minmaxloc.min_val = row_minmaxloc.min_val;
      team_minmaxloc.min_loc = row_minmaxloc.min_loc;
    }
    if( row_minmaxloc.max_val > team_minmax.max_val ) {
      team_minmaxloc.max_val = row_minmaxloc.max_val;
      team_minmaxloc.max_loc = row_minmaxloc.max_loc;
    }
  }
}, reducer_type(minmaxloc));

printf("Min %lf at (%i, %i)\n",minmaxloc.min_val, minmaxloc.min_loc.first, minmaxloc.min_loc.second);
printf("Max %lf at (%i, %i)\n",minmaxloc.max_val, minmaxloc.max_loc.first, minmaxloc.max_loc.second);