Sort#

template<class DstViewType, class SrcViewType>
struct copy_functor { }

template<class DstViewType, class PermuteViewType, class SrcViewType>
struct copy_permute_functor { }

class BinSort {
    template<class DstViewType, class SrcViewType> struct copy_functor { }
    template<class DstViewType, class PermuteViewType, class SrcViewType> struct copy_permute_functor { }
    template<class ValuesViewType> void sort( ValuesViewType const & values, int values_range_begin, int values_range_end ) const { }
    template<class ValuesViewType> void sort( ValuesViewType const & values ) const { }
}

template<class KeyViewType> struct BinOp1D { }
template<class KeyViewType> struct BinOp3D { }

template<class ViewType> void sort( ViewType const & view ) { }
template<class ViewType> void sort( ViewType view, size_t const begin, size_t const end ) {  }

Sorting with nested policies (team- and thread-level)#

Parallel sort functions for use within TeamPolicy kernels. These perform sorting using team-level (TeamThreadRange) or thread-level (ThreadVectorRange) parallelism.

Header: <Kokkos_NestedSort.hpp>

Synopsis#

//namespace Kokkos::Experimental

template <class TeamMember, class ViewType>
KOKKOS_INLINE_FUNCTION void sort_team(const TeamMember& t, const ViewType& view);

template <class TeamMember, class ViewType, class Comparator>
KOKKOS_INLINE_FUNCTION void sort_team(const TeamMember& t, const ViewType& view, const Comparator& comp);

template <class TeamMember, class KeyViewType, class ValueViewType>
KOKKOS_INLINE_FUNCTION void sort_by_key_team(
    const TeamMember& t, const KeyViewType& keyView, const ValueViewType& valueView);

template <class TeamMember, class KeyViewType, class ValueViewType, class Comparator>
KOKKOS_INLINE_FUNCTION void sort_by_key_team(
    const TeamMember& t, const KeyViewType& keyView, const ValueViewType& valueView, const Comparator& comp);

template <class TeamMember, class ViewType>
KOKKOS_INLINE_FUNCTION void sort_thread(const TeamMember& t, const ViewType& view);

template <class TeamMember, class ViewType, class Comparator>
KOKKOS_INLINE_FUNCTION void sort_thread(const TeamMember& t, const ViewType& view);

template <class TeamMember, class ViewType, class Comparator>
KOKKOS_INLINE_FUNCTION void sort_thread(const TeamMember& t, const ViewType& view, const Comparator& comp);

template <class TeamMember, class KeyViewType, class ValueViewType>
KOKKOS_INLINE_FUNCTION void sort_by_key_thread(
    const TeamMember& t, const KeyViewType& keyView, const ValueViewType& valueView);

template <class TeamMember, class KeyViewType, class ValueViewType, class Comparator>
KOKKOS_INLINE_FUNCTION void sort_by_key_thread(
    const TeamMember& t, const KeyViewType& keyView, const ValueViewType& valueView, const Comparator& comp);

sort_team and sort_by_key_team internally use the entire team, so they may be called inside the top level of TeamPolicy lambdas and functors. sort_thread and sort_by_key_thread use the vector lanes of a thread, so they may be called within either TeamPolicy or TeamThreadRange loops.

The sort_by_key functions sort keyView, while simultaneously applying the same permutation to the elements of valueView. It is equivalent to sorting (key[i], value[i]) tuples according to key. An example of where this is commonly used is to sort the entries and values in each row of a CRS (compressed row sparse) matrix. These functions require that keyView.extent(0) == valueView.extent(0).

Versions taking a Comparator object will use it to order the keys. Comparator::operator() should be a const member function that accepts two keys a and b, and returns a bool that is true if and only if a goes before b in the sorted list. For versions not taking a Comparator object, keys are sorted into ascending order (according to operator<). For example, this comparator will sort a view of int in descending order:

struct IntComparator {
    KOKKOS_FUNCTION constexpr bool operator()(const int& a, const int& b) const {
        return a > b; //a precedes b if a is larger
    }
};

Additional Information#

  • All functions include a final barrier at their level of parallelism, so all elements of view / keyView / valueView may be accessed immediately after they return.

  • These functions can operate on views in both global and scratch memory spaces.

  • These functions use the bitonic sorting algorithm, which is not stable. This means if a key is repeated in the input, then the values corresponding to that key might be in any order after doing a sort by key.

Example#

#include <Kokkos_Core.hpp>
#include <Kokkos_NestedSort.hpp>
#include <Kokkos_Random.hpp>

int main(int argc, char* argv[]) {
    using ExecSpace = Kokkos::DefaultExecutionSpace;
    using TeamPol = Kokkos::TeamPolicy<ExecSpace>;
    using TeamMem = typename TeamPol::member_type;
    Kokkos::initialize(argc, argv);
    {
        int n = 10;
        Kokkos::Random_XorShift64_Pool<ExecSpace> rand_pool(13718);
        Kokkos::View<int**, ExecSpace> A("A", n, n);
        Kokkos::fill_random(A, rand_pool, 100);
        Kokkos::parallel_for(
            TeamPol(n, Kokkos::AUTO()),
            KOKKOS_LAMBDA(const TeamMem& t)
            {
                //Sort a row of A using the whole team.
                auto A_row_i = Kokkos::subview(A, t.league_rank(), Kokkos::ALL());
                Kokkos::Experimental::sort_team(t, A_row_i);
            });
        auto Ahost = Kokkos::create_mirror_view_and_copy(Kokkos::HostSpace(), A);
        std::cout << "A, with each row sorted:\n";
        for(int i = 0; i < n; i++) {
            for(int j = 0; j < n; j++) {
                std::cout << Ahost(i, j) << ' ';
            }
            std::cout << '\n';
        }
        int vectorLen = TeamPol::vector_length_max();
        Kokkos::parallel_for(
            TeamPol(1, Kokkos::AUTO(), vectorLen),
            KOKKOS_LAMBDA(const TeamMem& t)
            {
                Kokkos::parallel_for(Kokkos::TeamThreadRange(t, n),
                    [=](int i)
                    {
                        //Now sort a column of A by using just this thread.
                        auto A_col_i = Kokkos::subview(A, Kokkos::ALL(), i);
                        Kokkos::Experimental::sort_thread(t, A_col_i);
                    });
            });
        Kokkos::deep_copy(Ahost, A);
        std::cout << "\nA, with each column sorted:\n";
        for(int i = 0; i < n; i++) {
            for(int j = 0; j < n; j++) {
                std::cout << Ahost(i, j) << ' ';
            }
            std::cout << '\n';
        }
    }
    Kokkos::finalize();
    return 0;
}

Sample output#

A, with each row sorted:
0 9 38 68 74 76 83 89 91 95
19 41 41 55 65 68 78 92 99 99
2 13 16 17 19 40 44 54 96 99
17 18 65 68 77 80 82 94 94 95
0 14 34 35 45 46 47 52 58 96
2 6 9 13 25 32 37 51 80 81
3 5 14 16 20 25 33 39 60 97
7 8 15 31 33 38 40 40 42 86
4 19 20 29 42 56 60 63 68 90
1 16 16 17 33 39 60 64 78 94

A, with each column sorted:
0 5 9 13 19 25 33 39 42 81
0 6 14 16 20 32 37 40 58 86
1 8 15 17 25 38 40 51 60 90
2 9 16 17 33 39 44 52 68 94
2 13 16 29 33 40 47 54 78 95
3 14 20 31 42 46 60 63 80 95
4 16 34 35 45 56 60 64 91 96
7 18 38 55 65 68 78 89 94 97
17 19 41 68 74 76 82 92 96 99
19 41 65 68 77 80 83 94 99 99