Back

sort

Overview

The sort function creates an ordered copy of a Serie based on various sorting criteria. It supports both ascending and descending order, custom comparator functions, sorting by a key function, and special handling for NaN values. The function is designed with immutability in mind, returning a new Serie instead of modifying the original.

Function Signatures

// Basic sort with order control
template <typename T>
Serie<T> sort(const Serie<T>& serie, 
                SortOrder order = SortOrder::ASCENDING,
                ExecutionPolicy exec = ExecutionPolicy::SEQ);

// Sort with custom comparator
template <typename T, typename Compare>
Serie<T> sort(const Serie<T>& serie, 
                Compare comp,
                ExecutionPolicy exec = ExecutionPolicy::SEQ);

// Sort by key function
template <typename T, typename KeyFunc>
Serie<T> sort_by(const Serie<T>& serie, 
                   KeyFunc key_func,
                   SortOrder order = SortOrder::ASCENDING,
                   ExecutionPolicy exec = ExecutionPolicy::SEQ);

// Sort with NaN handling
template <typename T>
Serie<T> sort_nan(const Serie<T>& serie, 
                    SortOrder order = SortOrder::ASCENDING,
                    bool nan_first = false,
                    ExecutionPolicy exec = ExecutionPolicy::SEQ);

// Pipeline binding functions
template <typename T>
auto bind_sort(SortOrder order = SortOrder::ASCENDING,
               ExecutionPolicy exec = ExecutionPolicy::SEQ);

template <typename Compare>
auto bind_sort_with(Compare comp, 
                    ExecutionPolicy exec = ExecutionPolicy::SEQ);

template <typename KeyFunc>
auto bind_sort_by(KeyFunc key_func, 
                  SortOrder order = SortOrder::ASCENDING,
                  ExecutionPolicy exec = ExecutionPolicy::SEQ);

Parameters

Parameter Type Description
serie const Serie<T>& The input Serie to sort.
order SortOrder The sort direction: ASCENDING or DESCENDING. Default is ASCENDING.
comp Compare Custom comparison function that should return true if the first element is less than the second.
key_func KeyFunc Function that extracts a key for comparison from each element.
nan_first bool If true, NaN values will be placed at the beginning; if false, at the end. Default is false.
exec ExecutionPolicy Execution policy for parallel algorithms (SEQ, PAR). Default is SEQ (sequential).

Return Value

All sort functions return a new Serie containing the same elements as the input Serie, but in sorted order. The returned Serie has the same type as the input Serie.

Example Usage

Basic Sorting
// Create a Serie with unsorted values
df::Serie<double> values{5.2, 1.7, 3.9, 2.1, 4.4};

// Sort in ascending order (default)
auto asc_values = df::sort(values);
// asc_values: {1.7, 2.1, 3.9, 4.4, 5.2}

// Sort in descending order
auto desc_values = df::sort(values, df::SortOrder::DESCENDING);
// desc_values: {5.2, 4.4, 3.9, 2.1, 1.7}

// Sort strings (lexicographically)
df::Serie<std::string> names{"Charlie", "Alice", "Bob", "Dave"};
auto sorted_names = df::sort(names);
// sorted_names: {"Alice", "Bob", "Charlie", "Dave"}
Custom Comparators
// Sort by absolute value
df::Serie<double> numbers{-5.0, 3.0, -1.0, 4.0, -2.0};
auto abs_sorted = df::sort(numbers, [](double a, double b) {
    return std::abs(a) < std::abs(b);
});
// abs_sorted: {-1.0, -2.0, 3.0, 4.0, -5.0}

// Case-insensitive string sorting
df::Serie<std::string> words{"Apple", "banana", "Cat", "dog"};
auto case_insensitive = df::sort(words, [](const std::string& a, const std::string& b) {
    std::string a_lower = a, b_lower = b;
    std::transform(a.begin(), a.end(), a_lower.begin(), ::tolower);
    std::transform(b.begin(), b.end(), b_lower.begin(), ::tolower);
    return a_lower < b_lower;
});
// case_insensitive: {"Apple", "banana", "Cat", "dog"}
Sorting Complex Types with Key Functions
// Define a Person struct
struct Person {
    std::string name;
    int age;
    double salary;
};

// Create a Serie of Person objects
df::Serie<Person> people{
    {"Alice", 32, 75000.0},
    {"Bob", 25, 65000.0},
    {"Charlie", 45, 85000.0},
    {"Dave", 32, 72000.0}
};

// Sort by age (ascending)
auto by_age = df::sort_by(people, [](const Person& p) { return p.age; });
// Result: Bob (25), Alice/Dave (32), Charlie (45)

// Sort by name (descending)
auto by_name = df::sort_by(people, 
                          [](const Person& p) { return p.name; },
                          df::SortOrder::DESCENDING);
// Result: Dave, Charlie, Bob, Alice

// Sort by salary (ascending)
auto by_salary = df::sort_by(people, [](const Person& p) { return p.salary; });
// Result: Bob, Dave, Alice, Charlie
Handling NaN Values
// Create a Serie with NaN values
df::Serie<double> data{5.0, std::nan(""), 3.0, std::nan(""), 1.0, 4.0};

// Sort with NaN values at the end (default)
auto sorted1 = df::sort_nan(data);
// sorted1: {1.0, 3.0, 4.0, 5.0, NaN, NaN}

// Sort with NaN values at the beginning
auto sorted2 = df::sort_nan(data, df::SortOrder::ASCENDING, true);
// sorted2: {NaN, NaN, 1.0, 3.0, 4.0, 5.0}

// Sort descending with NaN values at the end
auto sorted3 = df::sort_nan(data, df::SortOrder::DESCENDING);
// sorted3: {5.0, 4.0, 3.0, 1.0, NaN, NaN}
Pipeline Usage
// Create a Serie
df::Serie<double> values{5.2, 1.7, 3.9, 2.1, 4.4};

// Use in pipeline with bind_sort
auto sorted_squared = values
    | df::bind_map([](double x, size_t) { return x * x; })  // Square values
    | df::bind_sort<double>(df::SortOrder::ASCENDING);     // Sort squared values
// sorted_squared: {2.89, 4.41, 15.21, 19.36, 27.04}

// Use with custom comparator in pipeline
auto abs_sorted = df::Serie<double>{-5.0, 3.0, -1.0, 4.0, -2.0}
    | df::bind_sort_with([](double a, double b) {
        return std::abs(a) < std::abs(b);
      });
// abs_sorted: {-1.0, -2.0, 3.0, 4.0, -5.0}

// Complex pipeline with key function
auto result = people
    | df::bind_sort_by([](const Person& p) { return p.age; })  // Sort by age
    | df::bind_map([](const Person& p, size_t) {               // Extract names
        return p.name;
      });
// result: {"Bob", "Alice", "Dave", "Charlie"}
Parallel Sorting
// Create a large Serie
df::Serie<int> large_serie = df::random_uniform<int>(100000, 0, 1000000);

// Sort using parallel execution policy
auto sorted_parallel = df::sort(large_serie, df::SortOrder::ASCENDING, df::ExecutionPolicy::PAR);

// Parallel sort with custom comparator
auto parallel_custom = df::sort(large_serie, 
                               [](int a, int b) { return a % 10 < b % 10; },  // Sort by last digit
                               df::ExecutionPolicy::PAR);

// Parallel sort by key in pipeline
auto result = large_serie
    | df::bind_sort_by([](int x) { return x % 100; },  // Sort by last two digits
                      df::SortOrder::ASCENDING,
                      df::ExecutionPolicy::PAR);

Implementation Notes

  • All sort operations create a new Serie rather than modifying the input Serie in place.
  • The sort algorithm has an average case complexity of O(n log n), where n is the size of the Serie.
  • For small Series, parallel sorting may actually be slower due to overhead. Consider using sequential execution for small datasets.
  • NaN handling is particularly useful for floating-point data with missing values.
  • The sort implementation relies on std::sort internally and respects the C++ standard library guarantees.
  • The parallel execution policy is only effective when the library is compiled with parallel algorithms support.

Related Functions