Back
sort
Overview
The sort function creates an ordered copy of a Serie based on various sorting criteria. It supports both ascending and descending order, custom comparator functions, sorting by a key function, and special handling for NaN values. The function is designed with immutability in mind, returning a new Serie instead of modifying the original.
Function Signatures
// Basic sort with order control
template <typename T>
Serie<T> sort(const Serie<T>& serie,
SortOrder order = SortOrder::ASCENDING,
ExecutionPolicy exec = ExecutionPolicy::SEQ);
// Sort with custom comparator
template <typename T, typename Compare>
Serie<T> sort(const Serie<T>& serie,
Compare comp,
ExecutionPolicy exec = ExecutionPolicy::SEQ);
// Sort by key function
template <typename T, typename KeyFunc>
Serie<T> sort_by(const Serie<T>& serie,
KeyFunc key_func,
SortOrder order = SortOrder::ASCENDING,
ExecutionPolicy exec = ExecutionPolicy::SEQ);
// Sort with NaN handling
template <typename T>
Serie<T> sort_nan(const Serie<T>& serie,
SortOrder order = SortOrder::ASCENDING,
bool nan_first = false,
ExecutionPolicy exec = ExecutionPolicy::SEQ);
// Pipeline binding functions
template <typename T>
auto bind_sort(SortOrder order = SortOrder::ASCENDING,
ExecutionPolicy exec = ExecutionPolicy::SEQ);
template <typename Compare>
auto bind_sort_with(Compare comp,
ExecutionPolicy exec = ExecutionPolicy::SEQ);
template <typename KeyFunc>
auto bind_sort_by(KeyFunc key_func,
SortOrder order = SortOrder::ASCENDING,
ExecutionPolicy exec = ExecutionPolicy::SEQ);
Parameters
| Parameter | Type | Description |
|---|---|---|
| serie | const Serie<T>& | The input Serie to sort. |
| order | SortOrder | The sort direction: ASCENDING or DESCENDING. Default is ASCENDING. |
| comp | Compare | Custom comparison function that should return true if the first element is less than the second. |
| key_func | KeyFunc | Function that extracts a key for comparison from each element. |
| nan_first | bool | If true, NaN values will be placed at the beginning; if false, at the end. Default is false. |
| exec | ExecutionPolicy | Execution policy for parallel algorithms (SEQ, PAR). Default is SEQ (sequential). |
Return Value
All sort functions return a new Serie containing the same elements as the input Serie, but in sorted order. The returned Serie has the same type as the input Serie.
Example Usage
Basic Sorting
// Create a Serie with unsorted values
df::Serie<double> values{5.2, 1.7, 3.9, 2.1, 4.4};
// Sort in ascending order (default)
auto asc_values = df::sort(values);
// asc_values: {1.7, 2.1, 3.9, 4.4, 5.2}
// Sort in descending order
auto desc_values = df::sort(values, df::SortOrder::DESCENDING);
// desc_values: {5.2, 4.4, 3.9, 2.1, 1.7}
// Sort strings (lexicographically)
df::Serie<std::string> names{"Charlie", "Alice", "Bob", "Dave"};
auto sorted_names = df::sort(names);
// sorted_names: {"Alice", "Bob", "Charlie", "Dave"}
Custom Comparators
// Sort by absolute value
df::Serie<double> numbers{-5.0, 3.0, -1.0, 4.0, -2.0};
auto abs_sorted = df::sort(numbers, [](double a, double b) {
return std::abs(a) < std::abs(b);
});
// abs_sorted: {-1.0, -2.0, 3.0, 4.0, -5.0}
// Case-insensitive string sorting
df::Serie<std::string> words{"Apple", "banana", "Cat", "dog"};
auto case_insensitive = df::sort(words, [](const std::string& a, const std::string& b) {
std::string a_lower = a, b_lower = b;
std::transform(a.begin(), a.end(), a_lower.begin(), ::tolower);
std::transform(b.begin(), b.end(), b_lower.begin(), ::tolower);
return a_lower < b_lower;
});
// case_insensitive: {"Apple", "banana", "Cat", "dog"}
Sorting Complex Types with Key Functions
// Define a Person struct
struct Person {
std::string name;
int age;
double salary;
};
// Create a Serie of Person objects
df::Serie<Person> people{
{"Alice", 32, 75000.0},
{"Bob", 25, 65000.0},
{"Charlie", 45, 85000.0},
{"Dave", 32, 72000.0}
};
// Sort by age (ascending)
auto by_age = df::sort_by(people, [](const Person& p) { return p.age; });
// Result: Bob (25), Alice/Dave (32), Charlie (45)
// Sort by name (descending)
auto by_name = df::sort_by(people,
[](const Person& p) { return p.name; },
df::SortOrder::DESCENDING);
// Result: Dave, Charlie, Bob, Alice
// Sort by salary (ascending)
auto by_salary = df::sort_by(people, [](const Person& p) { return p.salary; });
// Result: Bob, Dave, Alice, Charlie
Handling NaN Values
// Create a Serie with NaN values
df::Serie<double> data{5.0, std::nan(""), 3.0, std::nan(""), 1.0, 4.0};
// Sort with NaN values at the end (default)
auto sorted1 = df::sort_nan(data);
// sorted1: {1.0, 3.0, 4.0, 5.0, NaN, NaN}
// Sort with NaN values at the beginning
auto sorted2 = df::sort_nan(data, df::SortOrder::ASCENDING, true);
// sorted2: {NaN, NaN, 1.0, 3.0, 4.0, 5.0}
// Sort descending with NaN values at the end
auto sorted3 = df::sort_nan(data, df::SortOrder::DESCENDING);
// sorted3: {5.0, 4.0, 3.0, 1.0, NaN, NaN}
Pipeline Usage
// Create a Serie
df::Serie<double> values{5.2, 1.7, 3.9, 2.1, 4.4};
// Use in pipeline with bind_sort
auto sorted_squared = values
| df::bind_map([](double x, size_t) { return x * x; }) // Square values
| df::bind_sort<double>(df::SortOrder::ASCENDING); // Sort squared values
// sorted_squared: {2.89, 4.41, 15.21, 19.36, 27.04}
// Use with custom comparator in pipeline
auto abs_sorted = df::Serie<double>{-5.0, 3.0, -1.0, 4.0, -2.0}
| df::bind_sort_with([](double a, double b) {
return std::abs(a) < std::abs(b);
});
// abs_sorted: {-1.0, -2.0, 3.0, 4.0, -5.0}
// Complex pipeline with key function
auto result = people
| df::bind_sort_by([](const Person& p) { return p.age; }) // Sort by age
| df::bind_map([](const Person& p, size_t) { // Extract names
return p.name;
});
// result: {"Bob", "Alice", "Dave", "Charlie"}
Parallel Sorting
// Create a large Serie
df::Serie<int> large_serie = df::random_uniform<int>(100000, 0, 1000000);
// Sort using parallel execution policy
auto sorted_parallel = df::sort(large_serie, df::SortOrder::ASCENDING, df::ExecutionPolicy::PAR);
// Parallel sort with custom comparator
auto parallel_custom = df::sort(large_serie,
[](int a, int b) { return a % 10 < b % 10; }, // Sort by last digit
df::ExecutionPolicy::PAR);
// Parallel sort by key in pipeline
auto result = large_serie
| df::bind_sort_by([](int x) { return x % 100; }, // Sort by last two digits
df::SortOrder::ASCENDING,
df::ExecutionPolicy::PAR);
Implementation Notes
- All sort operations create a new Serie rather than modifying the input Serie in place.
- The sort algorithm has an average case complexity of O(n log n), where n is the size of the Serie.
- For small Series, parallel sorting may actually be slower due to overhead. Consider using sequential execution for small datasets.
- NaN handling is particularly useful for floating-point data with missing values.
- The sort implementation relies on std::sort internally and respects the C++ standard library guarantees.
- The parallel execution policy is only effective when the library is compiled with parallel algorithms support.