Back
unzip
Overview
The unzip function splits a Serie of tuples into multiple individual Series, with each Serie containing the values from a specific component of the tuples. This is the reverse operation of the zip function and is useful for decomposing combined data back into separate Series for individual processing.
Function Signatures
// Main unzip function
template <typename Tuple>
auto unzip(const Serie<Tuple> &serie);
Parameters
| Parameter | Type | Description |
|---|---|---|
| serie | const Serie<Tuple>& | A Serie of tuples to be split into separate Series. The Tuple type is typically
std::tuple<T1, T2, ...> where T1, T2, etc. are the types of the tuple
components.
|
Return Value
A std::tuple of Series, where each Serie contains the values from one component of the
input
tuples. For a Serie of std::tuple<T1, T2, ...>, the return type will be
std::tuple<Serie<T1>, Serie<T2>, ...>.
Example Usage
Basic Unzipping of a Serie of Tuples
// Create a Serie of tuples (person data: name, age)
using PersonTuple = std::tuple<std::string, int>;
df::Serie<PersonTuple> people{
{"Alice", 25},
{"Bob", 32},
{"Charlie", 28},
{"Diana", 41}
};
// Unzip into separate Series
auto unzipped = df::unzip(people);
// Access individual Series from the tuple
auto names = std::get<0>(unzipped); // Serie<std::string> containing names
auto ages = std::get<1>(unzipped); // Serie<int> containing ages
// Now we can work with each Serie separately
std::cout << "Names: ";
names.forEach([](const std::string& name, size_t) {
std::cout << name << " ";
});
// Output: Names: Alice Bob Charlie Diana
std::cout << "\nAges: ";
ages.forEach([](int age, size_t) {
std::cout << age << " ";
});
// Output: Ages: 25 32 28 41
Unzipping Results of map Operations
// Create Series for original data
df::Serie<double> values{10.5, 20.3, 15.7, 8.2, 30.0};
// Map to create derived data as tuples
auto calculated = values.map([](double value, size_t) {
return std::make_tuple(
value, // Original value
value * 2, // Doubled value
std::sqrt(value), // Square root
value > 15.0 // Threshold check
);
});
// calculated is a Serie<std::tuple<double, double, double, bool>>
// Unzip to separate individual results
auto [originals, doubled, roots, above_threshold] = df::unzip(calculated);
// Now we can work with each derived Serie
auto sum_doubled = doubled.reduce([](double acc, double val, size_t) {
return acc + val;
}, 0.0);
// sum_doubled = 169.4
auto valid_roots = df::zip(roots, above_threshold)
.filter([](const auto& tuple, size_t) {
auto [_, is_valid] = tuple;
return is_valid;
})
.map([](const auto& tuple, size_t) {
auto [root_value, _] = tuple;
return root_value;
});
// valid_roots contains the square roots of values > 15.0
Processing Time Series Data
// Create a Serie of time-value pairs (simulating recorded data)
using TimeValuePair = std::tuple<double, double>;
df::Serie<TimeValuePair> recordings{
{0.0, 20.5}, // time (s), temperature (°C)
{1.0, 21.0},
{2.0, 21.5},
{3.0, 21.3},
{4.0, 20.8},
{5.0, 20.6}
};
// Unzip into separate time and value Series
auto [times, temperatures] = df::unzip(recordings);
// Calculate statistics on just the temperature values
double avg_temp = temperatures.reduce([](double acc, double temp, size_t idx, const auto& serie) {
return acc + temp / serie.size();
}, 0.0);
// avg_temp ≈ 20.95
double max_temp = temperatures.reduce([](double max_so_far, double temp, size_t) {
return std::max(max_so_far, temp);
}, std::numeric_limits<double>::lowest());
// max_temp = 21.5
// Find timestamps where temperature exceeded a threshold
auto high_temp_times = df::zip(times, temperatures)
.filter([](const auto& tuple, size_t) {
auto [_, temp] = tuple;
return temp > 21.0;
})
.map([](const auto& tuple, size_t) {
auto [time, _] = tuple;
return time;
});
// high_temp_times = {2.0, 3.0}
Combining zip and unzip Operations
// Create three coordinate Series for 3D points
df::Serie<double> x{1.0, 2.0, 3.0, 4.0, 5.0};
df::Serie<double> y{1.5, 2.5, 3.5, 4.5, 5.5};
df::Serie<double> z{2.0, 3.0, 4.0, 5.0, 6.0};
// Zip coordinates into 3D points
auto points_3d = df::zip(x, y, z);
// points_3d is a Serie<std::tuple<double, double, double>>
// Project 3D points to 2D by dropping the z-coordinate
auto points_2d = points_3d.map([](const auto& point, size_t) {
auto [x, y, _] = point; // Ignore z-coordinate
return std::make_tuple(x, y);
});
// points_2d is a Serie<std::tuple<double, double>>
// Unzip 2D points back to separate coordinate Series
auto [new_x, new_y] = df::unzip(points_2d);
// Compute distances from origin in 2D
auto distances_2d = df::zip(new_x, new_y).map([](const auto& point, size_t) {
auto [x, y] = point;
return std::sqrt(x*x + y*y);
});
// distances_2d contains the Euclidean distances of the 2D points from origin
// Transform and regroup data
auto transformed_points = df::zip(x, y, z).map([](const auto& point, size_t) {
auto [x, y, z] = point;
// Scale x, rotate y and z
double new_x = x * 2.0;
double new_y = y * std::cos(0.5) - z * std::sin(0.5);
double new_z = y * std::sin(0.5) + z * std::cos(0.5);
return std::make_tuple(new_x, new_y, new_z);
});
// Extract new coordinates
auto [scaled_x, rotated_y, rotated_z] = df::unzip(transformed_points);
Working with Structures Using Derived Tuples
// Define a data structure
struct Measurement {
double timestamp;
double temperature;
double pressure;
double humidity;
bool is_valid;
};
// Create a Serie of measurements
df::Serie<Measurement> measurements{
{0.0, 25.3, 1013.2, 65.4, true},
{1.0, 25.7, 1012.8, 66.0, true},
{2.0, 26.1, 1012.5, 67.2, true},
{3.0, 26.5, 1012.0, 68.1, false}, // Invalid measurement
{4.0, 26.8, 1011.6, 68.5, true}
};
// Convert structures to tuples for unzipping
auto measurement_tuples = measurements.map([](const Measurement& m, size_t) {
return std::make_tuple(m.timestamp, m.temperature, m.pressure, m.humidity, m.is_valid);
});
// Unzip into separate Series
auto [timestamps, temperatures, pressures, humidities, validity_flags] = df::unzip(measurement_tuples);
// Filter out invalid measurements while preserving data relationships
auto valid_data = df::zip(timestamps, temperatures, pressures, humidities, validity_flags)
.filter([](const auto& tuple, size_t) {
return std::get<4>(tuple); // Check validity flag
})
.map([](const auto& tuple, size_t) {
auto [t, temp, press, humid, _] = tuple;
return std::make_tuple(t, temp, press, humid); // Exclude validity flag
});
// Calculate temperature/humidity correlation for valid measurements
auto [valid_times, valid_temps, valid_pressures, valid_humidities] = df::unzip(valid_data);
// Process each measurement type independently
auto temp_derivatives = calculateDerivatives(valid_times, valid_temps);
auto pressure_derivatives = calculateDerivatives(valid_times, valid_pressures);
auto humidity_derivatives = calculateDerivatives(valid_times, valid_humidities);
// Where calculateDerivatives is a function like:
template <typename T>
df::Serie<double> calculateDerivatives(const df::Serie<double>& times, const df::Serie<T>& values) {
return df::zip(times, values).map([](auto tuple, size_t idx, const auto& series) {
if (idx == 0) return 0.0; // No derivative for first point
auto [current_time, current_value] = tuple;
auto [prev_time, prev_value] = series[idx-1];
return (current_value - prev_value) / (current_time - prev_time);
});
}
Implementation Notes
- The unzip function creates new Series without modifying the original Serie of tuples.
- The number of Series returned by unzip matches the number of components in each tuple.
- All tuples in the input Serie must have the same structure (same number and types of elements).
- The unzip function is the inverse of the zip function and is often used to separate combined data.
- When using the unzip function's return value, structured binding (C++17 feature) is recommended for clearer code.
- The tuple returned by unzip is a value, not a reference. Each Serie in the tuple is independent.
- The implementation uses template metaprogramming to handle tuples of any size efficiently.
- There is no bound version (bind_unzip) since unzip is typically used as a direct function call.