Back

unzip

Overview

The unzip function splits a Serie of tuples into multiple individual Series, with each Serie containing the values from a specific component of the tuples. This is the reverse operation of the zip function and is useful for decomposing combined data back into separate Series for individual processing.

Function Signatures


          // Main unzip function
          template <typename Tuple> 
          auto unzip(const Serie<Tuple> &serie);
        

Parameters

Parameter Type Description
serie const Serie<Tuple>& A Serie of tuples to be split into separate Series. The Tuple type is typically std::tuple<T1, T2, ...> where T1, T2, etc. are the types of the tuple components.

Return Value

A std::tuple of Series, where each Serie contains the values from one component of the input tuples. For a Serie of std::tuple<T1, T2, ...>, the return type will be std::tuple<Serie<T1>, Serie<T2>, ...>.

Example Usage

Basic Unzipping of a Serie of Tuples

// Create a Serie of tuples (person data: name, age)
using PersonTuple = std::tuple<std::string, int>;
df::Serie<PersonTuple> people{
    {"Alice", 25},
    {"Bob", 32},
    {"Charlie", 28},
    {"Diana", 41}
};

// Unzip into separate Series
auto unzipped = df::unzip(people);

// Access individual Series from the tuple
auto names = std::get<0>(unzipped);  // Serie<std::string> containing names
auto ages = std::get<1>(unzipped);   // Serie<int> containing ages

// Now we can work with each Serie separately
std::cout << "Names: ";
names.forEach([](const std::string& name, size_t) {
    std::cout << name << " ";
});
// Output: Names: Alice Bob Charlie Diana

std::cout << "\nAges: ";
ages.forEach([](int age, size_t) {
    std::cout << age << " ";
});
// Output: Ages: 25 32 28 41
Unzipping Results of map Operations

// Create Series for original data
df::Serie<double> values{10.5, 20.3, 15.7, 8.2, 30.0};

// Map to create derived data as tuples
auto calculated = values.map([](double value, size_t) {
    return std::make_tuple(
        value,                  // Original value
        value * 2,              // Doubled value
        std::sqrt(value),       // Square root
        value > 15.0            // Threshold check
    );
});
// calculated is a Serie<std::tuple<double, double, double, bool>>

// Unzip to separate individual results
auto [originals, doubled, roots, above_threshold] = df::unzip(calculated);

// Now we can work with each derived Serie
auto sum_doubled = doubled.reduce([](double acc, double val, size_t) {
    return acc + val;
}, 0.0);
// sum_doubled = 169.4

auto valid_roots = df::zip(roots, above_threshold)
    .filter([](const auto& tuple, size_t) {
        auto [_, is_valid] = tuple;
        return is_valid;
    })
    .map([](const auto& tuple, size_t) {
        auto [root_value, _] = tuple;
        return root_value;
    });
// valid_roots contains the square roots of values > 15.0
Processing Time Series Data

// Create a Serie of time-value pairs (simulating recorded data)
using TimeValuePair = std::tuple<double, double>;
df::Serie<TimeValuePair> recordings{
    {0.0, 20.5},   // time (s), temperature (°C)
    {1.0, 21.0},
    {2.0, 21.5},
    {3.0, 21.3},
    {4.0, 20.8},
    {5.0, 20.6}
};

// Unzip into separate time and value Series
auto [times, temperatures] = df::unzip(recordings);

// Calculate statistics on just the temperature values
double avg_temp = temperatures.reduce([](double acc, double temp, size_t idx, const auto& serie) {
    return acc + temp / serie.size();
}, 0.0);
// avg_temp ≈ 20.95

double max_temp = temperatures.reduce([](double max_so_far, double temp, size_t) {
    return std::max(max_so_far, temp);
}, std::numeric_limits<double>::lowest());
// max_temp = 21.5

// Find timestamps where temperature exceeded a threshold
auto high_temp_times = df::zip(times, temperatures)
    .filter([](const auto& tuple, size_t) {
        auto [_, temp] = tuple;
        return temp > 21.0;
    })
    .map([](const auto& tuple, size_t) {
        auto [time, _] = tuple;
        return time;
    });
// high_temp_times = {2.0, 3.0}
Combining zip and unzip Operations

// Create three coordinate Series for 3D points
df::Serie<double> x{1.0, 2.0, 3.0, 4.0, 5.0};
df::Serie<double> y{1.5, 2.5, 3.5, 4.5, 5.5};
df::Serie<double> z{2.0, 3.0, 4.0, 5.0, 6.0};

// Zip coordinates into 3D points
auto points_3d = df::zip(x, y, z);
// points_3d is a Serie<std::tuple<double, double, double>>

// Project 3D points to 2D by dropping the z-coordinate
auto points_2d = points_3d.map([](const auto& point, size_t) {
    auto [x, y, _] = point;  // Ignore z-coordinate
    return std::make_tuple(x, y);
});
// points_2d is a Serie<std::tuple<double, double>>

// Unzip 2D points back to separate coordinate Series
auto [new_x, new_y] = df::unzip(points_2d);

// Compute distances from origin in 2D
auto distances_2d = df::zip(new_x, new_y).map([](const auto& point, size_t) {
    auto [x, y] = point;
    return std::sqrt(x*x + y*y);
});
// distances_2d contains the Euclidean distances of the 2D points from origin

// Transform and regroup data
auto transformed_points = df::zip(x, y, z).map([](const auto& point, size_t) {
    auto [x, y, z] = point;
    // Scale x, rotate y and z
    double new_x = x * 2.0;
    double new_y = y * std::cos(0.5) - z * std::sin(0.5);
    double new_z = y * std::sin(0.5) + z * std::cos(0.5);
    return std::make_tuple(new_x, new_y, new_z);
});

// Extract new coordinates
auto [scaled_x, rotated_y, rotated_z] = df::unzip(transformed_points);
Working with Structures Using Derived Tuples

// Define a data structure
struct Measurement {
    double timestamp;
    double temperature;
    double pressure;
    double humidity;
    bool is_valid;
};

// Create a Serie of measurements
df::Serie<Measurement> measurements{
    {0.0, 25.3, 1013.2, 65.4, true},
    {1.0, 25.7, 1012.8, 66.0, true},
    {2.0, 26.1, 1012.5, 67.2, true},
    {3.0, 26.5, 1012.0, 68.1, false},  // Invalid measurement
    {4.0, 26.8, 1011.6, 68.5, true}
};

// Convert structures to tuples for unzipping
auto measurement_tuples = measurements.map([](const Measurement& m, size_t) {
    return std::make_tuple(m.timestamp, m.temperature, m.pressure, m.humidity, m.is_valid);
});

// Unzip into separate Series
auto [timestamps, temperatures, pressures, humidities, validity_flags] = df::unzip(measurement_tuples);

// Filter out invalid measurements while preserving data relationships
auto valid_data = df::zip(timestamps, temperatures, pressures, humidities, validity_flags)
    .filter([](const auto& tuple, size_t) {
        return std::get<4>(tuple);  // Check validity flag
    })
    .map([](const auto& tuple, size_t) {
        auto [t, temp, press, humid, _] = tuple;
        return std::make_tuple(t, temp, press, humid);  // Exclude validity flag
    });

// Calculate temperature/humidity correlation for valid measurements
auto [valid_times, valid_temps, valid_pressures, valid_humidities] = df::unzip(valid_data);

// Process each measurement type independently
auto temp_derivatives = calculateDerivatives(valid_times, valid_temps);
auto pressure_derivatives = calculateDerivatives(valid_times, valid_pressures);
auto humidity_derivatives = calculateDerivatives(valid_times, valid_humidities);

// Where calculateDerivatives is a function like:
template <typename T>
df::Serie<double> calculateDerivatives(const df::Serie<double>& times, const df::Serie<T>& values) {
    return df::zip(times, values).map([](auto tuple, size_t idx, const auto& series) {
        if (idx == 0) return 0.0;  // No derivative for first point
        auto [current_time, current_value] = tuple;
        auto [prev_time, prev_value] = series[idx-1];
        return (current_value - prev_value) / (current_time - prev_time);
    });
}

Implementation Notes

  • The unzip function creates new Series without modifying the original Serie of tuples.
  • The number of Series returned by unzip matches the number of components in each tuple.
  • All tuples in the input Serie must have the same structure (same number and types of elements).
  • The unzip function is the inverse of the zip function and is often used to separate combined data.
  • When using the unzip function's return value, structured binding (C++17 feature) is recommended for clearer code.
  • The tuple returned by unzip is a value, not a reference. Each Serie in the tuple is independent.
  • The implementation uses template metaprogramming to handle tuples of any size efficiently.
  • There is no bound version (bind_unzip) since unzip is typically used as a direct function call.

Related Functions