zip Function - DataFrame Library Documentation

Overview

The zip function combines multiple Series into a single Serie of tuples. Each tuple contains corresponding elements from the input Series at the same index position. This is useful for processing related data together or for combining separate but related Series into a unified structure that maintains the relationship between corresponding elements.

Function Signatures


          // Binary zip (two Series)
          template <typename T, typename U>
          auto zip(const Serie<T> &serie1, const Serie<U> &serie2);
          
          // Variadic zip (multiple Series)
          template <typename T, typename... Args>
          auto zip(const Serie<T> &first, const Args &...rest);

Parameters

Parameter	Type	Description
serie1, serie2	const Serie<T>&, const Serie<U>&	Two Series to zip together (binary version).
first, rest...	const Serie<T>&, const Args&...	Multiple Series to zip together (variadic version).

Return Value

A new Serie where each element is a tuple containing corresponding elements from all input Series. For the binary version, the return type is Serie<std::tuple<T, U>>. For the variadic version, the return type is Serie<std::tuple<T, Args::value_type...>>.

Example Usage

Basic Zipping of Two Series


// Create two Series
df::Serie<std::string> names{"Alice", "Bob", "Charlie"};
df::Serie<int> ages{25, 32, 28};

// Zip them together
auto people = df::zip(names, ages);
// people is a Serie<std::tuple<std::string, int>> containing:
// {{"Alice", 25}, {"Bob", 32}, {"Charlie", 28}}

// Access the tuples using structured binding
people.forEach([](const auto& tuple, size_t idx) {
    auto [name, age] = tuple;
    std::cout << "Person " << idx << ": " << name << ", " << age << " years old" << std::endl;
});

// Output:
// Person 0: Alice, 25 years old
// Person 1: Bob, 32 years old
// Person 2: Charlie, 28 years old

Zipping Multiple Series


// Create three Series representing different attributes
df::Serie<std::string> names{"Alice", "Bob", "Charlie", "Diana"};
df::Serie<int> ages{25, 32, 28, 41};
df::Serie<double> heights{165.5, 180.2, 175.0, 162.8};

// Zip all three Series together
auto people = df::zip(names, ages, heights);
// people is a Serie<std::tuple<std::string, int, double>>

// Process the combined data
people.forEach([](const auto& tuple, size_t) {
    auto [name, age, height] = tuple;
    std::cout << name << " is " << age << " years old and " 
              << height << " cm tall" << std::endl;
});

// Output:
// Alice is 25 years old and 165.5 cm tall
// Bob is 32 years old and 180.2 cm tall
// Charlie is 28 years old and 175 cm tall
// Diana is 41 years old and 162.8 cm tall

Processing Zipped Series with map


// Create coordinate Series
df::Serie<double> x_coords{1.0, 2.0, 3.0, 4.0, 5.0};
df::Serie<double> y_coords{4.0, 5.0, 6.0, 7.0, 8.0};

// Zip coordinates and calculate distances from origin
auto points = df::zip(x_coords, y_coords);
auto distances = points.map([](const auto& point, size_t) {
    auto [x, y] = point;
    return std::sqrt(x*x + y*y);  // Euclidean distance
});
// distances = {4.123, 5.385, 6.708, 8.062, 9.434}

// Calculate 2D vector operations
auto vector_sums = df::zip(x_coords, y_coords).map([](const auto& tuple, size_t) {
    auto [x, y] = tuple;
    return x + y;  // Sum of x and y components
});
// vector_sums = {5.0, 7.0, 9.0, 11.0, 13.0}

// Create projected coordinates
auto scaled_points = df::zip(x_coords, y_coords).map([](const auto& tuple, size_t) {
    auto [x, y] = tuple;
    return std::make_tuple(x * 2.0, y * 0.5);  // Scale x and y differently
});
// scaled_points is a Serie<std::tuple<double, double>> with scaled coordinates

Working with Different Types


// Combining different data types
df::Serie<std::string> categories{"A", "B", "C", "D"};
df::Serie<double> values{10.5, 20.3, 15.7, 8.2};
df::Serie<bool> flags{true, false, true, false};

// Zip diverse types together
auto mixed_data = df::zip(categories, values, flags);

// Process heterogeneous data
auto processed = mixed_data.map([](const auto& tuple, size_t) {
    auto [category, value, flag] = tuple;
    
    // Create a formatted string based on tuple contents
    std::string result = "Category " + category + ": ";
    
    if (flag) {
        result += "Priority item, value = " + std::to_string(value);
    } else {
        result += "Regular item, value = " + std::to_string(value);
    }
    
    return result;
});
// processed is a Serie<std::string> containing formatted descriptions

// Filter zipped data
auto filtered = mixed_data.filter([](const auto& tuple, size_t) {
    auto [_, value, flag] = tuple;  // Underscore for unused category
    return flag && value > 10.0;    // Select only priority items with high value
});
// filtered contains only elements where flag is true and value > 10.0

Time Series Example


// Create timestamp and measurement Series for a time series
df::Serie<double> timestamps{0.0, 1.0, 2.0, 3.0, 4.0, 5.0};  // In seconds
df::Serie<double> temperatures{20.5, 21.0, 21.5, 21.3, 20.8, 20.6};  // In Celsius

// Zip time series data
auto time_series = df::zip(timestamps, temperatures);

// Calculate temperature changes (derivative)
auto temperature_changes = time_series.map([](const auto& point, size_t idx, const auto& serie) {
    if (idx == 0) return 0.0;  // No previous point for the first element
    
    auto [current_time, current_temp] = point;
    auto [prev_time, prev_temp] = serie[idx-1];
    
    double time_delta = current_time - prev_time;
    double temp_delta = current_temp - prev_temp;
    
    return temp_delta / time_delta;  // Temperature change rate (°C/s)
});
// temperature_changes = {0.0, 0.5, 0.5, -0.2, -0.5, -0.2}

// Find periods of increasing temperature
auto increasing_periods = df::zip(timestamps, temperature_changes)
    .filter([](const auto& tuple, size_t) {
        auto [_, rate] = tuple;
        return rate > 0.0;
    });
// increasing_periods contains time points where temperature was increasing

Implementation Notes

All input Series must have the same size. If sizes differ, a runtime_error will be thrown.
The order of elements in the input Series is preserved in the output Serie of tuples.
The zip function is often used together with map to process related data points as a unit.
The element types of the input Series can be completely different.
To extract the individual Series back from a zipped Serie, use the unzip function.
Tuples created by zip are read-only; to modify values, you'll need to create new tuples with map.
When accessing tuple elements, you can use structured bindings (C++17 feature) for clearer code.

zip