Back

flatMap

Overview

The flatMap function applies a transformation to each element in a Serie, where the transformation returns a Serie for each element, and then flattens all those Series into a single Serie. It's essentially a combination of map followed by a flatten operation.

Function Signatures


// With index parameter
template <typename T, typename R>
Serie<R> flatMap(const Serie<T>& serie, 
                 std::function<Serie<R>(const T&, size_t)> callback);

// Without index parameter
template <typename T, typename R>
Serie<R> flatMap(const Serie<T>& serie, 
                 std::function<Serie<R>(const T&)> callback);

// Member function version
template <typename T>
template <typename R>
Serie<R> Serie<T>::flatMap(std::function<Serie<R>(const T&, size_t)> callback) const;

// Bound version for pipeline operations with index
template <typename T, typename R>
auto bind_flatMap(std::function<Serie<R>(const T&, size_t)> callback);

// Bound version for pipeline operations without index
template <typename T, typename R>
auto bind_flatMap(std::function<Serie<R>(const T&)> callback);
        

Parameters

Parameter Type Description
serie const Serie<T>& The input Serie to process.
callback std::function<Serie<R>(const T&, size_t)> Function to apply to each element, returning a Serie<R>. The function receives the element value and optionally its index.

Return Value

A new Serie of type R containing all elements from the Series returned by the callback function, flattened into a single Serie.

Example Usage

Basic Example: Expanding Strings to Characters

#include <dataframe/Serie.h>
#include <dataframe/flatMap.h>
#include <iostream>
#include <string>
#include <vector>

int main() {
    // Create a Serie of strings
    df::Serie<std::string> words{"hello", "world"};
    
    // Use flatMap to split each string into characters
    auto characters = df::flatMap<std::string, char>(words, [](const std::string& word, size_t) {
        std::vector<char> chars(word.begin(), word.end());
        return df::Serie<char>(chars);
    });
    
    // Print the result
    std::cout << "Original words: " << words << std::endl;
    std::cout << "Flattened characters: " << characters << std::endl;
    
    return 0;
}

// Output:
// Original words: [hello, world]
// Flattened characters: [h, e, l, l, o, w, o, r, l, d]
Complex Example: Generating Multiple Elements per Input

#include <dataframe/Serie.h>
#include <dataframe/flatMap.h>
#include <iostream>
#include <vector>

int main() {
    // Create a Serie of numbers
    df::Serie<int> numbers{1, 2, 3};
    
    // Use flatMap to repeat each number according to its value
    auto repeated = df::flatMap<int, int>(numbers, [](int n) {
        std::vector<int> repeats(n, n);
        return df::Serie<int>(repeats);
    });
    
    // Print the result
    std::cout << "Original numbers: " << numbers << std::endl;
    std::cout << "Repeated values: " << repeated << std::endl;
    
    return 0;
}

// Output:
// Original numbers: [1, 2, 3]
// Repeated values: [1, 2, 2, 3, 3, 3]
Pipeline Example: Sentence Splitting

#include <dataframe/Serie.h>
#include <dataframe/flatMap.h>
#include <dataframe/pipe.h>
#include <iostream>
#include <string>
#include <vector>
#include <sstream>

// Split a sentence into words
std::vector<std::string> splitSentence(const std::string& sentence) {
    std::vector<std::string> words;
    std::istringstream iss(sentence);
    std::string word;
    while (iss >> word) {
        words.push_back(word);
    }
    return words;
}

int main() {
    // Create a Serie of sentences
    df::Serie<std::string> sentences{
        "Hello world",
        "This is a test",
        "DataFrame library is awesome"
    };
    
    // Create a pipeline to:
    // 1. Split each sentence into words
    // 2. Filter out short words (less than 4 characters)
    auto long_words = sentences
        | df::bind_flatMap<std::string, std::string>([](const std::string& sentence) {
            return df::Serie<std::string>(splitSentence(sentence));
        })
        | df::bind_filter<std::string>([](const std::string& word) {
            return word.length() >= 4;
        });
    
    // Print the result
    std::cout << "Original sentences: " << sentences << std::endl;
    std::cout << "Long words: " << long_words << std::endl;
    
    return 0;
}

// Output:
// Original sentences: [Hello world, This is a test, DataFrame library is awesome]
// Long words: [Hello, world, This, test, DataFrame, library, awesome]

Implementation Notes

  • The flatMap function applies a transformation to each element and concatenates all the resulting Series.
  • Unlike map, which produces a one-to-one mapping, flatMap allows for one-to-many mappings.
  • If any of the Series returned by the callback is empty, no elements will be added to the result for that input element.
  • The callback function can return Series of different sizes for different input elements.
  • The function preserves the relative order of elements: all elements from the first input element come first, followed by elements from the second input element, and so on.

Common Use Cases

  • String Tokenization: Splitting strings into words or characters.
  • Exploding Nested Data: Flattening collections of collections into a single collection.
  • Data Expansion: Generating multiple output elements for each input element.
  • Many-to-Many Transformations: When each input element maps to a variable number of output elements.
  • Path Expansion: Generating all possible paths from a tree or graph structure.

Related Functions