Back
flatMap
Overview
The flatMap function applies a transformation to each element in a Serie, where the transformation returns a Serie for each element, and then flattens all those Series into a single Serie. It's essentially a combination of map followed by a flatten operation.
Function Signatures
// With index parameter
template <typename T, typename R>
Serie<R> flatMap(const Serie<T>& serie,
std::function<Serie<R>(const T&, size_t)> callback);
// Without index parameter
template <typename T, typename R>
Serie<R> flatMap(const Serie<T>& serie,
std::function<Serie<R>(const T&)> callback);
// Member function version
template <typename T>
template <typename R>
Serie<R> Serie<T>::flatMap(std::function<Serie<R>(const T&, size_t)> callback) const;
// Bound version for pipeline operations with index
template <typename T, typename R>
auto bind_flatMap(std::function<Serie<R>(const T&, size_t)> callback);
// Bound version for pipeline operations without index
template <typename T, typename R>
auto bind_flatMap(std::function<Serie<R>(const T&)> callback);
Parameters
| Parameter | Type | Description |
|---|---|---|
| serie | const Serie<T>& | The input Serie to process. |
| callback | std::function<Serie<R>(const T&, size_t)> | Function to apply to each element, returning a Serie<R>. The function receives the element value and optionally its index. |
Return Value
A new Serie of type R containing all elements from the Series returned by the callback function, flattened into a single Serie.
Example Usage
Basic Example: Expanding Strings to Characters
#include <dataframe/Serie.h>
#include <dataframe/flatMap.h>
#include <iostream>
#include <string>
#include <vector>
int main() {
// Create a Serie of strings
df::Serie<std::string> words{"hello", "world"};
// Use flatMap to split each string into characters
auto characters = df::flatMap<std::string, char>(words, [](const std::string& word, size_t) {
std::vector<char> chars(word.begin(), word.end());
return df::Serie<char>(chars);
});
// Print the result
std::cout << "Original words: " << words << std::endl;
std::cout << "Flattened characters: " << characters << std::endl;
return 0;
}
// Output:
// Original words: [hello, world]
// Flattened characters: [h, e, l, l, o, w, o, r, l, d]
Complex Example: Generating Multiple Elements per Input
#include <dataframe/Serie.h>
#include <dataframe/flatMap.h>
#include <iostream>
#include <vector>
int main() {
// Create a Serie of numbers
df::Serie<int> numbers{1, 2, 3};
// Use flatMap to repeat each number according to its value
auto repeated = df::flatMap<int, int>(numbers, [](int n) {
std::vector<int> repeats(n, n);
return df::Serie<int>(repeats);
});
// Print the result
std::cout << "Original numbers: " << numbers << std::endl;
std::cout << "Repeated values: " << repeated << std::endl;
return 0;
}
// Output:
// Original numbers: [1, 2, 3]
// Repeated values: [1, 2, 2, 3, 3, 3]
Pipeline Example: Sentence Splitting
#include <dataframe/Serie.h>
#include <dataframe/flatMap.h>
#include <dataframe/pipe.h>
#include <iostream>
#include <string>
#include <vector>
#include <sstream>
// Split a sentence into words
std::vector<std::string> splitSentence(const std::string& sentence) {
std::vector<std::string> words;
std::istringstream iss(sentence);
std::string word;
while (iss >> word) {
words.push_back(word);
}
return words;
}
int main() {
// Create a Serie of sentences
df::Serie<std::string> sentences{
"Hello world",
"This is a test",
"DataFrame library is awesome"
};
// Create a pipeline to:
// 1. Split each sentence into words
// 2. Filter out short words (less than 4 characters)
auto long_words = sentences
| df::bind_flatMap<std::string, std::string>([](const std::string& sentence) {
return df::Serie<std::string>(splitSentence(sentence));
})
| df::bind_filter<std::string>([](const std::string& word) {
return word.length() >= 4;
});
// Print the result
std::cout << "Original sentences: " << sentences << std::endl;
std::cout << "Long words: " << long_words << std::endl;
return 0;
}
// Output:
// Original sentences: [Hello world, This is a test, DataFrame library is awesome]
// Long words: [Hello, world, This, test, DataFrame, library, awesome]
Implementation Notes
- The
flatMapfunction applies a transformation to each element and concatenates all the resulting Series. - Unlike
map, which produces a one-to-one mapping,flatMapallows for one-to-many mappings. - If any of the Series returned by the callback is empty, no elements will be added to the result for that input element.
- The callback function can return Series of different sizes for different input elements.
- The function preserves the relative order of elements: all elements from the first input element come first, followed by elements from the second input element, and so on.
Common Use Cases
- String Tokenization: Splitting strings into words or characters.
- Exploding Nested Data: Flattening collections of collections into a single collection.
- Data Expansion: Generating multiple output elements for each input element.
- Many-to-Many Transformations: When each input element maps to a variable number of output elements.
- Path Expansion: Generating all possible paths from a tree or graph structure.