Understanding Attribute Decomposition in DataFrame
A guide to extracting and managing components from complex data
Introduction
When working with scientific, engineering, or geospatial data, we often encounter complex data structures like vectors, tensors, or specialized objects. While these structures are perfect for calculations, there are times when we need to access their individual components for visualization, statistical analysis, or other component-specific operations.
The DataFrame library provides a powerful mechanism called Attribute Decomposition that allows you to:
- Access individual components of complex data types without losing the original structure
- Create different views or interpretations of the same data
- Perform transformations on decomposed data and reflect changes back to the original
- Simplify the handling of multi-dimensional or multi-attribute data
This tutorial will guide you through the attribute decomposition system in DataFrame, showing you how to use it effectively in your data analysis workflows.
What is Attribute Decomposition?
Attribute decomposition is the process of breaking down complex data types into simpler, more accessible components while maintaining the relationships between them. For example:
- A 3D vector can be decomposed into its x, y, and z components
- A tensor can be decomposed into its individual elements or principal components
- A color value might be decomposed into RGB or HSL components
- Geospatial coordinates might be decomposed into latitude, longitude, and elevation
The DataFrame library provides a flexible system for attribute decomposition through two main classes:
- Manager: Coordinates the decomposition process and provides a unified interface
- Decomposer: Implements the actual decomposition logic for specific data types
Additionally, the library comes with several built-in decomposers for common scenarios:
- Components: Extracts individual elements from vectors, matrices, and arrays
- Coordinates: Specialized decomposer for working with spatial coordinates
The Manager Class
The df::attributes::Manager class is the main interface for attribute
decomposition. It manages a collection of decomposers
and provides methods to extract and work with decomposed attributes.
Key Manager Methods
namespace df {
namespace attributes {
class Manager {
public:
// Constructor taking a dataframe to manage
explicit Manager(const Dataframe &df);
// Copy constructor
Manager(const Manager &other);
// Register a decomposer
void addDecomposer(const Decomposer &decomposer);
// Get all attribute names for a target dimension
std::vector getNames(DecompDimension targetDim) const;
// Get a specific Serie by name
template Serie getSerie(const std::string &name) const;
// Check if an attribute exists
bool hasAttribute(DecompDimension, const std::string &) const;
// Clear all decomposers
void clear();
// Get the number of registered decomposers
size_t decomposerCount() const;
};
// Helper function to create a Manager
template
Manager createManager(const std::vector &names,
const Serie &...series);
} // namespace attributes
} // namespace df
DecompDimension Enum
The DecompDimension enum defines the target dimension for decomposition:
// Mathematical decomposition dimension
enum class DecompDimension {
Scalar = 1, // Individual components (x, y, z, etc.)
Vector, // N-dimensional vectors
Matrix // N-dimensional matrices/tensors
};
This enum allows you to specify the dimensionality of the decomposed attributes you want to retrieve.
Decomposer Basics
The Decomposer class is the base class for all decomposers in the DataFrame
library. It defines the interface
that all concrete decomposers must implement.
class Decomposer {
public:
virtual ~Decomposer() = default;
// Create a clone of this decomposer
virtual std::unique_ptr clone() const = 0;
// Get the names of all decomposed attributes for a given serie
virtual Strings names(const Dataframe &dataframe, DecompDimension targetDim,
const SerieBase &serie, const String &name) const = 0;
// Get a specific decomposed serie
virtual Serie serie(const Dataframe &dataframe,
DecompDimension targetDim,
const std::string &name) const = 0;
protected:
// Helper methods for derived classes
template static size_t getComponentCount();
template static Serie extractComponent(const Serie &serie, size_t index);
};
To make it easier to create custom decomposers, DataFrame provides a generic decomposer template:
template class GenDecomposer : public Decomposer {
public:
std::unique_ptr clone() const override;
Strings names(const Dataframe &dataframe, DecompDimension targetDim,
const SerieBase &serie, const String &name) const override;
Serie serie(const Dataframe &dataframe, DecompDimension targetDim,
const std::string &name) const override;
};
This template handles the common boilerplate and delegates the actual implementation to the derived class.
Built-in Decomposers
The DataFrame library includes several built-in decomposers for common use cases:
Components Decomposer
The Components decomposer extracts individual elements from arrays, vectors,
and matrices. It's useful for working with geometric data, tensors, or any array-like structure.
For example, with a Serie of Vector3 objects, the Components decomposer would generate attributes named:
P_xP_yP_z
Coordinates Decomposer
The Coordinates decomposer is specialized for working with spatial coordinates.
It can decompose coordinates into different representations:
- Cartesian (x, y, z)
- Spherical (r, theta, phi)
- Cylindrical (r, theta, z)
This is particularly useful for geospatial data, physics simulations, or any application where multiple coordinate representations are needed.
Usage Example
Let's walk through a complete example of using attribute decomposition to work with a dataframe containing 3D vectors:
#include
#include
#include
#include
#include
int main() {
// Create a dataframe with 3D vectors
df::Dataframe data;
// Create a Serie of 3D position vectors
df::Serie positions{
{1.0, 0.0, 0.0},
{0.0, 1.0, 0.0},
{0.0, 0.0, 1.0},
{1.0, 1.0, 1.0},
{2.0, 3.0, 4.0}
};
// Add the position Serie to the dataframe
data.add("P", positions);
// Create a Serie of velocity vectors
df::Serie velocities{
{0.1, 0.0, 0.0},
{0.0, 0.1, 0.0},
{0.0, 0.0, 0.1},
{0.1, 0.1, 0.1},
{0.2, 0.3, 0.4}
};
// Add the velocity Serie to the dataframe
data.add("V", velocities);
// Create an attribute manager for the dataframe
df::attributes::Manager manager(data);
// Add a Components decomposer
manager.addDecomposer(df::attributes::Components());
// Get all scalar attribute names
auto scalar_names = manager.getNames(df::attributes::DecompDimension::Scalar);
// Print the available scalar attributes
std::cout << "Available scalar attributes:" << std::endl;
for (const auto& name : scalar_names) {
std::cout << " - " << name << std::endl;
}
// Get the x-component of position
auto P_x = manager.getSerie("P_x");
// Get the z-component of velocity
auto V_z = manager.getSerie("V_z");
// Print the extracted components
std::cout << "\nPosition X: " << P_x << std::endl;
std::cout << "Velocity Z: " << V_z << std::endl;
// Check if a specific attribute exists
if (manager.hasAttribute(df::attributes::DecompDimension::Scalar, "P_y")) {
std::cout << "\nThe P_y attribute exists." << std::endl;
}
return 0;
}
The output of this example would be:
Available scalar attributes:
- P_x
- P_y
- P_z
- V_x
- V_y
- V_z