Understanding Attribute Decomposition in DataFrame

A guide to extracting and managing components from complex data

Introduction

When working with scientific, engineering, or geospatial data, we often encounter complex data structures like vectors, tensors, or specialized objects. While these structures are perfect for calculations, there are times when we need to access their individual components for visualization, statistical analysis, or other component-specific operations.

The DataFrame library provides a powerful mechanism called Attribute Decomposition that allows you to:

  • Access individual components of complex data types without losing the original structure
  • Create different views or interpretations of the same data
  • Perform transformations on decomposed data and reflect changes back to the original
  • Simplify the handling of multi-dimensional or multi-attribute data

This tutorial will guide you through the attribute decomposition system in DataFrame, showing you how to use it effectively in your data analysis workflows.

What is Attribute Decomposition?

Attribute decomposition is the process of breaking down complex data types into simpler, more accessible components while maintaining the relationships between them. For example:

  • A 3D vector can be decomposed into its x, y, and z components
  • A tensor can be decomposed into its individual elements or principal components
  • A color value might be decomposed into RGB or HSL components
  • Geospatial coordinates might be decomposed into latitude, longitude, and elevation

The DataFrame library provides a flexible system for attribute decomposition through two main classes:

  • Manager: Coordinates the decomposition process and provides a unified interface
  • Decomposer: Implements the actual decomposition logic for specific data types

Additionally, the library comes with several built-in decomposers for common scenarios:

  • Components: Extracts individual elements from vectors, matrices, and arrays
  • Coordinates: Specialized decomposer for working with spatial coordinates

The Manager Class

The df::attributes::Manager class is the main interface for attribute decomposition. It manages a collection of decomposers and provides methods to extract and work with decomposed attributes.

Key Manager Methods

Manager Class API
namespace df {
namespace attributes {

class Manager {
public:
    // Constructor taking a dataframe to manage
    explicit Manager(const Dataframe &df);
    
    // Copy constructor
    Manager(const Manager &other);
    
    // Register a decomposer
    void addDecomposer(const Decomposer &decomposer);
    
    // Get all attribute names for a target dimension
    std::vector getNames(DecompDimension targetDim) const;
    
    // Get a specific Serie by name
    template  Serie getSerie(const std::string &name) const;
    
    // Check if an attribute exists
    bool hasAttribute(DecompDimension, const std::string &) const;
    
    // Clear all decomposers
    void clear();
    
    // Get the number of registered decomposers
    size_t decomposerCount() const;
};

// Helper function to create a Manager
template 
Manager createManager(const std::vector &names,
                      const Serie &...series);

} // namespace attributes
} // namespace df

DecompDimension Enum

The DecompDimension enum defines the target dimension for decomposition:

DecompDimension Enum
// Mathematical decomposition dimension
enum class DecompDimension {
    Scalar = 1, // Individual components (x, y, z, etc.)
    Vector,     // N-dimensional vectors
    Matrix      // N-dimensional matrices/tensors
};

This enum allows you to specify the dimensionality of the decomposed attributes you want to retrieve.

Decomposer Basics

The Decomposer class is the base class for all decomposers in the DataFrame library. It defines the interface that all concrete decomposers must implement.

Decomposer Base Class
class Decomposer {
public:
    virtual ~Decomposer() = default;

    // Create a clone of this decomposer
    virtual std::unique_ptr clone() const = 0;

    // Get the names of all decomposed attributes for a given serie
    virtual Strings names(const Dataframe &dataframe, DecompDimension targetDim,
                        const SerieBase &serie, const String &name) const = 0;

    // Get a specific decomposed serie
    virtual Serie serie(const Dataframe &dataframe,
                            DecompDimension targetDim,
                            const std::string &name) const = 0;

protected:
    // Helper methods for derived classes
    template  static size_t getComponentCount();
    template  static Serie extractComponent(const Serie &serie, size_t index);
};

To make it easier to create custom decomposers, DataFrame provides a generic decomposer template:

Generic Decomposer Template
template  class GenDecomposer : public Decomposer {
public:
    std::unique_ptr clone() const override;

    Strings names(const Dataframe &dataframe, DecompDimension targetDim,
                const SerieBase &serie, const String &name) const override;

    Serie serie(const Dataframe &dataframe, DecompDimension targetDim,
                      const std::string &name) const override;
};

This template handles the common boilerplate and delegates the actual implementation to the derived class.

Built-in Decomposers

The DataFrame library includes several built-in decomposers for common use cases:

Components Decomposer

The Components decomposer extracts individual elements from arrays, vectors, and matrices. It's useful for working with geometric data, tensors, or any array-like structure.

For example, with a Serie of Vector3 objects, the Components decomposer would generate attributes named:

  • P_x
  • P_y
  • P_z

Coordinates Decomposer

The Coordinates decomposer is specialized for working with spatial coordinates. It can decompose coordinates into different representations:

  • Cartesian (x, y, z)
  • Spherical (r, theta, phi)
  • Cylindrical (r, theta, z)

This is particularly useful for geospatial data, physics simulations, or any application where multiple coordinate representations are needed.

Usage Example

Let's walk through a complete example of using attribute decomposition to work with a dataframe containing 3D vectors:

Basic Attribute Decomposition Example
#include 
#include 
#include 
#include 
#include 

int main() {
    // Create a dataframe with 3D vectors
    df::Dataframe data;
    
    // Create a Serie of 3D position vectors
    df::Serie positions{
        {1.0, 0.0, 0.0},
        {0.0, 1.0, 0.0},
        {0.0, 0.0, 1.0},
        {1.0, 1.0, 1.0},
        {2.0, 3.0, 4.0}
    };
    
    // Add the position Serie to the dataframe
    data.add("P", positions);
    
    // Create a Serie of velocity vectors
    df::Serie velocities{
        {0.1, 0.0, 0.0},
        {0.0, 0.1, 0.0},
        {0.0, 0.0, 0.1},
        {0.1, 0.1, 0.1},
        {0.2, 0.3, 0.4}
    };
    
    // Add the velocity Serie to the dataframe
    data.add("V", velocities);
    
    // Create an attribute manager for the dataframe
    df::attributes::Manager manager(data);
    
    // Add a Components decomposer
    manager.addDecomposer(df::attributes::Components());
    
    // Get all scalar attribute names
    auto scalar_names = manager.getNames(df::attributes::DecompDimension::Scalar);
    
    // Print the available scalar attributes
    std::cout << "Available scalar attributes:" << std::endl;
    for (const auto& name : scalar_names) {
        std::cout << "  - " << name << std::endl;
    }
    
    // Get the x-component of position
    auto P_x = manager.getSerie("P_x");
    
    // Get the z-component of velocity
    auto V_z = manager.getSerie("V_z");
    
    // Print the extracted components
    std::cout << "\nPosition X: " << P_x << std::endl;
    std::cout << "Velocity Z: " << V_z << std::endl;
    
    // Check if a specific attribute exists
    if (manager.hasAttribute(df::attributes::DecompDimension::Scalar, "P_y")) {
        std::cout << "\nThe P_y attribute exists." << std::endl;
    }
    
    return 0;
}

The output of this example would be:

Available scalar attributes:
  - P_x
  - P_y
  - P_z
  - V_x
  - V_y
  - V_z