Back

Binary Serialization

Overview

The binary serialization module provides fast, compact, OS-independent serialization of Serie objects to and from binary files. The format includes type metadata and handles endianness conversion automatically, ensuring files are portable across different architectures. Binary I/O is significantly faster than text-based formats (CSV, JSON) for large datasets.

Saving

save Function

// Save a Serie to a binary file
// The file stores: type identifier, element count, and raw data
template <typename T>
void save(const Serie<T> &serie, const std::string &filename);
Saving Examples

// Save scalar Serie
df::Serie<double> values{1.0, 2.0, 3.0, 4.0, 5.0};
df::io::save(values, "values.bin");

// Save vector Serie
df::Serie<Vector3D> positions{
    {1.0, 2.0, 3.0},
    {4.0, 5.0, 6.0},
    {7.0, 8.0, 9.0}
};
df::io::save(positions, "positions.bin");

// Save stress tensor Serie
df::Serie<SMatrix3D> stresses{
    {100, 10, 0, 50, 5, 30},
    {200, 20, 5, 150, 10, 80}
};
df::io::save(stresses, "stresses.bin");

Loading

load Functions

// Load a Serie with a known type
template <typename T>
Serie<T> load(const std::string &filename);

// Load a Serie with auto-detected type
// Returns a std::variant or type-erased container
auto load(const std::string &filename);
Loading Examples

// Load with explicit type
auto values = df::io::load<double>("values.bin");
std::cout << "Loaded " << values.size() << " doubles\n";

auto positions = df::io::load<Vector3D>("positions.bin");
std::cout << "Loaded " << positions.size() << " 3D vectors\n";

// Auto-detect type
auto data = df::io::load("values.bin");
// Type is detected from the file header

File Inspection and Custom Types

Utility Functions

// Get the type identifier stored in a binary file
// Returns a string like "double", "Vector3D", "SMatrix3D", etc.
std::string get_file_type(const std::string &filename);

// Register a custom type for auto-detection in load()
template <typename T>
void registerCustomType(const std::string &type_name);
Inspection and Custom Types Example

// Inspect a file before loading
std::string type = df::io::get_file_type("unknown_data.bin");
std::cout << "File contains: " << type << std::endl;

if (type == "double") {
    auto data = df::io::load<double>("unknown_data.bin");
    // process scalar data...
} else if (type == "Vector3D") {
    auto data = df::io::load<Vector3D>("unknown_data.bin");
    // process vector data...
}

// Register a custom type
struct MyData {
    double x, y, z;
    int category;
};
df::io::registerCustomType<MyData>("MyData");

// Now save/load works with MyData
df::Serie<MyData> custom_data{
    {1.0, 2.0, 3.0, 1},
    {4.0, 5.0, 6.0, 2}
};
df::io::save(custom_data, "custom.bin");
auto loaded = df::io::load<MyData>("custom.bin");

Complete Example

Binary I/O Pipeline

#include <dataframe/Serie.h>
#include <dataframe/io/binary_serialization.h>
#include <dataframe/math.h>
#include <dataframe/stats.h>
#include <iostream>
#include <chrono>

int main() {
    // Generate a large dataset
    size_t n = 1000000;
    auto data = df::math::random_normal(n, 0.0, 1.0);

    // Save to binary (fast)
    auto t0 = std::chrono::high_resolution_clock::now();
    df::io::save(data, "large_dataset.bin");
    auto t1 = std::chrono::high_resolution_clock::now();

    double save_ms = std::chrono::duration<double, std::milli>(t1 - t0).count();
    std::cout << "Saved " << n << " values in " << save_ms << " ms\n";

    // Load back
    auto t2 = std::chrono::high_resolution_clock::now();
    auto loaded = df::io::load<double>("large_dataset.bin");
    auto t3 = std::chrono::high_resolution_clock::now();

    double load_ms = std::chrono::duration<double, std::milli>(t3 - t2).count();
    std::cout << "Loaded " << loaded.size() << " values in " << load_ms << " ms\n";

    // Verify data integrity
    std::cout << "Type: " << df::io::get_file_type("large_dataset.bin") << "\n";
    std::cout << "Mean: " << df::stats::mean(loaded) << "\n";
    std::cout << "Std:  " << df::stats::std_dev(loaded) << "\n";

    // Save vector data
    df::Serie<Vector3D> displacements(n / 3, [](size_t i) -> Vector3D {
        return {0.001 * i, 0.002 * i, -0.0005 * i};
    });

    df::io::save(displacements, "displacements.bin");
    auto loaded_disp = df::io::load<Vector3D>("displacements.bin");

    std::cout << "Displacement vectors: " << loaded_disp.size() << "\n";

    return 0;
}

File Format

The binary file format consists of:

  • Magic number (4 bytes): Identifies the file as a DataFrame binary file.
  • Endianness marker (2 bytes): Indicates byte order for portability.
  • Type identifier (variable): String identifying the element type.
  • Element count (8 bytes): Number of elements in the Serie.
  • Raw data: The actual data bytes, with endianness conversion if needed.

This format ensures that files written on a little-endian machine can be correctly read on a big-endian machine and vice versa.

Related Functions