Back

Binning / Histograms

Overview

The bins function computes histograms by distributing values from a Serie into equally-spaced bins. It supports both automatic range detection (from the data's min/max) and user-specified custom ranges. A pipeline-compatible bind_bins version is also provided.

Function Signatures

Bins Functions

// Compute histogram with automatic range (min to max of the data)
Serie<size_t> bins(const Serie<double> &serie, size_t nb);

// Compute histogram with custom range
Serie<size_t> bins(const Serie<double> &serie, size_t nb,
                    double min_val, double max_val);

// Pipeline version (auto-range)
auto bind_bins(size_t nb);

// Pipeline version (custom range)
auto bind_bins(size_t nb, double min_val, double max_val);

The returned Serie<size_t> contains the count of values in each bin. Bin boundaries are computed as evenly spaced intervals between the minimum and maximum values. Values exactly at the maximum are placed in the last bin.

Examples

Auto-Range Binning

df::Serie<double> values{1.0, 2.3, 2.7, 3.1, 4.5, 5.0, 5.5, 7.2, 8.0, 9.9};

// Create 5 bins (auto range from 1.0 to 9.9)
auto histogram = df::bins(values, 5);
// Bin edges: [1.0, 2.78), [2.78, 4.56), [4.56, 6.34), [6.34, 8.12), [8.12, 9.9]
// histogram = {2, 2, 2, 2, 2}  (approximately)

for (size_t i = 0; i < histogram.size(); ++i) {
    std::cout << "Bin " << i << ": " << histogram[i] << " values\n";
}
Custom Range Binning

df::Serie<double> values{1.0, 2.3, 2.7, 3.1, 4.5, 5.0, 5.5, 7.2, 8.0, 9.9};

// Create 10 bins with a fixed range [0, 10]
auto histogram = df::bins(values, 10, 0.0, 10.0);
// Each bin spans width 1.0: [0,1), [1,2), ..., [9,10]

for (size_t i = 0; i < histogram.size(); ++i) {
    double lo = 0.0 + i * 1.0;
    double hi = lo + 1.0;
    std::cout << "[" << lo << ", " << hi << "): "
              << histogram[i] << " values\n";
}
Pipeline Usage

df::Serie<double> data{1.5, 2.5, 3.5, 4.5, 5.5, 6.5, 7.5, 8.5};

// Use in a pipeline
auto hist = data | df::bind_bins(4);

// Chain with other operations
auto random_data = df::math::random_normal(1000, 0.0, 1.0);
auto distribution = random_data | df::bind_bins(20, -4.0, 4.0);

Complete Example

Histogram Analysis

#include <dataframe/Serie.h>
#include <dataframe/bins.h>
#include <dataframe/math.h>
#include <dataframe/stats.h>
#include <iostream>
#include <iomanip>

int main() {
    // Generate normally distributed data
    auto data = df::math::random_normal(10000, 50.0, 10.0);

    // Compute basic statistics
    double mean = df::stats::mean(data);
    double std  = df::stats::std_dev(data);

    std::cout << "Mean: " << mean << ", Std Dev: " << std << "\n\n";

    // Create histogram with 20 bins
    size_t num_bins = 20;
    auto [lo, hi] = df::math::bounds(data);
    auto histogram = df::bins(data, num_bins, lo, hi);

    double bin_width = (hi - lo) / num_bins;

    // Print ASCII histogram
    size_t max_count = df::math::max(
        df::Serie<double>(histogram.size(), [&](size_t i) {
            return static_cast<double>(histogram[i]);
        })
    );

    for (size_t i = 0; i < histogram.size(); ++i) {
        double bin_lo = lo + i * bin_width;
        int bar_len = static_cast<int>(40.0 * histogram[i] / max_count);

        std::cout << std::fixed << std::setprecision(1)
                  << std::setw(6) << bin_lo << " | ";
        for (int j = 0; j < bar_len; ++j) std::cout << "#";
        std::cout << " (" << histogram[i] << ")\n";
    }

    return 0;
}

Related Functions