Back
Binning / Histograms
Overview
The bins function computes histograms by distributing values from a Serie into
equally-spaced bins. It supports both automatic range detection (from the data's min/max) and
user-specified custom ranges. A pipeline-compatible bind_bins version is also provided.
Function Signatures
Bins Functions
// Compute histogram with automatic range (min to max of the data)
Serie<size_t> bins(const Serie<double> &serie, size_t nb);
// Compute histogram with custom range
Serie<size_t> bins(const Serie<double> &serie, size_t nb,
double min_val, double max_val);
// Pipeline version (auto-range)
auto bind_bins(size_t nb);
// Pipeline version (custom range)
auto bind_bins(size_t nb, double min_val, double max_val);
The returned Serie<size_t> contains the count of values in each bin.
Bin boundaries are computed as evenly spaced intervals between the minimum and maximum values.
Values exactly at the maximum are placed in the last bin.
Examples
Auto-Range Binning
df::Serie<double> values{1.0, 2.3, 2.7, 3.1, 4.5, 5.0, 5.5, 7.2, 8.0, 9.9};
// Create 5 bins (auto range from 1.0 to 9.9)
auto histogram = df::bins(values, 5);
// Bin edges: [1.0, 2.78), [2.78, 4.56), [4.56, 6.34), [6.34, 8.12), [8.12, 9.9]
// histogram = {2, 2, 2, 2, 2} (approximately)
for (size_t i = 0; i < histogram.size(); ++i) {
std::cout << "Bin " << i << ": " << histogram[i] << " values\n";
}
Custom Range Binning
df::Serie<double> values{1.0, 2.3, 2.7, 3.1, 4.5, 5.0, 5.5, 7.2, 8.0, 9.9};
// Create 10 bins with a fixed range [0, 10]
auto histogram = df::bins(values, 10, 0.0, 10.0);
// Each bin spans width 1.0: [0,1), [1,2), ..., [9,10]
for (size_t i = 0; i < histogram.size(); ++i) {
double lo = 0.0 + i * 1.0;
double hi = lo + 1.0;
std::cout << "[" << lo << ", " << hi << "): "
<< histogram[i] << " values\n";
}
Pipeline Usage
df::Serie<double> data{1.5, 2.5, 3.5, 4.5, 5.5, 6.5, 7.5, 8.5};
// Use in a pipeline
auto hist = data | df::bind_bins(4);
// Chain with other operations
auto random_data = df::math::random_normal(1000, 0.0, 1.0);
auto distribution = random_data | df::bind_bins(20, -4.0, 4.0);
Complete Example
Histogram Analysis
#include <dataframe/Serie.h>
#include <dataframe/bins.h>
#include <dataframe/math.h>
#include <dataframe/stats.h>
#include <iostream>
#include <iomanip>
int main() {
// Generate normally distributed data
auto data = df::math::random_normal(10000, 50.0, 10.0);
// Compute basic statistics
double mean = df::stats::mean(data);
double std = df::stats::std_dev(data);
std::cout << "Mean: " << mean << ", Std Dev: " << std << "\n\n";
// Create histogram with 20 bins
size_t num_bins = 20;
auto [lo, hi] = df::math::bounds(data);
auto histogram = df::bins(data, num_bins, lo, hi);
double bin_width = (hi - lo) / num_bins;
// Print ASCII histogram
size_t max_count = df::math::max(
df::Serie<double>(histogram.size(), [&](size_t i) {
return static_cast<double>(histogram[i]);
})
);
for (size_t i = 0; i < histogram.size(); ++i) {
double bin_lo = lo + i * bin_width;
int bar_len = static_cast<int>(40.0 * histogram[i] / max_count);
std::cout << std::fixed << std::setprecision(1)
<< std::setw(6) << bin_lo << " | ";
for (int j = 0; j < bar_len; ++j) std::cout << "#";
std::cout << " (" << histogram[i] << ")\n";
}
return 0;
}