Calibrax
Unified Benchmarking Framework
A unified benchmarking framework for the JAX scientific ML ecosystem with profiling, statistical analysis, regression detection, and CI integration.
Repository Coming Soon
This project is under active development and will be open-sourced soon
Overview
Calibrax (Calibrate + JAX) is a unified benchmarking framework for the JAX scientific ML ecosystem. It extracts and consolidates shared benchmarking, profiling, and statistical analysis functionality from Datarax, Artifex, and Opifex into a single, reusable package.
The framework provides a full profiling suite including timing with warm-up awareness, resource monitoring, GPU memory/clock/power tracking, energy measurement, FLOPS counting, roofline analysis, XLA compilation profiling, complexity analysis, hardware detection, and carbon tracking. All measurements come with rigorous statistical analysis — bootstrap confidence intervals, hypothesis testing, effect sizes, and outlier detection.
Calibrax includes direction-aware regression detection with configurable severity levels, cross-configuration comparison with Pareto front analysis and aggregate scoring, and a validation framework for convergence analysis and accuracy assessment. Results can be stored in a JSON-per-run file backend with baseline management, and exported to W&B, MLflow, or publication-ready LaTeX/HTML/CSV tables and matplotlib plots.
For CI integration, Calibrax provides a regression gate with git bisect automation, and production monitoring with configurable alerting thresholds. The full CLI supports ingest, export, check, baseline, trend, summary, and profile commands.
Key Features
Profiling
Timing with warm-up awareness, GPU memory/clock/power tracking, FLOPS counting, roofline analysis, XLA compilation profiling, and carbon tracking.
Statistical Analysis
Bootstrap confidence intervals, hypothesis testing, effect sizes, and outlier detection for rigorous benchmarking results.
Regression Detection
Direction-aware performance regression detection with configurable severity levels. Catch performance regressions before they ship.
Comparison & Ranking
Cross-configuration comparison, Pareto front analysis, aggregate scoring, and scaling analysis for informed architecture decisions.
Publication Export
W&B and MLflow integration, publication-ready LaTeX/HTML/CSV tables, and matplotlib plots for papers and reports.
CLI & CI Integration
Full CLI (ingest, export, check, baseline, trend, summary, profile) with CI regression gate and git bisect automation.
Use Cases
Benchmarking JAX model performance across hardware configurations
Detecting performance regressions in CI/CD pipelines
Comparative analysis across Artifex, Datarax, and Opifex projects
Publication-ready performance tables and plots for research papers
GPU memory and energy profiling for resource optimization
Roofline analysis for identifying computational bottlenecks
Statistical validation of performance improvements
Production monitoring with configurable alerting thresholds
Installation
# Basic installation
uv pip install calibrax
# With statistical analysis (scipy)
uv pip install "calibrax[stats]"
# With GPU monitoring
uv pip install "calibrax[gpu]"
# With publication export (matplotlib)
uv pip install "calibrax[publication]"Quick Start
# Clone and set up development environment
git clone https://github.com/avitai/calibrax.git
cd calibrax
# Automatic setup with GPU detection
./setup.sh
source ./activate.sh
# CLI usage
calibrax profile --model my_model.py
calibrax check --baseline main
calibrax summary --format html
calibrax trend --metric throughput --window 30dBuilt With
Ready to Get Started?
Explore the documentation, try examples, or contribute to the project.