Back to Open Source

DiffBio

Differentiable Bioinformatics Pipelines

End-to-end differentiable bioinformatics pipelines built on JAX. Replace discrete operations with differentiable relaxations for gradient-based optimization.

Repository Coming Soon

This project is under active development and will be open-sourced soon

Overview

DiffBio is a framework for building end-to-end differentiable bioinformatics pipelines. Traditional bioinformatics pipelines use discrete operations (hard thresholds, argmax decisions) that block gradient flow. DiffBio addresses this by replacing these operations with differentiable relaxations, enabling gradient-based optimization through entire analysis workflows.

The framework provides 35+ differentiable operators covering alignment, variant calling, single-cell analysis, epigenomics, RNA-seq, preprocessing, normalization, and multi-omics. Key innovations include soft quality filtering using sigmoid-based weights instead of hard cutoffs, differentiable pileup with soft position assignments via temperature-controlled softmax, and soft alignment scoring replacing discrete Smith-Waterman with continuous relaxations.

DiffBio includes 5 end-to-end pipelines for variant calling, single-cell analysis, differential expression, and preprocessing. Each pipeline can be trained using gradient descent with custom loss functions, gradient clipping, and synthetic data generation for bootstrapping. This enables learning optimal pipeline parameters directly from data rather than manual tuning.

Built on Datarax's operator framework and powered by JAX/Flax NNX, DiffBio inherits composable architecture with automatic vectorization, batch processing, and GPU acceleration. Each operator implements the standard apply interface, enabling seamless composition into complex analysis workflows.

Key Features

35+ Differentiable Operators

Covering alignment, variant calling, single-cell analysis, epigenomics, RNA-seq, preprocessing, normalization, and multi-omics.

Soft Quality Filtering

Sigmoid-based weights instead of hard cutoffs. Learnable thresholds allow gradient-based optimization of quality control parameters.

Differentiable Alignment

Soft Smith-Waterman scoring replacing discrete alignments with continuous relaxations. Temperature-controlled softmax for smooth gradient flow.

End-to-End Pipelines

5 ready-to-use pipelines for variant calling, single-cell analysis, differential expression, and preprocessing — all trainable with gradient descent.

GPU-Accelerated

Built on JAX for XLA-compiled computation. Process large genomic datasets efficiently on GPUs and TPUs.

Built on Datarax

Composable architecture using the Datarax operator framework. Chain operators into pipelines with automatic vectorization and batch processing.

Use Cases

1

Variant calling with learnable quality thresholds and pileup parameters

2

Single-cell RNA-seq analysis with differentiable preprocessing

3

Differential expression analysis with end-to-end optimization

4

Epigenomics peak calling with soft boundary detection

5

Learning optimal pipeline parameters directly from labeled data

6

Multi-omics integration with gradient-based feature selection

7

Benchmarking differentiable vs. discrete bioinformatics approaches

8

Training custom bioinformatics operators with task-specific losses

Installation

# Clone the repository
git clone https://github.com/avitai/DiffBio.git
cd DiffBio

# Install with uv (recommended)
uv sync

# Or with pip
pip install -e .

Quick Start

import jax.numpy as jnp
from flax import nnx
from diffbio.operators import (
    DifferentiableQualityFilter,
    DifferentiablePileup,
)
from diffbio.pipelines import create_variant_calling_pipeline

# Quality filtering with learnable threshold
quality_filter = DifferentiableQualityFilter(
    threshold=20.0, temperature=1.0, rngs=nnx.Rngs(0),
)

# Create end-to-end variant calling pipeline
pipeline = create_variant_calling_pipeline(
    reference_length=100,
    num_classes=3,  # ref, SNP, indel
    hidden_dim=32,
    seed=42,
)

# Process reads — result contains per-position variant predictions
result, _, _ = pipeline.apply(batch_data, {}, None)

Built With

JAXFlax NNXOptaxDataraxjaxtypingNumPy

Ready to Get Started?

Explore the documentation, try examples, or contribute to the project.