Rate Limiting

RateLimitLayer caps how often a single principal can hit the router. The shipped algorithm is a per-key token bucket with configurable burst size and refill rate. Banks use it to dampen abuse on customer-facing channels without writing per-route guards.

Wiring

use cratestack_axum::ratelimit::{
    InMemoryRateLimitStore, RateLimitConfig, RateLimitLayer,
};
use std::sync::Arc;

let store = Arc::new(InMemoryRateLimitStore::new());
let config = RateLimitConfig::new(/* burst */ 60, /* refill */ 1.0);

let router = cratestack_schema::axum::router(db, procedures, JsonCodec, auth)
    .layer(RateLimitLayer::new(store, config));

RateLimitConfig carries:

burst — maximum tokens in the bucket (the largest peak the layer accepts)
refill_per_second — tokens added back per wall-clock second

A bucket configured (60, 1.0) lets a caller burst 60 requests, then steady-state 1 request per second.

Request flow

For every request the layer:

derives a key from the request (default: Authorization header SHA-256 fingerprint)
asks the store to consume one token
either forwards the request and adds X-RateLimit-Limit + X-RateLimit-Remaining headers to the response
or returns 429 Too Many Requests with Retry-After: <seconds> and an explanation body

Key function

The default fingerprint matches the idempotency layer’s. Banks running tenant-scoped budgeting override it:

RateLimitLayer::new(store, config).with_key_fn(|req| {
    req.headers()
        .get("x-tenant-id")
        .and_then(|v| v.to_str().ok())
        .unwrap_or("anonymous")
        .to_owned()
})

Two callers sharing a tenant share a bucket. Two callers from different tenants get independent buckets.

Stores

The shipped implementation is InMemoryRateLimitStore — a HashMap of buckets behind a Mutex. It is appropriate for:

single-replica deployments
development and testing
per-pod fairness in deployments where the upstream load balancer already shards by principal

Multi-replica deployments need a shared store. The RateLimitStore trait is async and dyn-compatible — Redis-backed implementations are the typical choice:

#[async_trait::async_trait]
pub trait RateLimitStore: Send + Sync + 'static {
    async fn consume(
        &self,
        key: &str,
        config: RateLimitConfig,
    ) -> Result<RateLimitDecision, CoolError>;
}

RateLimitDecision is either Allowed { remaining } or Throttled { retry_after_secs }.

Choosing parameters

Practical starting points:

customer-facing read endpoints: burst 30, refill 2.0 — accommodates page-load bursts
mutating endpoints: burst 10, refill 0.5 — same caller can do meaningful work but not script floods
operator/back-office endpoints: burst 600, refill 10.0 — humans behind a workstation, not bots

Banks layer the rate limit with idempotency — the rate limit caps the rate at which retries hit the layer; the idempotency layer caps how many of those retries actually run the handler.

Caveats

InMemoryRateLimitStore does not bound the key map. Long-running processes facing a high-cardinality key space (per-IP, per-session) should swap to a TTL-aware store.
The token bucket is wall-clock-driven; a process pause longer than one bucket-fill window grants a fresh burst on resume.
The shipped store does not persist across restarts. That is the right choice for per-pod fairness and the wrong choice for global enforcement.

Overview

Get Started

Guides

Banking Readiness

Architecture

Tooling

Studio

Reference

Internals

Rate Limiting

Rate Limiting

Wiring

Request flow

Key function

Stores

Choosing parameters

Caveats

Read Next

Overview

Get Started

Guides

Banking Readiness

Architecture

Tooling

Studio

Reference

Internals

​Rate Limiting

​Wiring

​Request flow

​Key function

​Stores

​Choosing parameters

​Caveats

​Read Next

Rate Limiting

Wiring

Request flow

Key function

Stores

Choosing parameters

Caveats

Read Next