Overview

API Rate limiting in Monime is a fundamental control mechanism designed to maintain platform reliability, equitable usage, and predictable performance. Every request passing through is subject to limit evaluation to prevent system abuse, contain noisy workloads, and ensure that resources are shared fairly across all tenants. Unlike single-threshold throttling, we apply limits at multiple contextual layers - capturing the identity of the caller, the scope of the tenant, and the type of request being made. This provides a balanced enforcement model that protects both the infrastructure and individual tenants, while giving clients precise feedback for adaptive behavior.

Enforcement Model

Rate limiting operates in a chainable fashion. Each incoming request is intercepted before it reaches the core service(s). At this stage, Monime evaluates the request against several limits, each responsible for a different dimension of usage control. If any limit rejects the request:
  • The request will not be forwarded to the target service.
  • No billing or side effect action will occur downstream.
  • A structured response is returned with rate limit headers (the limit and retry info).
This ensures that rejected requests are cheap for the platform and safe for the client.

Error Response

When a request is throttled, Monime always returns a structured error so your client knows exactly what happened and how to recover.

Status Code

429 Too Many Requests Standard HTTP code indicating the client has hit a rate limit.

Headers

  1. Monime-Rate-Limit → shows which limit was triggered:
    • token-limit — the per-token quota was exceeded.
    • space-limit — the per-space quota was exceeded.
    • endpoint-limit — the per-endpoint quota for the token was exceeded.
  2. Retry-After → the number of seconds to wait before retrying. Always at least 1.

Body

The response body is JSON and mirrors the structure used across Monime errors:
{
  "success": false,
  "messages": [],
  "error": {
    "code": 429,
    "reason": "too_many_requests",
    "message": "Too many requests sent in a short period. Request blocked due to rate limiting",
    "details": []
  }
}

Limits by Dimension

Our API rate limiting is built on a multidimensional enforcement model, rather than a single global cap. This means every request is evaluated through multiple layers of control, each representing a different perspective of usage. At its core, the model ensures that traffic is constrained at the edges where pressure naturally builds:
  • At the caller level, so no individual token can monopolize capacity.
  • At the tenant level, so traffic bursts from one Space don’t affect others.
  • At the route level, so individual endpoints remain reliable under heavy access.
The key principle behind this design is balance. By layering limits across dimensions, Monime can:
  • Protect platform stability by ensuring no single dimension overwhelms the system.
  • Preserve fairness so tenants (Spaces) share resources equitably.
  • Provide granularity, allowing clients to understand exactly where throttling occurred (token, space, or endpoint).
Instead of treating all traffic equally, Monime evaluates requests in context. For example, a Space may still have capacity overall, but if a specific endpoint is being hammered, the endpoint limit can apply targeted pressure. Conversely, a token might be consuming more than its share even though the endpoint is underutilized — the token limit steps in at that layer. This context-aware layering makes the protection mechanism adaptive: it scales with diverse workloads, isolates noisy patterns, and gives developers actionable feedback when limits are exceeded.
Think of it as a series of guardrails: if traffic skews too far in one direction — whether by token, by space, or by endpoint — the limit at that dimension catches it before it destabilizes the rest of the system.

Space-Based Limit

Limit name: space-limit A Space in Monime represents an organizational boundary — typically a tenant or business account. To protect stability and ensure fair use, Monime applies rate limits at the Space level.

Per-second Quotas

  • Test mode: up to 100 requests per second across all space interactions.
  • Live mode: up to 500 requests per second across all space interactions.
This aggregate limit ensures that even if multiple tokens are used, the combined throughput of the Space does not exceed the allowed budget.

Why Space Limit Exists

  • Tenant fairness: Prevents a single tenant from monopolizing platform resources.
  • Poison Containment: Controls noisy workloads where multiple apps or services operating under a Space generate bursts simultaneously.

Developer Implications

  • Monitor Space-wide usage, since limits apply across all tokens and endpoints.
  • When multiple apps share a Space, spread out their workloads and add jitter to avoid synchronized spikes.
  • Keep test traffic in test mode to stay within the lower quota and avoid affecting production.
  • One app exceeding its share can cause throttling for the entire Space, so coordinate usage across teams.
Even if tokens are individually within their own quotas, exceeding the Space quota (500 req/s for live, 100 req/s for test) will result in throttling across the entire Space.

Token-Based Limit

Limit name: token-limit Each access token has its own request-per-second budget. Live and test tokens are evaluated separately, ensuring development workloads don’t interfere with production traffic.

Per-second Quotas

  • Test tokens: up to 20 requests per second
  • Live tokens: up to 80 requests per second
This separation ensures that production workloads can operate at higher throughput, while test and staging environments remain safely throttled.

Why Token Limit Exists

  • Fairness: Prevents any single application from monopolizing shared infrastructure.
  • Isolation: Ensures that runaway traffic from one integration won’t spill over into others, even inside the same Space.
  • Environment separation: Keeps test workloads clearly isolated from production, avoiding cross-impact.

Developer Implications

  • Use separate tokens for independent applications or services. This guarantees each gets its own request budget.
  • Always use test tokens in staging, CI, and development environments.
  • Instrument your clients to log throttling events at the token level, so you can quickly identify integrations exceeding their fair share.
Even if your Space still has available capacity, exceeding the per-token quota (80 req/s live, 20 req/s test) will cause requests to be throttled at the token layer.

Endpoint-Based Limit

Limit name: endpoint-limit In addition to token and Space limits, Monime also enforces limits at the endpoint level, applied per token. This ensures that no single token can overload a specific API route, regardless of which Space it operates in.

Per-second Quotas

  • Test tokens: up to 5 requests per second per token per endpoint
  • Live tokens: up to 20 requests per second per token per endpoint
Each token has its own independent budget for every endpoint. If a token exceeds the limit on one route, only that endpoint is throttled for that token — other endpoints (or other tokens) remain unaffected.

Why Endpoint Limits Exist

  • Protect sensitive APIs: Prevents abuse of routes that are commonly polled or accessed at high frequency.
  • Fair distribution: Ensures all clients using the same endpoint can share capacity fairly.
  • Localized throttling: Restricts overload at the narrowest scope (token and endpoint), without penalizing unrelated traffic.

Developer Implications

  • Avoid rapid polling of a single endpoint — use webhooks, or conditional requests where possible.
  • If polling is required, spread requests with jitter to avoid breaching the 20|5 req/s cap.
  • Monitor endpoint-level throttling per token to identify “hot routes” that need optimization.
Even if token-level and Space-level quotas are not reached, exceeding the endpoint quota (20 req/s live, 5 req/s test) will result in throttling on that specific route for the token.

Why Rate Limiting Happens & How to Avoid It

Rate-limiting errors (429 Too Many Requests) occur when traffic goes beyond Monime’s defined quotas at the space, token, or endpoint level.

Common Causes

  • Polling status endpoints (e.g., GET /v1/payouts/{payout-id}) too frequently instead of using webhooks.
  • Many small API calls in loops or tiny pagination.
  • Multiple apps sharing the same token.
  • Staging traffic using live tokens/Spaces.
  • High concurrency from workers/consumers.

Best Practices

  • Use webhooks as your primary signal; poll sparingly.
  • Batch & paginate smartly — fewer, larger requests.
  • Issue separate tokens per app/service; keep test traffic in test mode.
  • Cap concurrency and add jitter to spread requests.
  • Respect Retry-After and log the Monime-Rate-Limit header to know what triggered the block.
In Monime, 429s are not failures — they’re flow-control signals. Handle them with backoff + jitter to keep your integrations resilient and reliable.