Mar 16, 2026 · 2 min read

What is Rate Limiting? A Simple Explanation for Developers

Rate limiting means restricting how many requests a user or client can make to your API within a time window.

Example: “100 requests per minute per user.” Request #101 gets rejected with a 429 Too Many Requests status code.

Why rate limit?

Prevent abuse — stop bots from hammering your API
Protect your server — one user shouldn’t be able to crash your service
Fair usage — ensure all users get a fair share of resources
Cost control — if you pay per API call (database, AI models), rate limiting caps your bill
Security — slow down brute-force login attempts

Common strategies

Fixed window

Count requests in fixed time blocks (e.g., per minute). Simple but allows bursts at window boundaries.

Minute 1: 0-60s → 100 requests allowed
Minute 2: 60-120s → 100 requests allowed

Sliding window

Smooths out the fixed window problem by looking at a rolling time period.

Token bucket

Users get tokens at a steady rate. Each request costs a token. If you’re out of tokens, you wait. Allows short bursts while maintaining an average rate.

How to implement it

Express.js (Node.js)

import rateLimit from 'express-rate-limit';

const limiter = rateLimit({
  windowMs: 60 * 1000, // 1 minute
  max: 100, // 100 requests per minute
  message: { error: 'Too many requests, try again later' },
});

app.use('/api/', limiter);

Nginx

http {
    limit_req_zone $binary_remote_addr zone=api:10m rate=10r/s;

    server {
        location /api/ {
            limit_req zone=api burst=20 nodelay;
        }
    }
}

Python (Flask)

from flask_limiter import Limiter

limiter = Limiter(app, default_limits=["100 per minute"])

@app.route("/api/data")
@limiter.limit("10 per second")
def get_data():
    return {"data": "..."}

Rate limit headers

Most APIs tell you your rate limit status in response headers:

X-RateLimit-Limit: 100        # Max requests allowed
X-RateLimit-Remaining: 42     # Requests left in this window
X-RateLimit-Reset: 1710590400 # When the window resets (Unix timestamp)
Retry-After: 30               # Seconds to wait (on 429 responses)

As an API consumer

If you’re calling someone else’s API and hitting rate limits:

Respect Retry-After — wait the specified time before retrying
Add exponential backoff — wait 1s, then 2s, then 4s, etc.
Cache responses — don’t re-fetch data you already have
Batch requests — combine multiple calls into one where possible