Rate limiting means restricting how many requests a user or client can make to your API within a time window.
Example: β100 requests per minute per user.β Request #101 gets rejected with a 429 Too Many Requests status code.
Why rate limit?
- Prevent abuse β stop bots from hammering your API
- Protect your server β one user shouldnβt be able to crash your service
- Fair usage β ensure all users get a fair share of resources
- Cost control β if you pay per API call (database, AI models), rate limiting caps your bill
- Security β slow down brute-force login attempts
Common strategies
Fixed window
Count requests in fixed time blocks (e.g., per minute). Simple but allows bursts at window boundaries.
Minute 1: 0-60s β 100 requests allowed
Minute 2: 60-120s β 100 requests allowed
Sliding window
Smooths out the fixed window problem by looking at a rolling time period.
Token bucket
Users get tokens at a steady rate. Each request costs a token. If youβre out of tokens, you wait. Allows short bursts while maintaining an average rate.
How to implement it
Express.js (Node.js)
import rateLimit from 'express-rate-limit';
const limiter = rateLimit({
windowMs: 60 * 1000, // 1 minute
max: 100, // 100 requests per minute
message: { error: 'Too many requests, try again later' },
});
app.use('/api/', limiter);
Nginx
http {
limit_req_zone $binary_remote_addr zone=api:10m rate=10r/s;
server {
location /api/ {
limit_req zone=api burst=20 nodelay;
}
}
}
Python (Flask)
from flask_limiter import Limiter
limiter = Limiter(app, default_limits=["100 per minute"])
@app.route("/api/data")
@limiter.limit("10 per second")
def get_data():
return {"data": "..."}
Rate limit headers
Most APIs tell you your rate limit status in response headers:
X-RateLimit-Limit: 100 # Max requests allowed
X-RateLimit-Remaining: 42 # Requests left in this window
X-RateLimit-Reset: 1710590400 # When the window resets (Unix timestamp)
Retry-After: 30 # Seconds to wait (on 429 responses)
As an API consumer
If youβre calling someone elseβs API and hitting rate limits:
- Respect
Retry-Afterβ wait the specified time before retrying - Add exponential backoff β wait 1s, then 2s, then 4s, etc.
- Cache responses β donβt re-fetch data you already have
- Batch requests β combine multiple calls into one where possible