thrilltrack-explorer/docs/RATE_LIMITING.md

# Rate Limiting Policy

**Last Updated**: November 3, 2025
**Status**: ACTIVE
**Coverage**: All public edge functions

---

## Overview

ThrillWiki enforces rate limiting on all public edge functions to prevent abuse, ensure fair usage, and protect against denial-of-service (DoS) attacks.

---

## Rate Limit Tiers

### Strict (5 requests/minute per IP)
**Use Case**: Expensive operations that consume significant resources

**Protected Endpoints**:
- `/upload-image` - File upload operations
- Future: Data exports, account deletion

**Reasoning**: File uploads are resource-intensive and should be limited to prevent storage abuse and bandwidth exhaustion.

---

### Standard (10 requests/minute per IP)
**Use Case**: Most API endpoints with moderate resource usage

**Protected Endpoints**:
- `/detect-location` - IP geolocation service
- Future: Public search/filter endpoints

**Reasoning**: Standard protection for endpoints that query external APIs or perform moderate processing.

---

### Lenient (30 requests/minute per IP)
**Use Case**: Read-only, cached endpoints with minimal resource usage

**Protected Endpoints**:
- Future: Cached entity data queries
- Future: Static content endpoints

**Reasoning**: Allow higher throughput for lightweight operations that don't strain resources.

---

### Per-User (Configurable, default 20 requests/minute)
**Use Case**: Authenticated endpoints where rate limiting by user ID provides better protection

**Protected Endpoints**:
- `/process-selective-approval` - 10 requests/minute per moderator
- Future: User-specific API endpoints

**Reasoning**: Moderators have different usage patterns than public users. Per-user limiting prevents credential sharing while allowing legitimate high-volume usage.

**Implementation**:
```typescript
const approvalRateLimiter = rateLimiters.perUser(10); // Custom limit
```

---

## Rate Limit Headers

All responses include rate limit information:

```http
X-RateLimit-Limit: 10
X-RateLimit-Remaining: 7
```

**On Rate Limit Exceeded** (HTTP 429):
```http
Retry-After: 45
X-RateLimit-Limit: 10
X-RateLimit-Remaining: 0
```

---

## Error Response Format

When rate limit is exceeded, you'll receive:

```json
{
  "error": "Rate limit exceeded",
  "message": "Too many requests. Please try again later.",
  "retryAfter": 45
}
```

**HTTP Status Code**: 429 Too Many Requests

---

## Client Implementation

### Handling Rate Limits

```typescript
async function uploadImage(file: File) {
  try {
    const response = await fetch('/upload-image', {
      method: 'POST',
      body: formData,
    });

    if (response.status === 429) {
      const data = await response.json();
      const retryAfter = data.retryAfter || 60;

      console.warn(`Rate limited. Retry in ${retryAfter} seconds`);

      // Wait and retry
      await new Promise(resolve => setTimeout(resolve, retryAfter * 1000));
      return uploadImage(file); // Retry
    }

    return response.json();
  } catch (error) {
    console.error('Upload failed:', error);
    throw error;
  }
}
```

### Exponential Backoff

For production clients, implement exponential backoff:

```typescript
async function uploadWithBackoff(file: File, maxRetries = 3) {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    try {
      const response = await fetch('/upload-image', {
        method: 'POST',
        body: formData,
      });

      if (response.status !== 429) {
        return response.json();
      }

      // Exponential backoff: 1s, 2s, 4s
      const backoffDelay = Math.pow(2, attempt) * 1000;
      await new Promise(resolve => setTimeout(resolve, backoffDelay));
    } catch (error) {
      if (attempt === maxRetries - 1) throw error;
    }
  }
  throw new Error('Max retries exceeded');
}
```

---

## Monitoring & Metrics

### Key Metrics to Track

1. **Rate Limit Hit Rate**: Percentage of requests hitting limits
2. **429 Response Count**: Total rate limit errors by endpoint
3. **Top Rate Limited IPs**: Identify potential abuse patterns
4. **False Positive Rate**: Legitimate users hitting limits

### Alerting Thresholds

**Warning Alerts**:
- Rate limit hit rate > 5% on any endpoint
- Single IP hits rate limit > 10 times in 1 hour

**Critical Alerts**:
- Rate limit hit rate > 20% (may indicate DDoS)
- Multiple IPs hitting limits simultaneously (coordinated attack)

---

## Rate Limit Adjustments

### Increasing Limits for Legitimate Use

If you have a legitimate use case requiring higher limits:

1. **Contact Support**: Describe your use case and expected volume
2. **Verification**: We'll verify your account and usage patterns
3. **Temporary Increase**: May grant temporary limit increase
4. **Custom Tier**: High-volume verified accounts may get custom limits

**Examples of Valid Requests**:
- Bulk data migration project
- Integration with external service
- High-traffic public API client

---

## Technical Implementation

### Architecture

Rate limiting is implemented using in-memory rate limiting with:
- **Storage**: Map-based storage (IP → {count, resetAt})
- **Cleanup**: Periodic cleanup of expired entries (every 30 seconds)
- **Capacity Management**: LRU eviction when map exceeds 10,000 entries
- **Emergency Handling**: Automatic cleanup if memory pressure detected

### Memory Management

**Map Capacity**: 10,000 unique IPs tracked simultaneously
**Cleanup Interval**: Every 30 seconds or half the rate limit window
**LRU Eviction**: Removes 30% oldest entries when at capacity

### Shared Middleware

All edge functions use the shared rate limiter:

```typescript
import { rateLimiters, withRateLimit } from '../_shared/rateLimiter.ts';

const limiter = rateLimiters.strict; // or .standard, .lenient, .perUser(n)

serve(withRateLimit(async (req) => {
  // Your edge function logic
}, limiter, corsHeaders));
```

---

## Security Considerations

### IP Spoofing Protection

Rate limiting uses `X-Forwarded-For` header (first IP in chain):
- Trusts proxy headers in production (Cloudflare, Supabase)
- Prevents IP spoofing by using first IP only
- Falls back to `X-Real-IP` if `X-Forwarded-For` unavailable

### Distributed Attacks

**Current Limitation**: In-memory rate limiting is per-edge-function instance
- Distributed attacks across multiple instances may bypass limits
- Future: Consider distributed rate limiting (Redis, Supabase table)

**Mitigation**:
- Monitor aggregate request rates across all instances
- Use Cloudflare rate limiting as first line of defense
- Alert on unusual traffic patterns

---

## Bypassing Rate Limits

**Important**: Rate limits CANNOT be bypassed, even for authenticated users.

**Why No Bypass?**:
- Prevents credential compromise from affecting system stability
- Ensures fair usage across all users
- Protects backend infrastructure

**Moderator/Admin Considerations**:
- Per-user rate limiting allows higher individual limits
- Moderators have different tiers for moderation actions
- No complete bypass to prevent abuse of compromised accounts

---

## Testing Rate Limits

### Manual Testing

```bash
# Test upload-image rate limit (5 req/min)
for i in {1..6}; do
  curl -X POST https://api.thrillwiki.com/functions/v1/upload-image \
    -H "Authorization: Bearer $TOKEN" \
    -d '{}' && echo "Request $i succeeded"
done
# Expected: First 5 succeed, 6th returns 429
```

### Automated Testing

```typescript
describe('Rate Limiting', () => {
  test('enforces strict limits on upload-image', async () => {
    const requests = [];

    // Make 6 requests (limit is 5)
    for (let i = 0; i < 6; i++) {
      requests.push(fetch('/upload-image', { method: 'POST' }));
    }

    const responses = await Promise.all(requests);
    const statuses = responses.map(r => r.status);

    expect(statuses.filter(s => s === 200).length).toBe(5);
    expect(statuses.filter(s => s === 429).length).toBe(1);
  });
});
```

---

## Future Enhancements

### Planned Improvements

1. **Database-Backed Rate Limiting**: Persistent rate limiting across edge function instances
2. **Dynamic Rate Limits**: Adjust limits based on system load
3. **User Reputation System**: Higher limits for trusted users
4. **API Keys**: Rate limiting by API key for integrations
5. **Cost-Based Limiting**: Different limits for different operation costs

---

## Related Documentation

- [Security Fixes (P0)](./SECURITY_FIXES_P0.md)
- [Edge Function Development](./EDGE_FUNCTIONS.md)
- [Error Tracking](./ERROR_TRACKING.md)

---

## Troubleshooting

### "Rate limit exceeded" when I haven't made many requests

**Possible Causes**:
1. **Shared IP**: You're behind a NAT/VPN sharing an IP with others
2. **Recent Requests**: Rate limit window hasn't reset yet
3. **Multiple Tabs**: Multiple browser tabs making requests

**Solutions**:
- Wait for rate limit window to reset (shown in `Retry-After` header)
- Check browser dev tools for unexpected background requests
- Disable browser extensions that might be making requests

### Rate limit seems inconsistent

**Explanation**: Rate limiting is per-edge-function instance
- Multiple instances may have separate rate limit counters
- Distributed traffic may see different limits
- This is expected behavior for in-memory rate limiting

---

## Contact

For rate limit issues or increase requests:
- **Support**: [Contact form on ThrillWiki]
- **Documentation**: https://docs.thrillwiki.com
- **Status**: https://status.thrillwiki.com