feat: Implement security fix plan

2025-12-27 14:26:58 -05:00 · 2025-11-03 15:50:07 +00:00
parent a86da6e833
commit fdfe141f31
6 changed files with 988 additions and 6 deletions
--- a/docs/RATE_LIMITING.md
+++ b/docs/RATE_LIMITING.md
@@ -0,0 +1,355 @@
+# Rate Limiting Policy
+
+**Last Updated**: November 3, 2025  
+**Status**: ACTIVE  
+**Coverage**: All public edge functions
+
+---
+
+## Overview
+
+ThrillWiki enforces rate limiting on all public edge functions to prevent abuse, ensure fair usage, and protect against denial-of-service (DoS) attacks.
+
+---
+
+## Rate Limit Tiers
+
+### Strict (5 requests/minute per IP)
+**Use Case**: Expensive operations that consume significant resources
+
+**Protected Endpoints**:
+- `/upload-image` - File upload operations
+- Future: Data exports, account deletion
+
+**Reasoning**: File uploads are resource-intensive and should be limited to prevent storage abuse and bandwidth exhaustion.
+
+---
+
+### Standard (10 requests/minute per IP)
+**Use Case**: Most API endpoints with moderate resource usage
+
+**Protected Endpoints**:
+- `/detect-location` - IP geolocation service
+- Future: Public search/filter endpoints
+
+**Reasoning**: Standard protection for endpoints that query external APIs or perform moderate processing.
+
+---
+
+### Lenient (30 requests/minute per IP)
+**Use Case**: Read-only, cached endpoints with minimal resource usage
+
+**Protected Endpoints**:
+- Future: Cached entity data queries
+- Future: Static content endpoints
+
+**Reasoning**: Allow higher throughput for lightweight operations that don't strain resources.
+
+---
+
+### Per-User (Configurable, default 20 requests/minute)
+**Use Case**: Authenticated endpoints where rate limiting by user ID provides better protection
+
+**Protected Endpoints**:
+- `/process-selective-approval` - 10 requests/minute per moderator
+- Future: User-specific API endpoints
+
+**Reasoning**: Moderators have different usage patterns than public users. Per-user limiting prevents credential sharing while allowing legitimate high-volume usage.
+
+**Implementation**:
+```typescript
+const approvalRateLimiter = rateLimiters.perUser(10); // Custom limit
+```
+
+---
+
+## Rate Limit Headers
+
+All responses include rate limit information:
+
+```http
+X-RateLimit-Limit: 10
+X-RateLimit-Remaining: 7
+```
+
+**On Rate Limit Exceeded** (HTTP 429):
+```http
+Retry-After: 45
+X-RateLimit-Limit: 10
+X-RateLimit-Remaining: 0
+```
+
+---
+
+## Error Response Format
+
+When rate limit is exceeded, you'll receive:
+
+```json
+{
+  "error": "Rate limit exceeded",
+  "message": "Too many requests. Please try again later.",
+  "retryAfter": 45
+}
+```
+
+**HTTP Status Code**: 429 Too Many Requests
+
+---
+
+## Client Implementation
+
+### Handling Rate Limits
+
+```typescript
+async function uploadImage(file: File) {
+  try {
+    const response = await fetch('/upload-image', {
+      method: 'POST',
+      body: formData,
+    });
+
+    if (response.status === 429) {
+      const data = await response.json();
+      const retryAfter = data.retryAfter || 60;
+      
+      console.warn(`Rate limited. Retry in ${retryAfter} seconds`);
+      
+      // Wait and retry
+      await new Promise(resolve => setTimeout(resolve, retryAfter * 1000));
+      return uploadImage(file); // Retry
+    }
+
+    return response.json();
+  } catch (error) {
+    console.error('Upload failed:', error);
+    throw error;
+  }
+}
+```
+
+### Exponential Backoff
+
+For production clients, implement exponential backoff:
+
+```typescript
+async function uploadWithBackoff(file: File, maxRetries = 3) {
+  for (let attempt = 0; attempt < maxRetries; attempt++) {
+    try {
+      const response = await fetch('/upload-image', {
+        method: 'POST',
+        body: formData,
+      });
+
+      if (response.status !== 429) {
+        return response.json();
+      }
+
+      // Exponential backoff: 1s, 2s, 4s
+      const backoffDelay = Math.pow(2, attempt) * 1000;
+      await new Promise(resolve => setTimeout(resolve, backoffDelay));
+    } catch (error) {
+      if (attempt === maxRetries - 1) throw error;
+    }
+  }
+  throw new Error('Max retries exceeded');
+}
+```
+
+---
+
+## Monitoring & Metrics
+
+### Key Metrics to Track
+
+1. **Rate Limit Hit Rate**: Percentage of requests hitting limits
+2. **429 Response Count**: Total rate limit errors by endpoint
+3. **Top Rate Limited IPs**: Identify potential abuse patterns
+4. **False Positive Rate**: Legitimate users hitting limits
+
+### Alerting Thresholds
+
+**Warning Alerts**:
+- Rate limit hit rate > 5% on any endpoint
+- Single IP hits rate limit > 10 times in 1 hour
+
+**Critical Alerts**:
+- Rate limit hit rate > 20% (may indicate DDoS)
+- Multiple IPs hitting limits simultaneously (coordinated attack)
+
+---
+
+## Rate Limit Adjustments
+
+### Increasing Limits for Legitimate Use
+
+If you have a legitimate use case requiring higher limits:
+
+1. **Contact Support**: Describe your use case and expected volume
+2. **Verification**: We'll verify your account and usage patterns
+3. **Temporary Increase**: May grant temporary limit increase
+4. **Custom Tier**: High-volume verified accounts may get custom limits
+
+**Examples of Valid Requests**:
+- Bulk data migration project
+- Integration with external service
+- High-traffic public API client
+
+---
+
+## Technical Implementation
+
+### Architecture
+
+Rate limiting is implemented using in-memory rate limiting with:
+- **Storage**: Map-based storage (IP → {count, resetAt})
+- **Cleanup**: Periodic cleanup of expired entries (every 30 seconds)
+- **Capacity Management**: LRU eviction when map exceeds 10,000 entries
+- **Emergency Handling**: Automatic cleanup if memory pressure detected
+
+### Memory Management
+
+**Map Capacity**: 10,000 unique IPs tracked simultaneously
+**Cleanup Interval**: Every 30 seconds or half the rate limit window
+**LRU Eviction**: Removes 30% oldest entries when at capacity
+
+### Shared Middleware
+
+All edge functions use the shared rate limiter:
+
+```typescript
+import { rateLimiters, withRateLimit } from '../_shared/rateLimiter.ts';
+
+const limiter = rateLimiters.strict; // or .standard, .lenient, .perUser(n)
+
+serve(withRateLimit(async (req) => {
+  // Your edge function logic
+}, limiter, corsHeaders));
+```
+
+---
+
+## Security Considerations
+
+### IP Spoofing Protection
+
+Rate limiting uses `X-Forwarded-For` header (first IP in chain):
+- Trusts proxy headers in production (Cloudflare, Supabase)
+- Prevents IP spoofing by using first IP only
+- Falls back to `X-Real-IP` if `X-Forwarded-For` unavailable
+
+### Distributed Attacks
+
+**Current Limitation**: In-memory rate limiting is per-edge-function instance
+- Distributed attacks across multiple instances may bypass limits
+- Future: Consider distributed rate limiting (Redis, Supabase table)
+
+**Mitigation**:
+- Monitor aggregate request rates across all instances
+- Use Cloudflare rate limiting as first line of defense
+- Alert on unusual traffic patterns
+
+---
+
+## Bypassing Rate Limits
+
+**Important**: Rate limits CANNOT be bypassed, even for authenticated users.
+
+**Why No Bypass?**:
+- Prevents credential compromise from affecting system stability
+- Ensures fair usage across all users
+- Protects backend infrastructure
+
+**Moderator/Admin Considerations**:
+- Per-user rate limiting allows higher individual limits
+- Moderators have different tiers for moderation actions
+- No complete bypass to prevent abuse of compromised accounts
+
+---
+
+## Testing Rate Limits
+
+### Manual Testing
+
+```bash
+# Test upload-image rate limit (5 req/min)
+for i in {1..6}; do
+  curl -X POST https://api.thrillwiki.com/functions/v1/upload-image \
+    -H "Authorization: Bearer $TOKEN" \
+    -d '{}' && echo "Request $i succeeded"
+done
+# Expected: First 5 succeed, 6th returns 429
+```
+
+### Automated Testing
+
+```typescript
+describe('Rate Limiting', () => {
+  test('enforces strict limits on upload-image', async () => {
+    const requests = [];
+    
+    // Make 6 requests (limit is 5)
+    for (let i = 0; i < 6; i++) {
+      requests.push(fetch('/upload-image', { method: 'POST' }));
+    }
+    
+    const responses = await Promise.all(requests);
+    const statuses = responses.map(r => r.status);
+    
+    expect(statuses.filter(s => s === 200).length).toBe(5);
+    expect(statuses.filter(s => s === 429).length).toBe(1);
+  });
+});
+```
+
+---
+
+## Future Enhancements
+
+### Planned Improvements
+
+1. **Database-Backed Rate Limiting**: Persistent rate limiting across edge function instances
+2. **Dynamic Rate Limits**: Adjust limits based on system load
+3. **User Reputation System**: Higher limits for trusted users
+4. **API Keys**: Rate limiting by API key for integrations
+5. **Cost-Based Limiting**: Different limits for different operation costs
+
+---
+
+## Related Documentation
+
+- [Security Fixes (P0)](./SECURITY_FIXES_P0.md)
+- [Edge Function Development](./EDGE_FUNCTIONS.md)
+- [Error Tracking](./ERROR_TRACKING.md)
+
+---
+
+## Troubleshooting
+
+### "Rate limit exceeded" when I haven't made many requests
+
+**Possible Causes**:
+1. **Shared IP**: You're behind a NAT/VPN sharing an IP with others
+2. **Recent Requests**: Rate limit window hasn't reset yet
+3. **Multiple Tabs**: Multiple browser tabs making requests
+
+**Solutions**:
+- Wait for rate limit window to reset (shown in `Retry-After` header)
+- Check browser dev tools for unexpected background requests
+- Disable browser extensions that might be making requests
+
+### Rate limit seems inconsistent
+
+**Explanation**: Rate limiting is per-edge-function instance
+- Multiple instances may have separate rate limit counters
+- Distributed traffic may see different limits
+- This is expected behavior for in-memory rate limiting
+
+---
+
+## Contact
+
+For rate limit issues or increase requests:
+- **Support**: [Contact form on ThrillWiki]
+- **Documentation**: https://docs.thrillwiki.com
+- **Status**: https://status.thrillwiki.com