mirror of
https://github.com/pacnpal/thrilltrack-explorer.git
synced 2025-12-20 02:51:12 -05:00
Implement monitor-rate-limits edge function to compare metrics against alert configurations, trigger notifications, and record alerts; update config and groundwork for admin UI integration.
211 lines
6.1 KiB
Markdown
211 lines
6.1 KiB
Markdown
# Rate Limit Monitoring Setup
|
|
|
|
This document explains how to set up automated rate limit monitoring with alerts.
|
|
|
|
## Overview
|
|
|
|
The rate limit monitoring system consists of:
|
|
1. **Metrics Collection** - Tracks all rate limit checks in-memory
|
|
2. **Alert Configuration** - Database table with configurable thresholds
|
|
3. **Monitor Function** - Edge function that checks metrics and triggers alerts
|
|
4. **Cron Job** - Scheduled job that runs the monitor function periodically
|
|
|
|
## Setup Instructions
|
|
|
|
### Step 1: Enable Required Extensions
|
|
|
|
Run this SQL in your Supabase SQL Editor:
|
|
|
|
```sql
|
|
-- Enable pg_cron for scheduling
|
|
CREATE EXTENSION IF NOT EXISTS pg_cron;
|
|
|
|
-- Enable pg_net for HTTP requests
|
|
CREATE EXTENSION IF NOT EXISTS pg_net;
|
|
```
|
|
|
|
### Step 2: Create the Cron Job
|
|
|
|
Run this SQL to schedule the monitor to run every 5 minutes:
|
|
|
|
```sql
|
|
SELECT cron.schedule(
|
|
'monitor-rate-limits',
|
|
'*/5 * * * *', -- Every 5 minutes
|
|
$$
|
|
SELECT
|
|
net.http_post(
|
|
url:='https://api.thrillwiki.com/functions/v1/monitor-rate-limits',
|
|
headers:='{"Content-Type": "application/json", "Authorization": "Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJzdXBhYmFzZSIsInJlZiI6InlkdnRtbnJzenlicW5iY3FiZGN5Iiwicm9sZSI6ImFub24iLCJpYXQiOjE3NTgzMjYzNTYsImV4cCI6MjA3MzkwMjM1Nn0.DM3oyapd_omP5ZzIlrT0H9qBsiQBxBRgw2tYuqgXKX4"}'::jsonb,
|
|
body:='{}'::jsonb
|
|
) as request_id;
|
|
$$
|
|
);
|
|
```
|
|
|
|
### Step 3: Verify the Cron Job
|
|
|
|
Check that the cron job was created:
|
|
|
|
```sql
|
|
SELECT * FROM cron.job WHERE jobname = 'monitor-rate-limits';
|
|
```
|
|
|
|
### Step 4: Configure Alert Thresholds
|
|
|
|
Visit the admin dashboard at `/admin/rate-limit-metrics` and navigate to the "Configuration" tab to:
|
|
|
|
- Enable/disable specific alerts
|
|
- Adjust threshold values
|
|
- Modify time windows
|
|
|
|
Default configurations are automatically created:
|
|
- **Block Rate Alert**: Triggers when >50% of requests are blocked in 5 minutes
|
|
- **Total Requests Alert**: Triggers when >1000 requests/minute
|
|
- **Unique IPs Alert**: Triggers when >100 unique IPs in 5 minutes (disabled by default)
|
|
|
|
## How It Works
|
|
|
|
### 1. Metrics Collection
|
|
|
|
Every rate limit check (both allowed and blocked) is recorded with:
|
|
- Timestamp
|
|
- Function name
|
|
- Client IP
|
|
- User ID (if authenticated)
|
|
- Result (allowed/blocked)
|
|
- Remaining quota
|
|
- Rate limit tier
|
|
|
|
Metrics are stored in-memory for the last 10,000 checks.
|
|
|
|
### 2. Monitoring Process
|
|
|
|
Every 5 minutes, the monitor function:
|
|
1. Fetches enabled alert configurations from the database
|
|
2. Analyzes current metrics for each configuration's time window
|
|
3. Compares metrics against configured thresholds
|
|
4. For exceeded thresholds:
|
|
- Records the alert in `rate_limit_alerts` table
|
|
- Sends notification to moderators via Novu
|
|
- Skips if a recent unresolved alert already exists (prevents spam)
|
|
|
|
### 3. Alert Deduplication
|
|
|
|
Alerts are deduplicated using a 15-minute window. If an alert for the same configuration was triggered in the last 15 minutes and hasn't been resolved, no new alert is sent.
|
|
|
|
### 4. Notifications
|
|
|
|
Alerts are sent to all moderators via the "moderators" topic in Novu, including:
|
|
- Email notifications
|
|
- In-app notifications (if configured)
|
|
- Custom notification channels (if configured)
|
|
|
|
## Monitoring the Monitor
|
|
|
|
### Check Cron Job Status
|
|
|
|
```sql
|
|
-- View recent cron job runs
|
|
SELECT * FROM cron.job_run_details
|
|
WHERE jobid = (SELECT jobid FROM cron.job WHERE jobname = 'monitor-rate-limits')
|
|
ORDER BY start_time DESC
|
|
LIMIT 10;
|
|
```
|
|
|
|
### View Function Logs
|
|
|
|
Check the edge function logs in Supabase Dashboard:
|
|
`https://supabase.com/dashboard/project/ydvtmnrszybqnbcqbdcy/functions/monitor-rate-limits/logs`
|
|
|
|
### Test Manually
|
|
|
|
You can test the monitor function manually by calling it via HTTP:
|
|
|
|
```bash
|
|
curl -X POST https://api.thrillwiki.com/functions/v1/monitor-rate-limits \
|
|
-H "Content-Type: application/json"
|
|
```
|
|
|
|
## Adjusting the Schedule
|
|
|
|
To change how often the monitor runs, update the cron schedule:
|
|
|
|
```sql
|
|
-- Update to run every 10 minutes instead
|
|
SELECT cron.alter_job('monitor-rate-limits', schedule:='*/10 * * * *');
|
|
|
|
-- Update to run every hour
|
|
SELECT cron.alter_job('monitor-rate-limits', schedule:='0 * * * *');
|
|
|
|
-- Update to run every minute (not recommended - may generate too many alerts)
|
|
SELECT cron.alter_job('monitor-rate-limits', schedule:='* * * * *');
|
|
```
|
|
|
|
## Removing the Cron Job
|
|
|
|
If you need to disable monitoring:
|
|
|
|
```sql
|
|
SELECT cron.unschedule('monitor-rate-limits');
|
|
```
|
|
|
|
## Troubleshooting
|
|
|
|
### No Alerts Being Triggered
|
|
|
|
1. Check if any alert configurations are enabled:
|
|
```sql
|
|
SELECT * FROM rate_limit_alert_config WHERE enabled = true;
|
|
```
|
|
|
|
2. Check if metrics are being collected:
|
|
- Visit `/admin/rate-limit-metrics` and check the "Recent Activity" tab
|
|
- If no activity, the rate limiter might not be in use
|
|
|
|
3. Check monitor function logs for errors
|
|
|
|
### Too Many Alerts
|
|
|
|
- Increase threshold values in the configuration
|
|
- Increase time windows for less sensitive detection
|
|
- Disable specific alert types that are too noisy
|
|
|
|
### Monitor Not Running
|
|
|
|
1. Verify cron job exists and is active
|
|
2. Check `cron.job_run_details` for error messages
|
|
3. Verify edge function deployed successfully
|
|
4. Check network connectivity between cron scheduler and edge function
|
|
|
|
## Database Tables
|
|
|
|
### `rate_limit_alert_config`
|
|
Stores alert threshold configurations. Only admins can modify.
|
|
|
|
### `rate_limit_alerts`
|
|
Stores history of all triggered alerts. Moderators can view and resolve.
|
|
|
|
## Security
|
|
|
|
- Alert configurations can only be modified by admin/superuser roles
|
|
- Alert history is only accessible to moderators and above
|
|
- The monitor function runs without JWT verification (as a cron job)
|
|
- All database operations respect Row Level Security policies
|
|
|
|
## Performance Considerations
|
|
|
|
- In-memory metrics store max 10,000 entries (auto-trimmed)
|
|
- Metrics older than the longest configured time window are not useful
|
|
- Monitor function typically runs in <500ms
|
|
- No significant database load (simple queries on small tables)
|
|
|
|
## Future Enhancements
|
|
|
|
Possible improvements:
|
|
- Function-specific alert thresholds
|
|
- Alert aggregation (daily/weekly summaries)
|
|
- Custom notification channels per alert type
|
|
- Machine learning-based anomaly detection
|
|
- Integration with external monitoring tools (Datadog, New Relic, etc.)
|