Files
thrilltrack-explorer/RATE_LIMIT_MONITORING_SETUP.md
gpt-engineer-app[bot] 28fa2fd0d4 Monitor rate limits progress
Implement monitor-rate-limits edge function to compare metrics against alert configurations, trigger notifications, and record alerts; update config and groundwork for admin UI integration.
2025-11-11 00:19:13 +00:00

6.1 KiB

Rate Limit Monitoring Setup

This document explains how to set up automated rate limit monitoring with alerts.

Overview

The rate limit monitoring system consists of:

  1. Metrics Collection - Tracks all rate limit checks in-memory
  2. Alert Configuration - Database table with configurable thresholds
  3. Monitor Function - Edge function that checks metrics and triggers alerts
  4. Cron Job - Scheduled job that runs the monitor function periodically

Setup Instructions

Step 1: Enable Required Extensions

Run this SQL in your Supabase SQL Editor:

-- Enable pg_cron for scheduling
CREATE EXTENSION IF NOT EXISTS pg_cron;

-- Enable pg_net for HTTP requests
CREATE EXTENSION IF NOT EXISTS pg_net;

Step 2: Create the Cron Job

Run this SQL to schedule the monitor to run every 5 minutes:

SELECT cron.schedule(
  'monitor-rate-limits',
  '*/5 * * * *', -- Every 5 minutes
  $$
  SELECT
    net.http_post(
        url:='https://api.thrillwiki.com/functions/v1/monitor-rate-limits',
        headers:='{"Content-Type": "application/json", "Authorization": "Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJzdXBhYmFzZSIsInJlZiI6InlkdnRtbnJzenlicW5iY3FiZGN5Iiwicm9sZSI6ImFub24iLCJpYXQiOjE3NTgzMjYzNTYsImV4cCI6MjA3MzkwMjM1Nn0.DM3oyapd_omP5ZzIlrT0H9qBsiQBxBRgw2tYuqgXKX4"}'::jsonb,
        body:='{}'::jsonb
    ) as request_id;
  $$
);

Step 3: Verify the Cron Job

Check that the cron job was created:

SELECT * FROM cron.job WHERE jobname = 'monitor-rate-limits';

Step 4: Configure Alert Thresholds

Visit the admin dashboard at /admin/rate-limit-metrics and navigate to the "Configuration" tab to:

  • Enable/disable specific alerts
  • Adjust threshold values
  • Modify time windows

Default configurations are automatically created:

  • Block Rate Alert: Triggers when >50% of requests are blocked in 5 minutes
  • Total Requests Alert: Triggers when >1000 requests/minute
  • Unique IPs Alert: Triggers when >100 unique IPs in 5 minutes (disabled by default)

How It Works

1. Metrics Collection

Every rate limit check (both allowed and blocked) is recorded with:

  • Timestamp
  • Function name
  • Client IP
  • User ID (if authenticated)
  • Result (allowed/blocked)
  • Remaining quota
  • Rate limit tier

Metrics are stored in-memory for the last 10,000 checks.

2. Monitoring Process

Every 5 minutes, the monitor function:

  1. Fetches enabled alert configurations from the database
  2. Analyzes current metrics for each configuration's time window
  3. Compares metrics against configured thresholds
  4. For exceeded thresholds:
    • Records the alert in rate_limit_alerts table
    • Sends notification to moderators via Novu
    • Skips if a recent unresolved alert already exists (prevents spam)

3. Alert Deduplication

Alerts are deduplicated using a 15-minute window. If an alert for the same configuration was triggered in the last 15 minutes and hasn't been resolved, no new alert is sent.

4. Notifications

Alerts are sent to all moderators via the "moderators" topic in Novu, including:

  • Email notifications
  • In-app notifications (if configured)
  • Custom notification channels (if configured)

Monitoring the Monitor

Check Cron Job Status

-- View recent cron job runs
SELECT * FROM cron.job_run_details 
WHERE jobid = (SELECT jobid FROM cron.job WHERE jobname = 'monitor-rate-limits')
ORDER BY start_time DESC 
LIMIT 10;

View Function Logs

Check the edge function logs in Supabase Dashboard: https://supabase.com/dashboard/project/ydvtmnrszybqnbcqbdcy/functions/monitor-rate-limits/logs

Test Manually

You can test the monitor function manually by calling it via HTTP:

curl -X POST https://api.thrillwiki.com/functions/v1/monitor-rate-limits \
  -H "Content-Type: application/json"

Adjusting the Schedule

To change how often the monitor runs, update the cron schedule:

-- Update to run every 10 minutes instead
SELECT cron.alter_job('monitor-rate-limits', schedule:='*/10 * * * *');

-- Update to run every hour
SELECT cron.alter_job('monitor-rate-limits', schedule:='0 * * * *');

-- Update to run every minute (not recommended - may generate too many alerts)
SELECT cron.alter_job('monitor-rate-limits', schedule:='* * * * *');

Removing the Cron Job

If you need to disable monitoring:

SELECT cron.unschedule('monitor-rate-limits');

Troubleshooting

No Alerts Being Triggered

  1. Check if any alert configurations are enabled:
SELECT * FROM rate_limit_alert_config WHERE enabled = true;
  1. Check if metrics are being collected:

    • Visit /admin/rate-limit-metrics and check the "Recent Activity" tab
    • If no activity, the rate limiter might not be in use
  2. Check monitor function logs for errors

Too Many Alerts

  • Increase threshold values in the configuration
  • Increase time windows for less sensitive detection
  • Disable specific alert types that are too noisy

Monitor Not Running

  1. Verify cron job exists and is active
  2. Check cron.job_run_details for error messages
  3. Verify edge function deployed successfully
  4. Check network connectivity between cron scheduler and edge function

Database Tables

rate_limit_alert_config

Stores alert threshold configurations. Only admins can modify.

rate_limit_alerts

Stores history of all triggered alerts. Moderators can view and resolve.

Security

  • Alert configurations can only be modified by admin/superuser roles
  • Alert history is only accessible to moderators and above
  • The monitor function runs without JWT verification (as a cron job)
  • All database operations respect Row Level Security policies

Performance Considerations

  • In-memory metrics store max 10,000 entries (auto-trimmed)
  • Metrics older than the longest configured time window are not useful
  • Monitor function typically runs in <500ms
  • No significant database load (simple queries on small tables)

Future Enhancements

Possible improvements:

  • Function-specific alert thresholds
  • Alert aggregation (daily/weekly summaries)
  • Custom notification channels per alert type
  • Machine learning-based anomaly detection
  • Integration with external monitoring tools (Datadog, New Relic, etc.)