# 🎯 Advanced ML Anomaly Detection & Automated Monitoring ## ✅ What's Now Active ### 1. Advanced ML Algorithms Your anomaly detection now uses **6 sophisticated algorithms**: #### Statistical Algorithms - **Z-Score**: Standard deviation-based outlier detection - **Moving Average**: Trend deviation detection - **Rate of Change**: Sudden change detection #### Advanced ML Algorithms (NEW!) - **Isolation Forest**: Anomaly detection based on data point isolation - Works by measuring how "isolated" a point is from the rest - Excellent for detecting outliers in multi-dimensional space - **Seasonal Decomposition**: Pattern-aware anomaly detection - Detects anomalies considering daily/weekly patterns - Configurable period (default: 24 hours) - Identifies seasonal spikes and drops - **Predictive Anomaly (LSTM-inspired)**: Time-series prediction - Uses triple exponential smoothing (Holt-Winters) - Predicts next value based on level and trend - Flags unexpected deviations from predictions - **Ensemble Method**: Multi-algorithm consensus - Combines all 5 algorithms for maximum accuracy - Requires 40%+ algorithms to agree for anomaly detection - Provides weighted confidence scores ### 2. Automated Cron Jobs **NOW RUNNING AUTOMATICALLY:** | Job | Schedule | Purpose | |-----|----------|---------| | `detect-anomalies-every-5-minutes` | Every 5 minutes (`*/5 * * * *`) | Run ML anomaly detection on all metrics | | `collect-metrics-every-minute` | Every minute (`* * * * *`) | Collect system metrics (errors, queues, API times) | | `data-retention-cleanup-daily` | Daily at 3 AM (`0 3 * * *`) | Clean up old data to manage DB size | ### 3. Algorithm Configuration Each metric can be configured with different algorithms in the `anomaly_detection_config` table: ```sql -- Example: Configure a metric to use all advanced algorithms UPDATE anomaly_detection_config SET detection_algorithms = ARRAY['z_score', 'moving_average', 'isolation_forest', 'seasonal', 'predictive', 'ensemble'] WHERE metric_name = 'api_response_time'; ``` **Algorithm Selection Guide:** - **z_score**: Best for normally distributed data, general outlier detection - **moving_average**: Best for trending data, smooth patterns - **rate_of_change**: Best for detecting sudden spikes/drops - **isolation_forest**: Best for complex multi-modal distributions - **seasonal**: Best for cyclic patterns (hourly, daily, weekly) - **predictive**: Best for time-series with clear trends - **ensemble**: Best for maximum accuracy, combines all methods ### 4. Sensitivity Tuning **Sensitivity Parameter** (in `anomaly_detection_config`): - Lower value (1.5-2.0): More sensitive, catches subtle anomalies, more false positives - Medium value (2.5-3.0): Balanced, recommended default - Higher value (3.5-5.0): Less sensitive, only major anomalies, fewer false positives ### 5. Monitoring Dashboard View all anomaly detections in the admin panel: - Navigate to `/admin/monitoring` - See the "ML Anomaly Detection" panel - Real-time updates every 30 seconds - Manual trigger button available **Anomaly Details Include:** - Algorithm used - Anomaly type (spike, drop, outlier, seasonal, etc.) - Severity (low, medium, high, critical) - Deviation score (how far from normal) - Confidence score (algorithm certainty) - Baseline vs actual values ## 🔍 How It Works ### Data Flow ``` 1. Metrics Collection (every minute) ↓ 2. Store in metric_time_series table ↓ 3. Anomaly Detection (every 5 minutes) ↓ 4. Run ML algorithms on recent data ↓ 5. Detect anomalies & calculate scores ↓ 6. Insert into anomaly_detections table ↓ 7. Auto-create system alerts (if critical/high) ↓ 8. Display in admin dashboard ↓ 9. Data Retention Cleanup (daily 3 AM) ``` ### Algorithm Comparison | Algorithm | Strength | Best For | Time Complexity | |-----------|----------|----------|-----------------| | Z-Score | Simple, fast | Normal distributions | O(n) | | Moving Average | Trend-aware | Gradual changes | O(n) | | Rate of Change | Change detection | Sudden shifts | O(1) | | Isolation Forest | Multi-dimensional | Complex patterns | O(n log n) | | Seasonal | Pattern-aware | Cyclic data | O(n) | | Predictive | Forecast-based | Time-series | O(n) | | Ensemble | Highest accuracy | Any pattern | O(n log n) | ## 📊 Current Metrics Being Monitored ### Supabase Metrics (collected every minute) - `api_error_count`: Recent API errors - `rate_limit_violations`: Rate limit blocks - `pending_submissions`: Submissions awaiting moderation - `active_incidents`: Open/investigating incidents - `unresolved_alerts`: Unresolved system alerts - `submission_approval_rate`: Approval percentage - `avg_moderation_time`: Average moderation time ### Django Metrics (collected every minute, if configured) - `error_rate`: Error log percentage - `api_response_time`: Average API response time (ms) - `celery_queue_size`: Queued Celery tasks - `database_connections`: Active DB connections - `cache_hit_rate`: Cache hit percentage ## 🎛️ Configuration ### Add New Metrics for Detection ```sql INSERT INTO anomaly_detection_config ( metric_name, metric_category, enabled, sensitivity, lookback_window_minutes, detection_algorithms, min_data_points, alert_threshold_score, auto_create_alert ) VALUES ( 'custom_metric_name', 'performance', true, 2.5, 60, ARRAY['ensemble', 'predictive', 'seasonal'], 10, 3.0, true ); ``` ### Adjust Sensitivity ```sql -- Make detection more sensitive for critical metrics UPDATE anomaly_detection_config SET sensitivity = 2.0, alert_threshold_score = 2.5 WHERE metric_name = 'api_error_count'; -- Make detection less sensitive for noisy metrics UPDATE anomaly_detection_config SET sensitivity = 4.0, alert_threshold_score = 4.0 WHERE metric_name = 'cache_hit_rate'; ``` ### Disable Detection for Specific Metrics ```sql UPDATE anomaly_detection_config SET enabled = false WHERE metric_name = 'some_metric'; ``` ## 🔧 Troubleshooting ### Check Cron Job Status ```sql SELECT jobid, jobname, schedule, active, last_run_time, last_run_status FROM cron.job_run_details WHERE jobname LIKE '%anomal%' OR jobname LIKE '%metric%' ORDER BY start_time DESC LIMIT 20; ``` ### View Recent Anomalies ```sql SELECT * FROM recent_anomalies_view ORDER BY detected_at DESC LIMIT 20; ``` ### Check Metric Collection ```sql SELECT metric_name, COUNT(*) as count, MIN(timestamp) as oldest, MAX(timestamp) as newest FROM metric_time_series WHERE timestamp > NOW() - INTERVAL '1 hour' GROUP BY metric_name ORDER BY metric_name; ``` ### Manual Anomaly Detection Trigger ```sql -- Call the edge function directly SELECT net.http_post( url := 'https://ydvtmnrszybqnbcqbdcy.supabase.co/functions/v1/detect-anomalies', headers := '{"Content-Type": "application/json", "Authorization": "Bearer YOUR_ANON_KEY"}'::jsonb, body := '{}'::jsonb ); ``` ## 📈 Performance Considerations ### Data Volume - Metrics: ~1440 records/day per metric (every minute) - With 12 metrics: ~17,280 records/day - 30-day retention: ~518,400 records - Automatic cleanup prevents unbounded growth ### Detection Performance - Each detection run processes all enabled metrics - Ensemble algorithm is most CPU-intensive - Recommended: Use ensemble only for critical metrics - Typical detection time: <5 seconds for 12 metrics ### Database Impact - Indexes on timestamp columns optimize queries - Regular cleanup maintains query performance - Consider partitioning for very high-volume deployments ## 🚀 Next Steps 1. **Monitor the Dashboard**: Visit `/admin/monitoring` to see anomalies 2. **Fine-tune Sensitivity**: Adjust based on false positive rate 3. **Add Custom Metrics**: Monitor application-specific KPIs 4. **Set Up Alerts**: Configure notifications for critical anomalies 5. **Review Weekly**: Check patterns and adjust algorithms ## 📚 Additional Resources - [Edge Function Logs](https://supabase.com/dashboard/project/ydvtmnrszybqnbcqbdcy/functions/detect-anomalies/logs) - [Cron Jobs Dashboard](https://supabase.com/dashboard/project/ydvtmnrszybqnbcqbdcy/sql/new) - Django README: `django/README_MONITORING.md`