mirror of
https://github.com/pacnpal/thrilltrack-explorer.git
synced 2025-12-20 04:31:13 -05:00
8.0 KiB
8.0 KiB
🎯 Advanced ML Anomaly Detection & Automated Monitoring
✅ What's Now Active
1. Advanced ML Algorithms
Your anomaly detection now uses 6 sophisticated algorithms:
Statistical Algorithms
- Z-Score: Standard deviation-based outlier detection
- Moving Average: Trend deviation detection
- Rate of Change: Sudden change detection
Advanced ML Algorithms (NEW!)
-
Isolation Forest: Anomaly detection based on data point isolation
- Works by measuring how "isolated" a point is from the rest
- Excellent for detecting outliers in multi-dimensional space
-
Seasonal Decomposition: Pattern-aware anomaly detection
- Detects anomalies considering daily/weekly patterns
- Configurable period (default: 24 hours)
- Identifies seasonal spikes and drops
-
Predictive Anomaly (LSTM-inspired): Time-series prediction
- Uses triple exponential smoothing (Holt-Winters)
- Predicts next value based on level and trend
- Flags unexpected deviations from predictions
-
Ensemble Method: Multi-algorithm consensus
- Combines all 5 algorithms for maximum accuracy
- Requires 40%+ algorithms to agree for anomaly detection
- Provides weighted confidence scores
2. Automated Cron Jobs
NOW RUNNING AUTOMATICALLY:
| Job | Schedule | Purpose |
|---|---|---|
detect-anomalies-every-5-minutes |
Every 5 minutes (*/5 * * * *) |
Run ML anomaly detection on all metrics |
collect-metrics-every-minute |
Every minute (* * * * *) |
Collect system metrics (errors, queues, API times) |
data-retention-cleanup-daily |
Daily at 3 AM (0 3 * * *) |
Clean up old data to manage DB size |
3. Algorithm Configuration
Each metric can be configured with different algorithms in the anomaly_detection_config table:
-- Example: Configure a metric to use all advanced algorithms
UPDATE anomaly_detection_config
SET detection_algorithms = ARRAY['z_score', 'moving_average', 'isolation_forest', 'seasonal', 'predictive', 'ensemble']
WHERE metric_name = 'api_response_time';
Algorithm Selection Guide:
- z_score: Best for normally distributed data, general outlier detection
- moving_average: Best for trending data, smooth patterns
- rate_of_change: Best for detecting sudden spikes/drops
- isolation_forest: Best for complex multi-modal distributions
- seasonal: Best for cyclic patterns (hourly, daily, weekly)
- predictive: Best for time-series with clear trends
- ensemble: Best for maximum accuracy, combines all methods
4. Sensitivity Tuning
Sensitivity Parameter (in anomaly_detection_config):
- Lower value (1.5-2.0): More sensitive, catches subtle anomalies, more false positives
- Medium value (2.5-3.0): Balanced, recommended default
- Higher value (3.5-5.0): Less sensitive, only major anomalies, fewer false positives
5. Monitoring Dashboard
View all anomaly detections in the admin panel:
- Navigate to
/admin/monitoring - See the "ML Anomaly Detection" panel
- Real-time updates every 30 seconds
- Manual trigger button available
Anomaly Details Include:
- Algorithm used
- Anomaly type (spike, drop, outlier, seasonal, etc.)
- Severity (low, medium, high, critical)
- Deviation score (how far from normal)
- Confidence score (algorithm certainty)
- Baseline vs actual values
🔍 How It Works
Data Flow
1. Metrics Collection (every minute)
↓
2. Store in metric_time_series table
↓
3. Anomaly Detection (every 5 minutes)
↓
4. Run ML algorithms on recent data
↓
5. Detect anomalies & calculate scores
↓
6. Insert into anomaly_detections table
↓
7. Auto-create system alerts (if critical/high)
↓
8. Display in admin dashboard
↓
9. Data Retention Cleanup (daily 3 AM)
Algorithm Comparison
| Algorithm | Strength | Best For | Time Complexity |
|---|---|---|---|
| Z-Score | Simple, fast | Normal distributions | O(n) |
| Moving Average | Trend-aware | Gradual changes | O(n) |
| Rate of Change | Change detection | Sudden shifts | O(1) |
| Isolation Forest | Multi-dimensional | Complex patterns | O(n log n) |
| Seasonal | Pattern-aware | Cyclic data | O(n) |
| Predictive | Forecast-based | Time-series | O(n) |
| Ensemble | Highest accuracy | Any pattern | O(n log n) |
📊 Current Metrics Being Monitored
Supabase Metrics (collected every minute)
api_error_count: Recent API errorsrate_limit_violations: Rate limit blockspending_submissions: Submissions awaiting moderationactive_incidents: Open/investigating incidentsunresolved_alerts: Unresolved system alertssubmission_approval_rate: Approval percentageavg_moderation_time: Average moderation time
Django Metrics (collected every minute, if configured)
error_rate: Error log percentageapi_response_time: Average API response time (ms)celery_queue_size: Queued Celery tasksdatabase_connections: Active DB connectionscache_hit_rate: Cache hit percentage
🎛️ Configuration
Add New Metrics for Detection
INSERT INTO anomaly_detection_config (
metric_name,
metric_category,
enabled,
sensitivity,
lookback_window_minutes,
detection_algorithms,
min_data_points,
alert_threshold_score,
auto_create_alert
) VALUES (
'custom_metric_name',
'performance',
true,
2.5,
60,
ARRAY['ensemble', 'predictive', 'seasonal'],
10,
3.0,
true
);
Adjust Sensitivity
-- Make detection more sensitive for critical metrics
UPDATE anomaly_detection_config
SET sensitivity = 2.0, alert_threshold_score = 2.5
WHERE metric_name = 'api_error_count';
-- Make detection less sensitive for noisy metrics
UPDATE anomaly_detection_config
SET sensitivity = 4.0, alert_threshold_score = 4.0
WHERE metric_name = 'cache_hit_rate';
Disable Detection for Specific Metrics
UPDATE anomaly_detection_config
SET enabled = false
WHERE metric_name = 'some_metric';
🔧 Troubleshooting
Check Cron Job Status
SELECT jobid, jobname, schedule, active, last_run_time, last_run_status
FROM cron.job_run_details
WHERE jobname LIKE '%anomal%' OR jobname LIKE '%metric%'
ORDER BY start_time DESC
LIMIT 20;
View Recent Anomalies
SELECT * FROM recent_anomalies_view
ORDER BY detected_at DESC
LIMIT 20;
Check Metric Collection
SELECT metric_name, COUNT(*) as count,
MIN(timestamp) as oldest,
MAX(timestamp) as newest
FROM metric_time_series
WHERE timestamp > NOW() - INTERVAL '1 hour'
GROUP BY metric_name
ORDER BY metric_name;
Manual Anomaly Detection Trigger
-- Call the edge function directly
SELECT net.http_post(
url := 'https://ydvtmnrszybqnbcqbdcy.supabase.co/functions/v1/detect-anomalies',
headers := '{"Content-Type": "application/json", "Authorization": "Bearer YOUR_ANON_KEY"}'::jsonb,
body := '{}'::jsonb
);
📈 Performance Considerations
Data Volume
- Metrics: ~1440 records/day per metric (every minute)
- With 12 metrics: ~17,280 records/day
- 30-day retention: ~518,400 records
- Automatic cleanup prevents unbounded growth
Detection Performance
- Each detection run processes all enabled metrics
- Ensemble algorithm is most CPU-intensive
- Recommended: Use ensemble only for critical metrics
- Typical detection time: <5 seconds for 12 metrics
Database Impact
- Indexes on timestamp columns optimize queries
- Regular cleanup maintains query performance
- Consider partitioning for very high-volume deployments
🚀 Next Steps
- Monitor the Dashboard: Visit
/admin/monitoringto see anomalies - Fine-tune Sensitivity: Adjust based on false positive rate
- Add Custom Metrics: Monitor application-specific KPIs
- Set Up Alerts: Configure notifications for critical anomalies
- Review Weekly: Check patterns and adjust algorithms
📚 Additional Resources
- Edge Function Logs
- Cron Jobs Dashboard
- Django README:
django/README_MONITORING.md