mirror of
https://github.com/pacnpal/thrilltrack-explorer.git
synced 2025-12-29 13:07:07 -05:00
Compare commits
4 Commits
07fdfe34f3
...
7642ac435b
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
7642ac435b | ||
|
|
c632e559d0 | ||
|
|
12a6bfdfab | ||
|
|
915a9fe2df |
266
MONITORING_SETUP.md
Normal file
266
MONITORING_SETUP.md
Normal file
@@ -0,0 +1,266 @@
|
|||||||
|
# 🎯 Advanced ML Anomaly Detection & Automated Monitoring
|
||||||
|
|
||||||
|
## ✅ What's Now Active
|
||||||
|
|
||||||
|
### 1. Advanced ML Algorithms
|
||||||
|
|
||||||
|
Your anomaly detection now uses **6 sophisticated algorithms**:
|
||||||
|
|
||||||
|
#### Statistical Algorithms
|
||||||
|
- **Z-Score**: Standard deviation-based outlier detection
|
||||||
|
- **Moving Average**: Trend deviation detection
|
||||||
|
- **Rate of Change**: Sudden change detection
|
||||||
|
|
||||||
|
#### Advanced ML Algorithms (NEW!)
|
||||||
|
- **Isolation Forest**: Anomaly detection based on data point isolation
|
||||||
|
- Works by measuring how "isolated" a point is from the rest
|
||||||
|
- Excellent for detecting outliers in multi-dimensional space
|
||||||
|
|
||||||
|
- **Seasonal Decomposition**: Pattern-aware anomaly detection
|
||||||
|
- Detects anomalies considering daily/weekly patterns
|
||||||
|
- Configurable period (default: 24 hours)
|
||||||
|
- Identifies seasonal spikes and drops
|
||||||
|
|
||||||
|
- **Predictive Anomaly (LSTM-inspired)**: Time-series prediction
|
||||||
|
- Uses triple exponential smoothing (Holt-Winters)
|
||||||
|
- Predicts next value based on level and trend
|
||||||
|
- Flags unexpected deviations from predictions
|
||||||
|
|
||||||
|
- **Ensemble Method**: Multi-algorithm consensus
|
||||||
|
- Combines all 5 algorithms for maximum accuracy
|
||||||
|
- Requires 40%+ algorithms to agree for anomaly detection
|
||||||
|
- Provides weighted confidence scores
|
||||||
|
|
||||||
|
### 2. Automated Cron Jobs
|
||||||
|
|
||||||
|
**NOW RUNNING AUTOMATICALLY:**
|
||||||
|
|
||||||
|
| Job | Schedule | Purpose |
|
||||||
|
|-----|----------|---------|
|
||||||
|
| `detect-anomalies-every-5-minutes` | Every 5 minutes (`*/5 * * * *`) | Run ML anomaly detection on all metrics |
|
||||||
|
| `collect-metrics-every-minute` | Every minute (`* * * * *`) | Collect system metrics (errors, queues, API times) |
|
||||||
|
| `data-retention-cleanup-daily` | Daily at 3 AM (`0 3 * * *`) | Clean up old data to manage DB size |
|
||||||
|
|
||||||
|
### 3. Algorithm Configuration
|
||||||
|
|
||||||
|
Each metric can be configured with different algorithms in the `anomaly_detection_config` table:
|
||||||
|
|
||||||
|
```sql
|
||||||
|
-- Example: Configure a metric to use all advanced algorithms
|
||||||
|
UPDATE anomaly_detection_config
|
||||||
|
SET detection_algorithms = ARRAY['z_score', 'moving_average', 'isolation_forest', 'seasonal', 'predictive', 'ensemble']
|
||||||
|
WHERE metric_name = 'api_response_time';
|
||||||
|
```
|
||||||
|
|
||||||
|
**Algorithm Selection Guide:**
|
||||||
|
|
||||||
|
- **z_score**: Best for normally distributed data, general outlier detection
|
||||||
|
- **moving_average**: Best for trending data, smooth patterns
|
||||||
|
- **rate_of_change**: Best for detecting sudden spikes/drops
|
||||||
|
- **isolation_forest**: Best for complex multi-modal distributions
|
||||||
|
- **seasonal**: Best for cyclic patterns (hourly, daily, weekly)
|
||||||
|
- **predictive**: Best for time-series with clear trends
|
||||||
|
- **ensemble**: Best for maximum accuracy, combines all methods
|
||||||
|
|
||||||
|
### 4. Sensitivity Tuning
|
||||||
|
|
||||||
|
**Sensitivity Parameter** (in `anomaly_detection_config`):
|
||||||
|
- Lower value (1.5-2.0): More sensitive, catches subtle anomalies, more false positives
|
||||||
|
- Medium value (2.5-3.0): Balanced, recommended default
|
||||||
|
- Higher value (3.5-5.0): Less sensitive, only major anomalies, fewer false positives
|
||||||
|
|
||||||
|
### 5. Monitoring Dashboard
|
||||||
|
|
||||||
|
View all anomaly detections in the admin panel:
|
||||||
|
- Navigate to `/admin/monitoring`
|
||||||
|
- See the "ML Anomaly Detection" panel
|
||||||
|
- Real-time updates every 30 seconds
|
||||||
|
- Manual trigger button available
|
||||||
|
|
||||||
|
**Anomaly Details Include:**
|
||||||
|
- Algorithm used
|
||||||
|
- Anomaly type (spike, drop, outlier, seasonal, etc.)
|
||||||
|
- Severity (low, medium, high, critical)
|
||||||
|
- Deviation score (how far from normal)
|
||||||
|
- Confidence score (algorithm certainty)
|
||||||
|
- Baseline vs actual values
|
||||||
|
|
||||||
|
## 🔍 How It Works
|
||||||
|
|
||||||
|
### Data Flow
|
||||||
|
|
||||||
|
```
|
||||||
|
1. Metrics Collection (every minute)
|
||||||
|
↓
|
||||||
|
2. Store in metric_time_series table
|
||||||
|
↓
|
||||||
|
3. Anomaly Detection (every 5 minutes)
|
||||||
|
↓
|
||||||
|
4. Run ML algorithms on recent data
|
||||||
|
↓
|
||||||
|
5. Detect anomalies & calculate scores
|
||||||
|
↓
|
||||||
|
6. Insert into anomaly_detections table
|
||||||
|
↓
|
||||||
|
7. Auto-create system alerts (if critical/high)
|
||||||
|
↓
|
||||||
|
8. Display in admin dashboard
|
||||||
|
↓
|
||||||
|
9. Data Retention Cleanup (daily 3 AM)
|
||||||
|
```
|
||||||
|
|
||||||
|
### Algorithm Comparison
|
||||||
|
|
||||||
|
| Algorithm | Strength | Best For | Time Complexity |
|
||||||
|
|-----------|----------|----------|-----------------|
|
||||||
|
| Z-Score | Simple, fast | Normal distributions | O(n) |
|
||||||
|
| Moving Average | Trend-aware | Gradual changes | O(n) |
|
||||||
|
| Rate of Change | Change detection | Sudden shifts | O(1) |
|
||||||
|
| Isolation Forest | Multi-dimensional | Complex patterns | O(n log n) |
|
||||||
|
| Seasonal | Pattern-aware | Cyclic data | O(n) |
|
||||||
|
| Predictive | Forecast-based | Time-series | O(n) |
|
||||||
|
| Ensemble | Highest accuracy | Any pattern | O(n log n) |
|
||||||
|
|
||||||
|
## 📊 Current Metrics Being Monitored
|
||||||
|
|
||||||
|
### Supabase Metrics (collected every minute)
|
||||||
|
- `api_error_count`: Recent API errors
|
||||||
|
- `rate_limit_violations`: Rate limit blocks
|
||||||
|
- `pending_submissions`: Submissions awaiting moderation
|
||||||
|
- `active_incidents`: Open/investigating incidents
|
||||||
|
- `unresolved_alerts`: Unresolved system alerts
|
||||||
|
- `submission_approval_rate`: Approval percentage
|
||||||
|
- `avg_moderation_time`: Average moderation time
|
||||||
|
|
||||||
|
### Django Metrics (collected every minute, if configured)
|
||||||
|
- `error_rate`: Error log percentage
|
||||||
|
- `api_response_time`: Average API response time (ms)
|
||||||
|
- `celery_queue_size`: Queued Celery tasks
|
||||||
|
- `database_connections`: Active DB connections
|
||||||
|
- `cache_hit_rate`: Cache hit percentage
|
||||||
|
|
||||||
|
## 🎛️ Configuration
|
||||||
|
|
||||||
|
### Add New Metrics for Detection
|
||||||
|
|
||||||
|
```sql
|
||||||
|
INSERT INTO anomaly_detection_config (
|
||||||
|
metric_name,
|
||||||
|
metric_category,
|
||||||
|
enabled,
|
||||||
|
sensitivity,
|
||||||
|
lookback_window_minutes,
|
||||||
|
detection_algorithms,
|
||||||
|
min_data_points,
|
||||||
|
alert_threshold_score,
|
||||||
|
auto_create_alert
|
||||||
|
) VALUES (
|
||||||
|
'custom_metric_name',
|
||||||
|
'performance',
|
||||||
|
true,
|
||||||
|
2.5,
|
||||||
|
60,
|
||||||
|
ARRAY['ensemble', 'predictive', 'seasonal'],
|
||||||
|
10,
|
||||||
|
3.0,
|
||||||
|
true
|
||||||
|
);
|
||||||
|
```
|
||||||
|
|
||||||
|
### Adjust Sensitivity
|
||||||
|
|
||||||
|
```sql
|
||||||
|
-- Make detection more sensitive for critical metrics
|
||||||
|
UPDATE anomaly_detection_config
|
||||||
|
SET sensitivity = 2.0, alert_threshold_score = 2.5
|
||||||
|
WHERE metric_name = 'api_error_count';
|
||||||
|
|
||||||
|
-- Make detection less sensitive for noisy metrics
|
||||||
|
UPDATE anomaly_detection_config
|
||||||
|
SET sensitivity = 4.0, alert_threshold_score = 4.0
|
||||||
|
WHERE metric_name = 'cache_hit_rate';
|
||||||
|
```
|
||||||
|
|
||||||
|
### Disable Detection for Specific Metrics
|
||||||
|
|
||||||
|
```sql
|
||||||
|
UPDATE anomaly_detection_config
|
||||||
|
SET enabled = false
|
||||||
|
WHERE metric_name = 'some_metric';
|
||||||
|
```
|
||||||
|
|
||||||
|
## 🔧 Troubleshooting
|
||||||
|
|
||||||
|
### Check Cron Job Status
|
||||||
|
|
||||||
|
```sql
|
||||||
|
SELECT jobid, jobname, schedule, active, last_run_time, last_run_status
|
||||||
|
FROM cron.job_run_details
|
||||||
|
WHERE jobname LIKE '%anomal%' OR jobname LIKE '%metric%'
|
||||||
|
ORDER BY start_time DESC
|
||||||
|
LIMIT 20;
|
||||||
|
```
|
||||||
|
|
||||||
|
### View Recent Anomalies
|
||||||
|
|
||||||
|
```sql
|
||||||
|
SELECT * FROM recent_anomalies_view
|
||||||
|
ORDER BY detected_at DESC
|
||||||
|
LIMIT 20;
|
||||||
|
```
|
||||||
|
|
||||||
|
### Check Metric Collection
|
||||||
|
|
||||||
|
```sql
|
||||||
|
SELECT metric_name, COUNT(*) as count,
|
||||||
|
MIN(timestamp) as oldest,
|
||||||
|
MAX(timestamp) as newest
|
||||||
|
FROM metric_time_series
|
||||||
|
WHERE timestamp > NOW() - INTERVAL '1 hour'
|
||||||
|
GROUP BY metric_name
|
||||||
|
ORDER BY metric_name;
|
||||||
|
```
|
||||||
|
|
||||||
|
### Manual Anomaly Detection Trigger
|
||||||
|
|
||||||
|
```sql
|
||||||
|
-- Call the edge function directly
|
||||||
|
SELECT net.http_post(
|
||||||
|
url := 'https://ydvtmnrszybqnbcqbdcy.supabase.co/functions/v1/detect-anomalies',
|
||||||
|
headers := '{"Content-Type": "application/json", "Authorization": "Bearer YOUR_ANON_KEY"}'::jsonb,
|
||||||
|
body := '{}'::jsonb
|
||||||
|
);
|
||||||
|
```
|
||||||
|
|
||||||
|
## 📈 Performance Considerations
|
||||||
|
|
||||||
|
### Data Volume
|
||||||
|
- Metrics: ~1440 records/day per metric (every minute)
|
||||||
|
- With 12 metrics: ~17,280 records/day
|
||||||
|
- 30-day retention: ~518,400 records
|
||||||
|
- Automatic cleanup prevents unbounded growth
|
||||||
|
|
||||||
|
### Detection Performance
|
||||||
|
- Each detection run processes all enabled metrics
|
||||||
|
- Ensemble algorithm is most CPU-intensive
|
||||||
|
- Recommended: Use ensemble only for critical metrics
|
||||||
|
- Typical detection time: <5 seconds for 12 metrics
|
||||||
|
|
||||||
|
### Database Impact
|
||||||
|
- Indexes on timestamp columns optimize queries
|
||||||
|
- Regular cleanup maintains query performance
|
||||||
|
- Consider partitioning for very high-volume deployments
|
||||||
|
|
||||||
|
## 🚀 Next Steps
|
||||||
|
|
||||||
|
1. **Monitor the Dashboard**: Visit `/admin/monitoring` to see anomalies
|
||||||
|
2. **Fine-tune Sensitivity**: Adjust based on false positive rate
|
||||||
|
3. **Add Custom Metrics**: Monitor application-specific KPIs
|
||||||
|
4. **Set Up Alerts**: Configure notifications for critical anomalies
|
||||||
|
5. **Review Weekly**: Check patterns and adjust algorithms
|
||||||
|
|
||||||
|
## 📚 Additional Resources
|
||||||
|
|
||||||
|
- [Edge Function Logs](https://supabase.com/dashboard/project/ydvtmnrszybqnbcqbdcy/functions/detect-anomalies/logs)
|
||||||
|
- [Cron Jobs Dashboard](https://supabase.com/dashboard/project/ydvtmnrszybqnbcqbdcy/sql/new)
|
||||||
|
- Django README: `django/README_MONITORING.md`
|
||||||
@@ -136,6 +136,24 @@ SELECT cron.schedule(
|
|||||||
);
|
);
|
||||||
```
|
```
|
||||||
|
|
||||||
|
### 5. Data Retention Cleanup Setup
|
||||||
|
|
||||||
|
The `data-retention-cleanup` edge function should run daily:
|
||||||
|
|
||||||
|
```sql
|
||||||
|
SELECT cron.schedule(
|
||||||
|
'data-retention-cleanup-daily',
|
||||||
|
'0 3 * * *', -- Daily at 3:00 AM
|
||||||
|
$$
|
||||||
|
SELECT net.http_post(
|
||||||
|
url:='https://api.thrillwiki.com/functions/v1/data-retention-cleanup',
|
||||||
|
headers:='{"Content-Type": "application/json", "Authorization": "Bearer YOUR_ANON_KEY"}'::jsonb,
|
||||||
|
body:=concat('{"time": "', now(), '"}')::jsonb
|
||||||
|
) as request_id;
|
||||||
|
$$
|
||||||
|
);
|
||||||
|
```
|
||||||
|
|
||||||
## Metrics Collected
|
## Metrics Collected
|
||||||
|
|
||||||
### Django Metrics
|
### Django Metrics
|
||||||
@@ -154,6 +172,35 @@ SELECT cron.schedule(
|
|||||||
- `submission_approval_rate`: Percentage of approved submissions (workflow)
|
- `submission_approval_rate`: Percentage of approved submissions (workflow)
|
||||||
- `avg_moderation_time`: Average time to moderate in minutes (workflow)
|
- `avg_moderation_time`: Average time to moderate in minutes (workflow)
|
||||||
|
|
||||||
|
## Data Retention Policies
|
||||||
|
|
||||||
|
The system automatically cleans up old data to manage database size:
|
||||||
|
|
||||||
|
### Retention Periods
|
||||||
|
- **Metrics** (`metric_time_series`): 30 days
|
||||||
|
- **Anomaly Detections**: 30 days (resolved alerts archived after 7 days)
|
||||||
|
- **Resolved Alerts**: 90 days
|
||||||
|
- **Resolved Incidents**: 90 days
|
||||||
|
|
||||||
|
### Cleanup Functions
|
||||||
|
|
||||||
|
The following database functions manage data retention:
|
||||||
|
|
||||||
|
1. **`cleanup_old_metrics(retention_days)`**: Deletes metrics older than specified days (default: 30)
|
||||||
|
2. **`cleanup_old_anomalies(retention_days)`**: Archives resolved anomalies and deletes old unresolved ones (default: 30)
|
||||||
|
3. **`cleanup_old_alerts(retention_days)`**: Deletes old resolved alerts (default: 90)
|
||||||
|
4. **`cleanup_old_incidents(retention_days)`**: Deletes old resolved incidents (default: 90)
|
||||||
|
5. **`run_data_retention_cleanup()`**: Master function that runs all cleanup operations
|
||||||
|
|
||||||
|
### Automated Cleanup Schedule
|
||||||
|
|
||||||
|
Django Celery tasks run retention cleanup automatically:
|
||||||
|
- Full cleanup: Daily at 3:00 AM
|
||||||
|
- Metrics cleanup: Daily at 3:30 AM
|
||||||
|
- Anomaly cleanup: Daily at 4:00 AM
|
||||||
|
|
||||||
|
View retention statistics in the Admin Dashboard's Data Retention panel.
|
||||||
|
|
||||||
## Monitoring
|
## Monitoring
|
||||||
|
|
||||||
View collected metrics in the Admin Monitoring Dashboard:
|
View collected metrics in the Admin Monitoring Dashboard:
|
||||||
|
|||||||
168
django/apps/monitoring/tasks_retention.py
Normal file
168
django/apps/monitoring/tasks_retention.py
Normal file
@@ -0,0 +1,168 @@
|
|||||||
|
"""
|
||||||
|
Celery tasks for data retention and cleanup.
|
||||||
|
"""
|
||||||
|
import logging
|
||||||
|
import requests
|
||||||
|
import os
|
||||||
|
from celery import shared_task
|
||||||
|
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
SUPABASE_URL = os.environ.get('SUPABASE_URL', 'https://api.thrillwiki.com')
|
||||||
|
SUPABASE_SERVICE_KEY = os.environ.get('SUPABASE_SERVICE_ROLE_KEY')
|
||||||
|
|
||||||
|
|
||||||
|
@shared_task(bind=True, name='monitoring.run_data_retention_cleanup')
|
||||||
|
def run_data_retention_cleanup(self):
|
||||||
|
"""
|
||||||
|
Run comprehensive data retention cleanup.
|
||||||
|
Cleans up old metrics, anomaly detections, alerts, and incidents.
|
||||||
|
Runs daily at 3 AM.
|
||||||
|
"""
|
||||||
|
logger.info("Starting data retention cleanup")
|
||||||
|
|
||||||
|
if not SUPABASE_SERVICE_KEY:
|
||||||
|
logger.error("SUPABASE_SERVICE_ROLE_KEY not configured")
|
||||||
|
return {'success': False, 'error': 'Missing service key'}
|
||||||
|
|
||||||
|
try:
|
||||||
|
# Call the Supabase RPC function
|
||||||
|
headers = {
|
||||||
|
'apikey': SUPABASE_SERVICE_KEY,
|
||||||
|
'Authorization': f'Bearer {SUPABASE_SERVICE_KEY}',
|
||||||
|
'Content-Type': 'application/json',
|
||||||
|
}
|
||||||
|
|
||||||
|
response = requests.post(
|
||||||
|
f'{SUPABASE_URL}/rest/v1/rpc/run_data_retention_cleanup',
|
||||||
|
headers=headers,
|
||||||
|
timeout=60
|
||||||
|
)
|
||||||
|
|
||||||
|
if response.status_code == 200:
|
||||||
|
result = response.json()
|
||||||
|
logger.info(f"Data retention cleanup completed: {result}")
|
||||||
|
return result
|
||||||
|
else:
|
||||||
|
logger.error(f"Data retention cleanup failed: {response.status_code} - {response.text}")
|
||||||
|
return {'success': False, 'error': response.text}
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Error in data retention cleanup: {e}", exc_info=True)
|
||||||
|
raise
|
||||||
|
|
||||||
|
|
||||||
|
@shared_task(bind=True, name='monitoring.cleanup_old_metrics')
|
||||||
|
def cleanup_old_metrics(self, retention_days: int = 30):
|
||||||
|
"""
|
||||||
|
Clean up old metric time series data.
|
||||||
|
Runs daily to remove metrics older than retention period.
|
||||||
|
"""
|
||||||
|
logger.info(f"Cleaning up metrics older than {retention_days} days")
|
||||||
|
|
||||||
|
if not SUPABASE_SERVICE_KEY:
|
||||||
|
logger.error("SUPABASE_SERVICE_ROLE_KEY not configured")
|
||||||
|
return {'success': False, 'error': 'Missing service key'}
|
||||||
|
|
||||||
|
try:
|
||||||
|
headers = {
|
||||||
|
'apikey': SUPABASE_SERVICE_KEY,
|
||||||
|
'Authorization': f'Bearer {SUPABASE_SERVICE_KEY}',
|
||||||
|
'Content-Type': 'application/json',
|
||||||
|
}
|
||||||
|
|
||||||
|
response = requests.post(
|
||||||
|
f'{SUPABASE_URL}/rest/v1/rpc/cleanup_old_metrics',
|
||||||
|
headers=headers,
|
||||||
|
json={'retention_days': retention_days},
|
||||||
|
timeout=30
|
||||||
|
)
|
||||||
|
|
||||||
|
if response.status_code == 200:
|
||||||
|
deleted_count = response.json()
|
||||||
|
logger.info(f"Cleaned up {deleted_count} old metrics")
|
||||||
|
return {'success': True, 'deleted_count': deleted_count}
|
||||||
|
else:
|
||||||
|
logger.error(f"Metrics cleanup failed: {response.status_code} - {response.text}")
|
||||||
|
return {'success': False, 'error': response.text}
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Error in metrics cleanup: {e}", exc_info=True)
|
||||||
|
raise
|
||||||
|
|
||||||
|
|
||||||
|
@shared_task(bind=True, name='monitoring.cleanup_old_anomalies')
|
||||||
|
def cleanup_old_anomalies(self, retention_days: int = 30):
|
||||||
|
"""
|
||||||
|
Clean up old anomaly detections.
|
||||||
|
Archives resolved anomalies and deletes very old unresolved ones.
|
||||||
|
"""
|
||||||
|
logger.info(f"Cleaning up anomalies older than {retention_days} days")
|
||||||
|
|
||||||
|
if not SUPABASE_SERVICE_KEY:
|
||||||
|
logger.error("SUPABASE_SERVICE_ROLE_KEY not configured")
|
||||||
|
return {'success': False, 'error': 'Missing service key'}
|
||||||
|
|
||||||
|
try:
|
||||||
|
headers = {
|
||||||
|
'apikey': SUPABASE_SERVICE_KEY,
|
||||||
|
'Authorization': f'Bearer {SUPABASE_SERVICE_KEY}',
|
||||||
|
'Content-Type': 'application/json',
|
||||||
|
}
|
||||||
|
|
||||||
|
response = requests.post(
|
||||||
|
f'{SUPABASE_URL}/rest/v1/rpc/cleanup_old_anomalies',
|
||||||
|
headers=headers,
|
||||||
|
json={'retention_days': retention_days},
|
||||||
|
timeout=30
|
||||||
|
)
|
||||||
|
|
||||||
|
if response.status_code == 200:
|
||||||
|
result = response.json()
|
||||||
|
logger.info(f"Cleaned up anomalies: {result}")
|
||||||
|
return {'success': True, 'result': result}
|
||||||
|
else:
|
||||||
|
logger.error(f"Anomalies cleanup failed: {response.status_code} - {response.text}")
|
||||||
|
return {'success': False, 'error': response.text}
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Error in anomalies cleanup: {e}", exc_info=True)
|
||||||
|
raise
|
||||||
|
|
||||||
|
|
||||||
|
@shared_task(bind=True, name='monitoring.get_retention_stats')
|
||||||
|
def get_retention_stats(self):
|
||||||
|
"""
|
||||||
|
Get current data retention statistics.
|
||||||
|
Shows record counts and storage size for monitored tables.
|
||||||
|
"""
|
||||||
|
logger.info("Fetching data retention statistics")
|
||||||
|
|
||||||
|
if not SUPABASE_SERVICE_KEY:
|
||||||
|
logger.error("SUPABASE_SERVICE_ROLE_KEY not configured")
|
||||||
|
return {'success': False, 'error': 'Missing service key'}
|
||||||
|
|
||||||
|
try:
|
||||||
|
headers = {
|
||||||
|
'apikey': SUPABASE_SERVICE_KEY,
|
||||||
|
'Authorization': f'Bearer {SUPABASE_SERVICE_KEY}',
|
||||||
|
'Content-Type': 'application/json',
|
||||||
|
}
|
||||||
|
|
||||||
|
response = requests.get(
|
||||||
|
f'{SUPABASE_URL}/rest/v1/data_retention_stats',
|
||||||
|
headers=headers,
|
||||||
|
timeout=10
|
||||||
|
)
|
||||||
|
|
||||||
|
if response.status_code == 200:
|
||||||
|
stats = response.json()
|
||||||
|
logger.info(f"Retrieved retention stats for {len(stats)} tables")
|
||||||
|
return {'success': True, 'stats': stats}
|
||||||
|
else:
|
||||||
|
logger.error(f"Failed to get retention stats: {response.status_code} - {response.text}")
|
||||||
|
return {'success': False, 'error': response.text}
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Error getting retention stats: {e}", exc_info=True)
|
||||||
|
raise
|
||||||
@@ -33,6 +33,25 @@ CELERY_BEAT_SCHEDULE = {
|
|||||||
'options': {'queue': 'monitoring'}
|
'options': {'queue': 'monitoring'}
|
||||||
},
|
},
|
||||||
|
|
||||||
|
# Data retention cleanup tasks
|
||||||
|
'run-data-retention-cleanup': {
|
||||||
|
'task': 'monitoring.run_data_retention_cleanup',
|
||||||
|
'schedule': crontab(hour=3, minute=0), # Daily at 3 AM
|
||||||
|
'options': {'queue': 'maintenance'}
|
||||||
|
},
|
||||||
|
|
||||||
|
'cleanup-old-metrics': {
|
||||||
|
'task': 'monitoring.cleanup_old_metrics',
|
||||||
|
'schedule': crontab(hour=3, minute=30), # Daily at 3:30 AM
|
||||||
|
'options': {'queue': 'maintenance'}
|
||||||
|
},
|
||||||
|
|
||||||
|
'cleanup-old-anomalies': {
|
||||||
|
'task': 'monitoring.cleanup_old_anomalies',
|
||||||
|
'schedule': crontab(hour=4, minute=0), # Daily at 4 AM
|
||||||
|
'options': {'queue': 'maintenance'}
|
||||||
|
},
|
||||||
|
|
||||||
# Existing user tasks
|
# Existing user tasks
|
||||||
'cleanup-expired-tokens': {
|
'cleanup-expired-tokens': {
|
||||||
'task': 'users.cleanup_expired_tokens',
|
'task': 'users.cleanup_expired_tokens',
|
||||||
|
|||||||
161
src/components/admin/DataRetentionPanel.tsx
Normal file
161
src/components/admin/DataRetentionPanel.tsx
Normal file
@@ -0,0 +1,161 @@
|
|||||||
|
import { Card, CardContent, CardDescription, CardHeader, CardTitle } from "@/components/ui/card";
|
||||||
|
import { Button } from "@/components/ui/button";
|
||||||
|
import { Badge } from "@/components/ui/badge";
|
||||||
|
import { Trash2, Database, Clock, HardDrive, TrendingDown } from "lucide-react";
|
||||||
|
import { useRetentionStats, useRunCleanup } from "@/hooks/admin/useDataRetention";
|
||||||
|
import { formatDistanceToNow } from "date-fns";
|
||||||
|
|
||||||
|
export function DataRetentionPanel() {
|
||||||
|
const { data: stats, isLoading } = useRetentionStats();
|
||||||
|
const runCleanup = useRunCleanup();
|
||||||
|
|
||||||
|
if (isLoading) {
|
||||||
|
return (
|
||||||
|
<Card>
|
||||||
|
<CardHeader>
|
||||||
|
<CardTitle>Data Retention</CardTitle>
|
||||||
|
<CardDescription>Loading retention statistics...</CardDescription>
|
||||||
|
</CardHeader>
|
||||||
|
</Card>
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
const totalRecords = stats?.reduce((sum, s) => sum + s.total_records, 0) || 0;
|
||||||
|
const totalSize = stats?.reduce((sum, s) => {
|
||||||
|
const size = s.table_size.replace(/[^0-9.]/g, '');
|
||||||
|
return sum + parseFloat(size);
|
||||||
|
}, 0) || 0;
|
||||||
|
|
||||||
|
return (
|
||||||
|
<Card>
|
||||||
|
<CardHeader>
|
||||||
|
<div className="flex items-center justify-between">
|
||||||
|
<div>
|
||||||
|
<CardTitle className="flex items-center gap-2">
|
||||||
|
<Database className="h-5 w-5" />
|
||||||
|
Data Retention Management
|
||||||
|
</CardTitle>
|
||||||
|
<CardDescription>
|
||||||
|
Automatic cleanup of old metrics and monitoring data
|
||||||
|
</CardDescription>
|
||||||
|
</div>
|
||||||
|
<Button
|
||||||
|
onClick={() => runCleanup.mutate()}
|
||||||
|
disabled={runCleanup.isPending}
|
||||||
|
variant="destructive"
|
||||||
|
size="sm"
|
||||||
|
>
|
||||||
|
<Trash2 className="h-4 w-4 mr-2" />
|
||||||
|
Run Cleanup Now
|
||||||
|
</Button>
|
||||||
|
</div>
|
||||||
|
</CardHeader>
|
||||||
|
<CardContent className="space-y-6">
|
||||||
|
{/* Summary Stats */}
|
||||||
|
<div className="grid gap-4 md:grid-cols-3">
|
||||||
|
<div className="space-y-2">
|
||||||
|
<div className="flex items-center gap-2 text-sm text-muted-foreground">
|
||||||
|
<Database className="h-4 w-4" />
|
||||||
|
Total Records
|
||||||
|
</div>
|
||||||
|
<div className="text-2xl font-bold">{totalRecords.toLocaleString()}</div>
|
||||||
|
</div>
|
||||||
|
<div className="space-y-2">
|
||||||
|
<div className="flex items-center gap-2 text-sm text-muted-foreground">
|
||||||
|
<HardDrive className="h-4 w-4" />
|
||||||
|
Total Size
|
||||||
|
</div>
|
||||||
|
<div className="text-2xl font-bold">{totalSize.toFixed(1)} MB</div>
|
||||||
|
</div>
|
||||||
|
<div className="space-y-2">
|
||||||
|
<div className="flex items-center gap-2 text-sm text-muted-foreground">
|
||||||
|
<TrendingDown className="h-4 w-4" />
|
||||||
|
Tables Monitored
|
||||||
|
</div>
|
||||||
|
<div className="text-2xl font-bold">{stats?.length || 0}</div>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
{/* Retention Policies */}
|
||||||
|
<div>
|
||||||
|
<h3 className="font-semibold mb-3">Retention Policies</h3>
|
||||||
|
<div className="space-y-2 text-sm">
|
||||||
|
<div className="flex justify-between items-center p-2 bg-muted/50 rounded">
|
||||||
|
<span>Metrics (metric_time_series)</span>
|
||||||
|
<Badge variant="outline">30 days</Badge>
|
||||||
|
</div>
|
||||||
|
<div className="flex justify-between items-center p-2 bg-muted/50 rounded">
|
||||||
|
<span>Anomaly Detections</span>
|
||||||
|
<Badge variant="outline">30 days</Badge>
|
||||||
|
</div>
|
||||||
|
<div className="flex justify-between items-center p-2 bg-muted/50 rounded">
|
||||||
|
<span>Resolved Alerts</span>
|
||||||
|
<Badge variant="outline">90 days</Badge>
|
||||||
|
</div>
|
||||||
|
<div className="flex justify-between items-center p-2 bg-muted/50 rounded">
|
||||||
|
<span>Resolved Incidents</span>
|
||||||
|
<Badge variant="outline">90 days</Badge>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
{/* Table Statistics */}
|
||||||
|
<div>
|
||||||
|
<h3 className="font-semibold mb-3">Storage Details</h3>
|
||||||
|
<div className="space-y-3">
|
||||||
|
{stats?.map((stat) => (
|
||||||
|
<div
|
||||||
|
key={stat.table_name}
|
||||||
|
className="border rounded-lg p-3 space-y-2"
|
||||||
|
>
|
||||||
|
<div className="flex items-center justify-between">
|
||||||
|
<span className="font-medium">{stat.table_name}</span>
|
||||||
|
<Badge variant="secondary">{stat.table_size}</Badge>
|
||||||
|
</div>
|
||||||
|
<div className="grid grid-cols-3 gap-2 text-xs text-muted-foreground">
|
||||||
|
<div>
|
||||||
|
<div>Total</div>
|
||||||
|
<div className="font-medium text-foreground">
|
||||||
|
{stat.total_records.toLocaleString()}
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
<div>
|
||||||
|
<div>Last 7 days</div>
|
||||||
|
<div className="font-medium text-foreground">
|
||||||
|
{stat.last_7_days.toLocaleString()}
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
<div>
|
||||||
|
<div>Last 30 days</div>
|
||||||
|
<div className="font-medium text-foreground">
|
||||||
|
{stat.last_30_days.toLocaleString()}
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
{stat.oldest_record && (
|
||||||
|
<div className="flex items-center gap-1 text-xs text-muted-foreground">
|
||||||
|
<Clock className="h-3 w-3" />
|
||||||
|
Oldest:{" "}
|
||||||
|
{formatDistanceToNow(new Date(stat.oldest_record), {
|
||||||
|
addSuffix: true,
|
||||||
|
})}
|
||||||
|
</div>
|
||||||
|
)}
|
||||||
|
</div>
|
||||||
|
))}
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
{/* Cleanup Schedule */}
|
||||||
|
<div className="bg-muted/50 rounded-lg p-4 space-y-2">
|
||||||
|
<h3 className="font-semibold text-sm">Automated Cleanup Schedule</h3>
|
||||||
|
<div className="space-y-1 text-sm text-muted-foreground">
|
||||||
|
<div>• Full cleanup runs daily at 3:00 AM</div>
|
||||||
|
<div>• Metrics cleanup at 3:30 AM</div>
|
||||||
|
<div>• Anomaly cleanup at 4:00 AM</div>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
</CardContent>
|
||||||
|
</Card>
|
||||||
|
);
|
||||||
|
}
|
||||||
134
src/hooks/admin/useDataRetention.ts
Normal file
134
src/hooks/admin/useDataRetention.ts
Normal file
@@ -0,0 +1,134 @@
|
|||||||
|
import { useQuery, useMutation, useQueryClient } from "@tanstack/react-query";
|
||||||
|
import { supabase } from "@/integrations/supabase/client";
|
||||||
|
import { toast } from "sonner";
|
||||||
|
|
||||||
|
interface RetentionStats {
|
||||||
|
table_name: string;
|
||||||
|
total_records: number;
|
||||||
|
last_7_days: number;
|
||||||
|
last_30_days: number;
|
||||||
|
oldest_record: string;
|
||||||
|
newest_record: string;
|
||||||
|
table_size: string;
|
||||||
|
}
|
||||||
|
|
||||||
|
interface CleanupResult {
|
||||||
|
success: boolean;
|
||||||
|
cleanup_results: {
|
||||||
|
metrics_deleted: number;
|
||||||
|
anomalies_archived: number;
|
||||||
|
anomalies_deleted: number;
|
||||||
|
alerts_deleted: number;
|
||||||
|
incidents_deleted: number;
|
||||||
|
};
|
||||||
|
timestamp: string;
|
||||||
|
}
|
||||||
|
|
||||||
|
export function useRetentionStats() {
|
||||||
|
return useQuery({
|
||||||
|
queryKey: ["dataRetentionStats"],
|
||||||
|
queryFn: async () => {
|
||||||
|
const { data, error } = await supabase
|
||||||
|
.from("data_retention_stats")
|
||||||
|
.select("*");
|
||||||
|
|
||||||
|
if (error) throw error;
|
||||||
|
return data as RetentionStats[];
|
||||||
|
},
|
||||||
|
refetchInterval: 60000, // Refetch every minute
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
export function useRunCleanup() {
|
||||||
|
const queryClient = useQueryClient();
|
||||||
|
|
||||||
|
return useMutation({
|
||||||
|
mutationFn: async () => {
|
||||||
|
const { data, error } = await supabase.functions.invoke(
|
||||||
|
"data-retention-cleanup"
|
||||||
|
);
|
||||||
|
|
||||||
|
if (error) throw error;
|
||||||
|
return data as CleanupResult;
|
||||||
|
},
|
||||||
|
onSuccess: (data) => {
|
||||||
|
const results = data.cleanup_results;
|
||||||
|
const total =
|
||||||
|
results.metrics_deleted +
|
||||||
|
results.anomalies_archived +
|
||||||
|
results.anomalies_deleted +
|
||||||
|
results.alerts_deleted +
|
||||||
|
results.incidents_deleted;
|
||||||
|
|
||||||
|
toast.success(
|
||||||
|
`Cleanup completed: ${total} records removed`,
|
||||||
|
{
|
||||||
|
description: `Metrics: ${results.metrics_deleted}, Anomalies: ${results.anomalies_deleted}, Alerts: ${results.alerts_deleted}`,
|
||||||
|
}
|
||||||
|
);
|
||||||
|
|
||||||
|
// Invalidate relevant queries
|
||||||
|
queryClient.invalidateQueries({ queryKey: ["dataRetentionStats"] });
|
||||||
|
queryClient.invalidateQueries({ queryKey: ["anomalyDetections"] });
|
||||||
|
queryClient.invalidateQueries({ queryKey: ["systemAlerts"] });
|
||||||
|
},
|
||||||
|
onError: (error: Error) => {
|
||||||
|
toast.error("Failed to run cleanup", {
|
||||||
|
description: error.message,
|
||||||
|
});
|
||||||
|
},
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
export function useCleanupMetrics() {
|
||||||
|
const queryClient = useQueryClient();
|
||||||
|
|
||||||
|
return useMutation({
|
||||||
|
mutationFn: async (retentionDays: number = 30) => {
|
||||||
|
const { data, error } = await supabase.rpc("cleanup_old_metrics", {
|
||||||
|
retention_days: retentionDays,
|
||||||
|
});
|
||||||
|
|
||||||
|
if (error) throw error;
|
||||||
|
return data;
|
||||||
|
},
|
||||||
|
onSuccess: (deletedCount) => {
|
||||||
|
toast.success(`Cleaned up ${deletedCount} old metrics`);
|
||||||
|
queryClient.invalidateQueries({ queryKey: ["dataRetentionStats"] });
|
||||||
|
},
|
||||||
|
onError: (error: Error) => {
|
||||||
|
toast.error("Failed to cleanup metrics", {
|
||||||
|
description: error.message,
|
||||||
|
});
|
||||||
|
},
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
export function useCleanupAnomalies() {
|
||||||
|
const queryClient = useQueryClient();
|
||||||
|
|
||||||
|
return useMutation({
|
||||||
|
mutationFn: async (retentionDays: number = 30) => {
|
||||||
|
const { data, error } = await supabase.rpc("cleanup_old_anomalies", {
|
||||||
|
retention_days: retentionDays,
|
||||||
|
});
|
||||||
|
|
||||||
|
if (error) throw error;
|
||||||
|
return data;
|
||||||
|
},
|
||||||
|
onSuccess: (result) => {
|
||||||
|
// Result is returned as an array with one element
|
||||||
|
const cleanupResult = Array.isArray(result) ? result[0] : result;
|
||||||
|
toast.success(
|
||||||
|
`Cleaned up anomalies: ${cleanupResult.archived_count} archived, ${cleanupResult.deleted_count} deleted`
|
||||||
|
);
|
||||||
|
queryClient.invalidateQueries({ queryKey: ["dataRetentionStats"] });
|
||||||
|
queryClient.invalidateQueries({ queryKey: ["anomalyDetections"] });
|
||||||
|
},
|
||||||
|
onError: (error: Error) => {
|
||||||
|
toast.error("Failed to cleanup anomalies", {
|
||||||
|
description: error.message,
|
||||||
|
});
|
||||||
|
},
|
||||||
|
});
|
||||||
|
}
|
||||||
@@ -10,6 +10,7 @@ import { trackRequest } from './requestTracking';
|
|||||||
import { getErrorMessage } from './errorHandler';
|
import { getErrorMessage } from './errorHandler';
|
||||||
import { withRetry, isRetryableError, type RetryOptions } from './retryHelpers';
|
import { withRetry, isRetryableError, type RetryOptions } from './retryHelpers';
|
||||||
import { breadcrumb } from './errorBreadcrumbs';
|
import { breadcrumb } from './errorBreadcrumbs';
|
||||||
|
import { logger } from './logger';
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Invoke a Supabase edge function with request tracking
|
* Invoke a Supabase edge function with request tracking
|
||||||
@@ -149,9 +150,31 @@ export async function invokeWithTracking<T = any>(
|
|||||||
}
|
}
|
||||||
|
|
||||||
const errorMessage = getErrorMessage(error);
|
const errorMessage = getErrorMessage(error);
|
||||||
|
|
||||||
|
// Detect CORS errors specifically
|
||||||
|
const isCorsError = errorMessage.toLowerCase().includes('cors') ||
|
||||||
|
errorMessage.toLowerCase().includes('cross-origin') ||
|
||||||
|
errorMessage.toLowerCase().includes('failed to send') ||
|
||||||
|
(error instanceof TypeError && errorMessage.toLowerCase().includes('failed to fetch'));
|
||||||
|
|
||||||
|
// Enhanced error logging
|
||||||
|
logger.error('[EdgeFunctionTracking] Edge function invocation failed', {
|
||||||
|
functionName,
|
||||||
|
error: errorMessage,
|
||||||
|
errorType: isCorsError ? 'CORS/Network' : (error as any)?.name || 'Unknown',
|
||||||
|
attempts: attemptCount,
|
||||||
|
isCorsError,
|
||||||
|
debugHint: isCorsError ? 'Browser blocked request - verify CORS headers allow X-Idempotency-Key or check network connectivity' : undefined,
|
||||||
|
status: (error as any)?.status,
|
||||||
|
});
|
||||||
|
|
||||||
return {
|
return {
|
||||||
data: null,
|
data: null,
|
||||||
error: { message: errorMessage, status: (error as any)?.status },
|
error: {
|
||||||
|
message: errorMessage,
|
||||||
|
status: (error as any)?.status,
|
||||||
|
isCorsError,
|
||||||
|
},
|
||||||
requestId: 'unknown',
|
requestId: 'unknown',
|
||||||
duration: 0,
|
duration: 0,
|
||||||
attempts: attemptCount,
|
attempts: attemptCount,
|
||||||
|
|||||||
@@ -38,12 +38,24 @@ export function isSupabaseConnectionError(error: unknown): boolean {
|
|||||||
|
|
||||||
// Database connection errors (08xxx codes)
|
// Database connection errors (08xxx codes)
|
||||||
if (supabaseError.code?.startsWith('08')) return true;
|
if (supabaseError.code?.startsWith('08')) return true;
|
||||||
|
|
||||||
|
// Check message for CORS and connectivity keywords
|
||||||
|
const message = supabaseError.message?.toLowerCase() || '';
|
||||||
|
if (message.includes('cors') ||
|
||||||
|
message.includes('cross-origin') ||
|
||||||
|
message.includes('failed to send')) {
|
||||||
|
return true;
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
// Network fetch errors
|
// Network fetch errors
|
||||||
if (error instanceof TypeError) {
|
if (error instanceof TypeError) {
|
||||||
const message = error.message.toLowerCase();
|
const message = error.message.toLowerCase();
|
||||||
if (message.includes('fetch') || message.includes('network') || message.includes('failed to fetch')) {
|
if (message.includes('fetch') ||
|
||||||
|
message.includes('network') ||
|
||||||
|
message.includes('failed to fetch') ||
|
||||||
|
message.includes('cors') ||
|
||||||
|
message.includes('cross-origin')) {
|
||||||
return true;
|
return true;
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@@ -61,7 +73,15 @@ export const handleError = (
|
|||||||
|
|
||||||
// Check if this is a connection error and dispatch event
|
// Check if this is a connection error and dispatch event
|
||||||
if (isSupabaseConnectionError(error)) {
|
if (isSupabaseConnectionError(error)) {
|
||||||
window.dispatchEvent(new CustomEvent('api-connectivity-down'));
|
const errorMsg = getErrorMessage(error).toLowerCase();
|
||||||
|
const isCors = errorMsg.includes('cors') || errorMsg.includes('cross-origin');
|
||||||
|
|
||||||
|
window.dispatchEvent(new CustomEvent('api-connectivity-down', {
|
||||||
|
detail: {
|
||||||
|
isCorsError: isCors,
|
||||||
|
error: errorMsg,
|
||||||
|
}
|
||||||
|
}));
|
||||||
}
|
}
|
||||||
|
|
||||||
// Enhanced error message and stack extraction
|
// Enhanced error message and stack extraction
|
||||||
@@ -132,6 +152,9 @@ export const handleError = (
|
|||||||
}
|
}
|
||||||
|
|
||||||
// Log to console/monitoring with enhanced debugging
|
// Log to console/monitoring with enhanced debugging
|
||||||
|
const isCorsError = errorMessage.toLowerCase().includes('cors') ||
|
||||||
|
errorMessage.toLowerCase().includes('cross-origin') ||
|
||||||
|
errorMessage.toLowerCase().includes('failed to send');
|
||||||
|
|
||||||
logger.error('Error occurred', {
|
logger.error('Error occurred', {
|
||||||
...context,
|
...context,
|
||||||
@@ -144,6 +167,8 @@ export const handleError = (
|
|||||||
hasStack: !!stack,
|
hasStack: !!stack,
|
||||||
isSyntheticStack: !!(error && typeof error === 'object' && !(error instanceof Error) && stack),
|
isSyntheticStack: !!(error && typeof error === 'object' && !(error instanceof Error) && stack),
|
||||||
supabaseError: supabaseErrorDetails,
|
supabaseError: supabaseErrorDetails,
|
||||||
|
isCorsError,
|
||||||
|
debugHint: isCorsError ? 'Browser blocked request - check CORS headers or network connectivity' : undefined,
|
||||||
});
|
});
|
||||||
|
|
||||||
// Additional debug logging when stack is missing
|
// Additional debug logging when stack is missing
|
||||||
|
|||||||
@@ -96,5 +96,6 @@ export const queryKeys = {
|
|||||||
incidents: (status?: string) => ['monitoring', 'incidents', status] as const,
|
incidents: (status?: string) => ['monitoring', 'incidents', status] as const,
|
||||||
incidentDetails: (incidentId: string) => ['monitoring', 'incident-details', incidentId] as const,
|
incidentDetails: (incidentId: string) => ['monitoring', 'incident-details', incidentId] as const,
|
||||||
anomalyDetections: () => ['monitoring', 'anomaly-detections'] as const,
|
anomalyDetections: () => ['monitoring', 'anomaly-detections'] as const,
|
||||||
|
dataRetentionStats: () => ['monitoring', 'data-retention-stats'] as const,
|
||||||
},
|
},
|
||||||
} as const;
|
} as const;
|
||||||
|
|||||||
@@ -7,6 +7,7 @@ import { GroupedAlertsPanel } from '@/components/admin/GroupedAlertsPanel';
|
|||||||
import { CorrelatedAlertsPanel } from '@/components/admin/CorrelatedAlertsPanel';
|
import { CorrelatedAlertsPanel } from '@/components/admin/CorrelatedAlertsPanel';
|
||||||
import { IncidentsPanel } from '@/components/admin/IncidentsPanel';
|
import { IncidentsPanel } from '@/components/admin/IncidentsPanel';
|
||||||
import { AnomalyDetectionPanel } from '@/components/admin/AnomalyDetectionPanel';
|
import { AnomalyDetectionPanel } from '@/components/admin/AnomalyDetectionPanel';
|
||||||
|
import { DataRetentionPanel } from '@/components/admin/DataRetentionPanel';
|
||||||
import { MonitoringQuickStats } from '@/components/admin/MonitoringQuickStats';
|
import { MonitoringQuickStats } from '@/components/admin/MonitoringQuickStats';
|
||||||
import { RecentActivityTimeline } from '@/components/admin/RecentActivityTimeline';
|
import { RecentActivityTimeline } from '@/components/admin/RecentActivityTimeline';
|
||||||
import { MonitoringNavCards } from '@/components/admin/MonitoringNavCards';
|
import { MonitoringNavCards } from '@/components/admin/MonitoringNavCards';
|
||||||
@@ -150,6 +151,9 @@ export default function MonitoringOverview() {
|
|||||||
isLoading={anomalies.isLoading}
|
isLoading={anomalies.isLoading}
|
||||||
/>
|
/>
|
||||||
|
|
||||||
|
{/* Data Retention Management */}
|
||||||
|
<DataRetentionPanel />
|
||||||
|
|
||||||
{/* Quick Stats Grid */}
|
{/* Quick Stats Grid */}
|
||||||
<MonitoringQuickStats
|
<MonitoringQuickStats
|
||||||
systemHealth={systemHealth.data ?? undefined}
|
systemHealth={systemHealth.data ?? undefined}
|
||||||
|
|||||||
@@ -9,6 +9,7 @@ const STANDARD_HEADERS = [
|
|||||||
'x-client-info',
|
'x-client-info',
|
||||||
'apikey',
|
'apikey',
|
||||||
'content-type',
|
'content-type',
|
||||||
|
'x-idempotency-key',
|
||||||
];
|
];
|
||||||
|
|
||||||
// Tracing headers for distributed tracing and request tracking
|
// Tracing headers for distributed tracing and request tracking
|
||||||
@@ -36,6 +37,7 @@ export const corsHeaders = {
|
|||||||
export const corsHeadersWithTracing = {
|
export const corsHeadersWithTracing = {
|
||||||
'Access-Control-Allow-Origin': '*',
|
'Access-Control-Allow-Origin': '*',
|
||||||
'Access-Control-Allow-Headers': ALL_HEADERS.join(', '),
|
'Access-Control-Allow-Headers': ALL_HEADERS.join(', '),
|
||||||
|
'Access-Control-Allow-Methods': 'GET, POST, PUT, DELETE, PATCH, OPTIONS',
|
||||||
};
|
};
|
||||||
|
|
||||||
/**
|
/**
|
||||||
|
|||||||
48
supabase/functions/data-retention-cleanup/index.ts
Normal file
48
supabase/functions/data-retention-cleanup/index.ts
Normal file
@@ -0,0 +1,48 @@
|
|||||||
|
import { createClient } from 'https://esm.sh/@supabase/supabase-js@2.57.4';
|
||||||
|
|
||||||
|
const corsHeaders = {
|
||||||
|
'Access-Control-Allow-Origin': '*',
|
||||||
|
'Access-Control-Allow-Headers': 'authorization, x-client-info, apikey, content-type',
|
||||||
|
};
|
||||||
|
|
||||||
|
Deno.serve(async (req) => {
|
||||||
|
if (req.method === 'OPTIONS') {
|
||||||
|
return new Response(null, { headers: corsHeaders });
|
||||||
|
}
|
||||||
|
|
||||||
|
try {
|
||||||
|
const supabaseUrl = Deno.env.get('SUPABASE_URL')!;
|
||||||
|
const supabaseKey = Deno.env.get('SUPABASE_SERVICE_ROLE_KEY')!;
|
||||||
|
const supabase = createClient(supabaseUrl, supabaseKey);
|
||||||
|
|
||||||
|
console.log('Starting data retention cleanup...');
|
||||||
|
|
||||||
|
// Call the master cleanup function
|
||||||
|
const { data, error } = await supabase.rpc('run_data_retention_cleanup');
|
||||||
|
|
||||||
|
if (error) {
|
||||||
|
console.error('Error running data retention cleanup:', error);
|
||||||
|
throw error;
|
||||||
|
}
|
||||||
|
|
||||||
|
console.log('Data retention cleanup completed:', data);
|
||||||
|
|
||||||
|
return new Response(
|
||||||
|
JSON.stringify({
|
||||||
|
success: true,
|
||||||
|
cleanup_results: data.cleanup_results,
|
||||||
|
timestamp: data.timestamp,
|
||||||
|
}),
|
||||||
|
{ headers: { ...corsHeaders, 'Content-Type': 'application/json' } }
|
||||||
|
);
|
||||||
|
} catch (error) {
|
||||||
|
console.error('Error in data-retention-cleanup function:', error);
|
||||||
|
return new Response(
|
||||||
|
JSON.stringify({ error: error.message }),
|
||||||
|
{
|
||||||
|
status: 500,
|
||||||
|
headers: { ...corsHeaders, 'Content-Type': 'application/json' },
|
||||||
|
}
|
||||||
|
);
|
||||||
|
}
|
||||||
|
});
|
||||||
@@ -32,8 +32,181 @@ interface AnomalyResult {
|
|||||||
anomalyValue: number;
|
anomalyValue: number;
|
||||||
}
|
}
|
||||||
|
|
||||||
// Statistical anomaly detection algorithms
|
// Advanced ML-based anomaly detection algorithms
|
||||||
class AnomalyDetector {
|
class AnomalyDetector {
|
||||||
|
// Isolation Forest approximation: Detects outliers based on isolation score
|
||||||
|
static isolationForest(data: number[], currentValue: number, sensitivity: number = 0.6): AnomalyResult {
|
||||||
|
if (data.length < 10) {
|
||||||
|
return { isAnomaly: false, anomalyType: 'none', deviationScore: 0, confidenceScore: 0, algorithm: 'isolation_forest', baselineValue: currentValue, anomalyValue: currentValue };
|
||||||
|
}
|
||||||
|
|
||||||
|
// Calculate isolation score (simplified version)
|
||||||
|
// Based on how different the value is from random samples
|
||||||
|
const samples = 20;
|
||||||
|
let isolationScore = 0;
|
||||||
|
|
||||||
|
for (let i = 0; i < samples; i++) {
|
||||||
|
const randomSample = data[Math.floor(Math.random() * data.length)];
|
||||||
|
const distance = Math.abs(currentValue - randomSample);
|
||||||
|
isolationScore += distance;
|
||||||
|
}
|
||||||
|
|
||||||
|
isolationScore = isolationScore / samples;
|
||||||
|
|
||||||
|
// Normalize by standard deviation
|
||||||
|
const mean = data.reduce((sum, val) => sum + val, 0) / data.length;
|
||||||
|
const variance = data.reduce((sum, val) => sum + Math.pow(val - mean, 2), 0) / data.length;
|
||||||
|
const stdDev = Math.sqrt(variance);
|
||||||
|
|
||||||
|
const normalizedScore = stdDev > 0 ? isolationScore / stdDev : 0;
|
||||||
|
const isAnomaly = normalizedScore > (1 / sensitivity);
|
||||||
|
|
||||||
|
return {
|
||||||
|
isAnomaly,
|
||||||
|
anomalyType: currentValue > mean ? 'outlier_high' : 'outlier_low',
|
||||||
|
deviationScore: normalizedScore,
|
||||||
|
confidenceScore: Math.min(normalizedScore / 5, 1),
|
||||||
|
algorithm: 'isolation_forest',
|
||||||
|
baselineValue: mean,
|
||||||
|
anomalyValue: currentValue,
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
|
// Seasonal decomposition: Detects anomalies considering seasonal patterns
|
||||||
|
static seasonalDecomposition(data: number[], currentValue: number, sensitivity: number = 2.5, period: number = 24): AnomalyResult {
|
||||||
|
if (data.length < period * 2) {
|
||||||
|
return { isAnomaly: false, anomalyType: 'none', deviationScore: 0, confidenceScore: 0, algorithm: 'seasonal', baselineValue: currentValue, anomalyValue: currentValue };
|
||||||
|
}
|
||||||
|
|
||||||
|
// Calculate seasonal component (average of values at same position in period)
|
||||||
|
const position = data.length % period;
|
||||||
|
const seasonalValues: number[] = [];
|
||||||
|
|
||||||
|
for (let i = position; i < data.length; i += period) {
|
||||||
|
seasonalValues.push(data[i]);
|
||||||
|
}
|
||||||
|
|
||||||
|
const seasonalMean = seasonalValues.reduce((sum, val) => sum + val, 0) / seasonalValues.length;
|
||||||
|
const seasonalStdDev = Math.sqrt(
|
||||||
|
seasonalValues.reduce((sum, val) => sum + Math.pow(val - seasonalMean, 2), 0) / seasonalValues.length
|
||||||
|
);
|
||||||
|
|
||||||
|
if (seasonalStdDev === 0) {
|
||||||
|
return { isAnomaly: false, anomalyType: 'none', deviationScore: 0, confidenceScore: 0, algorithm: 'seasonal', baselineValue: seasonalMean, anomalyValue: currentValue };
|
||||||
|
}
|
||||||
|
|
||||||
|
const deviationScore = Math.abs(currentValue - seasonalMean) / seasonalStdDev;
|
||||||
|
const isAnomaly = deviationScore > sensitivity;
|
||||||
|
|
||||||
|
return {
|
||||||
|
isAnomaly,
|
||||||
|
anomalyType: currentValue > seasonalMean ? 'seasonal_spike' : 'seasonal_drop',
|
||||||
|
deviationScore,
|
||||||
|
confidenceScore: Math.min(deviationScore / (sensitivity * 2), 1),
|
||||||
|
algorithm: 'seasonal',
|
||||||
|
baselineValue: seasonalMean,
|
||||||
|
anomalyValue: currentValue,
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
|
// LSTM-inspired prediction: Simple exponential smoothing with trend detection
|
||||||
|
static predictiveAnomaly(data: number[], currentValue: number, sensitivity: number = 2.5): AnomalyResult {
|
||||||
|
if (data.length < 5) {
|
||||||
|
return { isAnomaly: false, anomalyType: 'none', deviationScore: 0, confidenceScore: 0, algorithm: 'predictive', baselineValue: currentValue, anomalyValue: currentValue };
|
||||||
|
}
|
||||||
|
|
||||||
|
// Triple exponential smoothing (Holt-Winters approximation)
|
||||||
|
const alpha = 0.3; // Level smoothing
|
||||||
|
const beta = 0.1; // Trend smoothing
|
||||||
|
|
||||||
|
let level = data[0];
|
||||||
|
let trend = data[1] - data[0];
|
||||||
|
|
||||||
|
// Calculate smoothed values
|
||||||
|
for (let i = 1; i < data.length; i++) {
|
||||||
|
const prevLevel = level;
|
||||||
|
level = alpha * data[i] + (1 - alpha) * (level + trend);
|
||||||
|
trend = beta * (level - prevLevel) + (1 - beta) * trend;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Predict next value
|
||||||
|
const prediction = level + trend;
|
||||||
|
|
||||||
|
// Calculate prediction error
|
||||||
|
const recentData = data.slice(-10);
|
||||||
|
const predictionErrors: number[] = [];
|
||||||
|
|
||||||
|
for (let i = 1; i < recentData.length; i++) {
|
||||||
|
const simplePrediction = recentData[i - 1];
|
||||||
|
predictionErrors.push(Math.abs(recentData[i] - simplePrediction));
|
||||||
|
}
|
||||||
|
|
||||||
|
const meanError = predictionErrors.reduce((sum, err) => sum + err, 0) / predictionErrors.length;
|
||||||
|
const errorStdDev = Math.sqrt(
|
||||||
|
predictionErrors.reduce((sum, err) => sum + Math.pow(err - meanError, 2), 0) / predictionErrors.length
|
||||||
|
);
|
||||||
|
|
||||||
|
const actualError = Math.abs(currentValue - prediction);
|
||||||
|
const deviationScore = errorStdDev > 0 ? actualError / errorStdDev : 0;
|
||||||
|
const isAnomaly = deviationScore > sensitivity;
|
||||||
|
|
||||||
|
return {
|
||||||
|
isAnomaly,
|
||||||
|
anomalyType: currentValue > prediction ? 'unexpected_spike' : 'unexpected_drop',
|
||||||
|
deviationScore,
|
||||||
|
confidenceScore: Math.min(deviationScore / (sensitivity * 2), 1),
|
||||||
|
algorithm: 'predictive',
|
||||||
|
baselineValue: prediction,
|
||||||
|
anomalyValue: currentValue,
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
|
// Ensemble method: Combines multiple algorithms for better accuracy
|
||||||
|
static ensemble(data: number[], currentValue: number, sensitivity: number = 2.5): AnomalyResult {
|
||||||
|
const results: AnomalyResult[] = [
|
||||||
|
this.zScore(data, currentValue, sensitivity),
|
||||||
|
this.movingAverage(data, currentValue, sensitivity),
|
||||||
|
this.rateOfChange(data, currentValue, sensitivity),
|
||||||
|
this.isolationForest(data, currentValue, 0.6),
|
||||||
|
this.predictiveAnomaly(data, currentValue, sensitivity),
|
||||||
|
];
|
||||||
|
|
||||||
|
// Count how many algorithms detected an anomaly
|
||||||
|
const anomalyCount = results.filter(r => r.isAnomaly).length;
|
||||||
|
const anomalyRatio = anomalyCount / results.length;
|
||||||
|
|
||||||
|
// Calculate average deviation and confidence
|
||||||
|
const avgDeviation = results.reduce((sum, r) => sum + r.deviationScore, 0) / results.length;
|
||||||
|
const avgConfidence = results.reduce((sum, r) => sum + r.confidenceScore, 0) / results.length;
|
||||||
|
|
||||||
|
// Determine anomaly type based on most common classification
|
||||||
|
const typeCount = new Map<string, number>();
|
||||||
|
results.forEach(r => {
|
||||||
|
typeCount.set(r.anomalyType, (typeCount.get(r.anomalyType) || 0) + 1);
|
||||||
|
});
|
||||||
|
|
||||||
|
let mostCommonType = 'none';
|
||||||
|
let maxCount = 0;
|
||||||
|
typeCount.forEach((count, type) => {
|
||||||
|
if (count > maxCount) {
|
||||||
|
maxCount = count;
|
||||||
|
mostCommonType = type;
|
||||||
|
}
|
||||||
|
});
|
||||||
|
|
||||||
|
const mean = data.reduce((sum, val) => sum + val, 0) / data.length;
|
||||||
|
|
||||||
|
return {
|
||||||
|
isAnomaly: anomalyRatio >= 0.4, // At least 40% of algorithms agree
|
||||||
|
anomalyType: mostCommonType,
|
||||||
|
deviationScore: avgDeviation,
|
||||||
|
confidenceScore: Math.min(avgConfidence * anomalyRatio * 2, 1),
|
||||||
|
algorithm: 'ensemble',
|
||||||
|
baselineValue: mean,
|
||||||
|
anomalyValue: currentValue,
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
// Z-Score algorithm: Detects outliers based on standard deviation
|
// Z-Score algorithm: Detects outliers based on standard deviation
|
||||||
static zScore(data: number[], currentValue: number, sensitivity: number = 3.0): AnomalyResult {
|
static zScore(data: number[], currentValue: number, sensitivity: number = 3.0): AnomalyResult {
|
||||||
if (data.length < 2) {
|
if (data.length < 2) {
|
||||||
@@ -189,6 +362,18 @@ Deno.serve(async (req) => {
|
|||||||
case 'rate_of_change':
|
case 'rate_of_change':
|
||||||
result = AnomalyDetector.rateOfChange(historicalValues, currentValue, config.sensitivity);
|
result = AnomalyDetector.rateOfChange(historicalValues, currentValue, config.sensitivity);
|
||||||
break;
|
break;
|
||||||
|
case 'isolation_forest':
|
||||||
|
result = AnomalyDetector.isolationForest(historicalValues, currentValue, 0.6);
|
||||||
|
break;
|
||||||
|
case 'seasonal':
|
||||||
|
result = AnomalyDetector.seasonalDecomposition(historicalValues, currentValue, config.sensitivity, 24);
|
||||||
|
break;
|
||||||
|
case 'predictive':
|
||||||
|
result = AnomalyDetector.predictiveAnomaly(historicalValues, currentValue, config.sensitivity);
|
||||||
|
break;
|
||||||
|
case 'ensemble':
|
||||||
|
result = AnomalyDetector.ensemble(historicalValues, currentValue, config.sensitivity);
|
||||||
|
break;
|
||||||
default:
|
default:
|
||||||
continue;
|
continue;
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -0,0 +1,7 @@
|
|||||||
|
-- Fix security warnings: Set search_path for all retention policy functions
|
||||||
|
|
||||||
|
ALTER FUNCTION cleanup_old_metrics(INTEGER) SET search_path = public;
|
||||||
|
ALTER FUNCTION cleanup_old_anomalies(INTEGER) SET search_path = public;
|
||||||
|
ALTER FUNCTION cleanup_old_alerts(INTEGER) SET search_path = public;
|
||||||
|
ALTER FUNCTION cleanup_old_incidents(INTEGER) SET search_path = public;
|
||||||
|
ALTER FUNCTION run_data_retention_cleanup() SET search_path = public;
|
||||||
@@ -0,0 +1,40 @@
|
|||||||
|
-- Set up automated cron jobs for monitoring and anomaly detection
|
||||||
|
|
||||||
|
-- 1. Detect anomalies every 5 minutes
|
||||||
|
SELECT cron.schedule(
|
||||||
|
'detect-anomalies-every-5-minutes',
|
||||||
|
'*/5 * * * *', -- Every 5 minutes
|
||||||
|
$$
|
||||||
|
SELECT net.http_post(
|
||||||
|
url := 'https://ydvtmnrszybqnbcqbdcy.supabase.co/functions/v1/detect-anomalies',
|
||||||
|
headers := '{"Content-Type": "application/json", "Authorization": "Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJzdXBhYmFzZSIsInJlZiI6InlkdnRtbnJzenlicW5iY3FiZGN5Iiwicm9sZSI6ImFub24iLCJpYXQiOjE3NTgzMjYzNTYsImV4cCI6MjA3MzkwMjM1Nn0.DM3oyapd_omP5ZzIlrT0H9qBsiQBxBRgw2tYuqgXKX4"}'::jsonb,
|
||||||
|
body := jsonb_build_object('scheduled', true)
|
||||||
|
) as request_id;
|
||||||
|
$$
|
||||||
|
);
|
||||||
|
|
||||||
|
-- 2. Collect metrics every minute
|
||||||
|
SELECT cron.schedule(
|
||||||
|
'collect-metrics-every-minute',
|
||||||
|
'* * * * *', -- Every minute
|
||||||
|
$$
|
||||||
|
SELECT net.http_post(
|
||||||
|
url := 'https://ydvtmnrszybqnbcqbdcy.supabase.co/functions/v1/collect-metrics',
|
||||||
|
headers := '{"Content-Type": "application/json", "Authorization": "Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJzdXBhYmFzZSIsInJlZiI6InlkdnRtbnJzenlicW5iY3FiZGN5Iiwicm9sZSI6ImFub24iLCJpYXQiOjE3NTgzMjYzNTYsImV4cCI6MjA3MzkwMjM1Nn0.DM3oyapd_omP5ZzIlrT0H9qBsiQBxBRgw2tYuqgXKX4"}'::jsonb,
|
||||||
|
body := jsonb_build_object('scheduled', true)
|
||||||
|
) as request_id;
|
||||||
|
$$
|
||||||
|
);
|
||||||
|
|
||||||
|
-- 3. Data retention cleanup daily at 3 AM
|
||||||
|
SELECT cron.schedule(
|
||||||
|
'data-retention-cleanup-daily',
|
||||||
|
'0 3 * * *', -- Daily at 3:00 AM
|
||||||
|
$$
|
||||||
|
SELECT net.http_post(
|
||||||
|
url := 'https://ydvtmnrszybqnbcqbdcy.supabase.co/functions/v1/data-retention-cleanup',
|
||||||
|
headers := '{"Content-Type": "application/json", "Authorization": "Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJzdXBhYmFzZSIsInJlZiI6InlkdnRtbnJzenlicW5iY3FiZGN5Iiwicm9sZSI6ImFub24iLCJpYXQiOjE3NTgzMjYzNTYsImV4cCI6MjA3MzkwMjM1Nn0.DM3oyapd_omP5ZzIlrT0H9qBsiQBxBRgw2tYuqgXKX4"}'::jsonb,
|
||||||
|
body := jsonb_build_object('scheduled', true)
|
||||||
|
) as request_id;
|
||||||
|
$$
|
||||||
|
);
|
||||||
Reference in New Issue
Block a user