mirror of
https://github.com/pacnpal/thrilltrack-explorer.git
synced 2025-12-27 19:27:05 -05:00
Compare commits
4 Commits
07fdfe34f3
...
7642ac435b
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
7642ac435b | ||
|
|
c632e559d0 | ||
|
|
12a6bfdfab | ||
|
|
915a9fe2df |
266
MONITORING_SETUP.md
Normal file
266
MONITORING_SETUP.md
Normal file
@@ -0,0 +1,266 @@
|
||||
# 🎯 Advanced ML Anomaly Detection & Automated Monitoring
|
||||
|
||||
## ✅ What's Now Active
|
||||
|
||||
### 1. Advanced ML Algorithms
|
||||
|
||||
Your anomaly detection now uses **6 sophisticated algorithms**:
|
||||
|
||||
#### Statistical Algorithms
|
||||
- **Z-Score**: Standard deviation-based outlier detection
|
||||
- **Moving Average**: Trend deviation detection
|
||||
- **Rate of Change**: Sudden change detection
|
||||
|
||||
#### Advanced ML Algorithms (NEW!)
|
||||
- **Isolation Forest**: Anomaly detection based on data point isolation
|
||||
- Works by measuring how "isolated" a point is from the rest
|
||||
- Excellent for detecting outliers in multi-dimensional space
|
||||
|
||||
- **Seasonal Decomposition**: Pattern-aware anomaly detection
|
||||
- Detects anomalies considering daily/weekly patterns
|
||||
- Configurable period (default: 24 hours)
|
||||
- Identifies seasonal spikes and drops
|
||||
|
||||
- **Predictive Anomaly (LSTM-inspired)**: Time-series prediction
|
||||
- Uses triple exponential smoothing (Holt-Winters)
|
||||
- Predicts next value based on level and trend
|
||||
- Flags unexpected deviations from predictions
|
||||
|
||||
- **Ensemble Method**: Multi-algorithm consensus
|
||||
- Combines all 5 algorithms for maximum accuracy
|
||||
- Requires 40%+ algorithms to agree for anomaly detection
|
||||
- Provides weighted confidence scores
|
||||
|
||||
### 2. Automated Cron Jobs
|
||||
|
||||
**NOW RUNNING AUTOMATICALLY:**
|
||||
|
||||
| Job | Schedule | Purpose |
|
||||
|-----|----------|---------|
|
||||
| `detect-anomalies-every-5-minutes` | Every 5 minutes (`*/5 * * * *`) | Run ML anomaly detection on all metrics |
|
||||
| `collect-metrics-every-minute` | Every minute (`* * * * *`) | Collect system metrics (errors, queues, API times) |
|
||||
| `data-retention-cleanup-daily` | Daily at 3 AM (`0 3 * * *`) | Clean up old data to manage DB size |
|
||||
|
||||
### 3. Algorithm Configuration
|
||||
|
||||
Each metric can be configured with different algorithms in the `anomaly_detection_config` table:
|
||||
|
||||
```sql
|
||||
-- Example: Configure a metric to use all advanced algorithms
|
||||
UPDATE anomaly_detection_config
|
||||
SET detection_algorithms = ARRAY['z_score', 'moving_average', 'isolation_forest', 'seasonal', 'predictive', 'ensemble']
|
||||
WHERE metric_name = 'api_response_time';
|
||||
```
|
||||
|
||||
**Algorithm Selection Guide:**
|
||||
|
||||
- **z_score**: Best for normally distributed data, general outlier detection
|
||||
- **moving_average**: Best for trending data, smooth patterns
|
||||
- **rate_of_change**: Best for detecting sudden spikes/drops
|
||||
- **isolation_forest**: Best for complex multi-modal distributions
|
||||
- **seasonal**: Best for cyclic patterns (hourly, daily, weekly)
|
||||
- **predictive**: Best for time-series with clear trends
|
||||
- **ensemble**: Best for maximum accuracy, combines all methods
|
||||
|
||||
### 4. Sensitivity Tuning
|
||||
|
||||
**Sensitivity Parameter** (in `anomaly_detection_config`):
|
||||
- Lower value (1.5-2.0): More sensitive, catches subtle anomalies, more false positives
|
||||
- Medium value (2.5-3.0): Balanced, recommended default
|
||||
- Higher value (3.5-5.0): Less sensitive, only major anomalies, fewer false positives
|
||||
|
||||
### 5. Monitoring Dashboard
|
||||
|
||||
View all anomaly detections in the admin panel:
|
||||
- Navigate to `/admin/monitoring`
|
||||
- See the "ML Anomaly Detection" panel
|
||||
- Real-time updates every 30 seconds
|
||||
- Manual trigger button available
|
||||
|
||||
**Anomaly Details Include:**
|
||||
- Algorithm used
|
||||
- Anomaly type (spike, drop, outlier, seasonal, etc.)
|
||||
- Severity (low, medium, high, critical)
|
||||
- Deviation score (how far from normal)
|
||||
- Confidence score (algorithm certainty)
|
||||
- Baseline vs actual values
|
||||
|
||||
## 🔍 How It Works
|
||||
|
||||
### Data Flow
|
||||
|
||||
```
|
||||
1. Metrics Collection (every minute)
|
||||
↓
|
||||
2. Store in metric_time_series table
|
||||
↓
|
||||
3. Anomaly Detection (every 5 minutes)
|
||||
↓
|
||||
4. Run ML algorithms on recent data
|
||||
↓
|
||||
5. Detect anomalies & calculate scores
|
||||
↓
|
||||
6. Insert into anomaly_detections table
|
||||
↓
|
||||
7. Auto-create system alerts (if critical/high)
|
||||
↓
|
||||
8. Display in admin dashboard
|
||||
↓
|
||||
9. Data Retention Cleanup (daily 3 AM)
|
||||
```
|
||||
|
||||
### Algorithm Comparison
|
||||
|
||||
| Algorithm | Strength | Best For | Time Complexity |
|
||||
|-----------|----------|----------|-----------------|
|
||||
| Z-Score | Simple, fast | Normal distributions | O(n) |
|
||||
| Moving Average | Trend-aware | Gradual changes | O(n) |
|
||||
| Rate of Change | Change detection | Sudden shifts | O(1) |
|
||||
| Isolation Forest | Multi-dimensional | Complex patterns | O(n log n) |
|
||||
| Seasonal | Pattern-aware | Cyclic data | O(n) |
|
||||
| Predictive | Forecast-based | Time-series | O(n) |
|
||||
| Ensemble | Highest accuracy | Any pattern | O(n log n) |
|
||||
|
||||
## 📊 Current Metrics Being Monitored
|
||||
|
||||
### Supabase Metrics (collected every minute)
|
||||
- `api_error_count`: Recent API errors
|
||||
- `rate_limit_violations`: Rate limit blocks
|
||||
- `pending_submissions`: Submissions awaiting moderation
|
||||
- `active_incidents`: Open/investigating incidents
|
||||
- `unresolved_alerts`: Unresolved system alerts
|
||||
- `submission_approval_rate`: Approval percentage
|
||||
- `avg_moderation_time`: Average moderation time
|
||||
|
||||
### Django Metrics (collected every minute, if configured)
|
||||
- `error_rate`: Error log percentage
|
||||
- `api_response_time`: Average API response time (ms)
|
||||
- `celery_queue_size`: Queued Celery tasks
|
||||
- `database_connections`: Active DB connections
|
||||
- `cache_hit_rate`: Cache hit percentage
|
||||
|
||||
## 🎛️ Configuration
|
||||
|
||||
### Add New Metrics for Detection
|
||||
|
||||
```sql
|
||||
INSERT INTO anomaly_detection_config (
|
||||
metric_name,
|
||||
metric_category,
|
||||
enabled,
|
||||
sensitivity,
|
||||
lookback_window_minutes,
|
||||
detection_algorithms,
|
||||
min_data_points,
|
||||
alert_threshold_score,
|
||||
auto_create_alert
|
||||
) VALUES (
|
||||
'custom_metric_name',
|
||||
'performance',
|
||||
true,
|
||||
2.5,
|
||||
60,
|
||||
ARRAY['ensemble', 'predictive', 'seasonal'],
|
||||
10,
|
||||
3.0,
|
||||
true
|
||||
);
|
||||
```
|
||||
|
||||
### Adjust Sensitivity
|
||||
|
||||
```sql
|
||||
-- Make detection more sensitive for critical metrics
|
||||
UPDATE anomaly_detection_config
|
||||
SET sensitivity = 2.0, alert_threshold_score = 2.5
|
||||
WHERE metric_name = 'api_error_count';
|
||||
|
||||
-- Make detection less sensitive for noisy metrics
|
||||
UPDATE anomaly_detection_config
|
||||
SET sensitivity = 4.0, alert_threshold_score = 4.0
|
||||
WHERE metric_name = 'cache_hit_rate';
|
||||
```
|
||||
|
||||
### Disable Detection for Specific Metrics
|
||||
|
||||
```sql
|
||||
UPDATE anomaly_detection_config
|
||||
SET enabled = false
|
||||
WHERE metric_name = 'some_metric';
|
||||
```
|
||||
|
||||
## 🔧 Troubleshooting
|
||||
|
||||
### Check Cron Job Status
|
||||
|
||||
```sql
|
||||
SELECT jobid, jobname, schedule, active, last_run_time, last_run_status
|
||||
FROM cron.job_run_details
|
||||
WHERE jobname LIKE '%anomal%' OR jobname LIKE '%metric%'
|
||||
ORDER BY start_time DESC
|
||||
LIMIT 20;
|
||||
```
|
||||
|
||||
### View Recent Anomalies
|
||||
|
||||
```sql
|
||||
SELECT * FROM recent_anomalies_view
|
||||
ORDER BY detected_at DESC
|
||||
LIMIT 20;
|
||||
```
|
||||
|
||||
### Check Metric Collection
|
||||
|
||||
```sql
|
||||
SELECT metric_name, COUNT(*) as count,
|
||||
MIN(timestamp) as oldest,
|
||||
MAX(timestamp) as newest
|
||||
FROM metric_time_series
|
||||
WHERE timestamp > NOW() - INTERVAL '1 hour'
|
||||
GROUP BY metric_name
|
||||
ORDER BY metric_name;
|
||||
```
|
||||
|
||||
### Manual Anomaly Detection Trigger
|
||||
|
||||
```sql
|
||||
-- Call the edge function directly
|
||||
SELECT net.http_post(
|
||||
url := 'https://ydvtmnrszybqnbcqbdcy.supabase.co/functions/v1/detect-anomalies',
|
||||
headers := '{"Content-Type": "application/json", "Authorization": "Bearer YOUR_ANON_KEY"}'::jsonb,
|
||||
body := '{}'::jsonb
|
||||
);
|
||||
```
|
||||
|
||||
## 📈 Performance Considerations
|
||||
|
||||
### Data Volume
|
||||
- Metrics: ~1440 records/day per metric (every minute)
|
||||
- With 12 metrics: ~17,280 records/day
|
||||
- 30-day retention: ~518,400 records
|
||||
- Automatic cleanup prevents unbounded growth
|
||||
|
||||
### Detection Performance
|
||||
- Each detection run processes all enabled metrics
|
||||
- Ensemble algorithm is most CPU-intensive
|
||||
- Recommended: Use ensemble only for critical metrics
|
||||
- Typical detection time: <5 seconds for 12 metrics
|
||||
|
||||
### Database Impact
|
||||
- Indexes on timestamp columns optimize queries
|
||||
- Regular cleanup maintains query performance
|
||||
- Consider partitioning for very high-volume deployments
|
||||
|
||||
## 🚀 Next Steps
|
||||
|
||||
1. **Monitor the Dashboard**: Visit `/admin/monitoring` to see anomalies
|
||||
2. **Fine-tune Sensitivity**: Adjust based on false positive rate
|
||||
3. **Add Custom Metrics**: Monitor application-specific KPIs
|
||||
4. **Set Up Alerts**: Configure notifications for critical anomalies
|
||||
5. **Review Weekly**: Check patterns and adjust algorithms
|
||||
|
||||
## 📚 Additional Resources
|
||||
|
||||
- [Edge Function Logs](https://supabase.com/dashboard/project/ydvtmnrszybqnbcqbdcy/functions/detect-anomalies/logs)
|
||||
- [Cron Jobs Dashboard](https://supabase.com/dashboard/project/ydvtmnrszybqnbcqbdcy/sql/new)
|
||||
- Django README: `django/README_MONITORING.md`
|
||||
@@ -136,6 +136,24 @@ SELECT cron.schedule(
|
||||
);
|
||||
```
|
||||
|
||||
### 5. Data Retention Cleanup Setup
|
||||
|
||||
The `data-retention-cleanup` edge function should run daily:
|
||||
|
||||
```sql
|
||||
SELECT cron.schedule(
|
||||
'data-retention-cleanup-daily',
|
||||
'0 3 * * *', -- Daily at 3:00 AM
|
||||
$$
|
||||
SELECT net.http_post(
|
||||
url:='https://api.thrillwiki.com/functions/v1/data-retention-cleanup',
|
||||
headers:='{"Content-Type": "application/json", "Authorization": "Bearer YOUR_ANON_KEY"}'::jsonb,
|
||||
body:=concat('{"time": "', now(), '"}')::jsonb
|
||||
) as request_id;
|
||||
$$
|
||||
);
|
||||
```
|
||||
|
||||
## Metrics Collected
|
||||
|
||||
### Django Metrics
|
||||
@@ -154,6 +172,35 @@ SELECT cron.schedule(
|
||||
- `submission_approval_rate`: Percentage of approved submissions (workflow)
|
||||
- `avg_moderation_time`: Average time to moderate in minutes (workflow)
|
||||
|
||||
## Data Retention Policies
|
||||
|
||||
The system automatically cleans up old data to manage database size:
|
||||
|
||||
### Retention Periods
|
||||
- **Metrics** (`metric_time_series`): 30 days
|
||||
- **Anomaly Detections**: 30 days (resolved alerts archived after 7 days)
|
||||
- **Resolved Alerts**: 90 days
|
||||
- **Resolved Incidents**: 90 days
|
||||
|
||||
### Cleanup Functions
|
||||
|
||||
The following database functions manage data retention:
|
||||
|
||||
1. **`cleanup_old_metrics(retention_days)`**: Deletes metrics older than specified days (default: 30)
|
||||
2. **`cleanup_old_anomalies(retention_days)`**: Archives resolved anomalies and deletes old unresolved ones (default: 30)
|
||||
3. **`cleanup_old_alerts(retention_days)`**: Deletes old resolved alerts (default: 90)
|
||||
4. **`cleanup_old_incidents(retention_days)`**: Deletes old resolved incidents (default: 90)
|
||||
5. **`run_data_retention_cleanup()`**: Master function that runs all cleanup operations
|
||||
|
||||
### Automated Cleanup Schedule
|
||||
|
||||
Django Celery tasks run retention cleanup automatically:
|
||||
- Full cleanup: Daily at 3:00 AM
|
||||
- Metrics cleanup: Daily at 3:30 AM
|
||||
- Anomaly cleanup: Daily at 4:00 AM
|
||||
|
||||
View retention statistics in the Admin Dashboard's Data Retention panel.
|
||||
|
||||
## Monitoring
|
||||
|
||||
View collected metrics in the Admin Monitoring Dashboard:
|
||||
|
||||
168
django/apps/monitoring/tasks_retention.py
Normal file
168
django/apps/monitoring/tasks_retention.py
Normal file
@@ -0,0 +1,168 @@
|
||||
"""
|
||||
Celery tasks for data retention and cleanup.
|
||||
"""
|
||||
import logging
|
||||
import requests
|
||||
import os
|
||||
from celery import shared_task
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
SUPABASE_URL = os.environ.get('SUPABASE_URL', 'https://api.thrillwiki.com')
|
||||
SUPABASE_SERVICE_KEY = os.environ.get('SUPABASE_SERVICE_ROLE_KEY')
|
||||
|
||||
|
||||
@shared_task(bind=True, name='monitoring.run_data_retention_cleanup')
|
||||
def run_data_retention_cleanup(self):
|
||||
"""
|
||||
Run comprehensive data retention cleanup.
|
||||
Cleans up old metrics, anomaly detections, alerts, and incidents.
|
||||
Runs daily at 3 AM.
|
||||
"""
|
||||
logger.info("Starting data retention cleanup")
|
||||
|
||||
if not SUPABASE_SERVICE_KEY:
|
||||
logger.error("SUPABASE_SERVICE_ROLE_KEY not configured")
|
||||
return {'success': False, 'error': 'Missing service key'}
|
||||
|
||||
try:
|
||||
# Call the Supabase RPC function
|
||||
headers = {
|
||||
'apikey': SUPABASE_SERVICE_KEY,
|
||||
'Authorization': f'Bearer {SUPABASE_SERVICE_KEY}',
|
||||
'Content-Type': 'application/json',
|
||||
}
|
||||
|
||||
response = requests.post(
|
||||
f'{SUPABASE_URL}/rest/v1/rpc/run_data_retention_cleanup',
|
||||
headers=headers,
|
||||
timeout=60
|
||||
)
|
||||
|
||||
if response.status_code == 200:
|
||||
result = response.json()
|
||||
logger.info(f"Data retention cleanup completed: {result}")
|
||||
return result
|
||||
else:
|
||||
logger.error(f"Data retention cleanup failed: {response.status_code} - {response.text}")
|
||||
return {'success': False, 'error': response.text}
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Error in data retention cleanup: {e}", exc_info=True)
|
||||
raise
|
||||
|
||||
|
||||
@shared_task(bind=True, name='monitoring.cleanup_old_metrics')
|
||||
def cleanup_old_metrics(self, retention_days: int = 30):
|
||||
"""
|
||||
Clean up old metric time series data.
|
||||
Runs daily to remove metrics older than retention period.
|
||||
"""
|
||||
logger.info(f"Cleaning up metrics older than {retention_days} days")
|
||||
|
||||
if not SUPABASE_SERVICE_KEY:
|
||||
logger.error("SUPABASE_SERVICE_ROLE_KEY not configured")
|
||||
return {'success': False, 'error': 'Missing service key'}
|
||||
|
||||
try:
|
||||
headers = {
|
||||
'apikey': SUPABASE_SERVICE_KEY,
|
||||
'Authorization': f'Bearer {SUPABASE_SERVICE_KEY}',
|
||||
'Content-Type': 'application/json',
|
||||
}
|
||||
|
||||
response = requests.post(
|
||||
f'{SUPABASE_URL}/rest/v1/rpc/cleanup_old_metrics',
|
||||
headers=headers,
|
||||
json={'retention_days': retention_days},
|
||||
timeout=30
|
||||
)
|
||||
|
||||
if response.status_code == 200:
|
||||
deleted_count = response.json()
|
||||
logger.info(f"Cleaned up {deleted_count} old metrics")
|
||||
return {'success': True, 'deleted_count': deleted_count}
|
||||
else:
|
||||
logger.error(f"Metrics cleanup failed: {response.status_code} - {response.text}")
|
||||
return {'success': False, 'error': response.text}
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Error in metrics cleanup: {e}", exc_info=True)
|
||||
raise
|
||||
|
||||
|
||||
@shared_task(bind=True, name='monitoring.cleanup_old_anomalies')
|
||||
def cleanup_old_anomalies(self, retention_days: int = 30):
|
||||
"""
|
||||
Clean up old anomaly detections.
|
||||
Archives resolved anomalies and deletes very old unresolved ones.
|
||||
"""
|
||||
logger.info(f"Cleaning up anomalies older than {retention_days} days")
|
||||
|
||||
if not SUPABASE_SERVICE_KEY:
|
||||
logger.error("SUPABASE_SERVICE_ROLE_KEY not configured")
|
||||
return {'success': False, 'error': 'Missing service key'}
|
||||
|
||||
try:
|
||||
headers = {
|
||||
'apikey': SUPABASE_SERVICE_KEY,
|
||||
'Authorization': f'Bearer {SUPABASE_SERVICE_KEY}',
|
||||
'Content-Type': 'application/json',
|
||||
}
|
||||
|
||||
response = requests.post(
|
||||
f'{SUPABASE_URL}/rest/v1/rpc/cleanup_old_anomalies',
|
||||
headers=headers,
|
||||
json={'retention_days': retention_days},
|
||||
timeout=30
|
||||
)
|
||||
|
||||
if response.status_code == 200:
|
||||
result = response.json()
|
||||
logger.info(f"Cleaned up anomalies: {result}")
|
||||
return {'success': True, 'result': result}
|
||||
else:
|
||||
logger.error(f"Anomalies cleanup failed: {response.status_code} - {response.text}")
|
||||
return {'success': False, 'error': response.text}
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Error in anomalies cleanup: {e}", exc_info=True)
|
||||
raise
|
||||
|
||||
|
||||
@shared_task(bind=True, name='monitoring.get_retention_stats')
|
||||
def get_retention_stats(self):
|
||||
"""
|
||||
Get current data retention statistics.
|
||||
Shows record counts and storage size for monitored tables.
|
||||
"""
|
||||
logger.info("Fetching data retention statistics")
|
||||
|
||||
if not SUPABASE_SERVICE_KEY:
|
||||
logger.error("SUPABASE_SERVICE_ROLE_KEY not configured")
|
||||
return {'success': False, 'error': 'Missing service key'}
|
||||
|
||||
try:
|
||||
headers = {
|
||||
'apikey': SUPABASE_SERVICE_KEY,
|
||||
'Authorization': f'Bearer {SUPABASE_SERVICE_KEY}',
|
||||
'Content-Type': 'application/json',
|
||||
}
|
||||
|
||||
response = requests.get(
|
||||
f'{SUPABASE_URL}/rest/v1/data_retention_stats',
|
||||
headers=headers,
|
||||
timeout=10
|
||||
)
|
||||
|
||||
if response.status_code == 200:
|
||||
stats = response.json()
|
||||
logger.info(f"Retrieved retention stats for {len(stats)} tables")
|
||||
return {'success': True, 'stats': stats}
|
||||
else:
|
||||
logger.error(f"Failed to get retention stats: {response.status_code} - {response.text}")
|
||||
return {'success': False, 'error': response.text}
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Error getting retention stats: {e}", exc_info=True)
|
||||
raise
|
||||
@@ -33,6 +33,25 @@ CELERY_BEAT_SCHEDULE = {
|
||||
'options': {'queue': 'monitoring'}
|
||||
},
|
||||
|
||||
# Data retention cleanup tasks
|
||||
'run-data-retention-cleanup': {
|
||||
'task': 'monitoring.run_data_retention_cleanup',
|
||||
'schedule': crontab(hour=3, minute=0), # Daily at 3 AM
|
||||
'options': {'queue': 'maintenance'}
|
||||
},
|
||||
|
||||
'cleanup-old-metrics': {
|
||||
'task': 'monitoring.cleanup_old_metrics',
|
||||
'schedule': crontab(hour=3, minute=30), # Daily at 3:30 AM
|
||||
'options': {'queue': 'maintenance'}
|
||||
},
|
||||
|
||||
'cleanup-old-anomalies': {
|
||||
'task': 'monitoring.cleanup_old_anomalies',
|
||||
'schedule': crontab(hour=4, minute=0), # Daily at 4 AM
|
||||
'options': {'queue': 'maintenance'}
|
||||
},
|
||||
|
||||
# Existing user tasks
|
||||
'cleanup-expired-tokens': {
|
||||
'task': 'users.cleanup_expired_tokens',
|
||||
|
||||
161
src/components/admin/DataRetentionPanel.tsx
Normal file
161
src/components/admin/DataRetentionPanel.tsx
Normal file
@@ -0,0 +1,161 @@
|
||||
import { Card, CardContent, CardDescription, CardHeader, CardTitle } from "@/components/ui/card";
|
||||
import { Button } from "@/components/ui/button";
|
||||
import { Badge } from "@/components/ui/badge";
|
||||
import { Trash2, Database, Clock, HardDrive, TrendingDown } from "lucide-react";
|
||||
import { useRetentionStats, useRunCleanup } from "@/hooks/admin/useDataRetention";
|
||||
import { formatDistanceToNow } from "date-fns";
|
||||
|
||||
export function DataRetentionPanel() {
|
||||
const { data: stats, isLoading } = useRetentionStats();
|
||||
const runCleanup = useRunCleanup();
|
||||
|
||||
if (isLoading) {
|
||||
return (
|
||||
<Card>
|
||||
<CardHeader>
|
||||
<CardTitle>Data Retention</CardTitle>
|
||||
<CardDescription>Loading retention statistics...</CardDescription>
|
||||
</CardHeader>
|
||||
</Card>
|
||||
);
|
||||
}
|
||||
|
||||
const totalRecords = stats?.reduce((sum, s) => sum + s.total_records, 0) || 0;
|
||||
const totalSize = stats?.reduce((sum, s) => {
|
||||
const size = s.table_size.replace(/[^0-9.]/g, '');
|
||||
return sum + parseFloat(size);
|
||||
}, 0) || 0;
|
||||
|
||||
return (
|
||||
<Card>
|
||||
<CardHeader>
|
||||
<div className="flex items-center justify-between">
|
||||
<div>
|
||||
<CardTitle className="flex items-center gap-2">
|
||||
<Database className="h-5 w-5" />
|
||||
Data Retention Management
|
||||
</CardTitle>
|
||||
<CardDescription>
|
||||
Automatic cleanup of old metrics and monitoring data
|
||||
</CardDescription>
|
||||
</div>
|
||||
<Button
|
||||
onClick={() => runCleanup.mutate()}
|
||||
disabled={runCleanup.isPending}
|
||||
variant="destructive"
|
||||
size="sm"
|
||||
>
|
||||
<Trash2 className="h-4 w-4 mr-2" />
|
||||
Run Cleanup Now
|
||||
</Button>
|
||||
</div>
|
||||
</CardHeader>
|
||||
<CardContent className="space-y-6">
|
||||
{/* Summary Stats */}
|
||||
<div className="grid gap-4 md:grid-cols-3">
|
||||
<div className="space-y-2">
|
||||
<div className="flex items-center gap-2 text-sm text-muted-foreground">
|
||||
<Database className="h-4 w-4" />
|
||||
Total Records
|
||||
</div>
|
||||
<div className="text-2xl font-bold">{totalRecords.toLocaleString()}</div>
|
||||
</div>
|
||||
<div className="space-y-2">
|
||||
<div className="flex items-center gap-2 text-sm text-muted-foreground">
|
||||
<HardDrive className="h-4 w-4" />
|
||||
Total Size
|
||||
</div>
|
||||
<div className="text-2xl font-bold">{totalSize.toFixed(1)} MB</div>
|
||||
</div>
|
||||
<div className="space-y-2">
|
||||
<div className="flex items-center gap-2 text-sm text-muted-foreground">
|
||||
<TrendingDown className="h-4 w-4" />
|
||||
Tables Monitored
|
||||
</div>
|
||||
<div className="text-2xl font-bold">{stats?.length || 0}</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
{/* Retention Policies */}
|
||||
<div>
|
||||
<h3 className="font-semibold mb-3">Retention Policies</h3>
|
||||
<div className="space-y-2 text-sm">
|
||||
<div className="flex justify-between items-center p-2 bg-muted/50 rounded">
|
||||
<span>Metrics (metric_time_series)</span>
|
||||
<Badge variant="outline">30 days</Badge>
|
||||
</div>
|
||||
<div className="flex justify-between items-center p-2 bg-muted/50 rounded">
|
||||
<span>Anomaly Detections</span>
|
||||
<Badge variant="outline">30 days</Badge>
|
||||
</div>
|
||||
<div className="flex justify-between items-center p-2 bg-muted/50 rounded">
|
||||
<span>Resolved Alerts</span>
|
||||
<Badge variant="outline">90 days</Badge>
|
||||
</div>
|
||||
<div className="flex justify-between items-center p-2 bg-muted/50 rounded">
|
||||
<span>Resolved Incidents</span>
|
||||
<Badge variant="outline">90 days</Badge>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
{/* Table Statistics */}
|
||||
<div>
|
||||
<h3 className="font-semibold mb-3">Storage Details</h3>
|
||||
<div className="space-y-3">
|
||||
{stats?.map((stat) => (
|
||||
<div
|
||||
key={stat.table_name}
|
||||
className="border rounded-lg p-3 space-y-2"
|
||||
>
|
||||
<div className="flex items-center justify-between">
|
||||
<span className="font-medium">{stat.table_name}</span>
|
||||
<Badge variant="secondary">{stat.table_size}</Badge>
|
||||
</div>
|
||||
<div className="grid grid-cols-3 gap-2 text-xs text-muted-foreground">
|
||||
<div>
|
||||
<div>Total</div>
|
||||
<div className="font-medium text-foreground">
|
||||
{stat.total_records.toLocaleString()}
|
||||
</div>
|
||||
</div>
|
||||
<div>
|
||||
<div>Last 7 days</div>
|
||||
<div className="font-medium text-foreground">
|
||||
{stat.last_7_days.toLocaleString()}
|
||||
</div>
|
||||
</div>
|
||||
<div>
|
||||
<div>Last 30 days</div>
|
||||
<div className="font-medium text-foreground">
|
||||
{stat.last_30_days.toLocaleString()}
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
{stat.oldest_record && (
|
||||
<div className="flex items-center gap-1 text-xs text-muted-foreground">
|
||||
<Clock className="h-3 w-3" />
|
||||
Oldest:{" "}
|
||||
{formatDistanceToNow(new Date(stat.oldest_record), {
|
||||
addSuffix: true,
|
||||
})}
|
||||
</div>
|
||||
)}
|
||||
</div>
|
||||
))}
|
||||
</div>
|
||||
</div>
|
||||
|
||||
{/* Cleanup Schedule */}
|
||||
<div className="bg-muted/50 rounded-lg p-4 space-y-2">
|
||||
<h3 className="font-semibold text-sm">Automated Cleanup Schedule</h3>
|
||||
<div className="space-y-1 text-sm text-muted-foreground">
|
||||
<div>• Full cleanup runs daily at 3:00 AM</div>
|
||||
<div>• Metrics cleanup at 3:30 AM</div>
|
||||
<div>• Anomaly cleanup at 4:00 AM</div>
|
||||
</div>
|
||||
</div>
|
||||
</CardContent>
|
||||
</Card>
|
||||
);
|
||||
}
|
||||
134
src/hooks/admin/useDataRetention.ts
Normal file
134
src/hooks/admin/useDataRetention.ts
Normal file
@@ -0,0 +1,134 @@
|
||||
import { useQuery, useMutation, useQueryClient } from "@tanstack/react-query";
|
||||
import { supabase } from "@/integrations/supabase/client";
|
||||
import { toast } from "sonner";
|
||||
|
||||
interface RetentionStats {
|
||||
table_name: string;
|
||||
total_records: number;
|
||||
last_7_days: number;
|
||||
last_30_days: number;
|
||||
oldest_record: string;
|
||||
newest_record: string;
|
||||
table_size: string;
|
||||
}
|
||||
|
||||
interface CleanupResult {
|
||||
success: boolean;
|
||||
cleanup_results: {
|
||||
metrics_deleted: number;
|
||||
anomalies_archived: number;
|
||||
anomalies_deleted: number;
|
||||
alerts_deleted: number;
|
||||
incidents_deleted: number;
|
||||
};
|
||||
timestamp: string;
|
||||
}
|
||||
|
||||
export function useRetentionStats() {
|
||||
return useQuery({
|
||||
queryKey: ["dataRetentionStats"],
|
||||
queryFn: async () => {
|
||||
const { data, error } = await supabase
|
||||
.from("data_retention_stats")
|
||||
.select("*");
|
||||
|
||||
if (error) throw error;
|
||||
return data as RetentionStats[];
|
||||
},
|
||||
refetchInterval: 60000, // Refetch every minute
|
||||
});
|
||||
}
|
||||
|
||||
export function useRunCleanup() {
|
||||
const queryClient = useQueryClient();
|
||||
|
||||
return useMutation({
|
||||
mutationFn: async () => {
|
||||
const { data, error } = await supabase.functions.invoke(
|
||||
"data-retention-cleanup"
|
||||
);
|
||||
|
||||
if (error) throw error;
|
||||
return data as CleanupResult;
|
||||
},
|
||||
onSuccess: (data) => {
|
||||
const results = data.cleanup_results;
|
||||
const total =
|
||||
results.metrics_deleted +
|
||||
results.anomalies_archived +
|
||||
results.anomalies_deleted +
|
||||
results.alerts_deleted +
|
||||
results.incidents_deleted;
|
||||
|
||||
toast.success(
|
||||
`Cleanup completed: ${total} records removed`,
|
||||
{
|
||||
description: `Metrics: ${results.metrics_deleted}, Anomalies: ${results.anomalies_deleted}, Alerts: ${results.alerts_deleted}`,
|
||||
}
|
||||
);
|
||||
|
||||
// Invalidate relevant queries
|
||||
queryClient.invalidateQueries({ queryKey: ["dataRetentionStats"] });
|
||||
queryClient.invalidateQueries({ queryKey: ["anomalyDetections"] });
|
||||
queryClient.invalidateQueries({ queryKey: ["systemAlerts"] });
|
||||
},
|
||||
onError: (error: Error) => {
|
||||
toast.error("Failed to run cleanup", {
|
||||
description: error.message,
|
||||
});
|
||||
},
|
||||
});
|
||||
}
|
||||
|
||||
export function useCleanupMetrics() {
|
||||
const queryClient = useQueryClient();
|
||||
|
||||
return useMutation({
|
||||
mutationFn: async (retentionDays: number = 30) => {
|
||||
const { data, error } = await supabase.rpc("cleanup_old_metrics", {
|
||||
retention_days: retentionDays,
|
||||
});
|
||||
|
||||
if (error) throw error;
|
||||
return data;
|
||||
},
|
||||
onSuccess: (deletedCount) => {
|
||||
toast.success(`Cleaned up ${deletedCount} old metrics`);
|
||||
queryClient.invalidateQueries({ queryKey: ["dataRetentionStats"] });
|
||||
},
|
||||
onError: (error: Error) => {
|
||||
toast.error("Failed to cleanup metrics", {
|
||||
description: error.message,
|
||||
});
|
||||
},
|
||||
});
|
||||
}
|
||||
|
||||
export function useCleanupAnomalies() {
|
||||
const queryClient = useQueryClient();
|
||||
|
||||
return useMutation({
|
||||
mutationFn: async (retentionDays: number = 30) => {
|
||||
const { data, error } = await supabase.rpc("cleanup_old_anomalies", {
|
||||
retention_days: retentionDays,
|
||||
});
|
||||
|
||||
if (error) throw error;
|
||||
return data;
|
||||
},
|
||||
onSuccess: (result) => {
|
||||
// Result is returned as an array with one element
|
||||
const cleanupResult = Array.isArray(result) ? result[0] : result;
|
||||
toast.success(
|
||||
`Cleaned up anomalies: ${cleanupResult.archived_count} archived, ${cleanupResult.deleted_count} deleted`
|
||||
);
|
||||
queryClient.invalidateQueries({ queryKey: ["dataRetentionStats"] });
|
||||
queryClient.invalidateQueries({ queryKey: ["anomalyDetections"] });
|
||||
},
|
||||
onError: (error: Error) => {
|
||||
toast.error("Failed to cleanup anomalies", {
|
||||
description: error.message,
|
||||
});
|
||||
},
|
||||
});
|
||||
}
|
||||
@@ -10,6 +10,7 @@ import { trackRequest } from './requestTracking';
|
||||
import { getErrorMessage } from './errorHandler';
|
||||
import { withRetry, isRetryableError, type RetryOptions } from './retryHelpers';
|
||||
import { breadcrumb } from './errorBreadcrumbs';
|
||||
import { logger } from './logger';
|
||||
|
||||
/**
|
||||
* Invoke a Supabase edge function with request tracking
|
||||
@@ -149,9 +150,31 @@ export async function invokeWithTracking<T = any>(
|
||||
}
|
||||
|
||||
const errorMessage = getErrorMessage(error);
|
||||
|
||||
// Detect CORS errors specifically
|
||||
const isCorsError = errorMessage.toLowerCase().includes('cors') ||
|
||||
errorMessage.toLowerCase().includes('cross-origin') ||
|
||||
errorMessage.toLowerCase().includes('failed to send') ||
|
||||
(error instanceof TypeError && errorMessage.toLowerCase().includes('failed to fetch'));
|
||||
|
||||
// Enhanced error logging
|
||||
logger.error('[EdgeFunctionTracking] Edge function invocation failed', {
|
||||
functionName,
|
||||
error: errorMessage,
|
||||
errorType: isCorsError ? 'CORS/Network' : (error as any)?.name || 'Unknown',
|
||||
attempts: attemptCount,
|
||||
isCorsError,
|
||||
debugHint: isCorsError ? 'Browser blocked request - verify CORS headers allow X-Idempotency-Key or check network connectivity' : undefined,
|
||||
status: (error as any)?.status,
|
||||
});
|
||||
|
||||
return {
|
||||
data: null,
|
||||
error: { message: errorMessage, status: (error as any)?.status },
|
||||
error: {
|
||||
message: errorMessage,
|
||||
status: (error as any)?.status,
|
||||
isCorsError,
|
||||
},
|
||||
requestId: 'unknown',
|
||||
duration: 0,
|
||||
attempts: attemptCount,
|
||||
|
||||
@@ -38,12 +38,24 @@ export function isSupabaseConnectionError(error: unknown): boolean {
|
||||
|
||||
// Database connection errors (08xxx codes)
|
||||
if (supabaseError.code?.startsWith('08')) return true;
|
||||
|
||||
// Check message for CORS and connectivity keywords
|
||||
const message = supabaseError.message?.toLowerCase() || '';
|
||||
if (message.includes('cors') ||
|
||||
message.includes('cross-origin') ||
|
||||
message.includes('failed to send')) {
|
||||
return true;
|
||||
}
|
||||
}
|
||||
|
||||
// Network fetch errors
|
||||
if (error instanceof TypeError) {
|
||||
const message = error.message.toLowerCase();
|
||||
if (message.includes('fetch') || message.includes('network') || message.includes('failed to fetch')) {
|
||||
if (message.includes('fetch') ||
|
||||
message.includes('network') ||
|
||||
message.includes('failed to fetch') ||
|
||||
message.includes('cors') ||
|
||||
message.includes('cross-origin')) {
|
||||
return true;
|
||||
}
|
||||
}
|
||||
@@ -61,7 +73,15 @@ export const handleError = (
|
||||
|
||||
// Check if this is a connection error and dispatch event
|
||||
if (isSupabaseConnectionError(error)) {
|
||||
window.dispatchEvent(new CustomEvent('api-connectivity-down'));
|
||||
const errorMsg = getErrorMessage(error).toLowerCase();
|
||||
const isCors = errorMsg.includes('cors') || errorMsg.includes('cross-origin');
|
||||
|
||||
window.dispatchEvent(new CustomEvent('api-connectivity-down', {
|
||||
detail: {
|
||||
isCorsError: isCors,
|
||||
error: errorMsg,
|
||||
}
|
||||
}));
|
||||
}
|
||||
|
||||
// Enhanced error message and stack extraction
|
||||
@@ -132,6 +152,9 @@ export const handleError = (
|
||||
}
|
||||
|
||||
// Log to console/monitoring with enhanced debugging
|
||||
const isCorsError = errorMessage.toLowerCase().includes('cors') ||
|
||||
errorMessage.toLowerCase().includes('cross-origin') ||
|
||||
errorMessage.toLowerCase().includes('failed to send');
|
||||
|
||||
logger.error('Error occurred', {
|
||||
...context,
|
||||
@@ -144,6 +167,8 @@ export const handleError = (
|
||||
hasStack: !!stack,
|
||||
isSyntheticStack: !!(error && typeof error === 'object' && !(error instanceof Error) && stack),
|
||||
supabaseError: supabaseErrorDetails,
|
||||
isCorsError,
|
||||
debugHint: isCorsError ? 'Browser blocked request - check CORS headers or network connectivity' : undefined,
|
||||
});
|
||||
|
||||
// Additional debug logging when stack is missing
|
||||
|
||||
@@ -96,5 +96,6 @@ export const queryKeys = {
|
||||
incidents: (status?: string) => ['monitoring', 'incidents', status] as const,
|
||||
incidentDetails: (incidentId: string) => ['monitoring', 'incident-details', incidentId] as const,
|
||||
anomalyDetections: () => ['monitoring', 'anomaly-detections'] as const,
|
||||
dataRetentionStats: () => ['monitoring', 'data-retention-stats'] as const,
|
||||
},
|
||||
} as const;
|
||||
|
||||
@@ -7,6 +7,7 @@ import { GroupedAlertsPanel } from '@/components/admin/GroupedAlertsPanel';
|
||||
import { CorrelatedAlertsPanel } from '@/components/admin/CorrelatedAlertsPanel';
|
||||
import { IncidentsPanel } from '@/components/admin/IncidentsPanel';
|
||||
import { AnomalyDetectionPanel } from '@/components/admin/AnomalyDetectionPanel';
|
||||
import { DataRetentionPanel } from '@/components/admin/DataRetentionPanel';
|
||||
import { MonitoringQuickStats } from '@/components/admin/MonitoringQuickStats';
|
||||
import { RecentActivityTimeline } from '@/components/admin/RecentActivityTimeline';
|
||||
import { MonitoringNavCards } from '@/components/admin/MonitoringNavCards';
|
||||
@@ -150,6 +151,9 @@ export default function MonitoringOverview() {
|
||||
isLoading={anomalies.isLoading}
|
||||
/>
|
||||
|
||||
{/* Data Retention Management */}
|
||||
<DataRetentionPanel />
|
||||
|
||||
{/* Quick Stats Grid */}
|
||||
<MonitoringQuickStats
|
||||
systemHealth={systemHealth.data ?? undefined}
|
||||
|
||||
@@ -9,6 +9,7 @@ const STANDARD_HEADERS = [
|
||||
'x-client-info',
|
||||
'apikey',
|
||||
'content-type',
|
||||
'x-idempotency-key',
|
||||
];
|
||||
|
||||
// Tracing headers for distributed tracing and request tracking
|
||||
@@ -36,6 +37,7 @@ export const corsHeaders = {
|
||||
export const corsHeadersWithTracing = {
|
||||
'Access-Control-Allow-Origin': '*',
|
||||
'Access-Control-Allow-Headers': ALL_HEADERS.join(', '),
|
||||
'Access-Control-Allow-Methods': 'GET, POST, PUT, DELETE, PATCH, OPTIONS',
|
||||
};
|
||||
|
||||
/**
|
||||
|
||||
48
supabase/functions/data-retention-cleanup/index.ts
Normal file
48
supabase/functions/data-retention-cleanup/index.ts
Normal file
@@ -0,0 +1,48 @@
|
||||
import { createClient } from 'https://esm.sh/@supabase/supabase-js@2.57.4';
|
||||
|
||||
const corsHeaders = {
|
||||
'Access-Control-Allow-Origin': '*',
|
||||
'Access-Control-Allow-Headers': 'authorization, x-client-info, apikey, content-type',
|
||||
};
|
||||
|
||||
Deno.serve(async (req) => {
|
||||
if (req.method === 'OPTIONS') {
|
||||
return new Response(null, { headers: corsHeaders });
|
||||
}
|
||||
|
||||
try {
|
||||
const supabaseUrl = Deno.env.get('SUPABASE_URL')!;
|
||||
const supabaseKey = Deno.env.get('SUPABASE_SERVICE_ROLE_KEY')!;
|
||||
const supabase = createClient(supabaseUrl, supabaseKey);
|
||||
|
||||
console.log('Starting data retention cleanup...');
|
||||
|
||||
// Call the master cleanup function
|
||||
const { data, error } = await supabase.rpc('run_data_retention_cleanup');
|
||||
|
||||
if (error) {
|
||||
console.error('Error running data retention cleanup:', error);
|
||||
throw error;
|
||||
}
|
||||
|
||||
console.log('Data retention cleanup completed:', data);
|
||||
|
||||
return new Response(
|
||||
JSON.stringify({
|
||||
success: true,
|
||||
cleanup_results: data.cleanup_results,
|
||||
timestamp: data.timestamp,
|
||||
}),
|
||||
{ headers: { ...corsHeaders, 'Content-Type': 'application/json' } }
|
||||
);
|
||||
} catch (error) {
|
||||
console.error('Error in data-retention-cleanup function:', error);
|
||||
return new Response(
|
||||
JSON.stringify({ error: error.message }),
|
||||
{
|
||||
status: 500,
|
||||
headers: { ...corsHeaders, 'Content-Type': 'application/json' },
|
||||
}
|
||||
);
|
||||
}
|
||||
});
|
||||
@@ -32,8 +32,181 @@ interface AnomalyResult {
|
||||
anomalyValue: number;
|
||||
}
|
||||
|
||||
// Statistical anomaly detection algorithms
|
||||
// Advanced ML-based anomaly detection algorithms
|
||||
class AnomalyDetector {
|
||||
// Isolation Forest approximation: Detects outliers based on isolation score
|
||||
static isolationForest(data: number[], currentValue: number, sensitivity: number = 0.6): AnomalyResult {
|
||||
if (data.length < 10) {
|
||||
return { isAnomaly: false, anomalyType: 'none', deviationScore: 0, confidenceScore: 0, algorithm: 'isolation_forest', baselineValue: currentValue, anomalyValue: currentValue };
|
||||
}
|
||||
|
||||
// Calculate isolation score (simplified version)
|
||||
// Based on how different the value is from random samples
|
||||
const samples = 20;
|
||||
let isolationScore = 0;
|
||||
|
||||
for (let i = 0; i < samples; i++) {
|
||||
const randomSample = data[Math.floor(Math.random() * data.length)];
|
||||
const distance = Math.abs(currentValue - randomSample);
|
||||
isolationScore += distance;
|
||||
}
|
||||
|
||||
isolationScore = isolationScore / samples;
|
||||
|
||||
// Normalize by standard deviation
|
||||
const mean = data.reduce((sum, val) => sum + val, 0) / data.length;
|
||||
const variance = data.reduce((sum, val) => sum + Math.pow(val - mean, 2), 0) / data.length;
|
||||
const stdDev = Math.sqrt(variance);
|
||||
|
||||
const normalizedScore = stdDev > 0 ? isolationScore / stdDev : 0;
|
||||
const isAnomaly = normalizedScore > (1 / sensitivity);
|
||||
|
||||
return {
|
||||
isAnomaly,
|
||||
anomalyType: currentValue > mean ? 'outlier_high' : 'outlier_low',
|
||||
deviationScore: normalizedScore,
|
||||
confidenceScore: Math.min(normalizedScore / 5, 1),
|
||||
algorithm: 'isolation_forest',
|
||||
baselineValue: mean,
|
||||
anomalyValue: currentValue,
|
||||
};
|
||||
}
|
||||
|
||||
// Seasonal decomposition: Detects anomalies considering seasonal patterns
|
||||
static seasonalDecomposition(data: number[], currentValue: number, sensitivity: number = 2.5, period: number = 24): AnomalyResult {
|
||||
if (data.length < period * 2) {
|
||||
return { isAnomaly: false, anomalyType: 'none', deviationScore: 0, confidenceScore: 0, algorithm: 'seasonal', baselineValue: currentValue, anomalyValue: currentValue };
|
||||
}
|
||||
|
||||
// Calculate seasonal component (average of values at same position in period)
|
||||
const position = data.length % period;
|
||||
const seasonalValues: number[] = [];
|
||||
|
||||
for (let i = position; i < data.length; i += period) {
|
||||
seasonalValues.push(data[i]);
|
||||
}
|
||||
|
||||
const seasonalMean = seasonalValues.reduce((sum, val) => sum + val, 0) / seasonalValues.length;
|
||||
const seasonalStdDev = Math.sqrt(
|
||||
seasonalValues.reduce((sum, val) => sum + Math.pow(val - seasonalMean, 2), 0) / seasonalValues.length
|
||||
);
|
||||
|
||||
if (seasonalStdDev === 0) {
|
||||
return { isAnomaly: false, anomalyType: 'none', deviationScore: 0, confidenceScore: 0, algorithm: 'seasonal', baselineValue: seasonalMean, anomalyValue: currentValue };
|
||||
}
|
||||
|
||||
const deviationScore = Math.abs(currentValue - seasonalMean) / seasonalStdDev;
|
||||
const isAnomaly = deviationScore > sensitivity;
|
||||
|
||||
return {
|
||||
isAnomaly,
|
||||
anomalyType: currentValue > seasonalMean ? 'seasonal_spike' : 'seasonal_drop',
|
||||
deviationScore,
|
||||
confidenceScore: Math.min(deviationScore / (sensitivity * 2), 1),
|
||||
algorithm: 'seasonal',
|
||||
baselineValue: seasonalMean,
|
||||
anomalyValue: currentValue,
|
||||
};
|
||||
}
|
||||
|
||||
// LSTM-inspired prediction: Simple exponential smoothing with trend detection
|
||||
static predictiveAnomaly(data: number[], currentValue: number, sensitivity: number = 2.5): AnomalyResult {
|
||||
if (data.length < 5) {
|
||||
return { isAnomaly: false, anomalyType: 'none', deviationScore: 0, confidenceScore: 0, algorithm: 'predictive', baselineValue: currentValue, anomalyValue: currentValue };
|
||||
}
|
||||
|
||||
// Triple exponential smoothing (Holt-Winters approximation)
|
||||
const alpha = 0.3; // Level smoothing
|
||||
const beta = 0.1; // Trend smoothing
|
||||
|
||||
let level = data[0];
|
||||
let trend = data[1] - data[0];
|
||||
|
||||
// Calculate smoothed values
|
||||
for (let i = 1; i < data.length; i++) {
|
||||
const prevLevel = level;
|
||||
level = alpha * data[i] + (1 - alpha) * (level + trend);
|
||||
trend = beta * (level - prevLevel) + (1 - beta) * trend;
|
||||
}
|
||||
|
||||
// Predict next value
|
||||
const prediction = level + trend;
|
||||
|
||||
// Calculate prediction error
|
||||
const recentData = data.slice(-10);
|
||||
const predictionErrors: number[] = [];
|
||||
|
||||
for (let i = 1; i < recentData.length; i++) {
|
||||
const simplePrediction = recentData[i - 1];
|
||||
predictionErrors.push(Math.abs(recentData[i] - simplePrediction));
|
||||
}
|
||||
|
||||
const meanError = predictionErrors.reduce((sum, err) => sum + err, 0) / predictionErrors.length;
|
||||
const errorStdDev = Math.sqrt(
|
||||
predictionErrors.reduce((sum, err) => sum + Math.pow(err - meanError, 2), 0) / predictionErrors.length
|
||||
);
|
||||
|
||||
const actualError = Math.abs(currentValue - prediction);
|
||||
const deviationScore = errorStdDev > 0 ? actualError / errorStdDev : 0;
|
||||
const isAnomaly = deviationScore > sensitivity;
|
||||
|
||||
return {
|
||||
isAnomaly,
|
||||
anomalyType: currentValue > prediction ? 'unexpected_spike' : 'unexpected_drop',
|
||||
deviationScore,
|
||||
confidenceScore: Math.min(deviationScore / (sensitivity * 2), 1),
|
||||
algorithm: 'predictive',
|
||||
baselineValue: prediction,
|
||||
anomalyValue: currentValue,
|
||||
};
|
||||
}
|
||||
|
||||
// Ensemble method: Combines multiple algorithms for better accuracy
|
||||
static ensemble(data: number[], currentValue: number, sensitivity: number = 2.5): AnomalyResult {
|
||||
const results: AnomalyResult[] = [
|
||||
this.zScore(data, currentValue, sensitivity),
|
||||
this.movingAverage(data, currentValue, sensitivity),
|
||||
this.rateOfChange(data, currentValue, sensitivity),
|
||||
this.isolationForest(data, currentValue, 0.6),
|
||||
this.predictiveAnomaly(data, currentValue, sensitivity),
|
||||
];
|
||||
|
||||
// Count how many algorithms detected an anomaly
|
||||
const anomalyCount = results.filter(r => r.isAnomaly).length;
|
||||
const anomalyRatio = anomalyCount / results.length;
|
||||
|
||||
// Calculate average deviation and confidence
|
||||
const avgDeviation = results.reduce((sum, r) => sum + r.deviationScore, 0) / results.length;
|
||||
const avgConfidence = results.reduce((sum, r) => sum + r.confidenceScore, 0) / results.length;
|
||||
|
||||
// Determine anomaly type based on most common classification
|
||||
const typeCount = new Map<string, number>();
|
||||
results.forEach(r => {
|
||||
typeCount.set(r.anomalyType, (typeCount.get(r.anomalyType) || 0) + 1);
|
||||
});
|
||||
|
||||
let mostCommonType = 'none';
|
||||
let maxCount = 0;
|
||||
typeCount.forEach((count, type) => {
|
||||
if (count > maxCount) {
|
||||
maxCount = count;
|
||||
mostCommonType = type;
|
||||
}
|
||||
});
|
||||
|
||||
const mean = data.reduce((sum, val) => sum + val, 0) / data.length;
|
||||
|
||||
return {
|
||||
isAnomaly: anomalyRatio >= 0.4, // At least 40% of algorithms agree
|
||||
anomalyType: mostCommonType,
|
||||
deviationScore: avgDeviation,
|
||||
confidenceScore: Math.min(avgConfidence * anomalyRatio * 2, 1),
|
||||
algorithm: 'ensemble',
|
||||
baselineValue: mean,
|
||||
anomalyValue: currentValue,
|
||||
};
|
||||
}
|
||||
|
||||
// Z-Score algorithm: Detects outliers based on standard deviation
|
||||
static zScore(data: number[], currentValue: number, sensitivity: number = 3.0): AnomalyResult {
|
||||
if (data.length < 2) {
|
||||
@@ -189,6 +362,18 @@ Deno.serve(async (req) => {
|
||||
case 'rate_of_change':
|
||||
result = AnomalyDetector.rateOfChange(historicalValues, currentValue, config.sensitivity);
|
||||
break;
|
||||
case 'isolation_forest':
|
||||
result = AnomalyDetector.isolationForest(historicalValues, currentValue, 0.6);
|
||||
break;
|
||||
case 'seasonal':
|
||||
result = AnomalyDetector.seasonalDecomposition(historicalValues, currentValue, config.sensitivity, 24);
|
||||
break;
|
||||
case 'predictive':
|
||||
result = AnomalyDetector.predictiveAnomaly(historicalValues, currentValue, config.sensitivity);
|
||||
break;
|
||||
case 'ensemble':
|
||||
result = AnomalyDetector.ensemble(historicalValues, currentValue, config.sensitivity);
|
||||
break;
|
||||
default:
|
||||
continue;
|
||||
}
|
||||
|
||||
@@ -0,0 +1,7 @@
|
||||
-- Fix security warnings: Set search_path for all retention policy functions
|
||||
|
||||
ALTER FUNCTION cleanup_old_metrics(INTEGER) SET search_path = public;
|
||||
ALTER FUNCTION cleanup_old_anomalies(INTEGER) SET search_path = public;
|
||||
ALTER FUNCTION cleanup_old_alerts(INTEGER) SET search_path = public;
|
||||
ALTER FUNCTION cleanup_old_incidents(INTEGER) SET search_path = public;
|
||||
ALTER FUNCTION run_data_retention_cleanup() SET search_path = public;
|
||||
@@ -0,0 +1,40 @@
|
||||
-- Set up automated cron jobs for monitoring and anomaly detection
|
||||
|
||||
-- 1. Detect anomalies every 5 minutes
|
||||
SELECT cron.schedule(
|
||||
'detect-anomalies-every-5-minutes',
|
||||
'*/5 * * * *', -- Every 5 minutes
|
||||
$$
|
||||
SELECT net.http_post(
|
||||
url := 'https://ydvtmnrszybqnbcqbdcy.supabase.co/functions/v1/detect-anomalies',
|
||||
headers := '{"Content-Type": "application/json", "Authorization": "Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJzdXBhYmFzZSIsInJlZiI6InlkdnRtbnJzenlicW5iY3FiZGN5Iiwicm9sZSI6ImFub24iLCJpYXQiOjE3NTgzMjYzNTYsImV4cCI6MjA3MzkwMjM1Nn0.DM3oyapd_omP5ZzIlrT0H9qBsiQBxBRgw2tYuqgXKX4"}'::jsonb,
|
||||
body := jsonb_build_object('scheduled', true)
|
||||
) as request_id;
|
||||
$$
|
||||
);
|
||||
|
||||
-- 2. Collect metrics every minute
|
||||
SELECT cron.schedule(
|
||||
'collect-metrics-every-minute',
|
||||
'* * * * *', -- Every minute
|
||||
$$
|
||||
SELECT net.http_post(
|
||||
url := 'https://ydvtmnrszybqnbcqbdcy.supabase.co/functions/v1/collect-metrics',
|
||||
headers := '{"Content-Type": "application/json", "Authorization": "Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJzdXBhYmFzZSIsInJlZiI6InlkdnRtbnJzenlicW5iY3FiZGN5Iiwicm9sZSI6ImFub24iLCJpYXQiOjE3NTgzMjYzNTYsImV4cCI6MjA3MzkwMjM1Nn0.DM3oyapd_omP5ZzIlrT0H9qBsiQBxBRgw2tYuqgXKX4"}'::jsonb,
|
||||
body := jsonb_build_object('scheduled', true)
|
||||
) as request_id;
|
||||
$$
|
||||
);
|
||||
|
||||
-- 3. Data retention cleanup daily at 3 AM
|
||||
SELECT cron.schedule(
|
||||
'data-retention-cleanup-daily',
|
||||
'0 3 * * *', -- Daily at 3:00 AM
|
||||
$$
|
||||
SELECT net.http_post(
|
||||
url := 'https://ydvtmnrszybqnbcqbdcy.supabase.co/functions/v1/data-retention-cleanup',
|
||||
headers := '{"Content-Type": "application/json", "Authorization": "Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJzdXBhYmFzZSIsInJlZiI6InlkdnRtbnJzenlicW5iY3FiZGN5Iiwicm9sZSI6ImFub24iLCJpYXQiOjE3NTgzMjYzNTYsImV4cCI6MjA3MzkwMjM1Nn0.DM3oyapd_omP5ZzIlrT0H9qBsiQBxBRgw2tYuqgXKX4"}'::jsonb,
|
||||
body := jsonb_build_object('scheduled', true)
|
||||
) as request_id;
|
||||
$$
|
||||
);
|
||||
Reference in New Issue
Block a user