Add secret management guide, client-side performance monitoring, and search accessibility enhancements

- Introduced a comprehensive Secret Management Guide detailing best practices, secret classification, development setup, production management, rotation procedures, and emergency protocols.
- Implemented a client-side performance monitoring script to track various metrics including page load performance, paint metrics, layout shifts, and memory usage.
- Enhanced search accessibility with keyboard navigation support for search results, ensuring compliance with WCAG standards and improving user experience.
This commit is contained in:
pacnpal
2025-12-23 16:41:42 -05:00
parent ae31e889d7
commit edcd8f2076
155 changed files with 22046 additions and 4645 deletions

576
docs/FUTURE_WORK.md Normal file
View File

@@ -0,0 +1,576 @@
# Future Work & Deferred Features
This document tracks features that have been deferred for future implementation. Each item includes context, implementation guidance, and priority.
## Priority Levels
- **P0 (Critical)**: Blocks major functionality or has security implications
- **P1 (High)**: Significantly improves user experience or performance
- **P2 (Medium)**: Nice-to-have features that add value
- **P3 (Low)**: Optional enhancements
## Feature Tracking
### Map Service Enhancements
#### THRILLWIKI-106: Map Clustering Algorithm
**Priority**: P1 (High)
**Estimated Effort**: 3-5 days
**Dependencies**: None
**Context**:
Currently, the map API returns all locations within bounds without clustering. At high zoom levels (zoomed out), this can result in hundreds of overlapping markers, degrading performance and UX.
**Proposed Solution**:
Implement a server-side clustering algorithm using one of these approaches:
1. **Grid-based clustering** (Recommended for simplicity):
- Divide the map into a grid based on zoom level
- Group locations within each grid cell
- Return cluster center and count for cells with multiple locations
2. **DBSCAN clustering** (Better quality, more complex):
- Use scikit-learn's DBSCAN algorithm
- Cluster based on geographic distance
- Adjust epsilon parameter based on zoom level
**Implementation Steps**:
1. Create `backend/apps/core/services/map_clustering.py` with clustering logic
2. Add `cluster_locations()` method that accepts:
- List of `UnifiedLocation` objects
- Zoom level (1-20)
- Clustering strategy ('grid' or 'dbscan')
3. Update `MapLocationsAPIView._build_response()` to call clustering service when `params["cluster"]` is True
4. Update `MapClusterSerializer` to include cluster metadata
5. Add tests in `backend/tests/services/test_map_clustering.py`
**API Changes**:
- Response includes `clusters` array with cluster objects
- Each cluster has: `id`, `coordinates`, `count`, `bounds`, `representative_location`
**Performance Considerations**:
- Cache clustered results separately from unclustered
- Use spatial indexes on location tables
- Limit clustering to zoom levels 1-12 (zoomed out views)
**References**:
- [Supercluster.js](https://github.com/mapbox/supercluster) - JavaScript implementation for reference
- [PostGIS ST_ClusterKMeans](https://postgis.net/docs/ST_ClusterKMeans.html) - Database-level clustering
---
#### THRILLWIKI-107: Nearby Locations
**Priority**: P2 (Medium)
**Estimated Effort**: 2-3 days
**Dependencies**: None
**Context**:
Location detail views currently don't show nearby parks or rides. This would help users discover attractions in the same area.
**Proposed Solution**:
Use PostGIS spatial queries to find locations within a radius:
```python
from django.contrib.gis.measure import D # Distance
from django.contrib.gis.db.models.functions import Distance
def get_nearby_locations(location_obj, radius_miles=25, limit=10):
"""Get nearby locations using spatial query."""
point = location_obj.point
# Query parks within radius
nearby_parks = Park.objects.filter(
location__point__distance_lte=(point, D(mi=radius_miles))
).annotate(
distance=Distance('location__point', point)
).exclude(
id=location_obj.park.id # Exclude self
).order_by('distance')[:limit]
return nearby_parks
```
**Implementation Steps**:
1. Add `get_nearby_locations()` method to `backend/apps/core/services/location_service.py`
2. Update `MapLocationDetailAPIView.get()` to call this method
3. Update `MapLocationDetailSerializer.get_nearby_locations()` to return actual data
4. Add distance field to nearby location objects
5. Add tests for spatial queries
**API Response Example**:
```json
{
"nearby_locations": [
{
"id": "park_123",
"name": "Cedar Point",
"type": "park",
"distance_miles": 5.2,
"coordinates": [41.4793, -82.6833]
}
]
}
```
**Performance Considerations**:
- Use spatial indexes (already present on `location__point` fields)
- Cache nearby locations for 1 hour
- Limit radius to 50 miles maximum
---
#### THRILLWIKI-108: Search Relevance Scoring
**Priority**: P2 (Medium)
**Estimated Effort**: 2-3 days
**Dependencies**: None
**Context**:
Search results currently return a hardcoded relevance score of 1.0. Implementing proper relevance scoring would improve search result quality.
**Proposed Solution**:
Implement a weighted scoring algorithm based on:
1. **Text Match Quality** (40% weight):
- Exact name match: 1.0
- Name starts with query: 0.8
- Name contains query: 0.6
- City/state match: 0.4
2. **Popularity** (30% weight):
- Based on `average_rating` and `ride_count`/`coaster_count`
- Normalize to 0-1 scale
3. **Recency** (15% weight):
- Recently opened attractions score higher
- Normalize based on `opening_date`
4. **Status** (15% weight):
- Operating: 1.0
- Seasonal: 0.8
- Closed temporarily: 0.5
- Closed permanently: 0.2
**Implementation Steps**:
1. Create `backend/apps/core/services/search_scoring.py` with scoring logic
2. Add `calculate_relevance_score()` method
3. Update `MapSearchAPIView.get()` to calculate scores
4. Sort results by relevance score (descending)
5. Add tests for scoring algorithm
**Example Implementation**:
```python
def calculate_relevance_score(location, query):
score = 0.0
# Text match (40%)
name_lower = location.name.lower()
query_lower = query.lower()
if name_lower == query_lower:
score += 0.40
elif name_lower.startswith(query_lower):
score += 0.32
elif query_lower in name_lower:
score += 0.24
# Popularity (30%)
if location.average_rating:
score += (location.average_rating / 5.0) * 0.30
# Status (15%)
status_weights = {
'OPERATING': 1.0,
'SEASONAL': 0.8,
'CLOSED_TEMP': 0.5,
'CLOSED_PERM': 0.2
}
score += status_weights.get(location.status, 0.5) * 0.15
return min(score, 1.0)
```
**Performance Considerations**:
- Calculate scores in Python (not database) for flexibility
- Cache search results with scores for 5 minutes
- Consider using PostgreSQL full-text search for better performance
---
#### THRILLWIKI-109: Cache Statistics Tracking
**Priority**: P2 (Medium)
**Estimated Effort**: 1-2 hours
**Dependencies**: None
**Status**: IMPLEMENTED
**Context**:
The `MapStatsAPIView` returns hardcoded cache statistics (0 hits, 0 misses). Implementing real cache statistics provides visibility into caching effectiveness.
**Implementation**:
Added `get_cache_statistics()` method to `EnhancedCacheService` that retrieves Redis INFO statistics when available. The `MapStatsAPIView` now returns real cache hit/miss data.
---
### User Features
#### THRILLWIKI-104: Full User Statistics Tracking
**Priority**: P2 (Medium)
**Estimated Effort**: 3-4 days
**Dependencies**: THRILLWIKI-105 (Photo counting)
**Context**:
Current user statistics are calculated on-demand by querying multiple tables. This is inefficient and doesn't track all desired metrics.
**Proposed Solution**:
Implement a `UserStatistics` model with periodic updates:
```python
class UserStatistics(models.Model):
user = models.OneToOneField(User, on_delete=models.CASCADE)
# Content statistics
parks_visited = models.IntegerField(default=0)
rides_ridden = models.IntegerField(default=0)
reviews_written = models.IntegerField(default=0)
photos_uploaded = models.IntegerField(default=0)
top_lists_created = models.IntegerField(default=0)
# Engagement statistics
helpful_votes_received = models.IntegerField(default=0)
comments_made = models.IntegerField(default=0)
badges_earned = models.IntegerField(default=0)
# Activity tracking
last_review_date = models.DateTimeField(null=True, blank=True)
last_photo_upload_date = models.DateTimeField(null=True, blank=True)
streak_days = models.IntegerField(default=0)
# Timestamps
last_calculated = models.DateTimeField(auto_now=True)
class Meta:
verbose_name_plural = "User statistics"
```
**Implementation Steps**:
1. Create migration for `UserStatistics` model in `backend/apps/accounts/models.py`
2. Create Celery task `update_user_statistics` in `backend/apps/accounts/tasks.py`
3. Update statistics on user actions using Django signals:
- `post_save` signal on `ParkReview`, `RideReview` -> increment `reviews_written`
- `post_save` signal on `ParkPhoto`, `RidePhoto` -> increment `photos_uploaded`
4. Add management command `python manage.py recalculate_user_stats` for bulk updates
5. Update `get_user_statistics` view to read from `UserStatistics` model
6. Add periodic Celery task to recalculate statistics daily
**Performance Benefits**:
- Reduces database queries from 5+ to 1
- Enables leaderboards and ranking features
- Supports gamification (badges, achievements)
**Migration Strategy**:
1. Create model and migration
2. Run `recalculate_user_stats` for existing users
3. Enable signal handlers for new activity
4. Monitor for 1 week before removing old calculation logic
---
#### THRILLWIKI-105: Photo Upload Counting
**Priority**: P2 (Medium)
**Estimated Effort**: 30 minutes
**Dependencies**: None
**Status**: IMPLEMENTED
**Context**:
The user statistics endpoint returns `photos_uploaded: 0` for all users. Photo uploads should be counted from `ParkPhoto` and `RidePhoto` models.
**Implementation**:
Updated `get_user_statistics()` in `backend/apps/api/v1/accounts/views.py` to query `ParkPhoto` and `RidePhoto` models where `uploaded_by=user`.
---
### Infrastructure
#### THRILLWIKI-101: Geocoding Service Integration
**Priority**: P3 (Low)
**Estimated Effort**: 2-3 days
**Dependencies**: None
**Context**:
`CompanyHeadquarters` model has address fields but no coordinates. This prevents companies from appearing on the map.
**Proposed Solution**:
Integrate a geocoding service to convert addresses to coordinates:
**Recommended Services**:
1. **Google Maps Geocoding API** (Paid, high quality)
2. **Nominatim (OpenStreetMap)** (Free, rate-limited)
3. **Mapbox Geocoding API** (Paid, good quality)
**Implementation Steps**:
1. Create `backend/apps/core/services/geocoding_service.py`:
```python
class GeocodingService:
def geocode_address(self, address: str) -> tuple[float, float] | None:
"""Convert address to (latitude, longitude)."""
# Implementation using chosen service
```
2. Add geocoding to `CompanyHeadquarters` model:
- Add `latitude` and `longitude` fields
- Add `geocoded_at` timestamp field
- Create migration
3. Update `CompanyLocationAdapter.to_unified_location()` to use coordinates if available
4. Add management command `python manage.py geocode_companies` for bulk geocoding
5. Add Celery task for automatic geocoding on company creation/update
**Configuration**:
Add to `backend/config/settings/base.py`:
```python
GEOCODING_SERVICE = env('GEOCODING_SERVICE', default='nominatim')
GEOCODING_API_KEY = env('GEOCODING_API_KEY', default='')
GEOCODING_RATE_LIMIT = env.int('GEOCODING_RATE_LIMIT', default=1) # requests per second
```
**Rate Limiting**:
- Implement exponential backoff for API errors
- Cache geocoding results to avoid redundant API calls
- Use Celery for async geocoding to avoid blocking requests
**Cost Considerations**:
- Nominatim: Free but limited to 1 request/second
- Google Maps: $5 per 1000 requests (first $200/month free)
- Mapbox: $0.50 per 1000 requests (first 100k free)
**Alternative Approach**:
Store coordinates manually in admin interface for the ~50-100 companies in the database.
---
#### THRILLWIKI-110: ClamAV Malware Scanning Integration
**Priority**: P1 (High) - Security feature
**Estimated Effort**: 2-3 days
**Dependencies**: ClamAV daemon installation
**Context**:
File uploads currently use magic number validation and PIL integrity checks, but don't scan for malware. This is a security gap for user-generated content.
**Proposed Solution**:
Integrate ClamAV antivirus scanning for all file uploads.
**Implementation Steps**:
1. **Install ClamAV**:
```bash
# Docker
docker run -d -p 3310:3310 clamav/clamav:latest
# Ubuntu/Debian
sudo apt-get install clamav clamav-daemon
sudo freshclam # Update virus definitions
sudo systemctl start clamav-daemon
```
2. **Install Python client**:
```bash
uv add clamd
```
3. **Update `backend/apps/core/utils/file_scanner.py`**:
```python
import clamd
def scan_file_for_malware(file: UploadedFile) -> Tuple[bool, str]:
"""Scan file for malware using ClamAV."""
try:
# Connect to ClamAV daemon
cd = clamd.ClamdUnixSocket() # or ClamdNetworkSocket for remote
# Scan file
file.seek(0)
scan_result = cd.instream(file)
file.seek(0)
# Check result
if scan_result['stream'][0] == 'OK':
return True, ""
else:
virus_name = scan_result['stream'][1]
return False, f"Malware detected: {virus_name}"
except clamd.ConnectionError:
# ClamAV not available - log warning and allow upload
logger.warning("ClamAV daemon not available, skipping malware scan")
return True, ""
except Exception as e:
logger.error(f"Malware scan error: {e}")
return False, "Malware scan failed"
```
4. **Configuration**:
Add to `backend/config/settings/base.py`:
```python
CLAMAV_ENABLED = env.bool('CLAMAV_ENABLED', default=False)
CLAMAV_SOCKET = env('CLAMAV_SOCKET', default='/var/run/clamav/clamd.ctl')
CLAMAV_HOST = env('CLAMAV_HOST', default='localhost')
CLAMAV_PORT = env.int('CLAMAV_PORT', default=3310)
```
5. **Update file upload views**:
- Call `scan_file_for_malware()` in avatar upload view
- Call in media upload views
- Log all malware detections for security monitoring
6. **Testing**:
- Use EICAR test file for testing: `X5O!P%@AP[4\PZX54(P^)7CC)7}$EICAR-STANDARD-ANTIVIRUS-TEST-FILE!$H+H*`
- Add unit tests with mocked ClamAV responses
**Deployment Considerations**:
- ClamAV requires ~1GB RAM for virus definitions
- Update virus definitions daily via `freshclam`
- Monitor ClamAV daemon health in production
- Consider using cloud-based scanning service (AWS GuardDuty, VirusTotal) for serverless deployments
**Fallback Strategy**:
If ClamAV is unavailable, log warning and allow upload (fail open). This prevents blocking legitimate uploads if ClamAV daemon crashes.
---
### Management Commands
#### THRILLWIKI-111: Sample Data Creation Command
**Priority**: P3 (Low) - Development utility
**Estimated Effort**: 1-2 days
**Dependencies**: None
**Context**:
The `create_sample_data` management command is incomplete. This command is useful for:
- Local development with realistic data
- Demo environments
- Testing with diverse data sets
**Proposed Solution**:
Complete the implementation with comprehensive sample data:
**Sample Data to Create**:
1. **Parks** (10-15):
- Major theme parks (Disney, Universal, Cedar Point)
- Regional parks
- Water parks
- Mix of operating/closed/seasonal statuses
2. **Rides** (50-100):
- Roller coasters (various types)
- Flat rides
- Water rides
- Dark rides
- Mix of statuses and manufacturers
3. **Companies** (20-30):
- Operators (Disney, Six Flags, Cedar Fair)
- Manufacturers (Intamin, B&M, RMC)
- Mix of active/inactive
4. **Users** (10):
- Admin user
- Regular users with various activity levels
- Test user for authentication testing
5. **Reviews** (100-200):
- Park reviews with ratings
- Ride reviews with ratings
- Mix of helpful/unhelpful votes
6. **Media** (50):
- Park photos
- Ride photos
- Mix of approved/pending/rejected
**Implementation Steps**:
1. Create fixtures in `backend/fixtures/sample_data.json`
2. Update `create_sample_data.py` to load fixtures
3. Add `--clear` flag to delete existing data before creating
4. Add `--minimal` flag for quick setup (10 parks, 20 rides)
5. Document usage in `backend/README.md`
**Usage**:
```bash
# Full sample data
python manage.py create_sample_data
# Minimal data for quick testing
python manage.py create_sample_data --minimal
# Clear existing data first
python manage.py create_sample_data --clear
```
**Alternative Approach**:
Use Django fixtures with `loaddata` command:
```bash
python manage.py loaddata sample_parks sample_rides sample_users
```
---
## Completed Items
### THRILLWIKI-103: Admin Permission Checks
**Status**: COMPLETED (Already Implemented)
**Context**:
The `MapCacheView` delete and post methods had TODO comments for adding admin permission checks. Upon review, these checks were already implemented using `request.user.is_authenticated and request.user.is_staff`.
**Resolution**: Removed outdated TODO comments.
---
## Implementation Notes
### Creating GitHub Issues
Each item in this document can be converted to a GitHub issue using this template:
```markdown
## Description
[Copy from Context section]
## Implementation
[Copy from Implementation Steps section]
## Acceptance Criteria
- [ ] Feature implemented as specified
- [ ] Unit tests added with >80% coverage
- [ ] Integration tests pass
- [ ] Documentation updated
- [ ] Code reviewed and approved
## Priority
[Copy Priority value]
## Related
- THRILLWIKI issue number
- Related features or dependencies
```
### Priority Order for Implementation
Based on business value and effort, recommended implementation order:
1. **THRILLWIKI-110**: ClamAV Malware Scanning (P1, security)
2. **THRILLWIKI-106**: Map Clustering (P1, performance)
3. **THRILLWIKI-107**: Nearby Locations (P2, UX)
4. **THRILLWIKI-108**: Search Relevance Scoring (P2, UX)
5. **THRILLWIKI-104**: Full User Statistics (P2, engagement)
6. **THRILLWIKI-101**: Geocoding Service (P3, completeness)
7. **THRILLWIKI-111**: Sample Data Command (P3, development)