Add secret management guide, client-side performance monitoring, and search accessibility enhancements

- Introduced a comprehensive Secret Management Guide detailing best practices, secret classification, development setup, production management, rotation procedures, and emergency protocols. - Implemented a client-side performance monitoring script to track various metrics including page load performance, paint metrics, layout shifts, and memory usage. - Enhanced search accessibility with keyboard navigation support for search results, ensuring compliance with WCAG standards and improving user experience.
2025-12-24 17:11:09 -05:00 · 2025-12-23 16:41:42 -05:00
parent ae31e889d7
commit edcd8f2076
155 changed files with 22046 additions and 4645 deletions
--- a/docs/FUTURE_WORK.md
+++ b/docs/FUTURE_WORK.md
@@ -0,0 +1,576 @@
+# Future Work & Deferred Features
+
+This document tracks features that have been deferred for future implementation. Each item includes context, implementation guidance, and priority.
+
+## Priority Levels
+- **P0 (Critical)**: Blocks major functionality or has security implications
+- **P1 (High)**: Significantly improves user experience or performance
+- **P2 (Medium)**: Nice-to-have features that add value
+- **P3 (Low)**: Optional enhancements
+
+## Feature Tracking
+
+### Map Service Enhancements
+
+#### THRILLWIKI-106: Map Clustering Algorithm
+
+**Priority**: P1 (High)
+**Estimated Effort**: 3-5 days
+**Dependencies**: None
+
+**Context**:
+Currently, the map API returns all locations within bounds without clustering. At high zoom levels (zoomed out), this can result in hundreds of overlapping markers, degrading performance and UX.
+
+**Proposed Solution**:
+Implement a server-side clustering algorithm using one of these approaches:
+
+1. **Grid-based clustering** (Recommended for simplicity):
+   - Divide the map into a grid based on zoom level
+   - Group locations within each grid cell
+   - Return cluster center and count for cells with multiple locations
+
+2. **DBSCAN clustering** (Better quality, more complex):
+   - Use scikit-learn's DBSCAN algorithm
+   - Cluster based on geographic distance
+   - Adjust epsilon parameter based on zoom level
+
+**Implementation Steps**:
+1. Create `backend/apps/core/services/map_clustering.py` with clustering logic
+2. Add `cluster_locations()` method that accepts:
+   - List of `UnifiedLocation` objects
+   - Zoom level (1-20)
+   - Clustering strategy ('grid' or 'dbscan')
+3. Update `MapLocationsAPIView._build_response()` to call clustering service when `params["cluster"]` is True
+4. Update `MapClusterSerializer` to include cluster metadata
+5. Add tests in `backend/tests/services/test_map_clustering.py`
+
+**API Changes**:
+- Response includes `clusters` array with cluster objects
+- Each cluster has: `id`, `coordinates`, `count`, `bounds`, `representative_location`
+
+**Performance Considerations**:
+- Cache clustered results separately from unclustered
+- Use spatial indexes on location tables
+- Limit clustering to zoom levels 1-12 (zoomed out views)
+
+**References**:
+- [Supercluster.js](https://github.com/mapbox/supercluster) - JavaScript implementation for reference
+- [PostGIS ST_ClusterKMeans](https://postgis.net/docs/ST_ClusterKMeans.html) - Database-level clustering
+
+---
+
+#### THRILLWIKI-107: Nearby Locations
+
+**Priority**: P2 (Medium)
+**Estimated Effort**: 2-3 days
+**Dependencies**: None
+
+**Context**:
+Location detail views currently don't show nearby parks or rides. This would help users discover attractions in the same area.
+
+**Proposed Solution**:
+Use PostGIS spatial queries to find locations within a radius:
+
+```python
+from django.contrib.gis.measure import D  # Distance
+from django.contrib.gis.db.models.functions import Distance
+
+def get_nearby_locations(location_obj, radius_miles=25, limit=10):
+    """Get nearby locations using spatial query."""
+    point = location_obj.point
+
+    # Query parks within radius
+    nearby_parks = Park.objects.filter(
+        location__point__distance_lte=(point, D(mi=radius_miles))
+    ).annotate(
+        distance=Distance('location__point', point)
+    ).exclude(
+        id=location_obj.park.id  # Exclude self
+    ).order_by('distance')[:limit]
+
+    return nearby_parks
+```
+
+**Implementation Steps**:
+1. Add `get_nearby_locations()` method to `backend/apps/core/services/location_service.py`
+2. Update `MapLocationDetailAPIView.get()` to call this method
+3. Update `MapLocationDetailSerializer.get_nearby_locations()` to return actual data
+4. Add distance field to nearby location objects
+5. Add tests for spatial queries
+
+**API Response Example**:
+```json
+{
+  "nearby_locations": [
+    {
+      "id": "park_123",
+      "name": "Cedar Point",
+      "type": "park",
+      "distance_miles": 5.2,
+      "coordinates": [41.4793, -82.6833]
+    }
+  ]
+}
+```
+
+**Performance Considerations**:
+- Use spatial indexes (already present on `location__point` fields)
+- Cache nearby locations for 1 hour
+- Limit radius to 50 miles maximum
+
+---
+
+#### THRILLWIKI-108: Search Relevance Scoring
+
+**Priority**: P2 (Medium)
+**Estimated Effort**: 2-3 days
+**Dependencies**: None
+
+**Context**:
+Search results currently return a hardcoded relevance score of 1.0. Implementing proper relevance scoring would improve search result quality.
+
+**Proposed Solution**:
+Implement a weighted scoring algorithm based on:
+
+1. **Text Match Quality** (40% weight):
+   - Exact name match: 1.0
+   - Name starts with query: 0.8
+   - Name contains query: 0.6
+   - City/state match: 0.4
+
+2. **Popularity** (30% weight):
+   - Based on `average_rating` and `ride_count`/`coaster_count`
+   - Normalize to 0-1 scale
+
+3. **Recency** (15% weight):
+   - Recently opened attractions score higher
+   - Normalize based on `opening_date`
+
+4. **Status** (15% weight):
+   - Operating: 1.0
+   - Seasonal: 0.8
+   - Closed temporarily: 0.5
+   - Closed permanently: 0.2
+
+**Implementation Steps**:
+1. Create `backend/apps/core/services/search_scoring.py` with scoring logic
+2. Add `calculate_relevance_score()` method
+3. Update `MapSearchAPIView.get()` to calculate scores
+4. Sort results by relevance score (descending)
+5. Add tests for scoring algorithm
+
+**Example Implementation**:
+```python
+def calculate_relevance_score(location, query):
+    score = 0.0
+
+    # Text match (40%)
+    name_lower = location.name.lower()
+    query_lower = query.lower()
+    if name_lower == query_lower:
+        score += 0.40
+    elif name_lower.startswith(query_lower):
+        score += 0.32
+    elif query_lower in name_lower:
+        score += 0.24
+
+    # Popularity (30%)
+    if location.average_rating:
+        score += (location.average_rating / 5.0) * 0.30
+
+    # Status (15%)
+    status_weights = {
+        'OPERATING': 1.0,
+        'SEASONAL': 0.8,
+        'CLOSED_TEMP': 0.5,
+        'CLOSED_PERM': 0.2
+    }
+    score += status_weights.get(location.status, 0.5) * 0.15
+
+    return min(score, 1.0)
+```
+
+**Performance Considerations**:
+- Calculate scores in Python (not database) for flexibility
+- Cache search results with scores for 5 minutes
+- Consider using PostgreSQL full-text search for better performance
+
+---
+
+#### THRILLWIKI-109: Cache Statistics Tracking
+
+**Priority**: P2 (Medium)
+**Estimated Effort**: 1-2 hours
+**Dependencies**: None
+**Status**: IMPLEMENTED
+
+**Context**:
+The `MapStatsAPIView` returns hardcoded cache statistics (0 hits, 0 misses). Implementing real cache statistics provides visibility into caching effectiveness.
+
+**Implementation**:
+Added `get_cache_statistics()` method to `EnhancedCacheService` that retrieves Redis INFO statistics when available. The `MapStatsAPIView` now returns real cache hit/miss data.
+
+---
+
+### User Features
+
+#### THRILLWIKI-104: Full User Statistics Tracking
+
+**Priority**: P2 (Medium)
+**Estimated Effort**: 3-4 days
+**Dependencies**: THRILLWIKI-105 (Photo counting)
+
+**Context**:
+Current user statistics are calculated on-demand by querying multiple tables. This is inefficient and doesn't track all desired metrics.
+
+**Proposed Solution**:
+Implement a `UserStatistics` model with periodic updates:
+
+```python
+class UserStatistics(models.Model):
+    user = models.OneToOneField(User, on_delete=models.CASCADE)
+
+    # Content statistics
+    parks_visited = models.IntegerField(default=0)
+    rides_ridden = models.IntegerField(default=0)
+    reviews_written = models.IntegerField(default=0)
+    photos_uploaded = models.IntegerField(default=0)
+    top_lists_created = models.IntegerField(default=0)
+
+    # Engagement statistics
+    helpful_votes_received = models.IntegerField(default=0)
+    comments_made = models.IntegerField(default=0)
+    badges_earned = models.IntegerField(default=0)
+
+    # Activity tracking
+    last_review_date = models.DateTimeField(null=True, blank=True)
+    last_photo_upload_date = models.DateTimeField(null=True, blank=True)
+    streak_days = models.IntegerField(default=0)
+
+    # Timestamps
+    last_calculated = models.DateTimeField(auto_now=True)
+
+    class Meta:
+        verbose_name_plural = "User statistics"
+```
+
+**Implementation Steps**:
+1. Create migration for `UserStatistics` model in `backend/apps/accounts/models.py`
+2. Create Celery task `update_user_statistics` in `backend/apps/accounts/tasks.py`
+3. Update statistics on user actions using Django signals:
+   - `post_save` signal on `ParkReview`, `RideReview` -> increment `reviews_written`
+   - `post_save` signal on `ParkPhoto`, `RidePhoto` -> increment `photos_uploaded`
+4. Add management command `python manage.py recalculate_user_stats` for bulk updates
+5. Update `get_user_statistics` view to read from `UserStatistics` model
+6. Add periodic Celery task to recalculate statistics daily
+
+**Performance Benefits**:
+- Reduces database queries from 5+ to 1
+- Enables leaderboards and ranking features
+- Supports gamification (badges, achievements)
+
+**Migration Strategy**:
+1. Create model and migration
+2. Run `recalculate_user_stats` for existing users
+3. Enable signal handlers for new activity
+4. Monitor for 1 week before removing old calculation logic
+
+---
+
+#### THRILLWIKI-105: Photo Upload Counting
+
+**Priority**: P2 (Medium)
+**Estimated Effort**: 30 minutes
+**Dependencies**: None
+**Status**: IMPLEMENTED
+
+**Context**:
+The user statistics endpoint returns `photos_uploaded: 0` for all users. Photo uploads should be counted from `ParkPhoto` and `RidePhoto` models.
+
+**Implementation**:
+Updated `get_user_statistics()` in `backend/apps/api/v1/accounts/views.py` to query `ParkPhoto` and `RidePhoto` models where `uploaded_by=user`.
+
+---
+
+### Infrastructure
+
+#### THRILLWIKI-101: Geocoding Service Integration
+
+**Priority**: P3 (Low)
+**Estimated Effort**: 2-3 days
+**Dependencies**: None
+
+**Context**:
+`CompanyHeadquarters` model has address fields but no coordinates. This prevents companies from appearing on the map.
+
+**Proposed Solution**:
+Integrate a geocoding service to convert addresses to coordinates:
+
+**Recommended Services**:
+1. **Google Maps Geocoding API** (Paid, high quality)
+2. **Nominatim (OpenStreetMap)** (Free, rate-limited)
+3. **Mapbox Geocoding API** (Paid, good quality)
+
+**Implementation Steps**:
+1. Create `backend/apps/core/services/geocoding_service.py`:
+   ```python
+   class GeocodingService:
+       def geocode_address(self, address: str) -> tuple[float, float] | None:
+           """Convert address to (latitude, longitude)."""
+           # Implementation using chosen service
+   ```
+
+2. Add geocoding to `CompanyHeadquarters` model:
+   - Add `latitude` and `longitude` fields
+   - Add `geocoded_at` timestamp field
+   - Create migration
+
+3. Update `CompanyLocationAdapter.to_unified_location()` to use coordinates if available
+
+4. Add management command `python manage.py geocode_companies` for bulk geocoding
+
+5. Add Celery task for automatic geocoding on company creation/update
+
+**Configuration**:
+Add to `backend/config/settings/base.py`:
+```python
+GEOCODING_SERVICE = env('GEOCODING_SERVICE', default='nominatim')
+GEOCODING_API_KEY = env('GEOCODING_API_KEY', default='')
+GEOCODING_RATE_LIMIT = env.int('GEOCODING_RATE_LIMIT', default=1)  # requests per second
+```
+
+**Rate Limiting**:
+- Implement exponential backoff for API errors
+- Cache geocoding results to avoid redundant API calls
+- Use Celery for async geocoding to avoid blocking requests
+
+**Cost Considerations**:
+- Nominatim: Free but limited to 1 request/second
+- Google Maps: $5 per 1000 requests (first $200/month free)
+- Mapbox: $0.50 per 1000 requests (first 100k free)
+
+**Alternative Approach**:
+Store coordinates manually in admin interface for the ~50-100 companies in the database.
+
+---
+
+#### THRILLWIKI-110: ClamAV Malware Scanning Integration
+
+**Priority**: P1 (High) - Security feature
+**Estimated Effort**: 2-3 days
+**Dependencies**: ClamAV daemon installation
+
+**Context**:
+File uploads currently use magic number validation and PIL integrity checks, but don't scan for malware. This is a security gap for user-generated content.
+
+**Proposed Solution**:
+Integrate ClamAV antivirus scanning for all file uploads.
+
+**Implementation Steps**:
+
+1. **Install ClamAV**:
+   ```bash
+   # Docker
+   docker run -d -p 3310:3310 clamav/clamav:latest
+
+   # Ubuntu/Debian
+   sudo apt-get install clamav clamav-daemon
+   sudo freshclam  # Update virus definitions
+   sudo systemctl start clamav-daemon
+   ```
+
+2. **Install Python client**:
+   ```bash
+   uv add clamd
+   ```
+
+3. **Update `backend/apps/core/utils/file_scanner.py`**:
+   ```python
+   import clamd
+
+   def scan_file_for_malware(file: UploadedFile) -> Tuple[bool, str]:
+       """Scan file for malware using ClamAV."""
+       try:
+           # Connect to ClamAV daemon
+           cd = clamd.ClamdUnixSocket()  # or ClamdNetworkSocket for remote
+
+           # Scan file
+           file.seek(0)
+           scan_result = cd.instream(file)
+           file.seek(0)
+
+           # Check result
+           if scan_result['stream'][0] == 'OK':
+               return True, ""
+           else:
+               virus_name = scan_result['stream'][1]
+               return False, f"Malware detected: {virus_name}"
+
+       except clamd.ConnectionError:
+           # ClamAV not available - log warning and allow upload
+           logger.warning("ClamAV daemon not available, skipping malware scan")
+           return True, ""
+       except Exception as e:
+           logger.error(f"Malware scan error: {e}")
+           return False, "Malware scan failed"
+   ```
+
+4. **Configuration**:
+   Add to `backend/config/settings/base.py`:
+   ```python
+   CLAMAV_ENABLED = env.bool('CLAMAV_ENABLED', default=False)
+   CLAMAV_SOCKET = env('CLAMAV_SOCKET', default='/var/run/clamav/clamd.ctl')
+   CLAMAV_HOST = env('CLAMAV_HOST', default='localhost')
+   CLAMAV_PORT = env.int('CLAMAV_PORT', default=3310)
+   ```
+
+5. **Update file upload views**:
+   - Call `scan_file_for_malware()` in avatar upload view
+   - Call in media upload views
+   - Log all malware detections for security monitoring
+
+6. **Testing**:
+   - Use EICAR test file for testing: `X5O!P%@AP[4\PZX54(P^)7CC)7}$EICAR-STANDARD-ANTIVIRUS-TEST-FILE!$H+H*`
+   - Add unit tests with mocked ClamAV responses
+
+**Deployment Considerations**:
+- ClamAV requires ~1GB RAM for virus definitions
+- Update virus definitions daily via `freshclam`
+- Monitor ClamAV daemon health in production
+- Consider using cloud-based scanning service (AWS GuardDuty, VirusTotal) for serverless deployments
+
+**Fallback Strategy**:
+If ClamAV is unavailable, log warning and allow upload (fail open). This prevents blocking legitimate uploads if ClamAV daemon crashes.
+
+---
+
+### Management Commands
+
+#### THRILLWIKI-111: Sample Data Creation Command
+
+**Priority**: P3 (Low) - Development utility
+**Estimated Effort**: 1-2 days
+**Dependencies**: None
+
+**Context**:
+The `create_sample_data` management command is incomplete. This command is useful for:
+- Local development with realistic data
+- Demo environments
+- Testing with diverse data sets
+
+**Proposed Solution**:
+Complete the implementation with comprehensive sample data:
+
+**Sample Data to Create**:
+1. **Parks** (10-15):
+   - Major theme parks (Disney, Universal, Cedar Point)
+   - Regional parks
+   - Water parks
+   - Mix of operating/closed/seasonal statuses
+
+2. **Rides** (50-100):
+   - Roller coasters (various types)
+   - Flat rides
+   - Water rides
+   - Dark rides
+   - Mix of statuses and manufacturers
+
+3. **Companies** (20-30):
+   - Operators (Disney, Six Flags, Cedar Fair)
+   - Manufacturers (Intamin, B&M, RMC)
+   - Mix of active/inactive
+
+4. **Users** (10):
+   - Admin user
+   - Regular users with various activity levels
+   - Test user for authentication testing
+
+5. **Reviews** (100-200):
+   - Park reviews with ratings
+   - Ride reviews with ratings
+   - Mix of helpful/unhelpful votes
+
+6. **Media** (50):
+   - Park photos
+   - Ride photos
+   - Mix of approved/pending/rejected
+
+**Implementation Steps**:
+1. Create fixtures in `backend/fixtures/sample_data.json`
+2. Update `create_sample_data.py` to load fixtures
+3. Add `--clear` flag to delete existing data before creating
+4. Add `--minimal` flag for quick setup (10 parks, 20 rides)
+5. Document usage in `backend/README.md`
+
+**Usage**:
+```bash
+# Full sample data
+python manage.py create_sample_data
+
+# Minimal data for quick testing
+python manage.py create_sample_data --minimal
+
+# Clear existing data first
+python manage.py create_sample_data --clear
+```
+
+**Alternative Approach**:
+Use Django fixtures with `loaddata` command:
+```bash
+python manage.py loaddata sample_parks sample_rides sample_users
+```
+
+---
+
+## Completed Items
+
+### THRILLWIKI-103: Admin Permission Checks
+
+**Status**: COMPLETED (Already Implemented)
+
+**Context**:
+The `MapCacheView` delete and post methods had TODO comments for adding admin permission checks. Upon review, these checks were already implemented using `request.user.is_authenticated and request.user.is_staff`.
+
+**Resolution**: Removed outdated TODO comments.
+
+---
+
+## Implementation Notes
+
+### Creating GitHub Issues
+
+Each item in this document can be converted to a GitHub issue using this template:
+
+```markdown
+## Description
+[Copy from Context section]
+
+## Implementation
+[Copy from Implementation Steps section]
+
+## Acceptance Criteria
+- [ ] Feature implemented as specified
+- [ ] Unit tests added with >80% coverage
+- [ ] Integration tests pass
+- [ ] Documentation updated
+- [ ] Code reviewed and approved
+
+## Priority
+[Copy Priority value]
+
+## Related
+- THRILLWIKI issue number
+- Related features or dependencies
+```
+
+### Priority Order for Implementation
+
+Based on business value and effort, recommended implementation order:
+
+1. **THRILLWIKI-110**: ClamAV Malware Scanning (P1, security)
+2. **THRILLWIKI-106**: Map Clustering (P1, performance)
+3. **THRILLWIKI-107**: Nearby Locations (P2, UX)
+4. **THRILLWIKI-108**: Search Relevance Scoring (P2, UX)
+5. **THRILLWIKI-104**: Full User Statistics (P2, engagement)
+6. **THRILLWIKI-101**: Geocoding Service (P3, completeness)
+7. **THRILLWIKI-111**: Sample Data Command (P3, development)