mirror of https://github.com/pacnpal/thrillwiki_django_no_react.git synced 2025-12-24 17:11:09 -05:00

Files

pacnpal edcd8f2076 Add secret management guide, client-side performance monitoring, and search accessibility enhancements

- Introduced a comprehensive Secret Management Guide detailing best practices, secret classification, development setup, production management, rotation procedures, and emergency protocols.
- Implemented a client-side performance monitoring script to track various metrics including page load performance, paint metrics, layout shifts, and memory usage.
- Enhanced search accessibility with keyboard navigation support for search results, ensuring compliance with WCAG standards and improving user experience.

2025-12-23 16:41:42 -05:00

18 KiB

Raw Blame History

Future Work & Deferred Features

This document tracks features that have been deferred for future implementation. Each item includes context, implementation guidance, and priority.

Priority Levels

P0 (Critical): Blocks major functionality or has security implications
P1 (High): Significantly improves user experience or performance
P2 (Medium): Nice-to-have features that add value
P3 (Low): Optional enhancements

Feature Tracking

Map Service Enhancements

THRILLWIKI-106: Map Clustering Algorithm

Priority: P1 (High) Estimated Effort: 3-5 days Dependencies: None

Context: Currently, the map API returns all locations within bounds without clustering. At high zoom levels (zoomed out), this can result in hundreds of overlapping markers, degrading performance and UX.

Proposed Solution: Implement a server-side clustering algorithm using one of these approaches:

Grid-based clustering (Recommended for simplicity):
- Divide the map into a grid based on zoom level
- Group locations within each grid cell
- Return cluster center and count for cells with multiple locations
DBSCAN clustering (Better quality, more complex):
- Use scikit-learn's DBSCAN algorithm
- Cluster based on geographic distance
- Adjust epsilon parameter based on zoom level

Implementation Steps:

Create backend/apps/core/services/map_clustering.py with clustering logic
Add cluster_locations() method that accepts:
- List of UnifiedLocation objects
- Zoom level (1-20)
- Clustering strategy ('grid' or 'dbscan')
Update MapLocationsAPIView._build_response() to call clustering service when params["cluster"] is True
Update MapClusterSerializer to include cluster metadata
Add tests in backend/tests/services/test_map_clustering.py

API Changes:

Response includes clusters array with cluster objects
Each cluster has: id, coordinates, count, bounds, representative_location

Performance Considerations:

Cache clustered results separately from unclustered
Use spatial indexes on location tables
Limit clustering to zoom levels 1-12 (zoomed out views)

References:

Supercluster.js - JavaScript implementation for reference
PostGIS ST_ClusterKMeans - Database-level clustering

THRILLWIKI-107: Nearby Locations

Priority: P2 (Medium) Estimated Effort: 2-3 days Dependencies: None

Context: Location detail views currently don't show nearby parks or rides. This would help users discover attractions in the same area.

Proposed Solution: Use PostGIS spatial queries to find locations within a radius:

from django.contrib.gis.measure import D  # Distance
from django.contrib.gis.db.models.functions import Distance

def get_nearby_locations(location_obj, radius_miles=25, limit=10):
    """Get nearby locations using spatial query."""
    point = location_obj.point

    # Query parks within radius
    nearby_parks = Park.objects.filter(
        location__point__distance_lte=(point, D(mi=radius_miles))
    ).annotate(
        distance=Distance('location__point', point)
    ).exclude(
        id=location_obj.park.id  # Exclude self
    ).order_by('distance')[:limit]

    return nearby_parks

Implementation Steps:

Add get_nearby_locations() method to backend/apps/core/services/location_service.py
Update MapLocationDetailAPIView.get() to call this method
Update MapLocationDetailSerializer.get_nearby_locations() to return actual data
Add distance field to nearby location objects
Add tests for spatial queries

API Response Example:

{
  "nearby_locations": [
    {
      "id": "park_123",
      "name": "Cedar Point",
      "type": "park",
      "distance_miles": 5.2,
      "coordinates": [41.4793, -82.6833]
    }
  ]
}

Performance Considerations:

Use spatial indexes (already present on location__point fields)
Cache nearby locations for 1 hour
Limit radius to 50 miles maximum

THRILLWIKI-108: Search Relevance Scoring

Priority: P2 (Medium) Estimated Effort: 2-3 days Dependencies: None

Context: Search results currently return a hardcoded relevance score of 1.0. Implementing proper relevance scoring would improve search result quality.

Proposed Solution: Implement a weighted scoring algorithm based on:

Text Match Quality (40% weight):
- Exact name match: 1.0
- Name starts with query: 0.8
- Name contains query: 0.6
- City/state match: 0.4
Popularity (30% weight):
- Based on average_rating and ride_count/coaster_count
- Normalize to 0-1 scale
Recency (15% weight):
- Recently opened attractions score higher
- Normalize based on opening_date
Status (15% weight):
- Operating: 1.0
- Seasonal: 0.8
- Closed temporarily: 0.5
- Closed permanently: 0.2

Implementation Steps:

Create backend/apps/core/services/search_scoring.py with scoring logic
Add calculate_relevance_score() method
Update MapSearchAPIView.get() to calculate scores
Sort results by relevance score (descending)
Add tests for scoring algorithm

Example Implementation:

def calculate_relevance_score(location, query):
    score = 0.0

    # Text match (40%)
    name_lower = location.name.lower()
    query_lower = query.lower()
    if name_lower == query_lower:
        score += 0.40
    elif name_lower.startswith(query_lower):
        score += 0.32
    elif query_lower in name_lower:
        score += 0.24

    # Popularity (30%)
    if location.average_rating:
        score += (location.average_rating / 5.0) * 0.30

    # Status (15%)
    status_weights = {
        'OPERATING': 1.0,
        'SEASONAL': 0.8,
        'CLOSED_TEMP': 0.5,
        'CLOSED_PERM': 0.2
    }
    score += status_weights.get(location.status, 0.5) * 0.15

    return min(score, 1.0)

Performance Considerations:

Calculate scores in Python (not database) for flexibility
Cache search results with scores for 5 minutes
Consider using PostgreSQL full-text search for better performance

THRILLWIKI-109: Cache Statistics Tracking

Priority: P2 (Medium) Estimated Effort: 1-2 hours Dependencies: None Status: IMPLEMENTED

Context: The MapStatsAPIView returns hardcoded cache statistics (0 hits, 0 misses). Implementing real cache statistics provides visibility into caching effectiveness.

Implementation: Added get_cache_statistics() method to EnhancedCacheService that retrieves Redis INFO statistics when available. The MapStatsAPIView now returns real cache hit/miss data.

User Features

THRILLWIKI-104: Full User Statistics Tracking

Priority: P2 (Medium) Estimated Effort: 3-4 days Dependencies: THRILLWIKI-105 (Photo counting)

Context: Current user statistics are calculated on-demand by querying multiple tables. This is inefficient and doesn't track all desired metrics.

Proposed Solution: Implement a UserStatistics model with periodic updates:

class UserStatistics(models.Model):
    user = models.OneToOneField(User, on_delete=models.CASCADE)

    # Content statistics
    parks_visited = models.IntegerField(default=0)
    rides_ridden = models.IntegerField(default=0)
    reviews_written = models.IntegerField(default=0)
    photos_uploaded = models.IntegerField(default=0)
    top_lists_created = models.IntegerField(default=0)

    # Engagement statistics
    helpful_votes_received = models.IntegerField(default=0)
    comments_made = models.IntegerField(default=0)
    badges_earned = models.IntegerField(default=0)

    # Activity tracking
    last_review_date = models.DateTimeField(null=True, blank=True)
    last_photo_upload_date = models.DateTimeField(null=True, blank=True)
    streak_days = models.IntegerField(default=0)

    # Timestamps
    last_calculated = models.DateTimeField(auto_now=True)

    class Meta:
        verbose_name_plural = "User statistics"

Implementation Steps:

Create migration for UserStatistics model in backend/apps/accounts/models.py
Create Celery task update_user_statistics in backend/apps/accounts/tasks.py
Update statistics on user actions using Django signals:
- post_save signal on ParkReview, RideReview -> increment reviews_written
- post_save signal on ParkPhoto, RidePhoto -> increment photos_uploaded
Add management command python manage.py recalculate_user_stats for bulk updates
Update get_user_statistics view to read from UserStatistics model
Add periodic Celery task to recalculate statistics daily

Performance Benefits:

Reduces database queries from 5+ to 1
Enables leaderboards and ranking features
Supports gamification (badges, achievements)

Migration Strategy:

Create model and migration
Run recalculate_user_stats for existing users
Enable signal handlers for new activity
Monitor for 1 week before removing old calculation logic

THRILLWIKI-105: Photo Upload Counting

Priority: P2 (Medium) Estimated Effort: 30 minutes Dependencies: None Status: IMPLEMENTED

Context: The user statistics endpoint returns photos_uploaded: 0 for all users. Photo uploads should be counted from ParkPhoto and RidePhoto models.

Implementation: Updated get_user_statistics() in backend/apps/api/v1/accounts/views.py to query ParkPhoto and RidePhoto models where uploaded_by=user.

Infrastructure

THRILLWIKI-101: Geocoding Service Integration

Priority: P3 (Low) Estimated Effort: 2-3 days Dependencies: None

Context: CompanyHeadquarters model has address fields but no coordinates. This prevents companies from appearing on the map.

Proposed Solution: Integrate a geocoding service to convert addresses to coordinates:

Recommended Services:

Google Maps Geocoding API (Paid, high quality)
Nominatim (OpenStreetMap) (Free, rate-limited)
Mapbox Geocoding API (Paid, good quality)

Implementation Steps:

Create backend/apps/core/services/geocoding_service.py:

class GeocodingService:
    def geocode_address(self, address: str) -> tuple[float, float] | None:
        """Convert address to (latitude, longitude)."""
        # Implementation using chosen service

Add geocoding to CompanyHeadquarters model:
- Add latitude and longitude fields
- Add geocoded_at timestamp field
- Create migration
Update CompanyLocationAdapter.to_unified_location() to use coordinates if available
Add management command python manage.py geocode_companies for bulk geocoding
Add Celery task for automatic geocoding on company creation/update

Configuration: Add to backend/config/settings/base.py:

GEOCODING_SERVICE = env('GEOCODING_SERVICE', default='nominatim')
GEOCODING_API_KEY = env('GEOCODING_API_KEY', default='')
GEOCODING_RATE_LIMIT = env.int('GEOCODING_RATE_LIMIT', default=1)  # requests per second

Rate Limiting:

Implement exponential backoff for API errors
Cache geocoding results to avoid redundant API calls
Use Celery for async geocoding to avoid blocking requests

Cost Considerations:

Nominatim: Free but limited to 1 request/second
Google Maps: $5 per 1000 requests (first $200/month free)
Mapbox: $0.50 per 1000 requests (first 100k free)

Alternative Approach: Store coordinates manually in admin interface for the ~50-100 companies in the database.

THRILLWIKI-110: ClamAV Malware Scanning Integration

Priority: P1 (High) - Security feature Estimated Effort: 2-3 days Dependencies: ClamAV daemon installation

Context: File uploads currently use magic number validation and PIL integrity checks, but don't scan for malware. This is a security gap for user-generated content.

Proposed Solution: Integrate ClamAV antivirus scanning for all file uploads.

Implementation Steps:

Install ClamAV:

# Docker
docker run -d -p 3310:3310 clamav/clamav:latest

# Ubuntu/Debian
sudo apt-get install clamav clamav-daemon
sudo freshclam  # Update virus definitions
sudo systemctl start clamav-daemon

Install Python client:
```
uv add clamd
```

Update backend/apps/core/utils/file_scanner.py:

import clamd

def scan_file_for_malware(file: UploadedFile) -> Tuple[bool, str]:
    """Scan file for malware using ClamAV."""
    try:
        # Connect to ClamAV daemon
        cd = clamd.ClamdUnixSocket()  # or ClamdNetworkSocket for remote

        # Scan file
        file.seek(0)
        scan_result = cd.instream(file)
        file.seek(0)

        # Check result
        if scan_result['stream'][0] == 'OK':
            return True, ""
        else:
            virus_name = scan_result['stream'][1]
            return False, f"Malware detected: {virus_name}"

    except clamd.ConnectionError:
        # ClamAV not available - log warning and allow upload
        logger.warning("ClamAV daemon not available, skipping malware scan")
        return True, ""
    except Exception as e:
        logger.error(f"Malware scan error: {e}")
        return False, "Malware scan failed"

Configuration: Add to backend/config/settings/base.py:

CLAMAV_ENABLED = env.bool('CLAMAV_ENABLED', default=False)
CLAMAV_SOCKET = env('CLAMAV_SOCKET', default='/var/run/clamav/clamd.ctl')
CLAMAV_HOST = env('CLAMAV_HOST', default='localhost')
CLAMAV_PORT = env.int('CLAMAV_PORT', default=3310)

Update file upload views:
- Call scan_file_for_malware() in avatar upload view
- Call in media upload views
- Log all malware detections for security monitoring
Testing:
- Use EICAR test file for testing: X5O!P%@AP[4\PZX54(P^)7CC)7}$EICAR-STANDARD-ANTIVIRUS-TEST-FILE!$H+H*
- Add unit tests with mocked ClamAV responses

Deployment Considerations:

ClamAV requires ~1GB RAM for virus definitions
Update virus definitions daily via freshclam
Monitor ClamAV daemon health in production
Consider using cloud-based scanning service (AWS GuardDuty, VirusTotal) for serverless deployments

Fallback Strategy: If ClamAV is unavailable, log warning and allow upload (fail open). This prevents blocking legitimate uploads if ClamAV daemon crashes.

Management Commands

THRILLWIKI-111: Sample Data Creation Command

Priority: P3 (Low) - Development utility Estimated Effort: 1-2 days Dependencies: None

Context: The create_sample_data management command is incomplete. This command is useful for:

Local development with realistic data
Demo environments
Testing with diverse data sets

Proposed Solution: Complete the implementation with comprehensive sample data:

Sample Data to Create:

Parks (10-15):
- Major theme parks (Disney, Universal, Cedar Point)
- Regional parks
- Water parks
- Mix of operating/closed/seasonal statuses
Rides (50-100):
- Roller coasters (various types)
- Flat rides
- Water rides
- Dark rides
- Mix of statuses and manufacturers
Companies (20-30):
- Operators (Disney, Six Flags, Cedar Fair)
- Manufacturers (Intamin, B&M, RMC)
- Mix of active/inactive
Users (10):
- Admin user
- Regular users with various activity levels
- Test user for authentication testing
Reviews (100-200):
- Park reviews with ratings
- Ride reviews with ratings
- Mix of helpful/unhelpful votes
Media (50):
- Park photos
- Ride photos
- Mix of approved/pending/rejected

Implementation Steps:

Create fixtures in backend/fixtures/sample_data.json
Update create_sample_data.py to load fixtures
Add --clear flag to delete existing data before creating
Add --minimal flag for quick setup (10 parks, 20 rides)
Document usage in backend/README.md

Usage:

# Full sample data
python manage.py create_sample_data

# Minimal data for quick testing
python manage.py create_sample_data --minimal

# Clear existing data first
python manage.py create_sample_data --clear

Alternative Approach: Use Django fixtures with loaddata command:

python manage.py loaddata sample_parks sample_rides sample_users

Completed Items

THRILLWIKI-103: Admin Permission Checks

Status: COMPLETED (Already Implemented)

Context: The MapCacheView delete and post methods had TODO comments for adding admin permission checks. Upon review, these checks were already implemented using request.user.is_authenticated and request.user.is_staff.

Resolution: Removed outdated TODO comments.

Implementation Notes

Creating GitHub Issues

Each item in this document can be converted to a GitHub issue using this template:

## Description
[Copy from Context section]

## Implementation
[Copy from Implementation Steps section]

## Acceptance Criteria
- [ ] Feature implemented as specified
- [ ] Unit tests added with >80% coverage
- [ ] Integration tests pass
- [ ] Documentation updated
- [ ] Code reviewed and approved

## Priority
[Copy Priority value]

## Related
- THRILLWIKI issue number
- Related features or dependencies

Priority Order for Implementation

Based on business value and effort, recommended implementation order:

THRILLWIKI-110: ClamAV Malware Scanning (P1, security)
THRILLWIKI-106: Map Clustering (P1, performance)
THRILLWIKI-107: Nearby Locations (P2, UX)
THRILLWIKI-108: Search Relevance Scoring (P2, UX)
THRILLWIKI-104: Full User Statistics (P2, engagement)
THRILLWIKI-101: Geocoding Service (P3, completeness)
THRILLWIKI-111: Sample Data Command (P3, development)

18 KiB Raw Blame History

Future Work & Deferred Features

Priority Levels

Feature Tracking

Map Service Enhancements

THRILLWIKI-106: Map Clustering Algorithm

THRILLWIKI-107: Nearby Locations

THRILLWIKI-108: Search Relevance Scoring

THRILLWIKI-109: Cache Statistics Tracking

User Features

THRILLWIKI-104: Full User Statistics Tracking

THRILLWIKI-105: Photo Upload Counting

Infrastructure

THRILLWIKI-101: Geocoding Service Integration

THRILLWIKI-110: ClamAV Malware Scanning Integration

Management Commands

THRILLWIKI-111: Sample Data Creation Command

Completed Items

THRILLWIKI-103: Admin Permission Checks

Implementation Notes

Creating GitHub Issues

Priority Order for Implementation

18 KiB

Raw Blame History