Files
thrillwiki_django_no_react/docs/FUTURE_WORK.md
pacnpal edcd8f2076 Add secret management guide, client-side performance monitoring, and search accessibility enhancements
- Introduced a comprehensive Secret Management Guide detailing best practices, secret classification, development setup, production management, rotation procedures, and emergency protocols.
- Implemented a client-side performance monitoring script to track various metrics including page load performance, paint metrics, layout shifts, and memory usage.
- Enhanced search accessibility with keyboard navigation support for search results, ensuring compliance with WCAG standards and improving user experience.
2025-12-23 16:41:42 -05:00

18 KiB

Future Work & Deferred Features

This document tracks features that have been deferred for future implementation. Each item includes context, implementation guidance, and priority.

Priority Levels

  • P0 (Critical): Blocks major functionality or has security implications
  • P1 (High): Significantly improves user experience or performance
  • P2 (Medium): Nice-to-have features that add value
  • P3 (Low): Optional enhancements

Feature Tracking

Map Service Enhancements

THRILLWIKI-106: Map Clustering Algorithm

Priority: P1 (High) Estimated Effort: 3-5 days Dependencies: None

Context: Currently, the map API returns all locations within bounds without clustering. At high zoom levels (zoomed out), this can result in hundreds of overlapping markers, degrading performance and UX.

Proposed Solution: Implement a server-side clustering algorithm using one of these approaches:

  1. Grid-based clustering (Recommended for simplicity):

    • Divide the map into a grid based on zoom level
    • Group locations within each grid cell
    • Return cluster center and count for cells with multiple locations
  2. DBSCAN clustering (Better quality, more complex):

    • Use scikit-learn's DBSCAN algorithm
    • Cluster based on geographic distance
    • Adjust epsilon parameter based on zoom level

Implementation Steps:

  1. Create backend/apps/core/services/map_clustering.py with clustering logic
  2. Add cluster_locations() method that accepts:
    • List of UnifiedLocation objects
    • Zoom level (1-20)
    • Clustering strategy ('grid' or 'dbscan')
  3. Update MapLocationsAPIView._build_response() to call clustering service when params["cluster"] is True
  4. Update MapClusterSerializer to include cluster metadata
  5. Add tests in backend/tests/services/test_map_clustering.py

API Changes:

  • Response includes clusters array with cluster objects
  • Each cluster has: id, coordinates, count, bounds, representative_location

Performance Considerations:

  • Cache clustered results separately from unclustered
  • Use spatial indexes on location tables
  • Limit clustering to zoom levels 1-12 (zoomed out views)

References:


THRILLWIKI-107: Nearby Locations

Priority: P2 (Medium) Estimated Effort: 2-3 days Dependencies: None

Context: Location detail views currently don't show nearby parks or rides. This would help users discover attractions in the same area.

Proposed Solution: Use PostGIS spatial queries to find locations within a radius:

from django.contrib.gis.measure import D  # Distance
from django.contrib.gis.db.models.functions import Distance

def get_nearby_locations(location_obj, radius_miles=25, limit=10):
    """Get nearby locations using spatial query."""
    point = location_obj.point

    # Query parks within radius
    nearby_parks = Park.objects.filter(
        location__point__distance_lte=(point, D(mi=radius_miles))
    ).annotate(
        distance=Distance('location__point', point)
    ).exclude(
        id=location_obj.park.id  # Exclude self
    ).order_by('distance')[:limit]

    return nearby_parks

Implementation Steps:

  1. Add get_nearby_locations() method to backend/apps/core/services/location_service.py
  2. Update MapLocationDetailAPIView.get() to call this method
  3. Update MapLocationDetailSerializer.get_nearby_locations() to return actual data
  4. Add distance field to nearby location objects
  5. Add tests for spatial queries

API Response Example:

{
  "nearby_locations": [
    {
      "id": "park_123",
      "name": "Cedar Point",
      "type": "park",
      "distance_miles": 5.2,
      "coordinates": [41.4793, -82.6833]
    }
  ]
}

Performance Considerations:

  • Use spatial indexes (already present on location__point fields)
  • Cache nearby locations for 1 hour
  • Limit radius to 50 miles maximum

THRILLWIKI-108: Search Relevance Scoring

Priority: P2 (Medium) Estimated Effort: 2-3 days Dependencies: None

Context: Search results currently return a hardcoded relevance score of 1.0. Implementing proper relevance scoring would improve search result quality.

Proposed Solution: Implement a weighted scoring algorithm based on:

  1. Text Match Quality (40% weight):

    • Exact name match: 1.0
    • Name starts with query: 0.8
    • Name contains query: 0.6
    • City/state match: 0.4
  2. Popularity (30% weight):

    • Based on average_rating and ride_count/coaster_count
    • Normalize to 0-1 scale
  3. Recency (15% weight):

    • Recently opened attractions score higher
    • Normalize based on opening_date
  4. Status (15% weight):

    • Operating: 1.0
    • Seasonal: 0.8
    • Closed temporarily: 0.5
    • Closed permanently: 0.2

Implementation Steps:

  1. Create backend/apps/core/services/search_scoring.py with scoring logic
  2. Add calculate_relevance_score() method
  3. Update MapSearchAPIView.get() to calculate scores
  4. Sort results by relevance score (descending)
  5. Add tests for scoring algorithm

Example Implementation:

def calculate_relevance_score(location, query):
    score = 0.0

    # Text match (40%)
    name_lower = location.name.lower()
    query_lower = query.lower()
    if name_lower == query_lower:
        score += 0.40
    elif name_lower.startswith(query_lower):
        score += 0.32
    elif query_lower in name_lower:
        score += 0.24

    # Popularity (30%)
    if location.average_rating:
        score += (location.average_rating / 5.0) * 0.30

    # Status (15%)
    status_weights = {
        'OPERATING': 1.0,
        'SEASONAL': 0.8,
        'CLOSED_TEMP': 0.5,
        'CLOSED_PERM': 0.2
    }
    score += status_weights.get(location.status, 0.5) * 0.15

    return min(score, 1.0)

Performance Considerations:

  • Calculate scores in Python (not database) for flexibility
  • Cache search results with scores for 5 minutes
  • Consider using PostgreSQL full-text search for better performance

THRILLWIKI-109: Cache Statistics Tracking

Priority: P2 (Medium) Estimated Effort: 1-2 hours Dependencies: None Status: IMPLEMENTED

Context: The MapStatsAPIView returns hardcoded cache statistics (0 hits, 0 misses). Implementing real cache statistics provides visibility into caching effectiveness.

Implementation: Added get_cache_statistics() method to EnhancedCacheService that retrieves Redis INFO statistics when available. The MapStatsAPIView now returns real cache hit/miss data.


User Features

THRILLWIKI-104: Full User Statistics Tracking

Priority: P2 (Medium) Estimated Effort: 3-4 days Dependencies: THRILLWIKI-105 (Photo counting)

Context: Current user statistics are calculated on-demand by querying multiple tables. This is inefficient and doesn't track all desired metrics.

Proposed Solution: Implement a UserStatistics model with periodic updates:

class UserStatistics(models.Model):
    user = models.OneToOneField(User, on_delete=models.CASCADE)

    # Content statistics
    parks_visited = models.IntegerField(default=0)
    rides_ridden = models.IntegerField(default=0)
    reviews_written = models.IntegerField(default=0)
    photos_uploaded = models.IntegerField(default=0)
    top_lists_created = models.IntegerField(default=0)

    # Engagement statistics
    helpful_votes_received = models.IntegerField(default=0)
    comments_made = models.IntegerField(default=0)
    badges_earned = models.IntegerField(default=0)

    # Activity tracking
    last_review_date = models.DateTimeField(null=True, blank=True)
    last_photo_upload_date = models.DateTimeField(null=True, blank=True)
    streak_days = models.IntegerField(default=0)

    # Timestamps
    last_calculated = models.DateTimeField(auto_now=True)

    class Meta:
        verbose_name_plural = "User statistics"

Implementation Steps:

  1. Create migration for UserStatistics model in backend/apps/accounts/models.py
  2. Create Celery task update_user_statistics in backend/apps/accounts/tasks.py
  3. Update statistics on user actions using Django signals:
    • post_save signal on ParkReview, RideReview -> increment reviews_written
    • post_save signal on ParkPhoto, RidePhoto -> increment photos_uploaded
  4. Add management command python manage.py recalculate_user_stats for bulk updates
  5. Update get_user_statistics view to read from UserStatistics model
  6. Add periodic Celery task to recalculate statistics daily

Performance Benefits:

  • Reduces database queries from 5+ to 1
  • Enables leaderboards and ranking features
  • Supports gamification (badges, achievements)

Migration Strategy:

  1. Create model and migration
  2. Run recalculate_user_stats for existing users
  3. Enable signal handlers for new activity
  4. Monitor for 1 week before removing old calculation logic

THRILLWIKI-105: Photo Upload Counting

Priority: P2 (Medium) Estimated Effort: 30 minutes Dependencies: None Status: IMPLEMENTED

Context: The user statistics endpoint returns photos_uploaded: 0 for all users. Photo uploads should be counted from ParkPhoto and RidePhoto models.

Implementation: Updated get_user_statistics() in backend/apps/api/v1/accounts/views.py to query ParkPhoto and RidePhoto models where uploaded_by=user.


Infrastructure

THRILLWIKI-101: Geocoding Service Integration

Priority: P3 (Low) Estimated Effort: 2-3 days Dependencies: None

Context: CompanyHeadquarters model has address fields but no coordinates. This prevents companies from appearing on the map.

Proposed Solution: Integrate a geocoding service to convert addresses to coordinates:

Recommended Services:

  1. Google Maps Geocoding API (Paid, high quality)
  2. Nominatim (OpenStreetMap) (Free, rate-limited)
  3. Mapbox Geocoding API (Paid, good quality)

Implementation Steps:

  1. Create backend/apps/core/services/geocoding_service.py:

    class GeocodingService:
        def geocode_address(self, address: str) -> tuple[float, float] | None:
            """Convert address to (latitude, longitude)."""
            # Implementation using chosen service
    
  2. Add geocoding to CompanyHeadquarters model:

    • Add latitude and longitude fields
    • Add geocoded_at timestamp field
    • Create migration
  3. Update CompanyLocationAdapter.to_unified_location() to use coordinates if available

  4. Add management command python manage.py geocode_companies for bulk geocoding

  5. Add Celery task for automatic geocoding on company creation/update

Configuration: Add to backend/config/settings/base.py:

GEOCODING_SERVICE = env('GEOCODING_SERVICE', default='nominatim')
GEOCODING_API_KEY = env('GEOCODING_API_KEY', default='')
GEOCODING_RATE_LIMIT = env.int('GEOCODING_RATE_LIMIT', default=1)  # requests per second

Rate Limiting:

  • Implement exponential backoff for API errors
  • Cache geocoding results to avoid redundant API calls
  • Use Celery for async geocoding to avoid blocking requests

Cost Considerations:

  • Nominatim: Free but limited to 1 request/second
  • Google Maps: $5 per 1000 requests (first $200/month free)
  • Mapbox: $0.50 per 1000 requests (first 100k free)

Alternative Approach: Store coordinates manually in admin interface for the ~50-100 companies in the database.


THRILLWIKI-110: ClamAV Malware Scanning Integration

Priority: P1 (High) - Security feature Estimated Effort: 2-3 days Dependencies: ClamAV daemon installation

Context: File uploads currently use magic number validation and PIL integrity checks, but don't scan for malware. This is a security gap for user-generated content.

Proposed Solution: Integrate ClamAV antivirus scanning for all file uploads.

Implementation Steps:

  1. Install ClamAV:

    # Docker
    docker run -d -p 3310:3310 clamav/clamav:latest
    
    # Ubuntu/Debian
    sudo apt-get install clamav clamav-daemon
    sudo freshclam  # Update virus definitions
    sudo systemctl start clamav-daemon
    
  2. Install Python client:

    uv add clamd
    
  3. Update backend/apps/core/utils/file_scanner.py:

    import clamd
    
    def scan_file_for_malware(file: UploadedFile) -> Tuple[bool, str]:
        """Scan file for malware using ClamAV."""
        try:
            # Connect to ClamAV daemon
            cd = clamd.ClamdUnixSocket()  # or ClamdNetworkSocket for remote
    
            # Scan file
            file.seek(0)
            scan_result = cd.instream(file)
            file.seek(0)
    
            # Check result
            if scan_result['stream'][0] == 'OK':
                return True, ""
            else:
                virus_name = scan_result['stream'][1]
                return False, f"Malware detected: {virus_name}"
    
        except clamd.ConnectionError:
            # ClamAV not available - log warning and allow upload
            logger.warning("ClamAV daemon not available, skipping malware scan")
            return True, ""
        except Exception as e:
            logger.error(f"Malware scan error: {e}")
            return False, "Malware scan failed"
    
  4. Configuration: Add to backend/config/settings/base.py:

    CLAMAV_ENABLED = env.bool('CLAMAV_ENABLED', default=False)
    CLAMAV_SOCKET = env('CLAMAV_SOCKET', default='/var/run/clamav/clamd.ctl')
    CLAMAV_HOST = env('CLAMAV_HOST', default='localhost')
    CLAMAV_PORT = env.int('CLAMAV_PORT', default=3310)
    
  5. Update file upload views:

    • Call scan_file_for_malware() in avatar upload view
    • Call in media upload views
    • Log all malware detections for security monitoring
  6. Testing:

    • Use EICAR test file for testing: X5O!P%@AP[4\PZX54(P^)7CC)7}$EICAR-STANDARD-ANTIVIRUS-TEST-FILE!$H+H*
    • Add unit tests with mocked ClamAV responses

Deployment Considerations:

  • ClamAV requires ~1GB RAM for virus definitions
  • Update virus definitions daily via freshclam
  • Monitor ClamAV daemon health in production
  • Consider using cloud-based scanning service (AWS GuardDuty, VirusTotal) for serverless deployments

Fallback Strategy: If ClamAV is unavailable, log warning and allow upload (fail open). This prevents blocking legitimate uploads if ClamAV daemon crashes.


Management Commands

THRILLWIKI-111: Sample Data Creation Command

Priority: P3 (Low) - Development utility Estimated Effort: 1-2 days Dependencies: None

Context: The create_sample_data management command is incomplete. This command is useful for:

  • Local development with realistic data
  • Demo environments
  • Testing with diverse data sets

Proposed Solution: Complete the implementation with comprehensive sample data:

Sample Data to Create:

  1. Parks (10-15):

    • Major theme parks (Disney, Universal, Cedar Point)
    • Regional parks
    • Water parks
    • Mix of operating/closed/seasonal statuses
  2. Rides (50-100):

    • Roller coasters (various types)
    • Flat rides
    • Water rides
    • Dark rides
    • Mix of statuses and manufacturers
  3. Companies (20-30):

    • Operators (Disney, Six Flags, Cedar Fair)
    • Manufacturers (Intamin, B&M, RMC)
    • Mix of active/inactive
  4. Users (10):

    • Admin user
    • Regular users with various activity levels
    • Test user for authentication testing
  5. Reviews (100-200):

    • Park reviews with ratings
    • Ride reviews with ratings
    • Mix of helpful/unhelpful votes
  6. Media (50):

    • Park photos
    • Ride photos
    • Mix of approved/pending/rejected

Implementation Steps:

  1. Create fixtures in backend/fixtures/sample_data.json
  2. Update create_sample_data.py to load fixtures
  3. Add --clear flag to delete existing data before creating
  4. Add --minimal flag for quick setup (10 parks, 20 rides)
  5. Document usage in backend/README.md

Usage:

# Full sample data
python manage.py create_sample_data

# Minimal data for quick testing
python manage.py create_sample_data --minimal

# Clear existing data first
python manage.py create_sample_data --clear

Alternative Approach: Use Django fixtures with loaddata command:

python manage.py loaddata sample_parks sample_rides sample_users

Completed Items

THRILLWIKI-103: Admin Permission Checks

Status: COMPLETED (Already Implemented)

Context: The MapCacheView delete and post methods had TODO comments for adding admin permission checks. Upon review, these checks were already implemented using request.user.is_authenticated and request.user.is_staff.

Resolution: Removed outdated TODO comments.


Implementation Notes

Creating GitHub Issues

Each item in this document can be converted to a GitHub issue using this template:

## Description
[Copy from Context section]

## Implementation
[Copy from Implementation Steps section]

## Acceptance Criteria
- [ ] Feature implemented as specified
- [ ] Unit tests added with >80% coverage
- [ ] Integration tests pass
- [ ] Documentation updated
- [ ] Code reviewed and approved

## Priority
[Copy Priority value]

## Related
- THRILLWIKI issue number
- Related features or dependencies

Priority Order for Implementation

Based on business value and effort, recommended implementation order:

  1. THRILLWIKI-110: ClamAV Malware Scanning (P1, security)
  2. THRILLWIKI-106: Map Clustering (P1, performance)
  3. THRILLWIKI-107: Nearby Locations (P2, UX)
  4. THRILLWIKI-108: Search Relevance Scoring (P2, UX)
  5. THRILLWIKI-104: Full User Statistics (P2, engagement)
  6. THRILLWIKI-101: Geocoding Service (P3, completeness)
  7. THRILLWIKI-111: Sample Data Command (P3, development)