# Future Work & Deferred Features This document tracks features that have been deferred for future implementation. Each item includes context, implementation guidance, and priority. ## Priority Levels - **P0 (Critical)**: Blocks major functionality or has security implications - **P1 (High)**: Significantly improves user experience or performance - **P2 (Medium)**: Nice-to-have features that add value - **P3 (Low)**: Optional enhancements ## Feature Tracking ### Map Service Enhancements #### THRILLWIKI-106: Map Clustering Algorithm **Priority**: P1 (High) **Estimated Effort**: 3-5 days **Dependencies**: None **Context**: Currently, the map API returns all locations within bounds without clustering. At high zoom levels (zoomed out), this can result in hundreds of overlapping markers, degrading performance and UX. **Proposed Solution**: Implement a server-side clustering algorithm using one of these approaches: 1. **Grid-based clustering** (Recommended for simplicity): - Divide the map into a grid based on zoom level - Group locations within each grid cell - Return cluster center and count for cells with multiple locations 2. **DBSCAN clustering** (Better quality, more complex): - Use scikit-learn's DBSCAN algorithm - Cluster based on geographic distance - Adjust epsilon parameter based on zoom level **Implementation Steps**: 1. Create `backend/apps/core/services/map_clustering.py` with clustering logic 2. Add `cluster_locations()` method that accepts: - List of `UnifiedLocation` objects - Zoom level (1-20) - Clustering strategy ('grid' or 'dbscan') 3. Update `MapLocationsAPIView._build_response()` to call clustering service when `params["cluster"]` is True 4. Update `MapClusterSerializer` to include cluster metadata 5. Add tests in `backend/tests/services/test_map_clustering.py` **API Changes**: - Response includes `clusters` array with cluster objects - Each cluster has: `id`, `coordinates`, `count`, `bounds`, `representative_location` **Performance Considerations**: - Cache clustered results separately from unclustered - Use spatial indexes on location tables - Limit clustering to zoom levels 1-12 (zoomed out views) **References**: - [Supercluster.js](https://github.com/mapbox/supercluster) - JavaScript implementation for reference - [PostGIS ST_ClusterKMeans](https://postgis.net/docs/ST_ClusterKMeans.html) - Database-level clustering --- #### THRILLWIKI-107: Nearby Locations **Priority**: P2 (Medium) **Estimated Effort**: 2-3 days **Dependencies**: None **Context**: Location detail views currently don't show nearby parks or rides. This would help users discover attractions in the same area. **Proposed Solution**: Use PostGIS spatial queries to find locations within a radius: ```python from django.contrib.gis.measure import D # Distance from django.contrib.gis.db.models.functions import Distance def get_nearby_locations(location_obj, radius_miles=25, limit=10): """Get nearby locations using spatial query.""" point = location_obj.point # Query parks within radius nearby_parks = Park.objects.filter( location__point__distance_lte=(point, D(mi=radius_miles)) ).annotate( distance=Distance('location__point', point) ).exclude( id=location_obj.park.id # Exclude self ).order_by('distance')[:limit] return nearby_parks ``` **Implementation Steps**: 1. Add `get_nearby_locations()` method to `backend/apps/core/services/location_service.py` 2. Update `MapLocationDetailAPIView.get()` to call this method 3. Update `MapLocationDetailSerializer.get_nearby_locations()` to return actual data 4. Add distance field to nearby location objects 5. Add tests for spatial queries **API Response Example**: ```json { "nearby_locations": [ { "id": "park_123", "name": "Cedar Point", "type": "park", "distance_miles": 5.2, "coordinates": [41.4793, -82.6833] } ] } ``` **Performance Considerations**: - Use spatial indexes (already present on `location__point` fields) - Cache nearby locations for 1 hour - Limit radius to 50 miles maximum --- #### THRILLWIKI-108: Search Relevance Scoring **Priority**: P2 (Medium) **Estimated Effort**: 2-3 days **Dependencies**: None **Context**: Search results currently return a hardcoded relevance score of 1.0. Implementing proper relevance scoring would improve search result quality. **Proposed Solution**: Implement a weighted scoring algorithm based on: 1. **Text Match Quality** (40% weight): - Exact name match: 1.0 - Name starts with query: 0.8 - Name contains query: 0.6 - City/state match: 0.4 2. **Popularity** (30% weight): - Based on `average_rating` and `ride_count`/`coaster_count` - Normalize to 0-1 scale 3. **Recency** (15% weight): - Recently opened attractions score higher - Normalize based on `opening_date` 4. **Status** (15% weight): - Operating: 1.0 - Seasonal: 0.8 - Closed temporarily: 0.5 - Closed permanently: 0.2 **Implementation Steps**: 1. Create `backend/apps/core/services/search_scoring.py` with scoring logic 2. Add `calculate_relevance_score()` method 3. Update `MapSearchAPIView.get()` to calculate scores 4. Sort results by relevance score (descending) 5. Add tests for scoring algorithm **Example Implementation**: ```python def calculate_relevance_score(location, query): score = 0.0 # Text match (40%) name_lower = location.name.lower() query_lower = query.lower() if name_lower == query_lower: score += 0.40 elif name_lower.startswith(query_lower): score += 0.32 elif query_lower in name_lower: score += 0.24 # Popularity (30%) if location.average_rating: score += (location.average_rating / 5.0) * 0.30 # Status (15%) status_weights = { 'OPERATING': 1.0, 'SEASONAL': 0.8, 'CLOSED_TEMP': 0.5, 'CLOSED_PERM': 0.2 } score += status_weights.get(location.status, 0.5) * 0.15 return min(score, 1.0) ``` **Performance Considerations**: - Calculate scores in Python (not database) for flexibility - Cache search results with scores for 5 minutes - Consider using PostgreSQL full-text search for better performance --- #### THRILLWIKI-109: Cache Statistics Tracking **Priority**: P2 (Medium) **Estimated Effort**: 1-2 hours **Dependencies**: None **Status**: IMPLEMENTED **Context**: The `MapStatsAPIView` returns hardcoded cache statistics (0 hits, 0 misses). Implementing real cache statistics provides visibility into caching effectiveness. **Implementation**: Added `get_cache_statistics()` method to `EnhancedCacheService` that retrieves Redis INFO statistics when available. The `MapStatsAPIView` now returns real cache hit/miss data. --- ### User Features #### THRILLWIKI-104: Full User Statistics Tracking **Priority**: P2 (Medium) **Estimated Effort**: 3-4 days **Dependencies**: THRILLWIKI-105 (Photo counting) **Context**: Current user statistics are calculated on-demand by querying multiple tables. This is inefficient and doesn't track all desired metrics. **Proposed Solution**: Implement a `UserStatistics` model with periodic updates: ```python class UserStatistics(models.Model): user = models.OneToOneField(User, on_delete=models.CASCADE) # Content statistics parks_visited = models.IntegerField(default=0) rides_ridden = models.IntegerField(default=0) reviews_written = models.IntegerField(default=0) photos_uploaded = models.IntegerField(default=0) top_lists_created = models.IntegerField(default=0) # Engagement statistics helpful_votes_received = models.IntegerField(default=0) comments_made = models.IntegerField(default=0) badges_earned = models.IntegerField(default=0) # Activity tracking last_review_date = models.DateTimeField(null=True, blank=True) last_photo_upload_date = models.DateTimeField(null=True, blank=True) streak_days = models.IntegerField(default=0) # Timestamps last_calculated = models.DateTimeField(auto_now=True) class Meta: verbose_name_plural = "User statistics" ``` **Implementation Steps**: 1. Create migration for `UserStatistics` model in `backend/apps/accounts/models.py` 2. Create Celery task `update_user_statistics` in `backend/apps/accounts/tasks.py` 3. Update statistics on user actions using Django signals: - `post_save` signal on `ParkReview`, `RideReview` -> increment `reviews_written` - `post_save` signal on `ParkPhoto`, `RidePhoto` -> increment `photos_uploaded` 4. Add management command `python manage.py recalculate_user_stats` for bulk updates 5. Update `get_user_statistics` view to read from `UserStatistics` model 6. Add periodic Celery task to recalculate statistics daily **Performance Benefits**: - Reduces database queries from 5+ to 1 - Enables leaderboards and ranking features - Supports gamification (badges, achievements) **Migration Strategy**: 1. Create model and migration 2. Run `recalculate_user_stats` for existing users 3. Enable signal handlers for new activity 4. Monitor for 1 week before removing old calculation logic --- #### THRILLWIKI-105: Photo Upload Counting **Priority**: P2 (Medium) **Estimated Effort**: 30 minutes **Dependencies**: None **Status**: IMPLEMENTED **Context**: The user statistics endpoint returns `photos_uploaded: 0` for all users. Photo uploads should be counted from `ParkPhoto` and `RidePhoto` models. **Implementation**: Updated `get_user_statistics()` in `backend/apps/api/v1/accounts/views.py` to query `ParkPhoto` and `RidePhoto` models where `uploaded_by=user`. --- ### Infrastructure #### THRILLWIKI-101: Geocoding Service Integration **Priority**: P3 (Low) **Estimated Effort**: 2-3 days **Dependencies**: None **Context**: `CompanyHeadquarters` model has address fields but no coordinates. This prevents companies from appearing on the map. **Proposed Solution**: Integrate a geocoding service to convert addresses to coordinates: **Recommended Services**: 1. **Google Maps Geocoding API** (Paid, high quality) 2. **Nominatim (OpenStreetMap)** (Free, rate-limited) 3. **Mapbox Geocoding API** (Paid, good quality) **Implementation Steps**: 1. Create `backend/apps/core/services/geocoding_service.py`: ```python class GeocodingService: def geocode_address(self, address: str) -> tuple[float, float] | None: """Convert address to (latitude, longitude).""" # Implementation using chosen service ``` 2. Add geocoding to `CompanyHeadquarters` model: - Add `latitude` and `longitude` fields - Add `geocoded_at` timestamp field - Create migration 3. Update `CompanyLocationAdapter.to_unified_location()` to use coordinates if available 4. Add management command `python manage.py geocode_companies` for bulk geocoding 5. Add Celery task for automatic geocoding on company creation/update **Configuration**: Add to `backend/config/settings/base.py`: ```python GEOCODING_SERVICE = env('GEOCODING_SERVICE', default='nominatim') GEOCODING_API_KEY = env('GEOCODING_API_KEY', default='') GEOCODING_RATE_LIMIT = env.int('GEOCODING_RATE_LIMIT', default=1) # requests per second ``` **Rate Limiting**: - Implement exponential backoff for API errors - Cache geocoding results to avoid redundant API calls - Use Celery for async geocoding to avoid blocking requests **Cost Considerations**: - Nominatim: Free but limited to 1 request/second - Google Maps: $5 per 1000 requests (first $200/month free) - Mapbox: $0.50 per 1000 requests (first 100k free) **Alternative Approach**: Store coordinates manually in admin interface for the ~50-100 companies in the database. --- #### THRILLWIKI-110: ClamAV Malware Scanning Integration **Priority**: P1 (High) - Security feature **Estimated Effort**: 2-3 days **Dependencies**: ClamAV daemon installation **Context**: File uploads currently use magic number validation and PIL integrity checks, but don't scan for malware. This is a security gap for user-generated content. **Proposed Solution**: Integrate ClamAV antivirus scanning for all file uploads. **Implementation Steps**: 1. **Install ClamAV**: ```bash # Docker docker run -d -p 3310:3310 clamav/clamav:latest # Ubuntu/Debian sudo apt-get install clamav clamav-daemon sudo freshclam # Update virus definitions sudo systemctl start clamav-daemon ``` 2. **Install Python client**: ```bash uv add clamd ``` 3. **Update `backend/apps/core/utils/file_scanner.py`**: ```python import clamd def scan_file_for_malware(file: UploadedFile) -> Tuple[bool, str]: """Scan file for malware using ClamAV.""" try: # Connect to ClamAV daemon cd = clamd.ClamdUnixSocket() # or ClamdNetworkSocket for remote # Scan file file.seek(0) scan_result = cd.instream(file) file.seek(0) # Check result if scan_result['stream'][0] == 'OK': return True, "" else: virus_name = scan_result['stream'][1] return False, f"Malware detected: {virus_name}" except clamd.ConnectionError: # ClamAV not available - log warning and allow upload logger.warning("ClamAV daemon not available, skipping malware scan") return True, "" except Exception as e: logger.error(f"Malware scan error: {e}") return False, "Malware scan failed" ``` 4. **Configuration**: Add to `backend/config/settings/base.py`: ```python CLAMAV_ENABLED = env.bool('CLAMAV_ENABLED', default=False) CLAMAV_SOCKET = env('CLAMAV_SOCKET', default='/var/run/clamav/clamd.ctl') CLAMAV_HOST = env('CLAMAV_HOST', default='localhost') CLAMAV_PORT = env.int('CLAMAV_PORT', default=3310) ``` 5. **Update file upload views**: - Call `scan_file_for_malware()` in avatar upload view - Call in media upload views - Log all malware detections for security monitoring 6. **Testing**: - Use EICAR test file for testing: `X5O!P%@AP[4\PZX54(P^)7CC)7}$EICAR-STANDARD-ANTIVIRUS-TEST-FILE!$H+H*` - Add unit tests with mocked ClamAV responses **Deployment Considerations**: - ClamAV requires ~1GB RAM for virus definitions - Update virus definitions daily via `freshclam` - Monitor ClamAV daemon health in production - Consider using cloud-based scanning service (AWS GuardDuty, VirusTotal) for serverless deployments **Fallback Strategy**: If ClamAV is unavailable, log warning and allow upload (fail open). This prevents blocking legitimate uploads if ClamAV daemon crashes. --- ### Management Commands #### THRILLWIKI-111: Sample Data Creation Command **Priority**: P3 (Low) - Development utility **Estimated Effort**: 1-2 days **Dependencies**: None **Context**: The `create_sample_data` management command is incomplete. This command is useful for: - Local development with realistic data - Demo environments - Testing with diverse data sets **Proposed Solution**: Complete the implementation with comprehensive sample data: **Sample Data to Create**: 1. **Parks** (10-15): - Major theme parks (Disney, Universal, Cedar Point) - Regional parks - Water parks - Mix of operating/closed/seasonal statuses 2. **Rides** (50-100): - Roller coasters (various types) - Flat rides - Water rides - Dark rides - Mix of statuses and manufacturers 3. **Companies** (20-30): - Operators (Disney, Six Flags, Cedar Fair) - Manufacturers (Intamin, B&M, RMC) - Mix of active/inactive 4. **Users** (10): - Admin user - Regular users with various activity levels - Test user for authentication testing 5. **Reviews** (100-200): - Park reviews with ratings - Ride reviews with ratings - Mix of helpful/unhelpful votes 6. **Media** (50): - Park photos - Ride photos - Mix of approved/pending/rejected **Implementation Steps**: 1. Create fixtures in `backend/fixtures/sample_data.json` 2. Update `create_sample_data.py` to load fixtures 3. Add `--clear` flag to delete existing data before creating 4. Add `--minimal` flag for quick setup (10 parks, 20 rides) 5. Document usage in `backend/README.md` **Usage**: ```bash # Full sample data python manage.py create_sample_data # Minimal data for quick testing python manage.py create_sample_data --minimal # Clear existing data first python manage.py create_sample_data --clear ``` **Alternative Approach**: Use Django fixtures with `loaddata` command: ```bash python manage.py loaddata sample_parks sample_rides sample_users ``` --- ## Completed Items ### THRILLWIKI-103: Admin Permission Checks **Status**: COMPLETED (Already Implemented) **Context**: The `MapCacheView` delete and post methods had TODO comments for adding admin permission checks. Upon review, these checks were already implemented using `request.user.is_authenticated and request.user.is_staff`. **Resolution**: Removed outdated TODO comments. --- ## Implementation Notes ### Creating GitHub Issues Each item in this document can be converted to a GitHub issue using this template: ```markdown ## Description [Copy from Context section] ## Implementation [Copy from Implementation Steps section] ## Acceptance Criteria - [ ] Feature implemented as specified - [ ] Unit tests added with >80% coverage - [ ] Integration tests pass - [ ] Documentation updated - [ ] Code reviewed and approved ## Priority [Copy Priority value] ## Related - THRILLWIKI issue number - Related features or dependencies ``` ### Priority Order for Implementation Based on business value and effort, recommended implementation order: 1. **THRILLWIKI-110**: ClamAV Malware Scanning (P1, security) 2. **THRILLWIKI-106**: Map Clustering (P1, performance) 3. **THRILLWIKI-107**: Nearby Locations (P2, UX) 4. **THRILLWIKI-108**: Search Relevance Scoring (P2, UX) 5. **THRILLWIKI-104**: Full User Statistics (P2, engagement) 6. **THRILLWIKI-101**: Geocoding Service (P3, completeness) 7. **THRILLWIKI-111**: Sample Data Command (P3, development)