thrillwiki_django_no_react/docs/FUTURE_WORK.md

# Future Work & Deferred Features

This document tracks features that have been deferred for future implementation. Each item includes context, implementation guidance, and priority.

## Priority Levels
- **P0 (Critical)**: Blocks major functionality or has security implications
- **P1 (High)**: Significantly improves user experience or performance
- **P2 (Medium)**: Nice-to-have features that add value
- **P3 (Low)**: Optional enhancements

## Feature Tracking

### Map Service Enhancements

#### THRILLWIKI-106: Map Clustering Algorithm

**Priority**: P1 (High)
**Estimated Effort**: 3-5 days
**Dependencies**: None

**Context**:
Currently, the map API returns all locations within bounds without clustering. At high zoom levels (zoomed out), this can result in hundreds of overlapping markers, degrading performance and UX.

**Proposed Solution**:
Implement a server-side clustering algorithm using one of these approaches:

1. **Grid-based clustering** (Recommended for simplicity):
   - Divide the map into a grid based on zoom level
   - Group locations within each grid cell
   - Return cluster center and count for cells with multiple locations

2. **DBSCAN clustering** (Better quality, more complex):
   - Use scikit-learn's DBSCAN algorithm
   - Cluster based on geographic distance
   - Adjust epsilon parameter based on zoom level

**Implementation Steps**:
1. Create `backend/apps/core/services/map_clustering.py` with clustering logic
2. Add `cluster_locations()` method that accepts:
   - List of `UnifiedLocation` objects
   - Zoom level (1-20)
   - Clustering strategy ('grid' or 'dbscan')
3. Update `MapLocationsAPIView._build_response()` to call clustering service when `params["cluster"]` is True
4. Update `MapClusterSerializer` to include cluster metadata
5. Add tests in `backend/tests/services/test_map_clustering.py`

**API Changes**:
- Response includes `clusters` array with cluster objects
- Each cluster has: `id`, `coordinates`, `count`, `bounds`, `representative_location`

**Performance Considerations**:
- Cache clustered results separately from unclustered
- Use spatial indexes on location tables
- Limit clustering to zoom levels 1-12 (zoomed out views)

**References**:
- [Supercluster.js](https://github.com/mapbox/supercluster) - JavaScript implementation for reference
- [PostGIS ST_ClusterKMeans](https://postgis.net/docs/ST_ClusterKMeans.html) - Database-level clustering

---

#### THRILLWIKI-107: Nearby Locations

**Priority**: P2 (Medium)
**Estimated Effort**: 2-3 days
**Dependencies**: None

**Context**:
Location detail views currently don't show nearby parks or rides. This would help users discover attractions in the same area.

**Proposed Solution**:
Use PostGIS spatial queries to find locations within a radius:

```python
from django.contrib.gis.measure import D  # Distance
from django.contrib.gis.db.models.functions import Distance

def get_nearby_locations(location_obj, radius_miles=25, limit=10):
    """Get nearby locations using spatial query."""
    point = location_obj.point

    # Query parks within radius
    nearby_parks = Park.objects.filter(
        location__point__distance_lte=(point, D(mi=radius_miles))
    ).annotate(
        distance=Distance('location__point', point)
    ).exclude(
        id=location_obj.park.id  # Exclude self
    ).order_by('distance')[:limit]

    return nearby_parks
```

**Implementation Steps**:
1. Add `get_nearby_locations()` method to `backend/apps/core/services/location_service.py`
2. Update `MapLocationDetailAPIView.get()` to call this method
3. Update `MapLocationDetailSerializer.get_nearby_locations()` to return actual data
4. Add distance field to nearby location objects
5. Add tests for spatial queries

**API Response Example**:
```json
{
  "nearby_locations": [
    {
      "id": "park_123",
      "name": "Cedar Point",
      "type": "park",
      "distance_miles": 5.2,
      "coordinates": [41.4793, -82.6833]
    }
  ]
}
```

**Performance Considerations**:
- Use spatial indexes (already present on `location__point` fields)
- Cache nearby locations for 1 hour
- Limit radius to 50 miles maximum

---

#### THRILLWIKI-108: Search Relevance Scoring

**Priority**: P2 (Medium)
**Estimated Effort**: 2-3 days
**Dependencies**: None

**Context**:
Search results currently return a hardcoded relevance score of 1.0. Implementing proper relevance scoring would improve search result quality.

**Proposed Solution**:
Implement a weighted scoring algorithm based on:

1. **Text Match Quality** (40% weight):
   - Exact name match: 1.0
   - Name starts with query: 0.8
   - Name contains query: 0.6
   - City/state match: 0.4

2. **Popularity** (30% weight):
   - Based on `average_rating` and `ride_count`/`coaster_count`
   - Normalize to 0-1 scale

3. **Recency** (15% weight):
   - Recently opened attractions score higher
   - Normalize based on `opening_date`

4. **Status** (15% weight):
   - Operating: 1.0
   - Seasonal: 0.8
   - Closed temporarily: 0.5
   - Closed permanently: 0.2

**Implementation Steps**:
1. Create `backend/apps/core/services/search_scoring.py` with scoring logic
2. Add `calculate_relevance_score()` method
3. Update `MapSearchAPIView.get()` to calculate scores
4. Sort results by relevance score (descending)
5. Add tests for scoring algorithm

**Example Implementation**:
```python
def calculate_relevance_score(location, query):
    score = 0.0

    # Text match (40%)
    name_lower = location.name.lower()
    query_lower = query.lower()
    if name_lower == query_lower:
        score += 0.40
    elif name_lower.startswith(query_lower):
        score += 0.32
    elif query_lower in name_lower:
        score += 0.24

    # Popularity (30%)
    if location.average_rating:
        score += (location.average_rating / 5.0) * 0.30

    # Status (15%)
    status_weights = {
        'OPERATING': 1.0,
        'SEASONAL': 0.8,
        'CLOSED_TEMP': 0.5,
        'CLOSED_PERM': 0.2
    }
    score += status_weights.get(location.status, 0.5) * 0.15

    return min(score, 1.0)
```

**Performance Considerations**:
- Calculate scores in Python (not database) for flexibility
- Cache search results with scores for 5 minutes
- Consider using PostgreSQL full-text search for better performance

---

#### THRILLWIKI-109: Cache Statistics Tracking

**Priority**: P2 (Medium)
**Estimated Effort**: 1-2 hours
**Dependencies**: None
**Status**: IMPLEMENTED

**Context**:
The `MapStatsAPIView` returns hardcoded cache statistics (0 hits, 0 misses). Implementing real cache statistics provides visibility into caching effectiveness.

**Implementation**:
Added `get_cache_statistics()` method to `EnhancedCacheService` that retrieves Redis INFO statistics when available. The `MapStatsAPIView` now returns real cache hit/miss data.

---

### User Features

#### THRILLWIKI-104: Full User Statistics Tracking

**Priority**: P2 (Medium)
**Estimated Effort**: 3-4 days
**Dependencies**: THRILLWIKI-105 (Photo counting)

**Context**:
Current user statistics are calculated on-demand by querying multiple tables. This is inefficient and doesn't track all desired metrics.

**Proposed Solution**:
Implement a `UserStatistics` model with periodic updates:

```python
class UserStatistics(models.Model):
    user = models.OneToOneField(User, on_delete=models.CASCADE)

    # Content statistics
    parks_visited = models.IntegerField(default=0)
    rides_ridden = models.IntegerField(default=0)
    reviews_written = models.IntegerField(default=0)
    photos_uploaded = models.IntegerField(default=0)
    top_lists_created = models.IntegerField(default=0)

    # Engagement statistics
    helpful_votes_received = models.IntegerField(default=0)
    comments_made = models.IntegerField(default=0)
    badges_earned = models.IntegerField(default=0)

    # Activity tracking
    last_review_date = models.DateTimeField(null=True, blank=True)
    last_photo_upload_date = models.DateTimeField(null=True, blank=True)
    streak_days = models.IntegerField(default=0)

    # Timestamps
    last_calculated = models.DateTimeField(auto_now=True)

    class Meta:
        verbose_name_plural = "User statistics"
```

**Implementation Steps**:
1. Create migration for `UserStatistics` model in `backend/apps/accounts/models.py`
2. Create Celery task `update_user_statistics` in `backend/apps/accounts/tasks.py`
3. Update statistics on user actions using Django signals:
   - `post_save` signal on `ParkReview`, `RideReview` -> increment `reviews_written`
   - `post_save` signal on `ParkPhoto`, `RidePhoto` -> increment `photos_uploaded`
4. Add management command `python manage.py recalculate_user_stats` for bulk updates
5. Update `get_user_statistics` view to read from `UserStatistics` model
6. Add periodic Celery task to recalculate statistics daily

**Performance Benefits**:
- Reduces database queries from 5+ to 1
- Enables leaderboards and ranking features
- Supports gamification (badges, achievements)

**Migration Strategy**:
1. Create model and migration
2. Run `recalculate_user_stats` for existing users
3. Enable signal handlers for new activity
4. Monitor for 1 week before removing old calculation logic

---

#### THRILLWIKI-105: Photo Upload Counting

**Priority**: P2 (Medium)
**Estimated Effort**: 30 minutes
**Dependencies**: None
**Status**: IMPLEMENTED

**Context**:
The user statistics endpoint returns `photos_uploaded: 0` for all users. Photo uploads should be counted from `ParkPhoto` and `RidePhoto` models.

**Implementation**:
Updated `get_user_statistics()` in `backend/apps/api/v1/accounts/views.py` to query `ParkPhoto` and `RidePhoto` models where `uploaded_by=user`.

---

### Infrastructure

#### THRILLWIKI-101: Geocoding Service Integration

**Priority**: P3 (Low)
**Estimated Effort**: 2-3 days
**Dependencies**: None

**Context**:
`CompanyHeadquarters` model has address fields but no coordinates. This prevents companies from appearing on the map.

**Proposed Solution**:
Integrate a geocoding service to convert addresses to coordinates:

**Recommended Services**:
1. **Google Maps Geocoding API** (Paid, high quality)
2. **Nominatim (OpenStreetMap)** (Free, rate-limited)
3. **Mapbox Geocoding API** (Paid, good quality)

**Implementation Steps**:
1. Create `backend/apps/core/services/geocoding_service.py`:
   ```python
   class GeocodingService:
       def geocode_address(self, address: str) -> tuple[float, float] | None:
           """Convert address to (latitude, longitude)."""
           # Implementation using chosen service
   ```

2. Add geocoding to `CompanyHeadquarters` model:
   - Add `latitude` and `longitude` fields
   - Add `geocoded_at` timestamp field
   - Create migration

3. Update `CompanyLocationAdapter.to_unified_location()` to use coordinates if available

4. Add management command `python manage.py geocode_companies` for bulk geocoding

5. Add Celery task for automatic geocoding on company creation/update

**Configuration**:
Add to `backend/config/settings/base.py`:
```python
GEOCODING_SERVICE = env('GEOCODING_SERVICE', default='nominatim')
GEOCODING_API_KEY = env('GEOCODING_API_KEY', default='')
GEOCODING_RATE_LIMIT = env.int('GEOCODING_RATE_LIMIT', default=1)  # requests per second
```

**Rate Limiting**:
- Implement exponential backoff for API errors
- Cache geocoding results to avoid redundant API calls
- Use Celery for async geocoding to avoid blocking requests

**Cost Considerations**:
- Nominatim: Free but limited to 1 request/second
- Google Maps: $5 per 1000 requests (first $200/month free)
- Mapbox: $0.50 per 1000 requests (first 100k free)

**Alternative Approach**:
Store coordinates manually in admin interface for the ~50-100 companies in the database.

---

#### THRILLWIKI-110: ClamAV Malware Scanning Integration

**Priority**: P1 (High) - Security feature
**Estimated Effort**: 2-3 days
**Dependencies**: ClamAV daemon installation

**Context**:
File uploads currently use magic number validation and PIL integrity checks, but don't scan for malware. This is a security gap for user-generated content.

**Proposed Solution**:
Integrate ClamAV antivirus scanning for all file uploads.

**Implementation Steps**:

1. **Install ClamAV**:
   ```bash
   # Docker
   docker run -d -p 3310:3310 clamav/clamav:latest

   # Ubuntu/Debian
   sudo apt-get install clamav clamav-daemon
   sudo freshclam  # Update virus definitions
   sudo systemctl start clamav-daemon
   ```

2. **Install Python client**:
   ```bash
   uv add clamd
   ```

3. **Update `backend/apps/core/utils/file_scanner.py`**:
   ```python
   import clamd

   def scan_file_for_malware(file: UploadedFile) -> Tuple[bool, str]:
       """Scan file for malware using ClamAV."""
       try:
           # Connect to ClamAV daemon
           cd = clamd.ClamdUnixSocket()  # or ClamdNetworkSocket for remote

           # Scan file
           file.seek(0)
           scan_result = cd.instream(file)
           file.seek(0)

           # Check result
           if scan_result['stream'][0] == 'OK':
               return True, ""
           else:
               virus_name = scan_result['stream'][1]
               return False, f"Malware detected: {virus_name}"

       except clamd.ConnectionError:
           # ClamAV not available - log warning and allow upload
           logger.warning("ClamAV daemon not available, skipping malware scan")
           return True, ""
       except Exception as e:
           logger.error(f"Malware scan error: {e}")
           return False, "Malware scan failed"
   ```

4. **Configuration**:
   Add to `backend/config/settings/base.py`:
   ```python
   CLAMAV_ENABLED = env.bool('CLAMAV_ENABLED', default=False)
   CLAMAV_SOCKET = env('CLAMAV_SOCKET', default='/var/run/clamav/clamd.ctl')
   CLAMAV_HOST = env('CLAMAV_HOST', default='localhost')
   CLAMAV_PORT = env.int('CLAMAV_PORT', default=3310)
   ```

5. **Update file upload views**:
   - Call `scan_file_for_malware()` in avatar upload view
   - Call in media upload views
   - Log all malware detections for security monitoring

6. **Testing**:
   - Use EICAR test file for testing: `X5O!P%@AP[4\PZX54(P^)7CC)7}$EICAR-STANDARD-ANTIVIRUS-TEST-FILE!$H+H*`
   - Add unit tests with mocked ClamAV responses

**Deployment Considerations**:
- ClamAV requires ~1GB RAM for virus definitions
- Update virus definitions daily via `freshclam`
- Monitor ClamAV daemon health in production
- Consider using cloud-based scanning service (AWS GuardDuty, VirusTotal) for serverless deployments

**Fallback Strategy**:
If ClamAV is unavailable, log warning and allow upload (fail open). This prevents blocking legitimate uploads if ClamAV daemon crashes.

---

### Management Commands

#### THRILLWIKI-111: Sample Data Creation Command

**Priority**: P3 (Low) - Development utility
**Estimated Effort**: 1-2 days
**Dependencies**: None

**Context**:
The `create_sample_data` management command is incomplete. This command is useful for:
- Local development with realistic data
- Demo environments
- Testing with diverse data sets

**Proposed Solution**:
Complete the implementation with comprehensive sample data:

**Sample Data to Create**:
1. **Parks** (10-15):
   - Major theme parks (Disney, Universal, Cedar Point)
   - Regional parks
   - Water parks
   - Mix of operating/closed/seasonal statuses

2. **Rides** (50-100):
   - Roller coasters (various types)
   - Flat rides
   - Water rides
   - Dark rides
   - Mix of statuses and manufacturers

3. **Companies** (20-30):
   - Operators (Disney, Six Flags, Cedar Fair)
   - Manufacturers (Intamin, B&M, RMC)
   - Mix of active/inactive

4. **Users** (10):
   - Admin user
   - Regular users with various activity levels
   - Test user for authentication testing

5. **Reviews** (100-200):
   - Park reviews with ratings
   - Ride reviews with ratings
   - Mix of helpful/unhelpful votes

6. **Media** (50):
   - Park photos
   - Ride photos
   - Mix of approved/pending/rejected

**Implementation Steps**:
1. Create fixtures in `backend/fixtures/sample_data.json`
2. Update `create_sample_data.py` to load fixtures
3. Add `--clear` flag to delete existing data before creating
4. Add `--minimal` flag for quick setup (10 parks, 20 rides)
5. Document usage in `backend/README.md`

**Usage**:
```bash
# Full sample data
python manage.py create_sample_data

# Minimal data for quick testing
python manage.py create_sample_data --minimal

# Clear existing data first
python manage.py create_sample_data --clear
```

**Alternative Approach**:
Use Django fixtures with `loaddata` command:
```bash
python manage.py loaddata sample_parks sample_rides sample_users
```

---

## Completed Items

### THRILLWIKI-103: Admin Permission Checks

**Status**: COMPLETED (Already Implemented)

**Context**:
The `MapCacheView` delete and post methods had TODO comments for adding admin permission checks. Upon review, these checks were already implemented using `request.user.is_authenticated and request.user.is_staff`.

**Resolution**: Removed outdated TODO comments.

---

## Implementation Notes

### Creating GitHub Issues

Each item in this document can be converted to a GitHub issue using this template:

```markdown
## Description
[Copy from Context section]

## Implementation
[Copy from Implementation Steps section]

## Acceptance Criteria
- [ ] Feature implemented as specified
- [ ] Unit tests added with >80% coverage
- [ ] Integration tests pass
- [ ] Documentation updated
- [ ] Code reviewed and approved

## Priority
[Copy Priority value]

## Related
- THRILLWIKI issue number
- Related features or dependencies
```

### Priority Order for Implementation

Based on business value and effort, recommended implementation order:

1. **THRILLWIKI-110**: ClamAV Malware Scanning (P1, security)
2. **THRILLWIKI-106**: Map Clustering (P1, performance)
3. **THRILLWIKI-107**: Nearby Locations (P2, UX)
4. **THRILLWIKI-108**: Search Relevance Scoring (P2, UX)
5. **THRILLWIKI-104**: Full User Statistics (P2, engagement)
6. **THRILLWIKI-101**: Geocoding Service (P3, completeness)
7. **THRILLWIKI-111**: Sample Data Command (P3, development)