Files
thrillwiki_django_no_react/memory-bank/seed-data-analysis.md
pacnpal d31e4b4ebe Add comprehensive seed data analysis and implementation plan
- Document current schema analysis for Parks, Rides, Accounts, Moderation, Core, and Media apps
- Identify key relationships, constraints, and limitations of existing seed implementation
- Outline comprehensive seed data requirements across companies, parks, rides, users, and media
- Define phased implementation strategy for seeding data
- Create detailed technical implementation notes for command structure, data sources, and performance considerations
- Implement comprehensive seed command with phase-based execution and safety features
2025-09-24 10:28:07 -04:00

10 KiB

Seed Data Analysis and Implementation Plan

Current Schema Analysis

Complete Schema Analysis

Parks App Models

  • Park: Main park entity with operator (required FK to Company), property_owner (optional FK to Company), locations, areas, reviews, photos
  • ParkArea: Themed areas within parks
  • ParkLocation: Geographic data for parks with coordinates
  • ParkReview: User reviews for parks
  • ParkPhoto: Images for parks using Cloudflare Images
  • Company (aliased as Operator): Multi-role entity with roles array (OPERATOR, PROPERTY_OWNER, MANUFACTURER, DESIGNER)
  • CompanyHeadquarters: Location data for companies

Rides App Models

  • Ride: Individual ride installations at parks with park (required FK), manufacturer/designer (optional FKs to Company), ride_model (optional FK), coaster stats relationship
  • RideModel: Catalog of ride types/models with manufacturer (FK to Company), technical specs, variants
  • RideModelVariant: Specific configurations of ride models
  • RideModelPhoto: Photos for ride models
  • RideModelTechnicalSpec: Flexible technical specifications
  • RollerCoasterStats: Detailed statistics for roller coasters (OneToOne with Ride)
  • RideLocation: Geographic data for rides
  • RideReview: User reviews for rides
  • RideRanking: User rankings for rides
  • RidePairComparison: Pairwise comparisons for ranking
  • RankingSnapshot: Historical ranking data
  • RidePhoto: Images for rides

Accounts App Models

  • User: Extended AbstractUser with roles, preferences, security settings
  • UserProfile: Extended profile data with avatar, social links, ride statistics
  • EmailVerification: Email verification tokens
  • PasswordReset: Password reset tokens
  • UserDeletionRequest: Account deletion with email verification
  • UserNotification: System notifications for users
  • NotificationPreference: User notification preferences
  • TopList: User-created top lists
  • TopListItem: Items in top lists (generic foreign key)

Moderation App Models

  • EditSubmission: Original content submission and approval workflow
  • ModerationReport: User reports for content moderation
  • ModerationQueue: Workflow management for moderation tasks
  • ModerationAction: Actions taken against users/content
  • BulkOperation: Administrative bulk operations
  • PhotoSubmission: Photo submission workflow

Core App Models

  • SlugHistory: Track slug changes across all models using generic relations
  • SluggedModel: Abstract base model providing slug functionality with history tracking

Media App Models

  • Basic media handling (files already exist in shared/media)

Key Relationships and Constraints

Entity Relationship Patterns (from .clinerules)

  • Park: Must have Operator (required), may have PropertyOwner (optional), cannot reference Company directly
  • Ride: Must belong to Park, may have Manufacturer/Designer (optional), cannot reference Company directly
  • Company Roles:
    • Operators: Operate parks
    • PropertyOwners: Own park property (optional)
    • Manufacturers: Make rides
    • Designers: Design rides
    • All entities can have locations

Database Constraints

  • Business Rules: Enforced via CheckConstraints for dates, ratings, dimensions, positive values
  • Unique Constraints: Parks have unique slugs globally, Rides have unique slugs within parks
  • Foreign Key Constraints: Proper CASCADE/SET_NULL behaviors for data integrity

Current Seed Implementation Analysis

Existing Seed Command (apps/parks/management/commands/seed_initial_data.py)

Strengths:

  • Creates major theme park companies with proper roles
  • Seeds 6 major parks with realistic data (Disney, Universal, Cedar Fair, etc.)
  • Includes park locations with coordinates
  • Creates themed areas for each park
  • Uses get_or_create for idempotency

Limitations:

  • Only covers Parks app models
  • No rides, ride models, or manufacturer data
  • No user accounts or reviews
  • No media/photo seeding
  • Limited to 6 parks
  • No moderation, core, or advanced features

Comprehensive Seed Data Requirements

1. Companies (Multi-Role)

Need companies serving different roles:

  • Operators: Disney, Universal, Six Flags, Cedar Fair, SeaWorld, Herschend, etc.
  • Manufacturers: B&M, Intamin, RMC, Vekoma, Arrow, Schwarzkopf, etc.
  • Designers: Sometimes same as manufacturers, sometimes separate consulting firms
  • Property Owners: Often same as operators, but can be different (land lease scenarios)

2. Parks Ecosystem

  • Parks: Expand beyond current 6 to include major parks worldwide
  • Park Areas: Themed lands/sections within parks
  • Park Locations: Geographic data with proper coordinates
  • Park Photos: Sample images using placeholder services

3. Rides Ecosystem

  • Ride Models: Catalog of manufacturer models (B&M Hyper, Intamin Giga, etc.)
  • Rides: Specific installations at parks
  • Roller Coaster Stats: Technical specifications for coasters
  • Ride Photos: Images for rides
  • Ride Reviews: Sample user reviews

4. User Ecosystem

  • Users: Sample accounts with different roles (admin, moderator, user)
  • User Profiles: Complete profiles with avatars, social links
  • Top Lists: User-created rankings
  • Notifications: Sample system notifications

5. Media Integration

  • Cloudflare Images: Use placeholder image service for realistic data
  • Avatar Generation: Use UI Avatars service for user profile images

6. Data Volume Strategy

  • Realistic Scale: Hundreds of parks, thousands of rides, dozens of users
  • Geographic Diversity: Parks from multiple countries/continents
  • Time Periods: Historical data spanning decades of park/ride openings

Implementation Strategy

Phase 1: Foundation Data

  1. Companies with Roles: Create comprehensive company database with proper role assignments
  2. Core Parks: Expand park database to 20-30 major parks globally
  3. Basic Users: Create admin and sample user accounts

Phase 2: Rides and Models

  1. Manufacturer Models: Create ride model catalog for major manufacturers
  2. Park Rides: Populate parks with their signature rides
  3. Coaster Stats: Add technical specifications for roller coasters

Phase 3: User Content

  1. Reviews and Ratings: Generate sample reviews for parks and rides
  2. User Rankings: Create sample top lists and rankings
  3. Photos: Add placeholder images for parks and rides

Phase 4: Advanced Features

  1. Moderation: Sample submissions and moderation workflow
  2. Notifications: System notifications and preferences
  3. Media Management: Comprehensive photo/media seeding

Technical Implementation Notes

Command Structure

  • Use Django management command with options for different phases
  • Implement proper error handling and progress reporting
  • Support for selective seeding (e.g., --parks-only, --rides-only)
  • Idempotent operations using get_or_create patterns

Data Sources

  • Real park/ride data for authenticity
  • Proper geographic coordinates
  • Realistic technical specifications
  • Culturally diverse user names and preferences

Performance Considerations

  • Bulk operations for large datasets
  • Transaction management for data integrity
  • Progress indicators for long-running operations
  • Memory-efficient processing for large datasets

Implementation Completed

Comprehensive Seed Command Created

File: apps/core/management/commands/seed_comprehensive_data.py (843 lines)

Key Features:

  • Phase-based execution: 4 phases that can be run individually or together
  • Complete reset capability: --reset flag to clear all data safely
  • Configurable counts: --count parameter to override default entity counts
  • Proper relationship handling: Respects all FK constraints and entity relationship patterns
  • Realistic data: Uses Faker library for realistic names, locations, and content
  • Idempotent operations: Uses get_or_create to prevent duplicates
  • Comprehensive coverage: Seeds ALL models across ALL apps

Command Usage:

# Run all phases with full seeding
cd backend && uv run manage.py seed_comprehensive_data

# Reset all data and reseed
cd backend && uv run manage.py seed_comprehensive_data --reset

# Run specific phase only
cd backend && uv run manage.py seed_comprehensive_data --phase 2

# Override default counts
cd backend && uv run manage.py seed_comprehensive_data --count 100

# Verbose output
cd backend && uv run manage.py seed_comprehensive_data --verbose

Data Created:

  • 10 Companies with realistic roles (operators, manufacturers, designers, property owners)
  • 6 Major Parks (Disney, Universal, Cedar Point, Six Flags, etc.) with proper operators
  • Park Areas and Locations with real geographic coordinates
  • 7 Ride Models from different manufacturers (B&M, Intamin, Mack, Vekoma)
  • 6+ Major Rides installed at parks with technical specifications
  • 50+ Users with complete profiles and preferences
  • 200+ Park Reviews and 300+ Ride Reviews with realistic ratings
  • Ride Rankings and Top Lists for user-generated content
  • Moderation Workflow with submissions, reports, queue items, and actions
  • Notifications and User Content for complete ecosystem

Safety Features:

  • Proper deletion order to respect foreign key constraints
  • Preserves superuser accounts during reset
  • Transaction safety for all operations
  • Comprehensive error handling and logging
  • Maintains data integrity throughout process

Phase Breakdown:

  1. Phase 1 (Foundation): Companies, parks, areas, locations
  2. Phase 2 (Rides): Ride models, installations, statistics
  3. Phase 3 (Users & Community): Users, reviews, rankings, top lists
  4. Phase 4 (Moderation): Submissions, reports, queue management

Next Steps:

  • Test the command: cd backend && uv run manage.py seed_comprehensive_data --verbose
  • Verify data integrity and relationships
  • Add photo seeding integration with Cloudflare Images
  • Performance optimization if needed