# Seed Data Analysis and Implementation Plan ## Current Schema Analysis ### Complete Schema Analysis #### Parks App Models - **Park**: Main park entity with operator (required FK to Company), property_owner (optional FK to Company), locations, areas, reviews, photos - **ParkArea**: Themed areas within parks - **ParkLocation**: Geographic data for parks with coordinates - **ParkReview**: User reviews for parks - **ParkPhoto**: Images for parks using Cloudflare Images - **Company** (aliased as Operator): Multi-role entity with roles array (OPERATOR, PROPERTY_OWNER, MANUFACTURER, DESIGNER) - **CompanyHeadquarters**: Location data for companies #### Rides App Models - **Ride**: Individual ride installations at parks with park (required FK), manufacturer/designer (optional FKs to Company), ride_model (optional FK), coaster stats relationship - **RideModel**: Catalog of ride types/models with manufacturer (FK to Company), technical specs, variants - **RideModelVariant**: Specific configurations of ride models - **RideModelPhoto**: Photos for ride models - **RideModelTechnicalSpec**: Flexible technical specifications - **RollerCoasterStats**: Detailed statistics for roller coasters (OneToOne with Ride) - **RideLocation**: Geographic data for rides - **RideReview**: User reviews for rides - **RideRanking**: User rankings for rides - **RidePairComparison**: Pairwise comparisons for ranking - **RankingSnapshot**: Historical ranking data - **RidePhoto**: Images for rides #### Accounts App Models - **User**: Extended AbstractUser with roles, preferences, security settings - **UserProfile**: Extended profile data with avatar, social links, ride statistics - **EmailVerification**: Email verification tokens - **PasswordReset**: Password reset tokens - **UserDeletionRequest**: Account deletion with email verification - **UserNotification**: System notifications for users - **NotificationPreference**: User notification preferences - **TopList**: User-created top lists - **TopListItem**: Items in top lists (generic foreign key) #### Moderation App Models - **EditSubmission**: Original content submission and approval workflow - **ModerationReport**: User reports for content moderation - **ModerationQueue**: Workflow management for moderation tasks - **ModerationAction**: Actions taken against users/content - **BulkOperation**: Administrative bulk operations - **PhotoSubmission**: Photo submission workflow #### Core App Models - **SlugHistory**: Track slug changes across all models using generic relations - **SluggedModel**: Abstract base model providing slug functionality with history tracking #### Media App Models - Basic media handling (files already exist in shared/media) ### Key Relationships and Constraints #### Entity Relationship Patterns (from .clinerules) - **Park**: Must have Operator (required), may have PropertyOwner (optional), cannot reference Company directly - **Ride**: Must belong to Park, may have Manufacturer/Designer (optional), cannot reference Company directly - **Company Roles**: - Operators: Operate parks - PropertyOwners: Own park property (optional) - Manufacturers: Make rides - Designers: Design rides - All entities can have locations #### Database Constraints - **Business Rules**: Enforced via CheckConstraints for dates, ratings, dimensions, positive values - **Unique Constraints**: Parks have unique slugs globally, Rides have unique slugs within parks - **Foreign Key Constraints**: Proper CASCADE/SET_NULL behaviors for data integrity ### Current Seed Implementation Analysis #### Existing Seed Command (`apps/parks/management/commands/seed_initial_data.py`) **Strengths:** - Creates major theme park companies with proper roles - Seeds 6 major parks with realistic data (Disney, Universal, Cedar Fair, etc.) - Includes park locations with coordinates - Creates themed areas for each park - Uses get_or_create for idempotency **Limitations:** - Only covers Parks app models - No rides, ride models, or manufacturer data - No user accounts or reviews - No media/photo seeding - Limited to 6 parks - No moderation, core, or advanced features ## Comprehensive Seed Data Requirements ### 1. Companies (Multi-Role) Need companies serving different roles: - **Operators**: Disney, Universal, Six Flags, Cedar Fair, SeaWorld, Herschend, etc. - **Manufacturers**: B&M, Intamin, RMC, Vekoma, Arrow, Schwarzkopf, etc. - **Designers**: Sometimes same as manufacturers, sometimes separate consulting firms - **Property Owners**: Often same as operators, but can be different (land lease scenarios) ### 2. Parks Ecosystem - **Parks**: Expand beyond current 6 to include major parks worldwide - **Park Areas**: Themed lands/sections within parks - **Park Locations**: Geographic data with proper coordinates - **Park Photos**: Sample images using placeholder services ### 3. Rides Ecosystem - **Ride Models**: Catalog of manufacturer models (B&M Hyper, Intamin Giga, etc.) - **Rides**: Specific installations at parks - **Roller Coaster Stats**: Technical specifications for coasters - **Ride Photos**: Images for rides - **Ride Reviews**: Sample user reviews ### 4. User Ecosystem - **Users**: Sample accounts with different roles (admin, moderator, user) - **User Profiles**: Complete profiles with avatars, social links - **Top Lists**: User-created rankings - **Notifications**: Sample system notifications ### 5. Media Integration - **Cloudflare Images**: Use placeholder image service for realistic data - **Avatar Generation**: Use UI Avatars service for user profile images ### 6. Data Volume Strategy - **Realistic Scale**: Hundreds of parks, thousands of rides, dozens of users - **Geographic Diversity**: Parks from multiple countries/continents - **Time Periods**: Historical data spanning decades of park/ride openings ## Implementation Strategy ### Phase 1: Foundation Data 1. **Companies with Roles**: Create comprehensive company database with proper role assignments 2. **Core Parks**: Expand park database to 20-30 major parks globally 3. **Basic Users**: Create admin and sample user accounts ### Phase 2: Rides and Models 1. **Manufacturer Models**: Create ride model catalog for major manufacturers 2. **Park Rides**: Populate parks with their signature rides 3. **Coaster Stats**: Add technical specifications for roller coasters ### Phase 3: User Content 1. **Reviews and Ratings**: Generate sample reviews for parks and rides 2. **User Rankings**: Create sample top lists and rankings 3. **Photos**: Add placeholder images for parks and rides ### Phase 4: Advanced Features 1. **Moderation**: Sample submissions and moderation workflow 2. **Notifications**: System notifications and preferences 3. **Media Management**: Comprehensive photo/media seeding ## Technical Implementation Notes ### Command Structure - Use Django management command with options for different phases - Implement proper error handling and progress reporting - Support for selective seeding (e.g., --parks-only, --rides-only) - Idempotent operations using get_or_create patterns ### Data Sources - Real park/ride data for authenticity - Proper geographic coordinates - Realistic technical specifications - Culturally diverse user names and preferences ### Performance Considerations - Bulk operations for large datasets - Transaction management for data integrity - Progress indicators for long-running operations - Memory-efficient processing for large datasets ## Implementation Completed ✅ ### Comprehensive Seed Command Created **File**: `apps/core/management/commands/seed_comprehensive_data.py` (843 lines) **Key Features**: - **Phase-based execution**: 4 phases that can be run individually or together - **Complete reset capability**: `--reset` flag to clear all data safely - **Configurable counts**: `--count` parameter to override default entity counts - **Proper relationship handling**: Respects all FK constraints and entity relationship patterns - **Realistic data**: Uses Faker library for realistic names, locations, and content - **Idempotent operations**: Uses get_or_create to prevent duplicates - **Comprehensive coverage**: Seeds ALL models across ALL apps **Command Usage**: ```bash # Run all phases with full seeding cd backend && uv run manage.py seed_comprehensive_data # Reset all data and reseed cd backend && uv run manage.py seed_comprehensive_data --reset # Run specific phase only cd backend && uv run manage.py seed_comprehensive_data --phase 2 # Override default counts cd backend && uv run manage.py seed_comprehensive_data --count 100 # Verbose output cd backend && uv run manage.py seed_comprehensive_data --verbose ``` **Data Created**: - **10 Companies** with realistic roles (operators, manufacturers, designers, property owners) - **6 Major Parks** (Disney, Universal, Cedar Point, Six Flags, etc.) with proper operators - **Park Areas** and **Locations** with real geographic coordinates - **7 Ride Models** from different manufacturers (B&M, Intamin, Mack, Vekoma) - **6+ Major Rides** installed at parks with technical specifications - **50+ Users** with complete profiles and preferences - **200+ Park Reviews** and **300+ Ride Reviews** with realistic ratings - **Ride Rankings** and **Top Lists** for user-generated content - **Moderation Workflow** with submissions, reports, queue items, and actions - **Notifications** and **User Content** for complete ecosystem **Safety Features**: - Proper deletion order to respect foreign key constraints - Preserves superuser accounts during reset - Transaction safety for all operations - Comprehensive error handling and logging - Maintains data integrity throughout process **Phase Breakdown**: 1. **Phase 1 (Foundation)**: Companies, parks, areas, locations 2. **Phase 2 (Rides)**: Ride models, installations, statistics 3. **Phase 3 (Users & Community)**: Users, reviews, rankings, top lists 4. **Phase 4 (Moderation)**: Submissions, reports, queue management **Next Steps**: - Test the command: `cd backend && uv run manage.py seed_comprehensive_data --verbose` - Verify data integrity and relationships - Add photo seeding integration with Cloudflare Images - Performance optimization if needed