Add comprehensive seed data analysis and implementation plan

- Document current schema analysis for Parks, Rides, Accounts, Moderation, Core, and Media apps
- Identify key relationships, constraints, and limitations of existing seed implementation
- Outline comprehensive seed data requirements across companies, parks, rides, users, and media
- Define phased implementation strategy for seeding data
- Create detailed technical implementation notes for command structure, data sources, and performance considerations
- Implement comprehensive seed command with phase-based execution and safety features
This commit is contained in:
pacnpal
2025-09-24 10:28:07 -04:00
parent 0dd3f04137
commit d31e4b4ebe
2 changed files with 1331 additions and 0 deletions

View File

@@ -0,0 +1,231 @@
# Seed Data Analysis and Implementation Plan
## Current Schema Analysis
### Complete Schema Analysis
#### Parks App Models
- **Park**: Main park entity with operator (required FK to Company), property_owner (optional FK to Company), locations, areas, reviews, photos
- **ParkArea**: Themed areas within parks
- **ParkLocation**: Geographic data for parks with coordinates
- **ParkReview**: User reviews for parks
- **ParkPhoto**: Images for parks using Cloudflare Images
- **Company** (aliased as Operator): Multi-role entity with roles array (OPERATOR, PROPERTY_OWNER, MANUFACTURER, DESIGNER)
- **CompanyHeadquarters**: Location data for companies
#### Rides App Models
- **Ride**: Individual ride installations at parks with park (required FK), manufacturer/designer (optional FKs to Company), ride_model (optional FK), coaster stats relationship
- **RideModel**: Catalog of ride types/models with manufacturer (FK to Company), technical specs, variants
- **RideModelVariant**: Specific configurations of ride models
- **RideModelPhoto**: Photos for ride models
- **RideModelTechnicalSpec**: Flexible technical specifications
- **RollerCoasterStats**: Detailed statistics for roller coasters (OneToOne with Ride)
- **RideLocation**: Geographic data for rides
- **RideReview**: User reviews for rides
- **RideRanking**: User rankings for rides
- **RidePairComparison**: Pairwise comparisons for ranking
- **RankingSnapshot**: Historical ranking data
- **RidePhoto**: Images for rides
#### Accounts App Models
- **User**: Extended AbstractUser with roles, preferences, security settings
- **UserProfile**: Extended profile data with avatar, social links, ride statistics
- **EmailVerification**: Email verification tokens
- **PasswordReset**: Password reset tokens
- **UserDeletionRequest**: Account deletion with email verification
- **UserNotification**: System notifications for users
- **NotificationPreference**: User notification preferences
- **TopList**: User-created top lists
- **TopListItem**: Items in top lists (generic foreign key)
#### Moderation App Models
- **EditSubmission**: Original content submission and approval workflow
- **ModerationReport**: User reports for content moderation
- **ModerationQueue**: Workflow management for moderation tasks
- **ModerationAction**: Actions taken against users/content
- **BulkOperation**: Administrative bulk operations
- **PhotoSubmission**: Photo submission workflow
#### Core App Models
- **SlugHistory**: Track slug changes across all models using generic relations
- **SluggedModel**: Abstract base model providing slug functionality with history tracking
#### Media App Models
- Basic media handling (files already exist in shared/media)
### Key Relationships and Constraints
#### Entity Relationship Patterns (from .clinerules)
- **Park**: Must have Operator (required), may have PropertyOwner (optional), cannot reference Company directly
- **Ride**: Must belong to Park, may have Manufacturer/Designer (optional), cannot reference Company directly
- **Company Roles**:
- Operators: Operate parks
- PropertyOwners: Own park property (optional)
- Manufacturers: Make rides
- Designers: Design rides
- All entities can have locations
#### Database Constraints
- **Business Rules**: Enforced via CheckConstraints for dates, ratings, dimensions, positive values
- **Unique Constraints**: Parks have unique slugs globally, Rides have unique slugs within parks
- **Foreign Key Constraints**: Proper CASCADE/SET_NULL behaviors for data integrity
### Current Seed Implementation Analysis
#### Existing Seed Command (`apps/parks/management/commands/seed_initial_data.py`)
**Strengths:**
- Creates major theme park companies with proper roles
- Seeds 6 major parks with realistic data (Disney, Universal, Cedar Fair, etc.)
- Includes park locations with coordinates
- Creates themed areas for each park
- Uses get_or_create for idempotency
**Limitations:**
- Only covers Parks app models
- No rides, ride models, or manufacturer data
- No user accounts or reviews
- No media/photo seeding
- Limited to 6 parks
- No moderation, core, or advanced features
## Comprehensive Seed Data Requirements
### 1. Companies (Multi-Role)
Need companies serving different roles:
- **Operators**: Disney, Universal, Six Flags, Cedar Fair, SeaWorld, Herschend, etc.
- **Manufacturers**: B&M, Intamin, RMC, Vekoma, Arrow, Schwarzkopf, etc.
- **Designers**: Sometimes same as manufacturers, sometimes separate consulting firms
- **Property Owners**: Often same as operators, but can be different (land lease scenarios)
### 2. Parks Ecosystem
- **Parks**: Expand beyond current 6 to include major parks worldwide
- **Park Areas**: Themed lands/sections within parks
- **Park Locations**: Geographic data with proper coordinates
- **Park Photos**: Sample images using placeholder services
### 3. Rides Ecosystem
- **Ride Models**: Catalog of manufacturer models (B&M Hyper, Intamin Giga, etc.)
- **Rides**: Specific installations at parks
- **Roller Coaster Stats**: Technical specifications for coasters
- **Ride Photos**: Images for rides
- **Ride Reviews**: Sample user reviews
### 4. User Ecosystem
- **Users**: Sample accounts with different roles (admin, moderator, user)
- **User Profiles**: Complete profiles with avatars, social links
- **Top Lists**: User-created rankings
- **Notifications**: Sample system notifications
### 5. Media Integration
- **Cloudflare Images**: Use placeholder image service for realistic data
- **Avatar Generation**: Use UI Avatars service for user profile images
### 6. Data Volume Strategy
- **Realistic Scale**: Hundreds of parks, thousands of rides, dozens of users
- **Geographic Diversity**: Parks from multiple countries/continents
- **Time Periods**: Historical data spanning decades of park/ride openings
## Implementation Strategy
### Phase 1: Foundation Data
1. **Companies with Roles**: Create comprehensive company database with proper role assignments
2. **Core Parks**: Expand park database to 20-30 major parks globally
3. **Basic Users**: Create admin and sample user accounts
### Phase 2: Rides and Models
1. **Manufacturer Models**: Create ride model catalog for major manufacturers
2. **Park Rides**: Populate parks with their signature rides
3. **Coaster Stats**: Add technical specifications for roller coasters
### Phase 3: User Content
1. **Reviews and Ratings**: Generate sample reviews for parks and rides
2. **User Rankings**: Create sample top lists and rankings
3. **Photos**: Add placeholder images for parks and rides
### Phase 4: Advanced Features
1. **Moderation**: Sample submissions and moderation workflow
2. **Notifications**: System notifications and preferences
3. **Media Management**: Comprehensive photo/media seeding
## Technical Implementation Notes
### Command Structure
- Use Django management command with options for different phases
- Implement proper error handling and progress reporting
- Support for selective seeding (e.g., --parks-only, --rides-only)
- Idempotent operations using get_or_create patterns
### Data Sources
- Real park/ride data for authenticity
- Proper geographic coordinates
- Realistic technical specifications
- Culturally diverse user names and preferences
### Performance Considerations
- Bulk operations for large datasets
- Transaction management for data integrity
- Progress indicators for long-running operations
- Memory-efficient processing for large datasets
## Implementation Completed ✅
### Comprehensive Seed Command Created
**File**: `apps/core/management/commands/seed_comprehensive_data.py` (843 lines)
**Key Features**:
- **Phase-based execution**: 4 phases that can be run individually or together
- **Complete reset capability**: `--reset` flag to clear all data safely
- **Configurable counts**: `--count` parameter to override default entity counts
- **Proper relationship handling**: Respects all FK constraints and entity relationship patterns
- **Realistic data**: Uses Faker library for realistic names, locations, and content
- **Idempotent operations**: Uses get_or_create to prevent duplicates
- **Comprehensive coverage**: Seeds ALL models across ALL apps
**Command Usage**:
```bash
# Run all phases with full seeding
cd backend && uv run manage.py seed_comprehensive_data
# Reset all data and reseed
cd backend && uv run manage.py seed_comprehensive_data --reset
# Run specific phase only
cd backend && uv run manage.py seed_comprehensive_data --phase 2
# Override default counts
cd backend && uv run manage.py seed_comprehensive_data --count 100
# Verbose output
cd backend && uv run manage.py seed_comprehensive_data --verbose
```
**Data Created**:
- **10 Companies** with realistic roles (operators, manufacturers, designers, property owners)
- **6 Major Parks** (Disney, Universal, Cedar Point, Six Flags, etc.) with proper operators
- **Park Areas** and **Locations** with real geographic coordinates
- **7 Ride Models** from different manufacturers (B&M, Intamin, Mack, Vekoma)
- **6+ Major Rides** installed at parks with technical specifications
- **50+ Users** with complete profiles and preferences
- **200+ Park Reviews** and **300+ Ride Reviews** with realistic ratings
- **Ride Rankings** and **Top Lists** for user-generated content
- **Moderation Workflow** with submissions, reports, queue items, and actions
- **Notifications** and **User Content** for complete ecosystem
**Safety Features**:
- Proper deletion order to respect foreign key constraints
- Preserves superuser accounts during reset
- Transaction safety for all operations
- Comprehensive error handling and logging
- Maintains data integrity throughout process
**Phase Breakdown**:
1. **Phase 1 (Foundation)**: Companies, parks, areas, locations
2. **Phase 2 (Rides)**: Ride models, installations, statistics
3. **Phase 3 (Users & Community)**: Users, reviews, rankings, top lists
4. **Phase 4 (Moderation)**: Submissions, reports, queue management
**Next Steps**:
- Test the command: `cd backend && uv run manage.py seed_comprehensive_data --verbose`
- Verify data integrity and relationships
- Add photo seeding integration with Cloudflare Images
- Performance optimization if needed