# ThrillWiki Data Seeding Script ## Overview The `seed_data.py` management command provides comprehensive test data seeding for the ThrillWiki application. It creates realistic data across all models in the system for testing and development purposes. ## Usage ### Basic Usage ```bash # Seed with default counts uv run manage.py seed_data # Clear existing data and seed fresh uv run manage.py seed_data --clear # Custom counts uv run manage.py seed_data --users 50 --parks 20 --rides 100 --reviews 200 ``` ### Command Options - `--clear`: Clear existing data before seeding - `--users N`: Number of users to create (default: 25) - `--companies N`: Number of companies to create (default: 15) - `--parks N`: Number of parks to create (default: 10) - `--rides N`: Number of rides to create (default: 50) - `--ride-models N`: Number of ride models to create (default: 20) - `--reviews N`: Number of reviews to create (default: 100) ## What Gets Created ### Users & Accounts - **Admin User**: `admin` / `admin123` (superuser) - **Moderator User**: `moderator` / `mod123` (staff) - **Regular Users**: Random realistic users with profiles - **User Profiles**: Complete with ride credits, social links, preferences - **Notifications**: Sample notifications for users - **Top Lists**: User-created top lists for parks and rides ### Companies - **Park Operators**: Disney, Universal, Six Flags, Cedar Fair, etc. - **Ride Manufacturers**: B&M, Intamin, Vekoma, RMC, etc. - **Ride Designers**: Werner Stengel, Alan Schilke, John Wardley - **Company Headquarters**: Realistic address data ### Parks & Locations - **Famous Parks**: Magic Kingdom, Disneyland, Cedar Point, etc. - **Park Locations**: Geographic coordinates and addresses - **Park Areas**: Themed areas within parks - **Park Photos**: Sample photo records ### Rides & Models - **Famous Coasters**: Steel Vengeance, Millennium Force, etc. - **Ride Models**: B&M Dive Coaster, Intamin Accelerator, etc. - **Roller Coaster Stats**: Height, speed, inversions, etc. - **Ride Photos**: Sample photo records - **Technical Specs**: Detailed specifications for ride models ### Content & Reviews - **Park Reviews**: User reviews with ratings and visit dates - **Ride Reviews**: Detailed ride experiences - **Review Content**: Realistic review text and ratings ## Data Quality Features ### Realistic Data - **Names**: Diverse, realistic user names - **Locations**: Accurate geographic coordinates - **Relationships**: Proper company-park-ride relationships - **Statistics**: Realistic ride statistics and ratings ### Comprehensive Coverage - **All Models**: Seeds data for every model in the system - **Relationships**: Maintains proper foreign key relationships - **Optional Models**: Handles models that may not exist gracefully ### Data Integrity - **Unique Constraints**: Uses `get_or_create` to avoid duplicates - **Validation**: Respects model constraints and validation rules - **Dependencies**: Creates data in proper dependency order ## Technical Implementation ### Architecture - **Modular Design**: Separate methods for each model type - **Transaction Safety**: All operations wrapped in database transaction - **Error Handling**: Graceful handling of missing optional models - **Progress Reporting**: Clear console output with emojis and counts ### Model Handling - **Dual Company Models**: Properly handles separate Park and Ride company models - **Optional Models**: Checks for existence before using optional models - **Type Safety**: Proper type hints and error handling ### Data Generation - **Random but Realistic**: Uses curated lists for realistic data - **Configurable Counts**: All counts are configurable via command line - **Relationship Integrity**: Maintains proper relationships between models ## Troubleshooting ### Common Issues 1. **Database Schema Mismatch**: If you see timezone constraint errors, run migrations first: ```bash uv run manage.py migrate ``` 2. **Permission Errors**: Ensure database user has proper permissions for all operations 3. **Memory Issues**: For large datasets, consider running with smaller batches ### Known Limitations - **Database Schema Compatibility**: May encounter issues with database schemas that have additional required fields not present in the current models (e.g., timezone field) - **pghistory Compatibility**: May have issues with some pghistory configurations - **Cloudflare Images**: Creates placeholder records without actual images - **Geographic Data**: Requires PostGIS for location features - **Transaction Management**: Uses atomic transactions which may fail completely if any model creation fails ## Development Notes ### Adding New Models 1. Import the model at the top of the file 2. Add to `models_to_clear` list if needed 3. Create a new `create_*` method 4. Call the method in `handle()` in proper dependency order 5. Add count to `print_summary()` ### Customizing Data - Modify the data lists (e.g., `first_names`, `famous_parks`) to customize generated data - Adjust probability weights for different scenarios - Add new relationship patterns as needed ## Performance ### Optimization Tips - Use `--clear` sparingly in production-like environments - Consider smaller batch sizes for very large datasets - Monitor database performance during seeding ### Typical Performance - 25 users, 15 companies, 10 parks, 50 rides: ~30 seconds - 100 users, 50 companies, 25 parks, 200 rides: ~2-3 minutes ## Security Notes - **Default Passwords**: All seeded users have simple passwords for development only - **Admin Access**: Creates admin user with known credentials - **Production Warning**: Never run with `--clear` in production environments ## Future Enhancements - **Bulk Operations**: Use bulk_create for better performance - **Custom Scenarios**: Add preset scenarios (small, medium, large) - **Data Export**: Add ability to export seeded data - **Incremental Updates**: Support for updating existing data