# ThrillWiki Data Seeding Script

## Overview

The `seed_data.py` management command provides comprehensive test data seeding for the ThrillWiki application. It creates realistic data across all models in the system for testing and development purposes.

## Usage

### Basic Usage
```bash
# Seed with default counts
uv run manage.py seed_data

# Clear existing data and seed fresh
uv run manage.py seed_data --clear

# Custom counts
uv run manage.py seed_data --users 50 --parks 20 --rides 100 --reviews 200
```

### Command Options

- `--clear`: Clear existing data before seeding
- `--users N`: Number of users to create (default: 25)
- `--companies N`: Number of companies to create (default: 15)
- `--parks N`: Number of parks to create (default: 10)
- `--rides N`: Number of rides to create (default: 50)
- `--ride-models N`: Number of ride models to create (default: 20)
- `--reviews N`: Number of reviews to create (default: 100)

## What Gets Created

### Users & Accounts
- **Admin User**: `admin` / `admin123` (superuser)
- **Moderator User**: `moderator` / `mod123` (staff)
- **Regular Users**: Random realistic users with profiles
- **User Profiles**: Complete with ride credits, social links, preferences
- **Notifications**: Sample notifications for users
- **Top Lists**: User-created top lists for parks and rides

### Companies
- **Park Operators**: Disney, Universal, Six Flags, Cedar Fair, etc.
- **Ride Manufacturers**: B&M, Intamin, Vekoma, RMC, etc.
- **Ride Designers**: Werner Stengel, Alan Schilke, John Wardley
- **Company Headquarters**: Realistic address data

### Parks & Locations
- **Famous Parks**: Magic Kingdom, Disneyland, Cedar Point, etc.
- **Park Locations**: Geographic coordinates and addresses
- **Park Areas**: Themed areas within parks
- **Park Photos**: Sample photo records

### Rides & Models
- **Famous Coasters**: Steel Vengeance, Millennium Force, etc.
- **Ride Models**: B&M Dive Coaster, Intamin Accelerator, etc.
- **Roller Coaster Stats**: Height, speed, inversions, etc.
- **Ride Photos**: Sample photo records
- **Technical Specs**: Detailed specifications for ride models

### Content & Reviews
- **Park Reviews**: User reviews with ratings and visit dates
- **Ride Reviews**: Detailed ride experiences
- **Review Content**: Realistic review text and ratings

## Data Quality Features

### Realistic Data
- **Names**: Diverse, realistic user names
- **Locations**: Accurate geographic coordinates
- **Relationships**: Proper company-park-ride relationships
- **Statistics**: Realistic ride statistics and ratings

### Comprehensive Coverage
- **All Models**: Seeds data for every model in the system
- **Relationships**: Maintains proper foreign key relationships
- **Optional Models**: Handles models that may not exist gracefully

### Data Integrity
- **Unique Constraints**: Uses `get_or_create` to avoid duplicates
- **Validation**: Respects model constraints and validation rules
- **Dependencies**: Creates data in proper dependency order

## Technical Implementation

### Architecture
- **Modular Design**: Separate methods for each model type
- **Transaction Safety**: All operations wrapped in database transaction
- **Error Handling**: Graceful handling of missing optional models
- **Progress Reporting**: Clear console output with emojis and counts

### Model Handling
- **Dual Company Models**: Properly handles separate Park and Ride company models
- **Optional Models**: Checks for existence before using optional models
- **Type Safety**: Proper type hints and error handling

### Data Generation
- **Random but Realistic**: Uses curated lists for realistic data
- **Configurable Counts**: All counts are configurable via command line
- **Relationship Integrity**: Maintains proper relationships between models

## Troubleshooting

### Common Issues

1. **Database Schema Mismatch**: If you see timezone constraint errors, run migrations first:
   ```bash
   uv run manage.py migrate
   ```

2. **Permission Errors**: Ensure database user has proper permissions for all operations

3. **Memory Issues**: For large datasets, consider running with smaller batches

### Known Limitations

- **Database Schema Compatibility**: May encounter issues with database schemas that have additional required fields not present in the current models (e.g., timezone field)
- **pghistory Compatibility**: May have issues with some pghistory configurations
- **Cloudflare Images**: Creates placeholder records without actual images
- **Geographic Data**: Requires PostGIS for location features
- **Transaction Management**: Uses atomic transactions which may fail completely if any model creation fails

## Development Notes

### Adding New Models
1. Import the model at the top of the file
2. Add to `models_to_clear` list if needed
3. Create a new `create_*` method
4. Call the method in `handle()` in proper dependency order
5. Add count to `print_summary()`

### Customizing Data
- Modify the data lists (e.g., `first_names`, `famous_parks`) to customize generated data
- Adjust probability weights for different scenarios
- Add new relationship patterns as needed

## Performance

### Optimization Tips
- Use `--clear` sparingly in production-like environments
- Consider smaller batch sizes for very large datasets
- Monitor database performance during seeding

### Typical Performance
- 25 users, 15 companies, 10 parks, 50 rides: ~30 seconds
- 100 users, 50 companies, 25 parks, 200 rides: ~2-3 minutes

## Security Notes

- **Default Passwords**: All seeded users have simple passwords for development only
- **Admin Access**: Creates admin user with known credentials
- **Production Warning**: Never run with `--clear` in production environments

## Future Enhancements

- **Bulk Operations**: Use bulk_create for better performance
- **Custom Scenarios**: Add preset scenarios (small, medium, large)
- **Data Export**: Add ability to export seeded data
- **Incremental Updates**: Support for updating existing data