mirror of
https://github.com/pacnpal/thrillwiki_django_no_react.git
synced 2025-12-20 05:51:08 -05:00
ThrillWiki Data Seeding Script
Overview
The seed_data.py management command provides comprehensive test data seeding for the ThrillWiki application. It creates realistic data across all models in the system for testing and development purposes.
Usage
Basic Usage
# Seed with default counts
uv run manage.py seed_data
# Clear existing data and seed fresh
uv run manage.py seed_data --clear
# Custom counts
uv run manage.py seed_data --users 50 --parks 20 --rides 100 --reviews 200
Command Options
--clear: Clear existing data before seeding--users N: Number of users to create (default: 25)--companies N: Number of companies to create (default: 15)--parks N: Number of parks to create (default: 10)--rides N: Number of rides to create (default: 50)--ride-models N: Number of ride models to create (default: 20)--reviews N: Number of reviews to create (default: 100)
What Gets Created
Users & Accounts
- Admin User:
admin/admin123(superuser) - Moderator User:
moderator/mod123(staff) - Regular Users: Random realistic users with profiles
- User Profiles: Complete with ride credits, social links, preferences
- Notifications: Sample notifications for users
- Top Lists: User-created top lists for parks and rides
Companies
- Park Operators: Disney, Universal, Six Flags, Cedar Fair, etc.
- Ride Manufacturers: B&M, Intamin, Vekoma, RMC, etc.
- Ride Designers: Werner Stengel, Alan Schilke, John Wardley
- Company Headquarters: Realistic address data
Parks & Locations
- Famous Parks: Magic Kingdom, Disneyland, Cedar Point, etc.
- Park Locations: Geographic coordinates and addresses
- Park Areas: Themed areas within parks
- Park Photos: Sample photo records
Rides & Models
- Famous Coasters: Steel Vengeance, Millennium Force, etc.
- Ride Models: B&M Dive Coaster, Intamin Accelerator, etc.
- Roller Coaster Stats: Height, speed, inversions, etc.
- Ride Photos: Sample photo records
- Technical Specs: Detailed specifications for ride models
Content & Reviews
- Park Reviews: User reviews with ratings and visit dates
- Ride Reviews: Detailed ride experiences
- Review Content: Realistic review text and ratings
Data Quality Features
Realistic Data
- Names: Diverse, realistic user names
- Locations: Accurate geographic coordinates
- Relationships: Proper company-park-ride relationships
- Statistics: Realistic ride statistics and ratings
Comprehensive Coverage
- All Models: Seeds data for every model in the system
- Relationships: Maintains proper foreign key relationships
- Optional Models: Handles models that may not exist gracefully
Data Integrity
- Unique Constraints: Uses
get_or_createto avoid duplicates - Validation: Respects model constraints and validation rules
- Dependencies: Creates data in proper dependency order
Technical Implementation
Architecture
- Modular Design: Separate methods for each model type
- Transaction Safety: All operations wrapped in database transaction
- Error Handling: Graceful handling of missing optional models
- Progress Reporting: Clear console output with emojis and counts
Model Handling
- Dual Company Models: Properly handles separate Park and Ride company models
- Optional Models: Checks for existence before using optional models
- Type Safety: Proper type hints and error handling
Data Generation
- Random but Realistic: Uses curated lists for realistic data
- Configurable Counts: All counts are configurable via command line
- Relationship Integrity: Maintains proper relationships between models
Troubleshooting
Common Issues
-
Database Schema Mismatch: If you see timezone constraint errors, run migrations first:
uv run manage.py migrate -
Permission Errors: Ensure database user has proper permissions for all operations
-
Memory Issues: For large datasets, consider running with smaller batches
Known Limitations
- Database Schema Compatibility: May encounter issues with database schemas that have additional required fields not present in the current models (e.g., timezone field)
- pghistory Compatibility: May have issues with some pghistory configurations
- Cloudflare Images: Creates placeholder records without actual images
- Geographic Data: Requires PostGIS for location features
- Transaction Management: Uses atomic transactions which may fail completely if any model creation fails
Development Notes
Adding New Models
- Import the model at the top of the file
- Add to
models_to_clearlist if needed - Create a new
create_*method - Call the method in
handle()in proper dependency order - Add count to
print_summary()
Customizing Data
- Modify the data lists (e.g.,
first_names,famous_parks) to customize generated data - Adjust probability weights for different scenarios
- Add new relationship patterns as needed
Performance
Optimization Tips
- Use
--clearsparingly in production-like environments - Consider smaller batch sizes for very large datasets
- Monitor database performance during seeding
Typical Performance
- 25 users, 15 companies, 10 parks, 50 rides: ~30 seconds
- 100 users, 50 companies, 25 parks, 200 rides: ~2-3 minutes
Security Notes
- Default Passwords: All seeded users have simple passwords for development only
- Admin Access: Creates admin user with known credentials
- Production Warning: Never run with
--clearin production environments
Future Enhancements
- Bulk Operations: Use bulk_create for better performance
- Custom Scenarios: Add preset scenarios (small, medium, large)
- Data Export: Add ability to export seeded data
- Incremental Updates: Support for updating existing data