Files
thrillwiki_django_no_react/backend/apps/api/management/commands/README.md
pacnpal 35f8d0ef8f Implement hybrid filtering strategy for parks and rides
- Added comprehensive documentation for hybrid filtering implementation, including architecture, API endpoints, performance characteristics, and usage examples.
- Developed a hybrid pagination and client-side filtering recommendation, detailing server-side responsibilities and client-side logic.
- Created a test script for hybrid filtering endpoints, covering various test cases including basic filtering, search functionality, pagination, and edge cases.
2025-09-14 21:07:17 -04:00

5.8 KiB

ThrillWiki Data Seeding Script

Overview

The seed_data.py management command provides comprehensive test data seeding for the ThrillWiki application. It creates realistic data across all models in the system for testing and development purposes.

Usage

Basic Usage

# Seed with default counts
uv run manage.py seed_data

# Clear existing data and seed fresh
uv run manage.py seed_data --clear

# Custom counts
uv run manage.py seed_data --users 50 --parks 20 --rides 100 --reviews 200

Command Options

  • --clear: Clear existing data before seeding
  • --users N: Number of users to create (default: 25)
  • --companies N: Number of companies to create (default: 15)
  • --parks N: Number of parks to create (default: 10)
  • --rides N: Number of rides to create (default: 50)
  • --ride-models N: Number of ride models to create (default: 20)
  • --reviews N: Number of reviews to create (default: 100)

What Gets Created

Users & Accounts

  • Admin User: admin / admin123 (superuser)
  • Moderator User: moderator / mod123 (staff)
  • Regular Users: Random realistic users with profiles
  • User Profiles: Complete with ride credits, social links, preferences
  • Notifications: Sample notifications for users
  • Top Lists: User-created top lists for parks and rides

Companies

  • Park Operators: Disney, Universal, Six Flags, Cedar Fair, etc.
  • Ride Manufacturers: B&M, Intamin, Vekoma, RMC, etc.
  • Ride Designers: Werner Stengel, Alan Schilke, John Wardley
  • Company Headquarters: Realistic address data

Parks & Locations

  • Famous Parks: Magic Kingdom, Disneyland, Cedar Point, etc.
  • Park Locations: Geographic coordinates and addresses
  • Park Areas: Themed areas within parks
  • Park Photos: Sample photo records

Rides & Models

  • Famous Coasters: Steel Vengeance, Millennium Force, etc.
  • Ride Models: B&M Dive Coaster, Intamin Accelerator, etc.
  • Roller Coaster Stats: Height, speed, inversions, etc.
  • Ride Photos: Sample photo records
  • Technical Specs: Detailed specifications for ride models

Content & Reviews

  • Park Reviews: User reviews with ratings and visit dates
  • Ride Reviews: Detailed ride experiences
  • Review Content: Realistic review text and ratings

Data Quality Features

Realistic Data

  • Names: Diverse, realistic user names
  • Locations: Accurate geographic coordinates
  • Relationships: Proper company-park-ride relationships
  • Statistics: Realistic ride statistics and ratings

Comprehensive Coverage

  • All Models: Seeds data for every model in the system
  • Relationships: Maintains proper foreign key relationships
  • Optional Models: Handles models that may not exist gracefully

Data Integrity

  • Unique Constraints: Uses get_or_create to avoid duplicates
  • Validation: Respects model constraints and validation rules
  • Dependencies: Creates data in proper dependency order

Technical Implementation

Architecture

  • Modular Design: Separate methods for each model type
  • Transaction Safety: All operations wrapped in database transaction
  • Error Handling: Graceful handling of missing optional models
  • Progress Reporting: Clear console output with emojis and counts

Model Handling

  • Dual Company Models: Properly handles separate Park and Ride company models
  • Optional Models: Checks for existence before using optional models
  • Type Safety: Proper type hints and error handling

Data Generation

  • Random but Realistic: Uses curated lists for realistic data
  • Configurable Counts: All counts are configurable via command line
  • Relationship Integrity: Maintains proper relationships between models

Troubleshooting

Common Issues

  1. Database Schema Mismatch: If you see timezone constraint errors, run migrations first:

    uv run manage.py migrate
    
  2. Permission Errors: Ensure database user has proper permissions for all operations

  3. Memory Issues: For large datasets, consider running with smaller batches

Known Limitations

  • Database Schema Compatibility: May encounter issues with database schemas that have additional required fields not present in the current models (e.g., timezone field)
  • pghistory Compatibility: May have issues with some pghistory configurations
  • Cloudflare Images: Creates placeholder records without actual images
  • Geographic Data: Requires PostGIS for location features
  • Transaction Management: Uses atomic transactions which may fail completely if any model creation fails

Development Notes

Adding New Models

  1. Import the model at the top of the file
  2. Add to models_to_clear list if needed
  3. Create a new create_* method
  4. Call the method in handle() in proper dependency order
  5. Add count to print_summary()

Customizing Data

  • Modify the data lists (e.g., first_names, famous_parks) to customize generated data
  • Adjust probability weights for different scenarios
  • Add new relationship patterns as needed

Performance

Optimization Tips

  • Use --clear sparingly in production-like environments
  • Consider smaller batch sizes for very large datasets
  • Monitor database performance during seeding

Typical Performance

  • 25 users, 15 companies, 10 parks, 50 rides: ~30 seconds
  • 100 users, 50 companies, 25 parks, 200 rides: ~2-3 minutes

Security Notes

  • Default Passwords: All seeded users have simple passwords for development only
  • Admin Access: Creates admin user with known credentials
  • Production Warning: Never run with --clear in production environments

Future Enhancements

  • Bulk Operations: Use bulk_create for better performance
  • Custom Scenarios: Add preset scenarios (small, medium, large)
  • Data Export: Add ability to export seeded data
  • Incremental Updates: Support for updating existing data