mirror of https://github.com/pacnpal/thrillwiki_django_no_react.git synced 2025-12-20 10:51:09 -05:00

Files

pacnpal 35f8d0ef8f Implement hybrid filtering strategy for parks and rides

- Added comprehensive documentation for hybrid filtering implementation, including architecture, API endpoints, performance characteristics, and usage examples.
- Developed a hybrid pagination and client-side filtering recommendation, detailing server-side responsibilities and client-side logic.
- Created a test script for hybrid filtering endpoints, covering various test cases including basic filtering, search functionality, pagination, and edge cases.

2025-09-14 21:07:17 -04:00

5.8 KiB

Raw Blame History

ThrillWiki Data Seeding Script

Overview

The seed_data.py management command provides comprehensive test data seeding for the ThrillWiki application. It creates realistic data across all models in the system for testing and development purposes.

Usage

Basic Usage

# Seed with default counts
uv run manage.py seed_data

# Clear existing data and seed fresh
uv run manage.py seed_data --clear

# Custom counts
uv run manage.py seed_data --users 50 --parks 20 --rides 100 --reviews 200

Command Options

--clear: Clear existing data before seeding
--users N: Number of users to create (default: 25)
--companies N: Number of companies to create (default: 15)
--parks N: Number of parks to create (default: 10)
--rides N: Number of rides to create (default: 50)
--ride-models N: Number of ride models to create (default: 20)
--reviews N: Number of reviews to create (default: 100)

What Gets Created

Users & Accounts

Admin User: admin / admin123 (superuser)
Moderator User: moderator / mod123 (staff)
Regular Users: Random realistic users with profiles
User Profiles: Complete with ride credits, social links, preferences
Notifications: Sample notifications for users
Top Lists: User-created top lists for parks and rides

Companies

Park Operators: Disney, Universal, Six Flags, Cedar Fair, etc.
Ride Manufacturers: B&M, Intamin, Vekoma, RMC, etc.
Ride Designers: Werner Stengel, Alan Schilke, John Wardley
Company Headquarters: Realistic address data

Parks & Locations

Famous Parks: Magic Kingdom, Disneyland, Cedar Point, etc.
Park Locations: Geographic coordinates and addresses
Park Areas: Themed areas within parks
Park Photos: Sample photo records

Rides & Models

Famous Coasters: Steel Vengeance, Millennium Force, etc.
Ride Models: B&M Dive Coaster, Intamin Accelerator, etc.
Roller Coaster Stats: Height, speed, inversions, etc.
Ride Photos: Sample photo records
Technical Specs: Detailed specifications for ride models

Content & Reviews

Park Reviews: User reviews with ratings and visit dates
Ride Reviews: Detailed ride experiences
Review Content: Realistic review text and ratings

Data Quality Features

Realistic Data

Names: Diverse, realistic user names
Locations: Accurate geographic coordinates
Relationships: Proper company-park-ride relationships
Statistics: Realistic ride statistics and ratings

Comprehensive Coverage

All Models: Seeds data for every model in the system
Relationships: Maintains proper foreign key relationships
Optional Models: Handles models that may not exist gracefully

Data Integrity

Unique Constraints: Uses get_or_create to avoid duplicates
Validation: Respects model constraints and validation rules
Dependencies: Creates data in proper dependency order

Technical Implementation

Architecture

Modular Design: Separate methods for each model type
Transaction Safety: All operations wrapped in database transaction
Error Handling: Graceful handling of missing optional models
Progress Reporting: Clear console output with emojis and counts

Model Handling

Dual Company Models: Properly handles separate Park and Ride company models
Optional Models: Checks for existence before using optional models
Type Safety: Proper type hints and error handling

Data Generation

Random but Realistic: Uses curated lists for realistic data
Configurable Counts: All counts are configurable via command line
Relationship Integrity: Maintains proper relationships between models

Troubleshooting

Common Issues

Database Schema Mismatch: If you see timezone constraint errors, run migrations first:
```
uv run manage.py migrate
```
Permission Errors: Ensure database user has proper permissions for all operations
Memory Issues: For large datasets, consider running with smaller batches

Known Limitations

Database Schema Compatibility: May encounter issues with database schemas that have additional required fields not present in the current models (e.g., timezone field)
pghistory Compatibility: May have issues with some pghistory configurations
Cloudflare Images: Creates placeholder records without actual images
Geographic Data: Requires PostGIS for location features
Transaction Management: Uses atomic transactions which may fail completely if any model creation fails

Development Notes

Adding New Models

Import the model at the top of the file
Add to models_to_clear list if needed
Create a new create_* method
Call the method in handle() in proper dependency order
Add count to print_summary()

Customizing Data

Modify the data lists (e.g., first_names, famous_parks) to customize generated data
Adjust probability weights for different scenarios
Add new relationship patterns as needed

Performance

Optimization Tips

Use --clear sparingly in production-like environments
Consider smaller batch sizes for very large datasets
Monitor database performance during seeding

Typical Performance

25 users, 15 companies, 10 parks, 50 rides: ~30 seconds
100 users, 50 companies, 25 parks, 200 rides: ~2-3 minutes

Security Notes

Default Passwords: All seeded users have simple passwords for development only
Admin Access: Creates admin user with known credentials
Production Warning: Never run with --clear in production environments

Future Enhancements

Bulk Operations: Use bulk_create for better performance
Custom Scenarios: Add preset scenarios (small, medium, large)
Data Export: Add ability to export seeded data
Incremental Updates: Support for updating existing data

5.8 KiB Raw Blame History