# Database Optimization Guide This document describes the database optimization strategies implemented in ThrillWiki. ## Overview The application uses several optimization techniques to ensure fast query performance: 1. **Indexing Strategy** - Strategic use of B-tree, GIN, and composite indexes 2. **Query Optimization** - Proper use of `select_related` and `prefetch_related` 3. **Computed Fields** - Pre-computed values for common aggregations 4. **Manager Methods** - Optimized query patterns encapsulated in managers ## Indexing Strategy ### B-tree Indexes (Standard) Standard B-tree indexes are used for fields that are frequently filtered or sorted: | Model | Field | Index Type | Purpose | |-------|-------|------------|---------| | User | `is_banned` | B-tree | Fast filtering of banned users | | User | `role` | B-tree | Fast filtering by user role | | User | `(is_banned, role)` | Composite | Common query pattern | | Park | `status` | B-tree | Filter by park status | | Park | `search_text` | GIN trigram | Full-text search | | Ride | `status` | B-tree | Filter by ride status | | Ride | `search_text` | GIN trigram | Full-text search | ### GIN Indexes GIN (Generalized Inverted Index) indexes are used for array fields and full-text search: | Model | Field | Purpose | |-------|-------|---------| | Company | `roles` | Fast array containment queries (`roles__contains=["MANUFACTURER"]`) | | Park | `search_text` | Full-text search with trigram similarity | | Ride | `search_text` | Full-text search with trigram similarity | ### Creating GIN Indexes ```sql -- Array containment index CREATE INDEX IF NOT EXISTS parks_company_roles_gin_idx ON parks_company USING gin(roles); -- Full-text search index (if using tsvector) CREATE INDEX IF NOT EXISTS parks_park_search_idx ON parks_park USING gin(search_text gin_trgm_ops); ``` ## Query Optimization Patterns ### Manager Methods The application uses custom managers with optimized query methods: #### Park Queries ```python # List view - includes prefetched relations and stats parks = Park.objects.optimized_for_list() # Detail view - deep prefetching for all related data park = Park.objects.optimized_for_detail().get(slug='magic-kingdom') # Map display - minimal fields for markers parks = Park.objects.for_map_display() # Autocomplete - limited fields, fast lookup results = Park.objects.get_queryset().search_autocomplete(query='disney', limit=10) ``` #### Ride Queries ```python # List view with related objects rides = Ride.objects.optimized_for_list() # Detail view with stats ride = Ride.objects.optimized_for_detail().get(slug='space-mountain') # With coaster statistics rides = Ride.objects.with_coaster_stats().filter(category='RC') ``` #### Company Queries ```python # Manufacturers with ride counts manufacturers = Company.objects.manufacturers_with_ride_count() # Designers with ride counts designers = Company.objects.designers_with_ride_count() # Operators with park counts operators = Company.objects.operators_with_park_count() ``` ### Avoiding N+1 Queries Always use the optimized manager methods instead of raw queries: ```python # BAD - causes N+1 queries for park in Park.objects.all(): print(park.operator.name) # Each access hits DB # GOOD - single query with prefetch for park in Park.objects.optimized_for_list(): print(park.operator.name) # Already loaded ``` ### Using only() for Minimal Data When you only need specific fields, use `only()`: ```python # Only fetch necessary fields companies = Company.objects.filter(roles__contains=["MANUFACTURER"]).only( 'id', 'name', 'slug', 'roles' ) ``` ## Computed Fields ### Park Computed Fields | Field | Description | Updated When | |-------|-------------|--------------| | `ride_count` | Number of operating rides | Ride created/deleted/status changed | | `coaster_count` | Number of operating coasters | Ride created/deleted/status changed | | `opening_year` | Year extracted from opening_date | Park saved with opening_date | | `search_text` | Combined searchable text | Park/Location/Company name changes | ### Ride Computed Fields | Field | Description | Updated When | |-------|-------------|--------------| | `opening_year` | Year extracted from opening_date | Ride saved with opening_date | | `search_text` | Combined searchable text | Ride/Park/Company/RideModel changes | ### Signal Handlers Signals automatically update computed fields: ```python # When a park location changes, update search_text @receiver(post_save, sender='parks.ParkLocation') def update_park_search_text_on_location_change(sender, instance, **kwargs): if hasattr(instance, 'park') and instance.park: instance.park._populate_computed_fields() instance.park.save(update_fields=['search_text']) ``` ## CheckConstraints Database-level constraints ensure data integrity: ### User Constraints ```python # Banned users must have a ban_date models.CheckConstraint( name='user_ban_consistency', check=models.Q(is_banned=False) | models.Q(ban_date__isnull=False), ) ``` ### RideModel Constraints ```python # Unique name per manufacturer models.UniqueConstraint( fields=['manufacturer', 'name'], name='ridemodel_manufacturer_name_unique', ) # Installation year range must be valid models.CheckConstraint( name="ride_model_installation_years_logical", condition=models.Q(first_installation_year__isnull=True) | models.Q(last_installation_year__isnull=True) | models.Q(first_installation_year__lte=models.F("last_installation_year")), ) ``` ## Performance Benchmarking Use the benchmark script to measure query performance: ```bash # Run benchmarks python manage.py shell < scripts/benchmark_queries.py ``` Key metrics to monitor: - Average query time (< 100ms for list views, < 50ms for detail views) - Number of queries per operation (avoid N+1 patterns) - Index usage (check query plans) ## Migration Best Practices ### Adding Indexes ```python # Use RunSQL for GIN indexes (not natively supported by Django) migrations.RunSQL( sql="CREATE INDEX IF NOT EXISTS ... USING gin(...)", reverse_sql="DROP INDEX IF EXISTS ..." ) ``` ### Adding Constraints ```python # Use AddConstraint for proper dependency handling migrations.AddConstraint( model_name='user', constraint=models.CheckConstraint(...) ) ``` ### Rollback Procedures Each migration should be reversible: ```bash # Rollback specific migration python manage.py migrate accounts 0012 # Check migration plan before applying python manage.py migrate --plan ``` ## Monitoring ### Query Analysis Enable query logging in development: ```python LOGGING = { 'handlers': { 'console': {'class': 'logging.StreamHandler'}, }, 'loggers': { 'django.db.backends': { 'level': 'DEBUG', 'handlers': ['console'], } } } ``` ### Index Usage Check if indexes are being used: ```sql EXPLAIN ANALYZE SELECT * FROM parks_park WHERE status = 'OPERATING'; ``` ## Quick Reference ### Common Query Patterns | Operation | Method | |-----------|--------| | Park list page | `Park.objects.optimized_for_list()` | | Park detail page | `Park.objects.optimized_for_detail()` | | Map markers | `Park.objects.for_map_display()` | | Search autocomplete | `Park.objects.get_queryset().search_autocomplete()` | | Ride list page | `Ride.objects.optimized_for_list()` | | Ride detail page | `Ride.objects.optimized_for_detail()` | | Manufacturer list | `Company.objects.manufacturers_with_ride_count()` | | Operator list | `Company.objects.operators_with_park_count()` | ### Index Commands ```sql -- List all indexes for a table \di+ parks_park* -- Check index usage statistics SELECT * FROM pg_stat_user_indexes WHERE relname = 'parks_park'; -- Rebuild an index REINDEX INDEX parks_company_roles_gin_idx; ```