Files
thrillwiki_django_no_react/docs/database-optimization.md
pacnpal 2e35f8c5d9 feat: Refactor rides app with unique constraints, mixins, and enhanced documentation
- Added migration to convert unique_together constraints to UniqueConstraint for RideModel.
- Introduced RideFormMixin for handling entity suggestions in ride forms.
- Created comprehensive code standards documentation outlining formatting, docstring requirements, complexity guidelines, and testing requirements.
- Established error handling guidelines with a structured exception hierarchy and best practices for API and view error handling.
- Documented view pattern guidelines, emphasizing the use of CBVs, FBVs, and ViewSets with examples.
- Implemented a benchmarking script for query performance analysis and optimization.
- Developed security documentation detailing measures, configurations, and a security checklist.
- Compiled a database optimization guide covering indexing strategies, query optimization patterns, and computed fields.
2025-12-22 11:17:31 -05:00

7.7 KiB

Database Optimization Guide

This document describes the database optimization strategies implemented in ThrillWiki.

Overview

The application uses several optimization techniques to ensure fast query performance:

  1. Indexing Strategy - Strategic use of B-tree, GIN, and composite indexes
  2. Query Optimization - Proper use of select_related and prefetch_related
  3. Computed Fields - Pre-computed values for common aggregations
  4. Manager Methods - Optimized query patterns encapsulated in managers

Indexing Strategy

B-tree Indexes (Standard)

Standard B-tree indexes are used for fields that are frequently filtered or sorted:

Model Field Index Type Purpose
User is_banned B-tree Fast filtering of banned users
User role B-tree Fast filtering by user role
User (is_banned, role) Composite Common query pattern
Park status B-tree Filter by park status
Park search_text GIN trigram Full-text search
Ride status B-tree Filter by ride status
Ride search_text GIN trigram Full-text search

GIN Indexes

GIN (Generalized Inverted Index) indexes are used for array fields and full-text search:

Model Field Purpose
Company roles Fast array containment queries (roles__contains=["MANUFACTURER"])
Park search_text Full-text search with trigram similarity
Ride search_text Full-text search with trigram similarity

Creating GIN Indexes

-- Array containment index
CREATE INDEX IF NOT EXISTS parks_company_roles_gin_idx
    ON parks_company USING gin(roles);

-- Full-text search index (if using tsvector)
CREATE INDEX IF NOT EXISTS parks_park_search_idx
    ON parks_park USING gin(search_text gin_trgm_ops);

Query Optimization Patterns

Manager Methods

The application uses custom managers with optimized query methods:

Park Queries

# List view - includes prefetched relations and stats
parks = Park.objects.optimized_for_list()

# Detail view - deep prefetching for all related data
park = Park.objects.optimized_for_detail().get(slug='magic-kingdom')

# Map display - minimal fields for markers
parks = Park.objects.for_map_display()

# Autocomplete - limited fields, fast lookup
results = Park.objects.get_queryset().search_autocomplete(query='disney', limit=10)

Ride Queries

# List view with related objects
rides = Ride.objects.optimized_for_list()

# Detail view with stats
ride = Ride.objects.optimized_for_detail().get(slug='space-mountain')

# With coaster statistics
rides = Ride.objects.with_coaster_stats().filter(category='RC')

Company Queries

# Manufacturers with ride counts
manufacturers = Company.objects.manufacturers_with_ride_count()

# Designers with ride counts
designers = Company.objects.designers_with_ride_count()

# Operators with park counts
operators = Company.objects.operators_with_park_count()

Avoiding N+1 Queries

Always use the optimized manager methods instead of raw queries:

# BAD - causes N+1 queries
for park in Park.objects.all():
    print(park.operator.name)  # Each access hits DB

# GOOD - single query with prefetch
for park in Park.objects.optimized_for_list():
    print(park.operator.name)  # Already loaded

Using only() for Minimal Data

When you only need specific fields, use only():

# Only fetch necessary fields
companies = Company.objects.filter(roles__contains=["MANUFACTURER"]).only(
    'id', 'name', 'slug', 'roles'
)

Computed Fields

Park Computed Fields

Field Description Updated When
ride_count Number of operating rides Ride created/deleted/status changed
coaster_count Number of operating coasters Ride created/deleted/status changed
opening_year Year extracted from opening_date Park saved with opening_date
search_text Combined searchable text Park/Location/Company name changes

Ride Computed Fields

Field Description Updated When
opening_year Year extracted from opening_date Ride saved with opening_date
search_text Combined searchable text Ride/Park/Company/RideModel changes

Signal Handlers

Signals automatically update computed fields:

# When a park location changes, update search_text
@receiver(post_save, sender='parks.ParkLocation')
def update_park_search_text_on_location_change(sender, instance, **kwargs):
    if hasattr(instance, 'park') and instance.park:
        instance.park._populate_computed_fields()
        instance.park.save(update_fields=['search_text'])

CheckConstraints

Database-level constraints ensure data integrity:

User Constraints

# Banned users must have a ban_date
models.CheckConstraint(
    name='user_ban_consistency',
    check=models.Q(is_banned=False) | models.Q(ban_date__isnull=False),
)

RideModel Constraints

# Unique name per manufacturer
models.UniqueConstraint(
    fields=['manufacturer', 'name'],
    name='ridemodel_manufacturer_name_unique',
)

# Installation year range must be valid
models.CheckConstraint(
    name="ride_model_installation_years_logical",
    condition=models.Q(first_installation_year__isnull=True) |
              models.Q(last_installation_year__isnull=True) |
              models.Q(first_installation_year__lte=models.F("last_installation_year")),
)

Performance Benchmarking

Use the benchmark script to measure query performance:

# Run benchmarks
python manage.py shell < scripts/benchmark_queries.py

Key metrics to monitor:

  • Average query time (< 100ms for list views, < 50ms for detail views)
  • Number of queries per operation (avoid N+1 patterns)
  • Index usage (check query plans)

Migration Best Practices

Adding Indexes

# Use RunSQL for GIN indexes (not natively supported by Django)
migrations.RunSQL(
    sql="CREATE INDEX IF NOT EXISTS ... USING gin(...)",
    reverse_sql="DROP INDEX IF EXISTS ..."
)

Adding Constraints

# Use AddConstraint for proper dependency handling
migrations.AddConstraint(
    model_name='user',
    constraint=models.CheckConstraint(...)
)

Rollback Procedures

Each migration should be reversible:

# Rollback specific migration
python manage.py migrate accounts 0012

# Check migration plan before applying
python manage.py migrate --plan

Monitoring

Query Analysis

Enable query logging in development:

LOGGING = {
    'handlers': {
        'console': {'class': 'logging.StreamHandler'},
    },
    'loggers': {
        'django.db.backends': {
            'level': 'DEBUG',
            'handlers': ['console'],
        }
    }
}

Index Usage

Check if indexes are being used:

EXPLAIN ANALYZE SELECT * FROM parks_park WHERE status = 'OPERATING';

Quick Reference

Common Query Patterns

Operation Method
Park list page Park.objects.optimized_for_list()
Park detail page Park.objects.optimized_for_detail()
Map markers Park.objects.for_map_display()
Search autocomplete Park.objects.get_queryset().search_autocomplete()
Ride list page Ride.objects.optimized_for_list()
Ride detail page Ride.objects.optimized_for_detail()
Manufacturer list Company.objects.manufacturers_with_ride_count()
Operator list Company.objects.operators_with_park_count()

Index Commands

-- List all indexes for a table
\di+ parks_park*

-- Check index usage statistics
SELECT * FROM pg_stat_user_indexes WHERE relname = 'parks_park';

-- Rebuild an index
REINDEX INDEX parks_company_roles_gin_idx;