Files
thrilltrack-explorer/docs/ATOMIC_APPROVAL_TRANSACTIONS.md
gpt-engineer-app[bot] 67525173cb Approve tool use
The user has approved the tool use.
2025-11-06 20:15:14 +00:00

8.9 KiB

Atomic Approval Transactions

Overview

Phase 1 of the atomic transaction RPC implementation has been completed. This replaces the error-prone manual rollback logic in the process-selective-approval edge function with a true PostgreSQL ACID transaction.

Architecture

OLD Flow (process-selective-approval)

Edge Function (2,759 lines) ──┐
  ├─ Create entity 1         ├─ Manual rollback on error
  ├─ Create entity 2         ├─ Network failure = orphaned data
  ├─ Create entity 3         ├─ Edge function crash = partial state
  └─ Manual rollback if error─┘

NEW Flow (process-selective-approval-v2)

Edge Function (~200 lines)
  │
  └──> RPC: process_approval_transaction()
       │
       └──> PostgreSQL Transaction ───────────┐
            ├─ Create entity 1              │
            ├─ Create entity 2              │ ATOMIC
            ├─ Create entity 3              │ (all-or-nothing)
            └─ Commit OR Rollback ──────────┘
               (any error = auto rollback)

Key Benefits

True ACID Transactions: All operations succeed or fail together
Automatic Rollback: ANY error triggers immediate rollback
Network Resilient: Edge function crash = automatic rollback
Zero Orphaned Entities: Impossible by design
Simpler Code: Edge function reduced from 2,759 to ~200 lines

Database Functions Created

Main Transaction Function

process_approval_transaction(
  p_submission_id UUID,
  p_item_ids UUID[],
  p_moderator_id UUID,
  p_submitter_id UUID,
  p_request_id TEXT DEFAULT NULL
) RETURNS JSONB

Helper Functions

  • create_entity_from_submission() - Creates entities (parks, rides, companies, etc.)
  • update_entity_from_submission() - Updates existing entities
  • delete_entity_from_submission() - Soft/hard deletes entities

Monitoring Table

  • approval_transaction_metrics - Tracks performance, success rate, and rollbacks

Feature Flag

The new flow is disabled by default to allow gradual rollout and testing.

Enabling the New Flow

For Moderators (via Admin UI)

  1. Navigate to Admin Settings
  2. Find "Approval Transaction Mode" card
  3. Toggle "Use Atomic Transaction RPC" to ON
  4. Page will reload automatically

Programmatically

// Enable
localStorage.setItem('use_rpc_approval', 'true');

// Disable
localStorage.setItem('use_rpc_approval', 'false');

// Check status
const isEnabled = localStorage.getItem('use_rpc_approval') === 'true';

Testing Checklist

Basic Functionality ✓

  • Enable feature flag via admin UI
  • Approve a simple submission (1-2 items)
  • Verify entities created correctly
  • Check console logs for "Using edge function: process-selective-approval-v2"
  • Verify version history shows correct attribution

Error Scenarios ✓

  • Submit invalid data → verify full rollback
  • Trigger validation error → verify no partial state
  • Kill edge function mid-execution → verify auto rollback
  • Check logs for "Transaction failed, rolling back" messages

Concurrent Operations ✓

  • Two moderators approve same submission → one succeeds, one gets locked error
  • Verify only one set of entities created (no duplicates)

Data Integrity ✓

  • Run orphaned entity check (see SQL query below)
  • Verify session variables cleared after transaction
  • Check approval_transaction_metrics for success rate

Monitoring Queries

Check for Orphaned Entities

-- Should return 0 rows after migration
SELECT 
  'parks' as table_name,
  COUNT(*) as orphaned_count
FROM parks p
WHERE NOT EXISTS (
  SELECT 1 FROM park_versions pv
  WHERE pv.park_id = p.id
)
AND p.created_at > NOW() - INTERVAL '24 hours'

UNION ALL

SELECT 
  'rides' as table_name,
  COUNT(*) as orphaned_count
FROM rides r
WHERE NOT EXISTS (
  SELECT 1 FROM ride_versions rv
  WHERE rv.ride_id = r.id
)
AND r.created_at > NOW() - INTERVAL '24 hours';

Transaction Success Rate

SELECT 
  DATE_TRUNC('hour', created_at) as hour,
  COUNT(*) as total_transactions,
  COUNT(*) FILTER (WHERE success) as successful,
  COUNT(*) FILTER (WHERE rollback_triggered) as rollbacks,
  ROUND(AVG(duration_ms), 2) as avg_duration_ms,
  ROUND(100.0 * COUNT(*) FILTER (WHERE success) / COUNT(*), 2) as success_rate
FROM approval_transaction_metrics
WHERE created_at > NOW() - INTERVAL '24 hours'
GROUP BY hour
ORDER BY hour DESC;

Rollback Rate Alert

-- Alert if rollback_rate > 5%
SELECT 
  COUNT(*) FILTER (WHERE rollback_triggered) as rollbacks,
  COUNT(*) as total_attempts,
  ROUND(100.0 * COUNT(*) FILTER (WHERE rollback_triggered) / COUNT(*), 2) as rollback_rate
FROM approval_transaction_metrics
WHERE created_at > NOW() - INTERVAL '1 hour'
HAVING COUNT(*) FILTER (WHERE rollback_triggered) > 0;

Rollback Plan

If issues are detected after enabling the new flow:

Immediate Rollback (< 5 minutes)

// Disable feature flag globally (or ask users to toggle off)
localStorage.setItem('use_rpc_approval', 'false');
window.location.reload();

Data Recovery (if needed)

-- Identify submissions processed with v2 during problem window
SELECT 
  atm.submission_id,
  atm.created_at,
  atm.success,
  atm.error_message
FROM approval_transaction_metrics atm
WHERE atm.created_at BETWEEN '2025-11-06 19:00:00' AND '2025-11-06 20:00:00'
  AND atm.success = false
  AND atm.rollback_triggered = true;

-- Check for orphaned entities (if any exist)
-- Use the orphaned entity query above

Success Metrics

After full rollout, these metrics should be achieved:

Metric Target Current
Zero orphaned entities 0 ✓ TBD
Zero manual rollback logs 0 ✓ TBD
Transaction success rate >99% ✓ TBD
Avg transaction time <500ms ✓ TBD
Rollback rate <1% ✓ TBD

Deployment Phases

Phase 1: COMPLETE

  • Create RPC functions (helper + main transaction)
  • Create new edge function v2
  • Add feature flag support to frontend
  • Create admin UI toggle
  • Add monitoring table + RLS policies

Phase 2: 🟡 IN PROGRESS

  • Test with single moderator account
  • Monitor metrics for 24 hours
  • Verify zero orphaned entities
  • Collect feedback from test moderator

Phase 3: 🔲 PENDING

  • Enable for 10% of requests (weighted sampling)
  • Monitor for 24 hours
  • Check rollback rate < 1%

Phase 4: 🔲 PENDING

  • Enable for 50% of requests
  • Monitor for 48 hours
  • Compare performance metrics with old flow

Phase 5: 🔲 PENDING

  • Enable for 100% of requests
  • Monitor for 1 week
  • Mark old edge function as deprecated

Phase 6: 🔲 PENDING

  • Remove old edge function
  • Archive manual rollback code
  • Update all documentation

Troubleshooting

Issue: Feature flag not working

Symptom: Logs still show "process-selective-approval" even with flag enabled
Solution: Clear localStorage and reload: localStorage.clear(); window.location.reload()

Issue: "RPC function not found" error

Symptom: Edge function fails with "process_approval_transaction not found"
Solution: Run the migration again or check function exists:

SELECT proname FROM pg_proc WHERE proname = 'process_approval_transaction';

Issue: High rollback rate (>5%)

Symptom: Many transactions rolling back in metrics
Solution:

  1. Check error messages in approval_transaction_metrics.error_message
  2. Disable feature flag immediately
  3. Investigate root cause (validation issues, data integrity, etc.)

Issue: Orphaned entities detected

Symptom: Entities exist without corresponding versions
Solution:

  1. Disable feature flag immediately
  2. Run orphaned entity query to identify affected entities
  3. Investigate cause (likely edge function crash during v1 flow)
  4. Consider data cleanup (manual deletion or version creation)

FAQ

Q: Can I switch back to the old flow without data loss?
A: Yes. Simply toggle off the feature flag. All data remains intact.

Q: What happens if the edge function crashes mid-transaction?
A: PostgreSQL automatically rolls back the entire transaction. No orphaned data.

Q: How do I know which flow approved a submission?
A: Check approval_transaction_metrics table. If a row exists, v2 was used.

Q: Can I use both flows simultaneously?
A: Yes. The feature flag is per-browser, so different moderators can use different flows.

Q: When will the old flow be removed?
A: After 30 days of stable operation at 100% rollout (Phase 6).

References