Files
pacnpal d504d41de2 feat: complete monorepo structure with frontend and shared resources
- Add complete backend/ directory with full Django application
- Add frontend/ directory with Vite + TypeScript setup ready for Next.js
- Add comprehensive shared/ directory with:
  - Complete documentation and memory-bank archives
  - Media files and avatars (letters, park/ride images)
  - Deployment scripts and automation tools
  - Shared types and utilities
- Add architecture/ directory with migration guides
- Configure pnpm workspace for monorepo development
- Update .gitignore to exclude .django_tailwind_cli/ build artifacts
- Preserve all historical documentation in shared/docs/memory-bank/
- Set up proper structure for full-stack development with shared resources
2025-08-23 18:40:07 -04:00

16 KiB

ThrillWiki Remote Deployment System

🚀 Bulletproof remote deployment with integrated GitHub authentication and automatic pull scheduling

Overview

The ThrillWiki Remote Deployment System provides a complete solution for deploying the ThrillWiki automation infrastructure to remote VMs via SSH/SCP. It includes integrated GitHub authentication setup and automatic pull scheduling configured as systemd services.

🎯 Key Features

  • 🔄 Bulletproof Remote Deployment - SSH/SCP-based deployment with connection testing and retry logic
  • 🔐 Integrated GitHub Authentication - Seamless PAT setup during deployment process
  • Automatic Pull Scheduling - Configurable intervals (default: 5 minutes) with systemd integration
  • 🛡️ Comprehensive Error Handling - Rollback capabilities and health validation
  • 📊 Multi-Host Support - Deploy to multiple VMs in parallel or sequentially
  • Health Validation - Real-time status reporting and post-deployment testing
  • 🔧 Multiple Deployment Presets - Dev, prod, demo, and testing configurations

🏗️ Architecture

┌─────────────────────────────────────────────────────────────────┐
│                    Local Development Machine                    │
├─────────────────────────────────────────────────────────────────┤
│  deploy-complete.sh (Orchestrator)                             │
│  ├── GitHub Authentication Setup                               │
│  ├── Multi-host Connectivity Testing                          │
│  └── Deployment Coordination                                  │
│                                                                │
│  remote-deploy.sh (Core Deployment)                           │
│  ├── SSH/SCP File Transfer                                    │
│  ├── Remote Environment Setup                                 │
│  ├── Service Configuration                                    │
│  └── Health Validation                                        │
└─────────────────────────────────────────────────────────────────┘
                              │ SSH/SCP
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                      Remote VM(s)                              │
├─────────────────────────────────────────────────────────────────┤
│  ThrillWiki Project Files                                      │
│  ├── bulletproof-automation.sh (5-min pull scheduling)        │
│  ├── GitHub PAT Authentication                                │
│  └── UV Package Management                                    │
│                                                               │
│  systemd Service                                              │
│  ├── thrillwiki-automation.service                           │
│  ├── Auto-start on boot                                      │
│  ├── Health monitoring                                       │
│  └── Automatic restart on failure                            │
└─────────────────────────────────────────────────────────────────┘

📁 File Structure

scripts/vm/
├── deploy-complete.sh              # 🎯 One-command complete deployment
├── remote-deploy.sh               # 🚀 Core remote deployment engine
├── bulletproof-automation.sh      # 🔄 Main automation with 5-min pulls
├── setup-automation.sh           # ⚙️ Interactive setup script
├── automation-config.sh          # 📋 Configuration management
├── github-setup.py               # 🔐 GitHub PAT authentication
├── quick-start.sh                # ⚡ Rapid setup with defaults
└── README.md                     # 📚 This documentation

scripts/systemd/
├── thrillwiki-automation.service  # 🛡️ systemd service definition
└── thrillwiki-automation***REMOVED***.example  # 📝 Environment template

🚀 Quick Start

1. One-Command Complete Deployment

Deploy the complete automation system to a remote VM:

# Basic deployment with interactive setup
./scripts/vm/deploy-complete.sh 192.168.1.100

# Production deployment with GitHub token
./scripts/vm/deploy-complete.sh --preset prod --token ghp_xxxxx production-server

# Multi-host parallel deployment
./scripts/vm/deploy-complete.sh --parallel host1 host2 host3

2. Preview Deployment (Dry Run)

See what would be deployed without making changes:

./scripts/vm/deploy-complete.sh --dry-run --preset prod 192.168.1.100

3. Development Environment Setup

Quick development deployment with frequent pulls:

./scripts/vm/deploy-complete.sh --preset dev --pull-interval 60 dev-server

🎛️ Deployment Options

Deployment Presets

Preset Pull Interval Use Case Features
dev 60s (1 min) Development Debug enabled, frequent updates
prod 300s (5 min) Production Security hardened, stable intervals
demo 120s (2 min) Demos Feature showcase, moderate updates
testing 180s (3 min) Testing Comprehensive monitoring

Command Options

deploy-complete.sh (Orchestrator)

./scripts/vm/deploy-complete.sh [OPTIONS] <host1> [host2] [host3]...

OPTIONS:
  -u, --user USER      Remote username (default: ubuntu)
  -p, --port PORT      SSH port (default: 22)
  -k, --key PATH       SSH private key file
  -t, --token TOKEN    GitHub Personal Access Token
  --preset PRESET      Deployment preset (dev/prod/demo/testing)
  --pull-interval SEC  Custom pull interval in seconds
  --skip-github        Skip GitHub authentication setup
  --parallel           Deploy to multiple hosts in parallel
  --dry-run           Preview deployment without executing
  --force             Force deployment even if target exists
  --debug             Enable debug logging

remote-deploy.sh (Core Engine)

./scripts/vm/remote-deploy.sh [OPTIONS] <remote_host>

OPTIONS:
  -u, --user USER      Remote username
  -p, --port PORT      SSH port
  -k, --key PATH       SSH private key file
  -d, --dest PATH      Remote destination path
  --github-token TOK   GitHub token for authentication
  --skip-github        Skip GitHub setup
  --skip-service       Skip systemd service setup
  --force             Force deployment
  --dry-run           Preview mode

🔐 GitHub Authentication

Automatic Setup

The deployment system automatically configures GitHub authentication:

  1. Interactive Setup - Guides you through PAT creation
  2. Token Validation - Tests API access and permissions
  3. Secure Storage - Stores tokens with proper file permissions
  4. Repository Access - Validates access to your ThrillWiki repository

Manual GitHub Token Setup

If you prefer to set up GitHub authentication manually:

# Create GitHub PAT at: https://github.com/settings/tokens
# Required scopes: repo (for private repos) or public_repo (for public repos)

# Use token during deployment
./scripts/vm/deploy-complete.sh --token ghp_your_token_here 192.168.1.100

# Or set as environment variable
export GITHUB_TOKEN=ghp_your_token_here
./scripts/vm/deploy-complete.sh 192.168.1.100

Automatic Pull Scheduling

Default Configuration

  • Pull Interval: 5 minutes (300 seconds)
  • Health Checks: Every 60 seconds
  • Auto-restart: On failure with 10-second delay
  • Systemd Integration: Auto-start on boot

Customization

# Custom pull intervals
./scripts/vm/deploy-complete.sh --pull-interval 120 192.168.1.100  # 2 minutes

# Development with frequent pulls
./scripts/vm/deploy-complete.sh --preset dev 192.168.1.100  # 1 minute

# Production with stable intervals
./scripts/vm/deploy-complete.sh --preset prod 192.168.1.100  # 5 minutes

Monitoring

# Monitor automation in real-time
ssh ubuntu@192.168.1.100 'sudo journalctl -u thrillwiki-automation -f'

# Check service status
ssh ubuntu@192.168.1.100 'sudo systemctl status thrillwiki-automation'

# View automation logs
ssh ubuntu@192.168.1.100 'tail -f [AWS-SECRET-REMOVED]-automation.log'

🛠️ Advanced Usage

Multi-Host Deployment

Deploy to multiple hosts simultaneously:

# Sequential deployment
./scripts/vm/deploy-complete.sh host1 host2 host3

# Parallel deployment (faster)
./scripts/vm/deploy-complete.sh --parallel host1 host2 host3

# Mixed environments
./scripts/vm/deploy-complete.sh --preset prod prod1 prod2 prod3

Custom SSH Configuration

# Custom SSH key and user
./scripts/vm/deploy-complete.sh -u admin -k ~/.ssh/custom_key -p 2222 remote-host

# SSH config file support
# Add to ~/.ssh/config:
# Host thrillwiki-prod
#   HostName 192.168.1.100
#   User ubuntu
#   IdentityFile ~/.ssh/thrillwiki_key
#   Port 22

./scripts/vm/deploy-complete.sh thrillwiki-prod

Environment-Specific Deployment

# Development environment
./scripts/vm/deploy-complete.sh --preset dev --debug dev-server

# Production environment with security
./scripts/vm/deploy-complete.sh --preset prod --token $GITHUB_TOKEN prod-server

# Testing environment with monitoring
./scripts/vm/deploy-complete.sh --preset testing test-server

🔧 Troubleshooting

Common Issues

SSH Connection Failed

# Test SSH connectivity
ssh -o ConnectTimeout=10 ubuntu@192.168.1.100 'echo "Connection test"'

# Check SSH key permissions
chmod 600 ~/.ssh/your_key
ssh-add ~/.ssh/your_key

# Verify host accessibility
ping 192.168.1.100

GitHub Authentication Issues

# Validate GitHub token
python3 scripts/vm/github-setup.py validate

# Test repository access
curl -H "Authorization: Bearer $GITHUB_TOKEN" \
     https://api.github.com/repos/your-username/thrillwiki

# Re-setup GitHub authentication
python3 scripts/vm/github-setup.py setup

Service Not Starting

# Check service status
ssh ubuntu@host 'sudo systemctl status thrillwiki-automation'

# View service logs
ssh ubuntu@host 'sudo journalctl -u thrillwiki-automation --since "1 hour ago"'

# Manual service restart
ssh ubuntu@host 'sudo systemctl restart thrillwiki-automation'

Deployment Validation Failed

# Check project files
ssh ubuntu@host 'ls -la /home/ubuntu/thrillwiki/scripts/vm/'

# Test automation script manually
ssh ubuntu@host 'cd /home/ubuntu/thrillwiki && bash scripts/vm/bulletproof-automation.sh --test'

# Verify GitHub access
ssh ubuntu@host 'cd /home/ubuntu/thrillwiki && python3 scripts/vm/github-setup.py validate'

Debug Mode

Enable detailed logging for troubleshooting:

# Enable debug mode
export COMPLETE_DEBUG=true
export DEPLOY_DEBUG=true

./scripts/vm/deploy-complete.sh --debug 192.168.1.100

Rollback Deployment

If deployment fails, automatic rollback is performed:

# Manual rollback (if needed)
ssh ubuntu@host 'sudo systemctl stop thrillwiki-automation'
ssh ubuntu@host 'sudo systemctl disable thrillwiki-automation'
ssh ubuntu@host 'rm -rf /home/ubuntu/thrillwiki'

📊 Monitoring and Maintenance

Health Monitoring

The deployed system includes comprehensive health monitoring:

  • Service Health: systemd monitors the automation service
  • Repository Health: Regular GitHub connectivity tests
  • Server Health: Django server monitoring and auto-restart
  • Resource Health: Memory and CPU monitoring
  • Log Health: Automatic log rotation and cleanup

Regular Maintenance

# Update automation system
ssh ubuntu@host 'cd /home/ubuntu/thrillwiki && git pull'
ssh ubuntu@host 'sudo systemctl restart thrillwiki-automation'

# View recent logs
ssh ubuntu@host 'sudo journalctl -u thrillwiki-automation --since "24 hours ago"'

# Check disk usage
ssh ubuntu@host 'df -h /home/ubuntu/thrillwiki'

# Rotate logs manually
ssh ubuntu@host 'cd /home/ubuntu/thrillwiki && find logs/ -name "*.log" -size +10M -exec mv {} {}.old \;'

Performance Tuning

# Adjust pull intervals for performance
./scripts/vm/deploy-complete.sh --pull-interval 600 192.168.1.100  # 10 minutes

# Monitor resource usage
ssh ubuntu@host 'top -p $(pgrep -f bulletproof-automation)'

# Check automation performance
ssh ubuntu@host 'tail -100 [AWS-SECRET-REMOVED]-automation.log | grep -E "(SUCCESS|ERROR)"'

🔒 Security Considerations

SSH Security

  • Use SSH keys instead of passwords
  • Restrict SSH access with firewall rules
  • Use non-standard SSH ports when possible
  • Regularly rotate SSH keys

GitHub Token Security

  • Use tokens with minimal required permissions
  • Set reasonable expiration dates
  • Store tokens securely with 600 permissions
  • Regularly rotate GitHub PATs

System Security

  • Keep remote systems updated
  • Use systemd security features
  • Monitor automation logs for suspicious activity
  • Restrict network access to automation services

📚 Integration Guide

CI/CD Integration

Integrate with your CI/CD pipeline:

# GitHub Actions example
- name: Deploy to Production
  run: |
    ./scripts/vm/deploy-complete.sh \
      --preset prod \
      --token ${{ secrets.GITHUB_TOKEN }} \
      --parallel \
      prod1.example.com prod2.example.com

# GitLab CI example
deploy_production:
  script:
    - ./scripts/vm/deploy-complete.sh --preset prod --token $GITHUB_TOKEN $PROD_SERVERS

Infrastructure as Code

Use with Terraform or similar tools:

# Terraform example
resource "null_resource" "thrillwiki_deployment" {
  provisioner "local-exec" {
    command = "./scripts/vm/deploy-complete.sh --preset prod ${aws_instance.app.public_ip}"
  }
  
  depends_on = [aws_instance.app]
}

🆘 Support

Getting Help

  1. Check the logs - Most issues are logged in detail
  2. Use debug mode - Enable debug logging for troubleshooting
  3. Test connectivity - Verify SSH and GitHub access
  4. Validate environment - Check dependencies and permissions

Log Locations

  • Local Deployment Logs: logs/deploy-complete.log, logs/remote-deploy.log
  • Remote Automation Logs: [AWS-SECRET-REMOVED]-automation.log
  • System Service Logs: journalctl -u thrillwiki-automation

Common Solutions

Issue Solution
SSH timeout Check network connectivity and SSH service
Permission denied Verify SSH key permissions and user access
GitHub API rate limit Configure GitHub PAT with proper scopes
Service won't start Check systemd service configuration and logs
Automation not pulling Verify GitHub access and repository permissions

🎉 Success!

Your ThrillWiki automation system is now deployed with:

  • Automatic repository pulls every 5 minutes
  • GitHub authentication configured
  • systemd service for reliability
  • Health monitoring and logging
  • Django server automation with UV

The system will automatically:

  1. Pull latest changes from your repository
  2. Run Django migrations when needed
  3. Update dependencies with UV
  4. Restart the Django server
  5. Monitor and recover from failures

Enjoy your fully automated ThrillWiki deployment! 🚀