mirror of
https://github.com/pacnpal/markov-discord.git
synced 2025-12-20 03:01:04 -05:00
- Added `optimization-plan.md` detailing strategies to reduce response latency and improve training throughput. - Enhanced performance analysis in `performance-analysis.md` with identified bottlenecks and completed optimizations. - Created `productContext.md` summarizing project goals, user scenarios, and implementation priorities. - Developed `markov-store.ts` for high-performance serialized chain storage with alias method sampling. - Implemented database performance indexes in `1704067200000-AddPerformanceIndexes.ts`. - Introduced `markov-worker.ts` for handling CPU-intensive operations in separate threads. - Established a worker pool in `worker-pool.ts` to manage multiple worker threads efficiently.
84 lines
4.4 KiB
Markdown
84 lines
4.4 KiB
Markdown
# [MEMORY BANK: ACTIVE] Optimization Plan - Further Performance Work
|
|
|
|
Date: 2025-09-25
|
|
Purpose: Reduce response latency and improve training throughput beyond existing optimizations.
|
|
Context: builds on [`memory-bank/performance-analysis.md`](memory-bank/performance-analysis.md:1) and implemented changes in [`src/train.ts`](src/train.ts:1) and [`src/index.ts`](src/index.ts:1).
|
|
|
|
Goals:
|
|
- Target: end-to-end response generation < 500ms for typical queries.
|
|
- Training throughput: process 1M messages/hour on dev hardware.
|
|
- Memory: keep max heap < 2GB during training on 16GB host.
|
|
|
|
Measurement & Profiling (first actions)
|
|
1. Capture baseline metrics:
|
|
- Run workload A (100k messages) and record CPU, memory, latency histograms.
|
|
- Tools: Node clinic/Flame, --prof, and pprof.
|
|
2. Add short-term tracing: export traces for top code paths in [`src/index.ts`](src/index.ts:1) and [`src/train.ts`](src/train.ts:1).
|
|
3. Create benchmark scripts: `bench/trace.sh` and `bench/load_test.ts` (synthetic).
|
|
|
|
High Priority (implement immediately)
|
|
1. Persist precomputed Markov chains per channel/guild:
|
|
- Add a serialized chain store: `src/markov-store.ts` (new).
|
|
- On training, update chain incrementally instead of rebuilding.
|
|
- Benefit: response generation becomes O(1) for chain lookup.
|
|
2. Use optimized sampling structures (Alias method):
|
|
- Replace repeated weighted selection with alias tables built per prefix.
|
|
- File changes: [`src/index.ts`](src/index.ts:1), [`src/markov-store.ts`](src/markov-store.ts:1).
|
|
3. Offload CPU-bound work to Worker Threads:
|
|
- Move chain-building and heavy sampling into Node `worker_threads`.
|
|
- Add a worker pool (4 threads default) with backpressure.
|
|
- Files: [`src/train.ts`](src/train.ts:1), [`src/workers/markov-worker.ts`](src/workers/markov-worker.ts:1).
|
|
4. Use in-memory LRU cache for active chains:
|
|
- Keep hot channels' chains in RAM; evict least-recently-used.
|
|
- Implement TTL and memory cap.
|
|
|
|
Medium Priority
|
|
1. Optimize SQLite for runtime:
|
|
- Use WAL mode and PRAGMA journal_mode = WAL; set synchronous = NORMAL.
|
|
- Use prepared statements and transactions for bulk writes.
|
|
- Temporarily disable non-essential indexes during major bulk imports.
|
|
- File: [`src/migration/1704067200000-AddPerformanceIndexes.ts`](src/migration/1704067200000-AddPerformanceIndexes.ts:1).
|
|
2. Move heavy random-access data into a K/V store:
|
|
- Consider LevelDB/LMDB or RocksDB for prefix->suffix lists for faster reads.
|
|
3. Incremental training API:
|
|
- Add an HTTP or IPC to submit new messages and update chain incrementally.
|
|
|
|
Low Priority / Long term
|
|
1. Reimplement core hot loops in Rust via Neon or FFI for max throughput.
|
|
2. Shard storage by guild and run independent workers per shard.
|
|
3. Replace SQLite with a server DB (Postgres) only if concurrency demands it.
|
|
|
|
Implementation steps (concrete)
|
|
1. Add profiling scripts + run baseline (1-2 days).
|
|
2. Implement `src/markov-store.ts` with serialization and alias table builder (1-2 days).
|
|
3. Wire worker pool and move chain building into workers (1-2 days).
|
|
4. Add LRU cache around store and integrate with response path (0.5-1 day).
|
|
5. Apply SQLite runtime tuning and test bulk import patterns (0.5 day).
|
|
6. Add metrics & dashboards (Prometheus + Grafana or simple histograms) (1 day).
|
|
7. Run load tests and iterate on bottlenecks (1-3 days).
|
|
|
|
Benchmarks to run
|
|
- Baseline: 100k messages, measure 95th percentile response latency.
|
|
- After chain-store: expect >5x faster generation.
|
|
- After workers + alias: expect ~10x faster generation in CPU-heavy scenarios.
|
|
|
|
Rollout & Validation
|
|
- Feature-flag new chain-store and worker pool behind config toggles in [`config/config.json`](config/config.json:1).
|
|
- Canary rollout to single guild for 24h with load test traffic.
|
|
- Compare metrics and only enable globally after verifying thresholds.
|
|
|
|
Observability & Metrics
|
|
- Instrument: response latency histogram, chain-build time, cache hit ratio, DB query durations.
|
|
- Log slow queries > 50ms with context.
|
|
- Add alerts for cache thrashing and worker queue saturation.
|
|
|
|
Risks & Mitigations
|
|
- Serialization format changes: include versioning and migration utilities.
|
|
- Worker crashes: add supervisor and restart/backoff.
|
|
- Memory blowup from caching: enforce strict memory caps and stats.
|
|
|
|
Next actions for Code mode
|
|
- Create `src/markov-store.ts`, `src/workers/markov-worker.ts`, add bench scripts, and update `config/config.json` toggles.
|
|
- I will implement the highest-priority changes in Code mode when you approve.
|
|
|
|
End. |