mirror of
https://github.com/pacnpal/markov-discord.git
synced 2025-12-19 18:51:05 -05:00
Update file permissions and add documentation for LLM integration
This commit is contained in:
11
.snapshots/readme.md
Normal file
11
.snapshots/readme.md
Normal file
@@ -0,0 +1,11 @@
|
|||||||
|
# Snapshots Directory
|
||||||
|
|
||||||
|
This directory contains snapshots of your code for AI interactions. Each snapshot is a markdown file that includes relevant code context and project structure information.
|
||||||
|
|
||||||
|
## What's included in snapshots?
|
||||||
|
- Selected code files and their contents
|
||||||
|
- Project structure (if enabled)
|
||||||
|
- Your prompt/question for the AI
|
||||||
|
|
||||||
|
## Configuration
|
||||||
|
You can customize snapshot behavior in `config.json`.
|
||||||
44
.snapshots/sponsors.md
Normal file
44
.snapshots/sponsors.md
Normal file
@@ -0,0 +1,44 @@
|
|||||||
|
# Thank you for using Snapshots for AI
|
||||||
|
|
||||||
|
Thanks for using Snapshots for AI. We hope this tool has helped you solve a problem or two.
|
||||||
|
|
||||||
|
If you would like to support our work, please help us by considering the following offers and requests:
|
||||||
|
|
||||||
|
## Ways to Support
|
||||||
|
|
||||||
|
### Join the GBTI Network!!! 🙏🙏🙏
|
||||||
|
The GBTI Network is a community of developers who are passionate about open source and community-driven development. Members enjoy access to exclussive tools, resources, a private MineCraft server, a listing in our members directory, co-op opportunities and more.
|
||||||
|
|
||||||
|
- Support our work by becoming a [GBTI Network member](https://gbti.network/membership/).
|
||||||
|
|
||||||
|
### Try out BugHerd 🐛
|
||||||
|
BugHerd is a visual feedback and bug-tracking tool designed to streamline website development by enabling users to pin feedback directly onto web pages. This approach facilitates clear communication among clients, designers, developers, and project managers.
|
||||||
|
|
||||||
|
- Start your free trial with [BugHerd](https://partners.bugherd.com/55z6c8az8rvr) today.
|
||||||
|
|
||||||
|
### Hire Developers from Codeable 👥
|
||||||
|
Codeable connects you with top-tier professionals skilled in frameworks and technologies such as Laravel, React, Django, Node, Vue.js, Angular, Ruby on Rails, and Node.js. Don't let the WordPress focus discourage you. Codeable experts do it all.
|
||||||
|
|
||||||
|
- Visit [Codeable](https://www.codeable.io/developers/?ref=z8h3e) to hire your next team member.
|
||||||
|
|
||||||
|
### Lead positive reviews on our marketplace listing ⭐⭐⭐⭐⭐
|
||||||
|
- Rate us on [VSCode marketplace](https://marketplace.visualstudio.com/items?itemName=GBTI.snapshots-for-ai)
|
||||||
|
- Review us on [Cursor marketplace](https://open-vsx.org/extension/GBTI/snapshots-for-ai)
|
||||||
|
|
||||||
|
### Star Our GitHub Repository ⭐
|
||||||
|
- Star and watch our [repository](https://github.com/gbti-network/vscode-snapshots-for-ai)
|
||||||
|
|
||||||
|
### 📡 Stay Connected
|
||||||
|
Follow us on your favorite platforms for updates, news, and community discussions:
|
||||||
|
- **[Twitter/X](https://twitter.com/gbti_network)**
|
||||||
|
- **[GitHub](https://github.com/gbti-network)**
|
||||||
|
- **[YouTube](https://www.youtube.com/channel/UCh4FjB6r4oWQW-QFiwqv-UA)**
|
||||||
|
- **[Dev.to](https://dev.to/gbti)**
|
||||||
|
- **[Daily.dev](https://dly.to/zfCriM6JfRF)**
|
||||||
|
- **[Hashnode](https://gbti.hashnode.dev/)**
|
||||||
|
- **[Discord Community](https://gbti.network)**
|
||||||
|
- **[Reddit Community](https://www.reddit.com/r/GBTI_network)**
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
Thank you for supporting open source software! 🙏
|
||||||
116
cline_docs/activeContext.md
Executable file
116
cline_docs/activeContext.md
Executable file
@@ -0,0 +1,116 @@
|
|||||||
|
# Active Context
|
||||||
|
Last Updated: 2024-12-27
|
||||||
|
|
||||||
|
## Current Focus
|
||||||
|
Integrating LLM capabilities into the existing Discord bot while maintaining the unique "personality" of each server's Markov-based responses.
|
||||||
|
|
||||||
|
### Active Issues
|
||||||
|
1. Response Generation
|
||||||
|
- Need to implement hybrid Markov-LLM response system
|
||||||
|
- Must maintain response speed within acceptable limits
|
||||||
|
- Need to handle API rate limiting gracefully
|
||||||
|
|
||||||
|
2. Data Management
|
||||||
|
- Implement efficient storage for embeddings
|
||||||
|
- Design context window management
|
||||||
|
- Handle conversation threading
|
||||||
|
|
||||||
|
3. Integration Points
|
||||||
|
- Modify generateResponse function to support LLM
|
||||||
|
- Add embedding generation pipeline
|
||||||
|
- Implement context tracking
|
||||||
|
|
||||||
|
## Recent Changes
|
||||||
|
- Analyzed current codebase structure
|
||||||
|
- Identified integration points for LLM
|
||||||
|
- Documented system architecture
|
||||||
|
- Created implementation plan
|
||||||
|
|
||||||
|
## Active Files
|
||||||
|
|
||||||
|
### Core Implementation
|
||||||
|
- src/index.ts
|
||||||
|
- Main bot logic
|
||||||
|
- Message handling
|
||||||
|
- Command processing
|
||||||
|
|
||||||
|
- src/entity/
|
||||||
|
- Database schema
|
||||||
|
- Need to add embedding and context tables
|
||||||
|
|
||||||
|
- src/train.ts
|
||||||
|
- Training pipeline
|
||||||
|
- Need to add embedding generation
|
||||||
|
|
||||||
|
### New Files Needed
|
||||||
|
- src/llm/
|
||||||
|
- provider.ts (LLM service integration)
|
||||||
|
- embedding.ts (Embedding generation)
|
||||||
|
- context.ts (Context management)
|
||||||
|
|
||||||
|
- src/entity/
|
||||||
|
- MessageEmbedding.ts
|
||||||
|
- ConversationContext.ts
|
||||||
|
|
||||||
|
## Next Steps
|
||||||
|
|
||||||
|
### Immediate Tasks
|
||||||
|
1. Create database migrations
|
||||||
|
- Add embedding table
|
||||||
|
- Add context table
|
||||||
|
- Update existing message schema
|
||||||
|
|
||||||
|
2. Implement LLM integration
|
||||||
|
- Set up OpenAI client
|
||||||
|
- Create response generation service
|
||||||
|
- Add fallback mechanisms
|
||||||
|
|
||||||
|
3. Add embedding pipeline
|
||||||
|
- Implement background processing
|
||||||
|
- Set up batch operations
|
||||||
|
- Add storage management
|
||||||
|
|
||||||
|
### Short-term Goals
|
||||||
|
1. Test hybrid response system
|
||||||
|
- Benchmark response times
|
||||||
|
- Measure coherence
|
||||||
|
- Validate context usage
|
||||||
|
|
||||||
|
2. Optimize performance
|
||||||
|
- Implement caching
|
||||||
|
- Add rate limiting
|
||||||
|
- Tune batch sizes
|
||||||
|
|
||||||
|
3. Update documentation
|
||||||
|
- Add LLM configuration guide
|
||||||
|
- Update deployment instructions
|
||||||
|
- Document new commands
|
||||||
|
|
||||||
|
### Dependencies
|
||||||
|
- OpenAI API access
|
||||||
|
- Additional storage capacity
|
||||||
|
- Updated environment configuration
|
||||||
|
|
||||||
|
## Implementation Strategy
|
||||||
|
|
||||||
|
### Phase 1: Foundation
|
||||||
|
1. Database schema updates
|
||||||
|
2. Basic LLM integration
|
||||||
|
3. Simple context tracking
|
||||||
|
|
||||||
|
### Phase 2: Enhancement
|
||||||
|
1. Hybrid response system
|
||||||
|
2. Advanced context management
|
||||||
|
3. Performance optimization
|
||||||
|
|
||||||
|
### Phase 3: Refinement
|
||||||
|
1. User feedback integration
|
||||||
|
2. Response quality metrics
|
||||||
|
3. Fine-tuning capabilities
|
||||||
|
|
||||||
|
## Notes
|
||||||
|
- Keep existing Markov system as fallback
|
||||||
|
- Monitor API usage and costs
|
||||||
|
- Consider implementing local LLM option
|
||||||
|
- Need to update help documentation
|
||||||
|
- Consider adding configuration commands
|
||||||
50
cline_docs/productContext.md
Executable file
50
cline_docs/productContext.md
Executable file
@@ -0,0 +1,50 @@
|
|||||||
|
# Product Context
|
||||||
|
Last Updated: 2024-12-27
|
||||||
|
|
||||||
|
## Why we're building this
|
||||||
|
- To create an engaging Discord bot that learns from and interacts with server conversations
|
||||||
|
- To provide natural, contextually relevant responses using both Markov chains and LLM capabilities
|
||||||
|
- To maintain conversation history and generate responses that feel authentic to each server's culture
|
||||||
|
|
||||||
|
## Core user problems/solutions
|
||||||
|
Problems:
|
||||||
|
- Current Markov responses can be incoherent or lack context
|
||||||
|
- No semantic understanding of conversation context
|
||||||
|
- Limited ability to generate coherent long-form responses
|
||||||
|
|
||||||
|
Solutions:
|
||||||
|
- Integrate LLM to enhance response quality while maintaining server-specific voice
|
||||||
|
- Use existing message database for both Markov and LLM training
|
||||||
|
- Combine Markov's randomness with LLM's coherence
|
||||||
|
|
||||||
|
## Key workflows
|
||||||
|
1. Message Collection
|
||||||
|
- Listen to channels
|
||||||
|
- Store messages in SQLite
|
||||||
|
- Track message context and metadata
|
||||||
|
|
||||||
|
2. Response Generation
|
||||||
|
- Current: Markov chain generation
|
||||||
|
- Proposed: Hybrid Markov-LLM generation
|
||||||
|
- Context-aware responses
|
||||||
|
|
||||||
|
3. Training
|
||||||
|
- Batch processing of channel history
|
||||||
|
- JSON import support
|
||||||
|
- Continuous learning from new messages
|
||||||
|
|
||||||
|
## Product direction and priorities
|
||||||
|
1. Short term
|
||||||
|
- Implement LLM integration for response generation
|
||||||
|
- Maintain existing Markov functionality as fallback
|
||||||
|
- Add context window for more relevant responses
|
||||||
|
|
||||||
|
2. Medium term
|
||||||
|
- Fine-tune LLM on server-specific data
|
||||||
|
- Implement response quality metrics
|
||||||
|
- Add conversation memory
|
||||||
|
|
||||||
|
3. Long term
|
||||||
|
- Advanced context understanding
|
||||||
|
- Personality adaptation per server
|
||||||
|
- Multi-modal response capabilities
|
||||||
130
cline_docs/systemPatterns.md
Executable file
130
cline_docs/systemPatterns.md
Executable file
@@ -0,0 +1,130 @@
|
|||||||
|
# System Patterns
|
||||||
|
Last Updated: 2024-12-27
|
||||||
|
|
||||||
|
## High-level Architecture
|
||||||
|
|
||||||
|
### Current System
|
||||||
|
```
|
||||||
|
Discord Events -> Message Processing -> SQLite Storage
|
||||||
|
-> Markov Generation
|
||||||
|
```
|
||||||
|
|
||||||
|
### Proposed LLM Integration
|
||||||
|
```
|
||||||
|
Discord Events -> Message Processing -> SQLite Storage
|
||||||
|
-> Response Generator
|
||||||
|
├─ Markov Chain
|
||||||
|
├─ LLM
|
||||||
|
└─ Response Selector
|
||||||
|
```
|
||||||
|
|
||||||
|
## Core Technical Patterns
|
||||||
|
|
||||||
|
### Data Storage
|
||||||
|
- SQLite database using TypeORM
|
||||||
|
- Entity structure:
|
||||||
|
- Guild (server)
|
||||||
|
- Channel (per-server channels)
|
||||||
|
- Messages (training data)
|
||||||
|
|
||||||
|
### Message Processing
|
||||||
|
1. Current Flow:
|
||||||
|
- Message received
|
||||||
|
- Filtered for human authorship
|
||||||
|
- Stored in database with metadata
|
||||||
|
- Used for Markov chain training
|
||||||
|
|
||||||
|
2. Enhanced Flow:
|
||||||
|
- Add message embedding generation
|
||||||
|
- Store context window
|
||||||
|
- Track conversation threads
|
||||||
|
|
||||||
|
### Response Generation
|
||||||
|
|
||||||
|
#### Current (Markov)
|
||||||
|
```typescript
|
||||||
|
interface MarkovGenerateOptions {
|
||||||
|
filter: (result) => boolean;
|
||||||
|
maxTries: number;
|
||||||
|
startSeed?: string;
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Proposed (Hybrid)
|
||||||
|
```typescript
|
||||||
|
interface ResponseGenerateOptions {
|
||||||
|
contextWindow: Message[];
|
||||||
|
temperature: number;
|
||||||
|
maxTokens: number;
|
||||||
|
startSeed?: string;
|
||||||
|
forceProvider?: 'markov' | 'llm' | 'hybrid';
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Data Flow
|
||||||
|
|
||||||
|
### Training Pipeline
|
||||||
|
1. Message Collection
|
||||||
|
- Discord channel history
|
||||||
|
- JSON imports
|
||||||
|
- Real-time messages
|
||||||
|
|
||||||
|
2. Processing
|
||||||
|
- Text cleaning
|
||||||
|
- Metadata extraction
|
||||||
|
- Embedding generation
|
||||||
|
|
||||||
|
3. Storage
|
||||||
|
- Raw messages
|
||||||
|
- Processed embeddings
|
||||||
|
- Context relationships
|
||||||
|
|
||||||
|
### Response Pipeline
|
||||||
|
1. Context Gathering
|
||||||
|
- Recent messages
|
||||||
|
- Channel history
|
||||||
|
- User interaction history
|
||||||
|
|
||||||
|
2. Generation Strategy
|
||||||
|
- Short responses: Markov chain
|
||||||
|
- Complex responses: LLM
|
||||||
|
- Hybrid: LLM-guided Markov chain
|
||||||
|
|
||||||
|
3. Post-processing
|
||||||
|
- Response filtering
|
||||||
|
- Token limit enforcement
|
||||||
|
- Attachment handling
|
||||||
|
|
||||||
|
## Key Technical Decisions
|
||||||
|
|
||||||
|
### LLM Integration
|
||||||
|
1. Local Embedding Model
|
||||||
|
- Use sentence-transformers for message embedding
|
||||||
|
- Store embeddings in SQLite
|
||||||
|
- Enable semantic search
|
||||||
|
|
||||||
|
2. Response Generation
|
||||||
|
- Primary: Use OpenAI API
|
||||||
|
- Fallback: Use local LLM
|
||||||
|
- Hybrid: Combine with Markov output
|
||||||
|
|
||||||
|
3. Context Management
|
||||||
|
- Rolling window of recent messages
|
||||||
|
- Semantic clustering of related content
|
||||||
|
- Thread-aware context tracking
|
||||||
|
|
||||||
|
### Performance Requirements
|
||||||
|
1. Response Time
|
||||||
|
- Markov: < 100ms
|
||||||
|
- LLM: < 2000ms
|
||||||
|
- Hybrid: < 2500ms
|
||||||
|
|
||||||
|
2. Memory Usage
|
||||||
|
- Max 1GB per guild
|
||||||
|
- Batch processing for large imports
|
||||||
|
- Regular cleanup of old contexts
|
||||||
|
|
||||||
|
3. Rate Limiting
|
||||||
|
- Discord API compliance
|
||||||
|
- LLM API quota management
|
||||||
|
- Fallback mechanisms
|
||||||
189
cline_docs/techContext.md
Executable file
189
cline_docs/techContext.md
Executable file
@@ -0,0 +1,189 @@
|
|||||||
|
# Technical Context
|
||||||
|
Last Updated: 2024-12-27
|
||||||
|
|
||||||
|
## Core Technologies
|
||||||
|
|
||||||
|
### Current Stack
|
||||||
|
- Node.js/TypeScript
|
||||||
|
- Discord.js for Discord API
|
||||||
|
- TypeORM for database management
|
||||||
|
- SQLite for data storage
|
||||||
|
- markov-strings-db for response generation
|
||||||
|
|
||||||
|
### LLM Integration Stack
|
||||||
|
- OpenAI API for primary LLM capabilities
|
||||||
|
- sentence-transformers for embeddings
|
||||||
|
- Vector extensions for SQLite
|
||||||
|
- Redis (optional) for context caching
|
||||||
|
|
||||||
|
## Integration Patterns
|
||||||
|
|
||||||
|
### Database Schema Extensions
|
||||||
|
```sql
|
||||||
|
-- New tables for LLM integration
|
||||||
|
|
||||||
|
-- Store message embeddings
|
||||||
|
CREATE TABLE message_embeddings (
|
||||||
|
id TEXT PRIMARY KEY,
|
||||||
|
message_id TEXT NOT NULL,
|
||||||
|
embedding BLOB NOT NULL,
|
||||||
|
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
|
||||||
|
FOREIGN KEY (message_id) REFERENCES messages(id)
|
||||||
|
);
|
||||||
|
|
||||||
|
-- Store conversation contexts
|
||||||
|
CREATE TABLE conversation_contexts (
|
||||||
|
id TEXT PRIMARY KEY,
|
||||||
|
channel_id TEXT NOT NULL,
|
||||||
|
context_window TEXT NOT NULL,
|
||||||
|
last_updated TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
|
||||||
|
FOREIGN KEY (channel_id) REFERENCES channels(id)
|
||||||
|
);
|
||||||
|
```
|
||||||
|
|
||||||
|
### API Integration
|
||||||
|
```typescript
|
||||||
|
interface LLMConfig {
|
||||||
|
provider: 'openai' | 'local';
|
||||||
|
model: string;
|
||||||
|
apiKey?: string;
|
||||||
|
maxTokens: number;
|
||||||
|
temperature: number;
|
||||||
|
contextWindow: number;
|
||||||
|
}
|
||||||
|
|
||||||
|
interface ResponseGenerator {
|
||||||
|
generateResponse(options: {
|
||||||
|
prompt: string;
|
||||||
|
context: Message[];
|
||||||
|
guildId: string;
|
||||||
|
channelId: string;
|
||||||
|
}): Promise<string>;
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Message Processing Pipeline
|
||||||
|
```typescript
|
||||||
|
interface MessageProcessor {
|
||||||
|
processMessage(message: Discord.Message): Promise<void>;
|
||||||
|
generateEmbedding(text: string): Promise<Float32Array>;
|
||||||
|
updateContext(channelId: string, message: Discord.Message): Promise<void>;
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Key Libraries/Frameworks
|
||||||
|
|
||||||
|
### Current Dependencies
|
||||||
|
- discord.js: ^14.x
|
||||||
|
- typeorm: ^0.x
|
||||||
|
- markov-strings-db: Custom fork
|
||||||
|
- sqlite3: ^5.x
|
||||||
|
|
||||||
|
### New Dependencies
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"dependencies": {
|
||||||
|
"@openai/api": "^4.x",
|
||||||
|
"onnxruntime-node": "^1.x",
|
||||||
|
"sentence-transformers": "^2.x",
|
||||||
|
"sqlite-vss": "^0.1.x",
|
||||||
|
"redis": "^4.x"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Infrastructure Choices
|
||||||
|
|
||||||
|
### Deployment
|
||||||
|
- Continue with current deployment pattern
|
||||||
|
- Add environment variables for LLM configuration
|
||||||
|
- Optional Redis for high-traffic servers
|
||||||
|
|
||||||
|
### Scaling Considerations
|
||||||
|
1. Message Processing
|
||||||
|
- Batch embedding generation
|
||||||
|
- Background processing queue
|
||||||
|
- Rate limiting for API calls
|
||||||
|
|
||||||
|
2. Response Generation
|
||||||
|
- Caching frequent responses
|
||||||
|
- Fallback to Markov when rate limited
|
||||||
|
- Load balancing between providers
|
||||||
|
|
||||||
|
3. Storage
|
||||||
|
- Regular embedding pruning
|
||||||
|
- Context window management
|
||||||
|
- Backup strategy for embeddings
|
||||||
|
|
||||||
|
## Technical Constraints
|
||||||
|
|
||||||
|
### API Limitations
|
||||||
|
1. OpenAI
|
||||||
|
- Rate limits
|
||||||
|
- Token quotas
|
||||||
|
- Cost considerations
|
||||||
|
|
||||||
|
2. Discord
|
||||||
|
- Message rate limits
|
||||||
|
- Response time requirements
|
||||||
|
- Attachment handling
|
||||||
|
|
||||||
|
### Resource Usage
|
||||||
|
1. Memory
|
||||||
|
- Embedding model size
|
||||||
|
- Context window storage
|
||||||
|
- Cache management
|
||||||
|
|
||||||
|
2. Storage
|
||||||
|
- Embedding data size
|
||||||
|
- Context history retention
|
||||||
|
- Backup requirements
|
||||||
|
|
||||||
|
3. Processing
|
||||||
|
- Embedding generation load
|
||||||
|
- Response generation time
|
||||||
|
- Background task management
|
||||||
|
|
||||||
|
## Development Environment
|
||||||
|
|
||||||
|
### Setup Requirements
|
||||||
|
```bash
|
||||||
|
# Core dependencies
|
||||||
|
npm install
|
||||||
|
|
||||||
|
# LLM integration
|
||||||
|
npm install @openai/api onnxruntime-node sentence-transformers sqlite-vss
|
||||||
|
|
||||||
|
# Optional caching
|
||||||
|
npm install redis
|
||||||
|
```
|
||||||
|
|
||||||
|
### Environment Variables
|
||||||
|
```env
|
||||||
|
# LLM Configuration
|
||||||
|
OPENAI_API_KEY=sk-...
|
||||||
|
LLM_PROVIDER=openai
|
||||||
|
LLM_MODEL=gpt-3.5-turbo
|
||||||
|
LLM_MAX_TOKENS=150
|
||||||
|
LLM_TEMPERATURE=0.7
|
||||||
|
CONTEXT_WINDOW_SIZE=10
|
||||||
|
|
||||||
|
# Optional Redis
|
||||||
|
REDIS_URL=redis://localhost:6379
|
||||||
|
```
|
||||||
|
|
||||||
|
### Testing Strategy
|
||||||
|
1. Unit Tests
|
||||||
|
- Message processing
|
||||||
|
- Embedding generation
|
||||||
|
- Context management
|
||||||
|
|
||||||
|
2. Integration Tests
|
||||||
|
- LLM API interaction
|
||||||
|
- Database operations
|
||||||
|
- Discord event handling
|
||||||
|
|
||||||
|
3. Performance Tests
|
||||||
|
- Response time benchmarks
|
||||||
|
- Memory usage monitoring
|
||||||
|
- Rate limit compliance
|
||||||
0
database.sqlite
Executable file
0
database.sqlite
Executable file
Reference in New Issue
Block a user