Production Deployment
Deploy ThePopeBot to production with Docker, implement security hardening, set up monitoring and logging, and learn scaling strategies.
Docker Production Config
While the development setup uses npm run dev, production deployments should use Docker for consistency, isolation, and reproducibility.
Production Dockerfile
Create a multi-stage Dockerfile for optimized builds:
# Stage 1: Build
FROM node:18-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY . .
RUN npm run build
# Stage 2: Production
FROM node:18-alpine AS production
WORKDIR /app
# Security: run as non-root user
RUN addgroup -S appgroup && adduser -S appuser -G appgroup
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
COPY --from=builder /app/package.json ./
# Health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
CMD wget --no-verbose --tries=1 --spider http://localhost:3000/health || exit 1
USER appuser
EXPOSE 3000
CMD ["node", "dist/index.js"]
Docker Compose for Production
version: "3.8"
services:
thepopebot:
build:
context: .
dockerfile: Dockerfile
target: production
ports:
- "3000:3000"
environment:
NODE_ENV: production
LOG_LEVEL: info
LOG_FORMAT: json
env_file:
- .env.production
restart: unless-stopped
deploy:
resources:
limits:
cpus: "2.0"
memory: 2G
reservations:
cpus: "0.5"
memory: 512M
networks:
- app-network
redis:
image: redis:7-alpine
volumes:
- redis-data:/data
restart: unless-stopped
networks:
- app-network
networks:
app-network:
driver: bridge
volumes:
redis-data:
Building and Running
# Build the production image
docker compose build
# Start in detached mode
docker compose up -d
# View logs
docker compose logs -f thepopebot
# Check health
docker compose ps
Security Hardening
Running an AI agent in production requires careful attention to security.
API Key Protection
Never expose API keys in logs, responses, or error messages:
// Middleware to strip sensitive data from responses
app.use((req, res, next) => {
const originalJson = res.json.bind(res);
res.json = (data: any) => {
return originalJson(sanitizeResponse(data));
};
next();
});
function sanitizeResponse(data: any): any {
const sensitive = ['apiKey', 'token', 'secret', 'password'];
if (typeof data === 'object' && data !== null) {
for (const key of Object.keys(data)) {
if (sensitive.some(s => key.toLowerCase().includes(s))) {
data[key] = '[REDACTED]';
} else if (typeof data[key] === 'object') {
data[key] = sanitizeResponse(data[key]);
}
}
}
return data;
}
Input Validation and Sanitization
Validate all incoming requests before they reach the agent:
const inputSchema = z.object({
agent: z.string().max(50).regex(/^[a-zA-Z0-9-_]+$/),
message: z.string().max(10000).trim(),
metadata: z.record(z.string()).optional(),
});
app.post('/api/chat', (req, res) => {
const result = inputSchema.safeParse(req.body);
if (!result.success) {
return res.status(400).json({ error: 'Invalid input', details: result.error });
}
// Process validated input
});
Rate Limiting
Protect against abuse with multi-tier rate limiting:
import rateLimit from 'express-rate-limit';
// Global rate limit
const globalLimiter = rateLimit({
windowMs: 60 * 1000, // 1 minute
max: 100, // 100 requests per minute globally
standardHeaders: true,
});
// Per-user rate limit
const userLimiter = rateLimit({
windowMs: 60 * 1000,
max: 20, // 20 requests per user per minute
keyGenerator: (req) => req.user?.id || req.ip,
});
app.use('/api/', globalLimiter);
app.use('/api/chat', userLimiter);
Tool Execution Sandboxing
Restrict what tools can do in production:
security:
tools:
filesystem:
allowedPaths:
- "/app/workspace"
- "/tmp/agent-work"
blockedPatterns:
- "**/.env*"
- "**/node_modules/**"
- "**/*.key"
maxFileSize: 10485760 # 10MB
git:
allowedOperations:
- clone
- diff
- log
blockedOperations:
- push
- force-push
- reset
Monitoring and Logging
Production systems need observability to detect issues early and respond quickly.
Structured Logging
Use structured JSON logging for machine-readable output:
import pino from 'pino';
const logger = pino({
level: process.env.LOG_LEVEL || 'info',
formatters: {
level: (label) => ({ level: label }),
},
serializers: {
req: pino.stdSerializers.req,
res: pino.stdSerializers.res,
err: pino.stdSerializers.err,
},
});
// Log agent events with context
logger.info({
event: 'agent_request',
agent: 'coder',
userId: 'user-123',
channel: 'telegram',
tokenUsage: 1523,
duration: 4200,
}, 'Agent request completed');
Health Check Endpoint
Implement a comprehensive health check:
app.get('/health', async (req, res) => {
const checks = {
status: 'ok',
timestamp: new Date().toISOString(),
uptime: process.uptime(),
services: {
redis: await checkRedis(),
llm: await checkLLMProvider(),
telegram: await checkTelegram(),
},
resources: {
memory: process.memoryUsage(),
cpu: process.cpuUsage(),
},
};
const allHealthy = Object.values(checks.services)
.every(s => s.status === 'ok');
res.status(allHealthy ? 200 : 503).json(checks);
});
Metrics Collection
Track key metrics for performance monitoring:
const metrics = {
requestCount: new Counter('requests_total', 'Total requests'),
requestDuration: new Histogram('request_duration_ms', 'Request duration'),
tokenUsage: new Counter('tokens_total', 'Total LLM tokens used'),
toolCalls: new Counter('tool_calls_total', 'Total tool calls'),
errorCount: new Counter('errors_total', 'Total errors'),
activeAgents: new Gauge('active_agents', 'Currently active agents'),
};
// Expose metrics endpoint for Prometheus
app.get('/metrics', async (req, res) => {
res.set('Content-Type', 'text/plain');
res.send(await register.metrics());
});
Alerting Rules
Set up alerts for critical conditions:
alerts:
high-error-rate:
condition: "error_rate > 5%"
window: "5m"
severity: critical
notify: ["slack", "pagerduty"]
high-latency:
condition: "p95_latency > 10s"
window: "10m"
severity: warning
notify: ["slack"]
token-budget-exceeded:
condition: "daily_tokens > 900000"
window: "1d"
severity: warning
notify: ["email"]
Scaling Strategies
As usage grows, you need strategies to scale your agent infrastructure.
Horizontal Scaling
Run multiple instances behind a load balancer:
# docker-compose.scale.yml
services:
thepopebot:
deploy:
replicas: 3
update_config:
parallelism: 1
delay: 30s
order: start-first
nginx:
image: nginx:alpine
ports:
- "80:80"
volumes:
- ./nginx.conf:/etc/nginx/nginx.conf
depends_on:
- thepopebot
Queue-Based Architecture
For high-throughput scenarios, use a message queue to decouple request ingestion from processing:
User Request βββΆ API Server βββΆ Message Queue βββΆ Worker Pool βββΆ Response
β
ββββββββΌβββββββ
βΌ βΌ βΌ
Worker Worker Worker
// Producer: API server enqueues requests
await queue.add('agent-task', {
agent: 'coder',
message: userMessage,
userId: user.id,
priority: calculatePriority(user),
});
// Consumer: Workers process from the queue
queue.process('agent-task', async (job) => {
const result = await engine.run(job.data);
await notifyUser(job.data.userId, result);
});
Caching Strategy
Cache frequently used data to reduce LLM calls:
const cache = {
// Cache tool results for identical inputs
toolResults: new LRUCache({ max: 1000, ttl: 300000 }),
// Cache agent context assembly
contextCache: new LRUCache({ max: 100, ttl: 60000 }),
// Cache common LLM responses
responseCache: new LRUCache({ max: 500, ttl: 600000 }),
};
Best Practices Checklist
Before going to production, verify the following:
Security
- All API keys are stored in environment variables, not in code
- Input validation is implemented on all endpoints
- Rate limiting is configured for all public endpoints
- Tool execution is sandboxed with explicit permission lists
- HTTPS is enforced for all external communication
- Authentication is required for the Web UI and API
- Webhook signatures are validated
Reliability
- Health check endpoint is implemented and monitored
- Graceful shutdown handles in-flight requests
- Retry logic is implemented for transient failures
- Circuit breakers protect against cascading failures
- Resource limits are set for CPU, memory, and tokens
Observability
- Structured logging is configured with appropriate log levels
- Metrics are collected and exposed for monitoring
- Alerts are set up for critical conditions
- Request tracing allows debugging individual requests
- Token usage is tracked and budgeted
Operations
- Docker images are built with multi-stage builds
- Container runs as non-root user
- Automated backups are configured for persistent data
- Rolling update strategy is defined
- Rollback procedure is documented and tested
- Scaling thresholds are defined and automated
Completing this checklist means your ThePopeBot deployment is production-ready. Congratulations on making it through all 7 days of the tutorial!