WebSockets in Production: Scaling Nightmares and Solutions
🎯 The Challenge
I built a real-time chat system with WebSockets. At 100 users, it worked perfectly. Messages were instant, everything was smooth.
Then we hit 1,000 users. Then 10,000. Everything broke.
The problems: Connection drops, memory leaks, message delays, server crashes. Here's how I fixed them.
💥 Problems I Faced
1. Single Server Bottleneck
All connections were on one server. At 5,000 connections, the server couldn't handle it.
2. Memory Leaks
Connections weren't being cleaned up properly. Memory usage grew until the server crashed.
3. Message Broadcasting
Broadcasting to 10,000 connections was slow. Messages were delayed by seconds.
4. Connection Drops
Connections were dropping randomly. Users had to reconnect constantly.
✅ Solutions That Worked
1. Horizontal Scaling with Redis
Used Redis pub/sub to share connections across multiple servers:
// Laravel Broadcasting with Redis
// config/broadcasting.php
'connections' => [
'redis' => [
'driver' => 'redis',
'connection' => 'default',
],
],
// Broadcast events
broadcast(new MessageSent($message))->toOthers();
Impact: Could scale to multiple servers. Each server handled a subset of connections.
2. Connection Pooling
Used Laravel Echo Server with connection pooling:
{
"authHost": "https://ameylokare.com",
"authEndpoint": "/broadcasting/auth",
"clients": [],
"database": "redis",
"databaseConfig": {
"redis": {
"host": "127.0.0.1",
"port": 6379
}
},
"devMode": false,
"host": "0.0.0.0",
"port": 6001,
"protocol": "http",
"socketio": {
"transports": ["websocket", "polling"]
}
}
3. Message Queuing
Queued message broadcasting to prevent blocking:
// Queue broadcasting
class BroadcastMessage implements ShouldQueue
{
public function handle()
{
broadcast(new MessageSent($this->message));
}
}
// Dispatch to queue
BroadcastMessage::dispatch($message);
Impact: Non-blocking broadcasts. Server stayed responsive.
4. Connection Management
Properly cleaned up connections:
// Clean up on disconnect
public function disconnect($connectionId)
{
// Remove from channels
$this->channels->remove($connectionId);
// Clean up memory
unset($this->connections[$connectionId]);
// Log disconnect
Log::info("Connection {$connectionId} disconnected");
}
5. Load Balancing
Used sticky sessions for WebSocket connections:
# Nginx sticky sessions
upstream websocket {
ip_hash; # Sticky sessions
server 127.0.0.1:6001;
server 127.0.0.1:6002;
server 127.0.0.1:6003;
}
server {
location /socket.io {
proxy_pass http://websocket;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
}
}
📊 Performance Improvements
| Metric | Before | After | Improvement |
|---|---|---|---|
| Max Connections | 5,000 | 50,000+ | 10x |
| Message Latency | 2-5 seconds | < 100ms | 20-50x faster |
| Memory Usage | Growing | Stable | Fixed leak |
| Connection Drops | Frequent | Rare | Much better |
💡 Key Lessons
- Horizontal scaling is essential: Can't scale vertically forever
- Redis pub/sub enables multi-server: Critical for scaling
- Queue everything: Non-blocking operations keep servers responsive
- Connection management matters: Clean up properly or face memory leaks
- Load balancing needs sticky sessions: WebSockets require persistent connections
🎯 Architecture
Final architecture:
Clients → Load Balancer (Nginx) → Laravel Echo Servers (3x) → Redis Pub/Sub → Database
This architecture scales horizontally. Add more Echo servers as needed.
💡 Key Takeaways
- WebSockets need horizontal scaling for production
- Redis pub/sub is essential for multi-server setups
- Queue message broadcasting to prevent blocking
- Proper connection cleanup prevents memory leaks
- Load balancing requires sticky sessions
Scaling WebSockets is challenging, but with the right architecture, it's doable. The key is horizontal scaling with Redis and proper connection management.