Web Development

WebSockets in Production: Scaling Nightmares and Solutions

December 30, 2024 3 min read By Amey Lokare

🎯 The Challenge

I built a real-time chat system with WebSockets. At 100 users, it worked perfectly. Messages were instant, everything was smooth.

Then we hit 1,000 users. Then 10,000. Everything broke.

The problems: Connection drops, memory leaks, message delays, server crashes. Here's how I fixed them.

💥 Problems I Faced

1. Single Server Bottleneck

All connections were on one server. At 5,000 connections, the server couldn't handle it.

2. Memory Leaks

Connections weren't being cleaned up properly. Memory usage grew until the server crashed.

3. Message Broadcasting

Broadcasting to 10,000 connections was slow. Messages were delayed by seconds.

4. Connection Drops

Connections were dropping randomly. Users had to reconnect constantly.

✅ Solutions That Worked

1. Horizontal Scaling with Redis

Used Redis pub/sub to share connections across multiple servers:

// Laravel Broadcasting with Redis
// config/broadcasting.php
'connections' => [
    'redis' => [
        'driver' => 'redis',
        'connection' => 'default',
    ],
],

// Broadcast events
broadcast(new MessageSent($message))->toOthers();

Impact: Could scale to multiple servers. Each server handled a subset of connections.

2. Connection Pooling

Used Laravel Echo Server with connection pooling:

{
  "authHost": "https://ameylokare.com",
  "authEndpoint": "/broadcasting/auth",
  "clients": [],
  "database": "redis",
  "databaseConfig": {
    "redis": {
      "host": "127.0.0.1",
      "port": 6379
    }
  },
  "devMode": false,
  "host": "0.0.0.0",
  "port": 6001,
  "protocol": "http",
  "socketio": {
    "transports": ["websocket", "polling"]
  }
}

3. Message Queuing

Queued message broadcasting to prevent blocking:

// Queue broadcasting
class BroadcastMessage implements ShouldQueue
{
    public function handle()
    {
        broadcast(new MessageSent($this->message));
    }
}

// Dispatch to queue
BroadcastMessage::dispatch($message);

Impact: Non-blocking broadcasts. Server stayed responsive.

4. Connection Management

Properly cleaned up connections:

// Clean up on disconnect
public function disconnect($connectionId)
{
    // Remove from channels
    $this->channels->remove($connectionId);
    
    // Clean up memory
    unset($this->connections[$connectionId]);
    
    // Log disconnect
    Log::info("Connection {$connectionId} disconnected");
}

5. Load Balancing

Used sticky sessions for WebSocket connections:

# Nginx sticky sessions
upstream websocket {
    ip_hash;  # Sticky sessions
    server 127.0.0.1:6001;
    server 127.0.0.1:6002;
    server 127.0.0.1:6003;
}

server {
    location /socket.io {
        proxy_pass http://websocket;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
    }
}

📊 Performance Improvements

Metric Before After Improvement
Max Connections 5,000 50,000+ 10x
Message Latency 2-5 seconds < 100ms 20-50x faster
Memory Usage Growing Stable Fixed leak
Connection Drops Frequent Rare Much better

💡 Key Lessons

  • Horizontal scaling is essential: Can't scale vertically forever
  • Redis pub/sub enables multi-server: Critical for scaling
  • Queue everything: Non-blocking operations keep servers responsive
  • Connection management matters: Clean up properly or face memory leaks
  • Load balancing needs sticky sessions: WebSockets require persistent connections

🎯 Architecture

Final architecture:

Clients → Load Balancer (Nginx) → Laravel Echo Servers (3x) → Redis Pub/Sub → Database

This architecture scales horizontally. Add more Echo servers as needed.

💡 Key Takeaways

  • WebSockets need horizontal scaling for production
  • Redis pub/sub is essential for multi-server setups
  • Queue message broadcasting to prevent blocking
  • Proper connection cleanup prevents memory leaks
  • Load balancing requires sticky sessions

Scaling WebSockets is challenging, but with the right architecture, it's doable. The key is horizontal scaling with Redis and proper connection management.

Comments

Leave a Comment

Related Posts