Friday, August 1, 2025

[Netflix Tech Blog] Behind the Scenes of Pushy - Netflix’s Real-Time WebSocket Proxy (Part 1)

In today’s connected world, user experiences are expected to be instant, interactive, and deeply personalized. Whether it’s a mobile notification about a new episode, a voice command to your smart TV, or a prompt to resume your last show; the ability to push messages in real-time has become critical.

At Netflix’s scale, serving hundreds of millions of devices across the globe, building and maintaining a real-time communication layer isn’t just desirable, it’s mission-critical.

That’s where  Pushy comes in which is Netflix’s in-house WebSocket proxy, designed to reliably deliver low-latency, bi-directional communication between client devices (TVs, phones, browsers, game consoles) and backend services. Originally created to support use cases like voice search, remote control pairing, casting, and other interactive UI elements, Pushy quickly became a foundational service for Netflix.

But as the streaming giant continued evolving its product and infrastructure, cracks began to appear in Pushy’s original design and some of them are scalability, observability, and operational resilience needed a rethinking.

In this two-part breakdown series, I’ll walk you through:
  • What Pushy is and why Netflix built it
  • The original architecture and its role in powering interactive features
  • What worked well and where it started to show strain
  • And in Part 2: how Netflix rearchitected Pushy to meet the future head-on

Let’s start by understanding the need for Pushy in the first place.


๐Ÿ’ก Why Netflix Needed Pushy in the First Place

As Netflix expanded from being a content streaming service to an interactive entertainment platform, the demands on its device communication infrastructure grew rapidly. Simple HTTP-based polling or long-polling mechanisms were no longer sufficient for the level of interactivity users expected.

Here are the emerging use cases that demanded real-time, low-latency communication between Netflix devices and backend services:
  1. Voice Search & Navigation: Smart TVs and streaming devices increasingly support voice commands. Users expect their spoken input to be recognized, interpreted, and acted upon in real-time, requiring a persistent, fast channel between device and service.
  2. Second-Screen Experiences & Remote Control Pairing: Netflix enables pairing between mobile devices and TVs, allowing the phone to act as a remote or companion device. This requires devices to discover each other and exchange information instantly.
  3. Casting: When users cast Netflix content from their phones to another device (like Chromecast or a Smart TV), there’s a need for near-instant coordination between devices.
  4. Interactive UI Elements:  Think of countdowns, real-time error reporting, or UI nudges (e.g., "Resume from where you left off") , these need to be delivered to the client immediately and reliably.
To power these experiences, Netflix needed a solution that:
  • Could maintain persistent, bidirectional connections at scale.
  • Worked across millions of heterogeneous devices.
  • Offered low-latency delivery for time-sensitive events.
  • Could be centrally managed, observed, and extended to future use cases.
That solution was Pushy, a custom-built WebSocket proxy service that sits between client devices and backend systems, managing long-lived connections and ensuring seamless real-time communication.

Here is one example of alexa voice commands where pushy played a vital role:





๐Ÿ“ก Pushy’s Original Architecture — End-to-End Overview

Pushy was Netflix’s custom-built WebSocket proxy designed to maintain persistent, low-latency, bidirectional communication between millions of client devices and backend services.

At a high level, Pushy accepted WebSocket connections from clients (TVs, mobile devices, etc.), maintained their sessions, and routed messages to the appropriate backend services over gRPC. However, under the hood, it contained several sophisticated components that enabled it to operate at Netflix scale.

Here’s how it worked, step-by-step:

  1. Client Connection
    • A client (e.g., a smart TV) initiates a WebSocket connection to Pushy.
    • Pushy authenticates the client and establishes a session.
    • It stores routing metadata (e.g., region, app version, device ID) for the session.
  2. Session Registration in Push Registry
    • Pushy registers the client session in a Push Registry, which acts as an in-memory mapping of client sessions to backend service instances or logical partitions.
    • This registry is consulted for routing incoming and outgoing messages.
    • Initially, Netflix used Dynomite (a high-availability Redis wrapper) to back this registry.
  3. Message Ingestion via Queue & Processor
    • Client sends a message over the WebSocket connection.
    • The message is queued in an internal Message Queue to absorb bursts and prevent backpressure from backend systems.
  4. Processing via Message Processor
    • The Message Processor consumes messages from the queue.
    • It applies session-specific logic, enrichment, or transformation.
    • Messages are then serialized into gRPC-compatible formats.
  5. Routing to Backend Services
    • Based on the Push Registry’s mapping and routing rules from the Control Plane, the message is routed to the correct backend service over gRPC.
  6. Outbound Messaging
    • Backend services can also push messages to clients.
    • Pushy uses the same registry to identify the client session and deliver the message over the existing WebSocket connection.
๐Ÿ” Supporting Control Plane (External)

Pushy constantly syncs with an external Control Plane, which provides:

  • Dynamic routing rules
  • Backend service discovery
  • Deployment topology
  • Traffic shaping configs




๐Ÿ›ก️What is Dynomite

Dynomite is a plugin layer built by Netflix to transform existing, standalone storage engines like Redis or Memcached into a fully distributed, highly available, and multi‑data‑center system. Designed around concepts from Amazon’s Dynamo architecture, it enables:
  • Active‑active replication across data centers
  • Auto‑sharding, linear horizontal scalability
  • Configurable consistency levels, node‑warmup, and backup/restore support
Netflix has used Dynomite in production extensively, up to 1,000+ nodes, serving over a million operations per second at peak, with clusters reaching multiple terabytes in size.

A standard Redis or Memcached instance is a single‑server cache. Dynomite allows Netflix to:
  • Seamlessly scale to many servers and data centers
  • Replicate data synchronously across regions for high availability
  • Avoid a single point of failure while maintaining low latency and consistent behavior under heavy load.
There is also a management tool called Dynomite Manager that helps automate tasks like node replacement and bootstrap, especially useful when deploying via AWS Auto Scaling. 

๐Ÿ” Scalability and Auto-Panning

To deal with global scale, Pushy’s initial strategy was to scale out WebSocket servers horizontally across availability zones. But this wasn’t enough, some zones still ended up with more load due to device churn and reconnect storms.

Netflix introduced auto-panning, a strategy where incoming connections would be balanced based on current server load and connection counts, not just simple round-robin DNS-based routing.

This was a precursor to what would become a more intelligent, connection-aware load balancing strategy in the evolved architecture.


✅ Summary

Pushy wasn’t just a simple proxy but it was a session-aware, fault-tolerant, scalable message router for real-time device communication. By combining message queuing, smart session routing, and dynamic discovery, it helped Netflix deliver seamless user experiences across millions of connected devices.


⚠️ Challenges with the Original Pushy Architecture

While Pushy’s original design served Netflix well at first, it began to strain under the weight of scale, evolving use cases, and operational demands. As the number of connected devices grew into the millions, and as Pushy became central to real-time features across multiple Netflix applications, a range of technical and architectural issues surfaced.

Let’s break them down:

1. Frequent Reconnect Storms

Netflix devices tend to reconnect all at once:
  • Power outages, Wi-Fi changes, or Netflix app updates often trigger mass reconnects.
  • WebSocket servers had to handle a spike of TLS handshakes, registry updates, and message backlog replays and all within seconds.
This led to:
  • Thundering herd problems
  • Sudden CPU and memory pressure on specific nodes
  • Occasional partial outages during large-scale reconnects
2.  Push Registry Memory Pressure and Latency

The Push Registry, backed by Dynomite, became a bottleneck:
  • Every message delivery required low-latency lookups in the registry to identify active sessions.
  • With millions of devices, memory usage ballooned.
  • Dynomite’s replication overhead and Redis-like architecture couldn’t scale linearly, resulting in lookup slowdowns and stale data.
3. Limited Fault Isolation

A major issue was no strong isolation between customers or device types:
  • A bug or reconnection surge in one type of client (say, Android TVs) could affect Pushy performance globally.
  • Without sharding or tenant separation, a misbehaving device model could overwhelm shared infrastructure.
4. Insufficient Observability and Debuggability
  • The original message processor had limited metrics and retry intelligence.
  • Operators lacked visibility into where messages got stuck or which sessions were stale.
  • Diagnosing delivery failures required deep dives across multiple logs and systems.
This made it hard to:
  • Guarantee message delivery SLAs
  • Track end-to-end latency
  • Quickly respond to production issues
5. Network Topology Challenges

Since WebSocket servers needed to stay globally distributed and session-aware:
  • Load balancing at the TCP level was tricky (WebSockets aren’t natively friendly to CDN-style routing).
  • Auto-scaling had to preserve session stickiness.
  • Edge nodes required careful management of memory, state, and connection churn.
6. New Use Cases Were Outgrowing the Architecture

Pushy started with basic notifications, but:
  • Features like real-time device sync, voice input responsiveness and cross-device experiences demanded ultra-low latency and higher delivery guarantees.
  • The original system wasn’t optimized for delivery retry, QoS policies, or global message routing.
๐Ÿ”š Wrapping Up Part 1: The Rise and Limits of Pushy

In this first part, we explored why Netflix needed a solution like Pushy in the first place and how it enabled seamless, real-time interactivity across devices, from second-screen controls to voice-based commands. We broke down the architecture that powered this WebSocket-based infrastructure, including its registry, message queues, and processors.

Pushy served Netflix exceptionally well for years, supporting a growing range of use cases and hundreds of millions of concurrent connections. But as we saw, scale and diversity came with operational trade-offs. From resource fragmentation to uneven scaling, the cracks began to show, not because Pushy was inadequate, but because Netflix’s needs had fundamentally evolved.

In Part 2, we’ll dive into how Netflix re-architected Pushy to meet these new demands: the innovations they introduced, how they tackled each bottleneck systematically, and what benefits the evolved system now unlocks—including hints of what might come next.

No comments:

Post a Comment