WebSockets Explained: A Journey into Real-Time Communication


WebSocket is a two-way, full-duplex protocol utilized in client-server communication contexts. As a stateful protocol, the connection between the client and server remains active until terminated by either party. Once the connection is closed by either the client or server, the connection ends on both sides.

Sample webSocket URI

A WebSocket URI has the following format:

ws://hostname:port/path?query

For secure WebSocket connections, the scheme would be wss:

wss://hostname:port/path?query

Here’s an example of a WebSocket URI:

ws://example.com:8000/chat

In this example, a WebSocket connection is being established to the /chat endpoint on example.com on port 8000.

Client and server communication

websockets

WebSockets establish a connection between a client and a server. The process begins when the client sends an HTTP GET request with an upgrade header, signaling to the server that this is a request to upgrade the connection. If the server supports the upgrade properties, it responds with a status code of 101; otherwise, it returns an error code.

If the server returns any status code other than 101, the client must terminate the connection.

WebSockets are efficient because they create a single, persistent connection between the client and the server, eliminating the need for repeated handshakes with the server for each communication.

WebSocket handshake request from client to server

Here is an example of a WebSocket handshake request header from a client to a server:

GET /chat HTTP/1.1
Host: server.example.com
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: x3JJHMbDL1EzLkh9GBhXDw==
Sec-WebSocket-Protocol: chat, superchat
Sec-WebSocket-Version: 13
Origin: http://example.com

In this example, the client is requesting to upgrade the connection to a WebSocket connection with the /chat endpoint on server.example.com. The Sec-WebSocket-Key is a base64 encoded random value which the server will use to form a response to complete the handshake. The Sec-WebSocket-Protocol is optional and is used to specify sub-protocols, so the server knows which one to use. The Sec-WebSocket-Version indicates the WebSocket protocol version the client wishes to use. The Origin header is used in browser clients to indicate the origin of the request.

WebSocket handshake response from server to client

Here is an example of a WebSocket handshake response header from a server to a client:

HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=

In this example, the server responds with a status code of 101, indicating that it’s switching the protocol as requested by the client. The Upgrade and Connection headers confirm the upgrade to a WebSocket connection. The Sec-WebSocket-Accept header contains a value derived from the Sec-WebSocket-Key header sent by the client, which is used to confirm that the server received the client’s WebSocket handshake request

WebSocket frames

A WebSocket frame is the smallest unit of communication in the WebSocket protocol. Each frame consists of a header and a payload. The header contains information about the frame like its length, type (opcode), whether it’s the final frame in a message, and whether the payload is masked. The payload is the actual data being transmitted.

Here’s a simplified representation of a WebSocket frame structure:

websocket-frame

  • The FIN bit indicates whether this is the final frame in a message.

  • The opcode indicates the type of the frame (e.g., text, binary, close, ping, pong).

  • The MASK bit indicates whether the payload is masked.

  • The payload length field indicates the length of the payload, and can be followed by additional bytes for the length if the payload is large. The masking-key is present if the MASK bit is set, and is used to unmask the payload data.

  • The payload data is the actual data being transmitted.

In the WebSocket protocol, a message can be split into several smaller units called “fragments”. Each fragment is sent in a separate WebSocket frame. This can be useful when dealing with large messages or when streaming data.

A fragment consists of a portion of the message payload and a header. The header includes a “FIN” bit which indicates whether this is the last fragment in the message. If the FIN bit is set to 1, this is the last (or only) fragment in the message. If the FIN bit is set to 0, there are more fragments to come.

WebSockets vs HTTP

Both HTTP (Hypertext Transfer Protocol) and WebSockets are internet communication protocols, but they differ significantly in several ways:

  1. HTTP operates on a stateless protocol, meaning each request-response pair is treated separately with no retained connection state between requests. Conversely, WebSockets create a persistent, bidirectional communication channel between a client and a server.
  2. HTTP uses a request-response communication style where the client sends a request and awaits a response from the server. In contrast, WebSockets facilitate full-duplex communication, allowing the client and server to independently send messages at any time.
  3. HTTP requests carry substantial metadata in headers, which can create overhead, particularly for frequent small messages. In contrast, once a WebSocket connection is established, the overhead for each message is significantly reduced.
  4. HTTP is commonly used for web page loading or API calls, where data needs to be fetched or sent to a server by a client. WebSockets, however, are utilized when there’s a need for real-time, bidirectional communication, such as in chat applications, live updates, and interactive gaming.

Server example

The example contains a simple WebSocket server implementation in C#. It uses the HttpListener class to listen for HTTP requests and upgrades them to WebSocket connections if possible.

using System;
using System.Net;
using System.Net.WebSockets;
using System.Text;
using System.Threading;
using System.Threading.Tasks;

class WebsocketServer
{
    static async Task StartServer()
    {
       	var httpListener = new HttpListener();
		httpListener.Prefixes.Add("http://localhost:8000/chat/");
		httpListener.Start();
		Console.WriteLine("Server started.");

        while (true) {
            var context = await httpListener.GetContextAsync();

            if (context.Request.IsWebSocketRequest) {
                var webSocketContext = await context.AcceptWebSocketAsync(null);
                var webSocket = webSocketContext.WebSocket;

                await EchoLoop(webSocket);
            } else {
                context.Response.StatusCode = 400;
                context.Response.Close();
            }
        }
    }
    
    internal static async Task EchoLoop(WebSocket webSocket) {
    	var buffer = new byte[1024];

    	while (webSocket.State == WebSocketState.Open) {
        	var receivedResult = await webSocket.ReceiveAsync(new ArraySegment<byte>(buffer), CancellationToken.None);
        	if (receivedResult.MessageType == WebSocketMessageType.Close) {
            	await webSocket.CloseAsync(WebSocketCloseStatus.NormalClosure, "", CancellationToken.None);
        	} else {
            	var msgBuffer = new byte[receivedResult.Count ];
            	Array.Copy(buffer, msgBuffer, receivedResult.Count);
            	var msg = Encoding.UTF8.GetString(msgBuffer, 0, receivedResult.Count);
            	Console.WriteLine($"Received: {msg}");

            	var sendBuffer = new ArraySegment<byte>(msgBuffer);
            	await webSocket.SendAsync(sendBuffer, WebSocketMessageType.Text, true, CancellationToken.None);
        	}
        }
    }
}

Here’s a breakdown of the code:

  1. It creates an HttpListener that listens to HTTP requests on http://localhost:8000/chat/. It then enters an infinite loop where it waits for incoming HTTP requests.
  2. When a request is received, it checks if the request is a WebSocket upgrade request using the IsWebSocketRequest property. If it is, it accepts the WebSocket connection and starts an echo loop with the EchoLoop method. If it’s not a WebSocket request, it responds with a 400 status code and closes the connection.
  3. The EchoLoop method is where the WebSocket communication happens. It creates a buffer to store incoming messages and enters a loop that continues as long as the WebSocket connection is open.
  4. Inside the loop, it waits for a message from the client. If the message type is Close, it closes the WebSocket connection. Otherwise, it reads the message, logs it to the console, and sends it back to the client. This is why it’s called an echo loop: it echoes back any message it receives.
  5. The ArraySegment<byte> instances are used to interact with the WebSocket’s SendAsync and ReceiveAsync methods. These methods perform asynchronous read and write operations on the WebSocket connection.

Client example

This TypeScript code is a simple WebSocket client implementation for Node.js using the websocket library.

var W3CWebSocket = require("websocket").w3cwebsocket;

var client = new W3CWebSocket("ws://localhost:8000/chat");

client.onerror = function () {
  console.log("Connection Error");
};

client.onopen = function () {
  console.log("WebSocket Client Connected");

  function sendNumber() {
    if (client.readyState === client.OPEN) {
      var number = Math.round(Math.random() * 0xffffff);
      client.send(number.toString());
      setTimeout(sendNumber, 1000);
    }
  }
  sendNumber();
};

client.onclose = function () {
  console.log("Client Closed");
};

client.onmessage = function (e: any) {
  if (typeof e.data === "string") {
    console.log("Received: '" + e.data + "'");
  } else {
    const jsonString = JSON.parse(
      new TextDecoder().decode(e.data as ArrayBuffer)
    );
    console.log(jsonString);
  }
};

Here’s a breakdown of what it does:

  1. It imports the w3cwebsocket object from the websocket library.
  2. Creates a new WebSocket client that connects to the server at ws://localhost:8000/chat
  3. client.onerror () => {...} logs a message if there’s an error with the WebSocket connection
  4. client.onopen () => {...} when the WebSocket connection is opened, this function logs a message and starts sending random numbers to the server every second
  5. client.onclose () => {...} logs a message when the WebSocket connection is closed.
  6. client.onmessage () => {...} handles incoming messages from the server. If the message data is a string, it logs the string. If the message data is an ArrayBuffer (which is a binary data type), it decodes the ArrayBuffer into a string, parses the string as JSON, and logs the resulting object.

References