Building a Multiplayer Infinite Canvas with React and WebSockets

Rob Pruzan 6/30/24

Estimated Reading Time: 12.9m

Motivation

I'm now building my third app which needs an infinite canvas- pannable and zoomable. I like implementing it myself over using an abstraction to avoid all of the bloat that comes with graphics libraries. You can get pretty far with not a lot of canvas code.

Id like to document:

How my implementation of an infinite canvas has changed
How I'd currently implement an infinite canvas,
How to integrate multiplayer into an infinite canvas.

Pre-reqs/ dependencies

React
Typescript

As long as you understand the basic fundamentals of React & Typescript, the code snippets should make sense to you.

Note! All of the demos shown in this artice are intended for desktop use

Introduction

The first time I built an infinite canvas was for my algorithm visualization website AlgoViz. This was also the first time I built anything canvas related, so I think it's useful to point out naive solutions that someone might think of, and how they compare to a more refined solution.

Rendering geometry

The thing all infinite canvas applications will have in common is the need to render some geometry on the canvas.

This is a simple task since the browser gives us all the API's we need to draw different geometry to the canvas. Our responsibility is to handle its movement.

Panning

When I first implemented panning my goal was simply to move the geometry on the screen with my trackpad. So the most straight forward solution (to me) was updating the location of my geometry in state.

To implement this, all thats needed ontop of the prior example is to:

Listen for trackpad events on the canvas ("wheel")
Update the geometry state by how much movement was detected during the wheel event (event.deltaX, event.deltaY)

Note! All the demo components can be reset by hitting the undo icon in the top left

This gets the job done if all we want is the allusion of panning.

But, this implementation quickly falls apart if we start thinking about using the canvas implementation in a multiplayer setting. By multiplayer, I mean the canvas geometry state is shared between users.

Note Here I simulated multiple players and shared state with a react component that passes down state to 2 canvas instances. In reality you would be using something like a web-socket connection to dispatch state updates to other players.

What the above components show you is that panning one canvas instance also pans the other. Currently our local coordinate system is tightly coupled to the geometry we want to render.

Instead of modifying the actual position of the geometry to implement panning, we can just maintain state that holds how far we are translated in the x and y direction. After we can apply that translation when we draw to the canvas.

Now panning one canvas will successfully not affect the other user's geometry state.

Zooming

Zooming is a similar concept to panning, but instead of adding to our coordinates by some amount, we will scale.

We can do this in 2 ways:

Update the geometry's coordinates every time the user zooms
Maintain state representing how much the user is zoomed, where we apply the transformation to the geometry when drawing to canvas

Because we already encountered the pitfalls of updating the actual geometry's coordinates, we will jump straight to maintaining a separate piece of state that represents how much the user is zoomed.

To do this, we will re-use the wheel event we setup on the canvas- the browser exposes zooming through a wheel event, where the event.ctrlKey == true. We can then read the event.deltaY property to know how much the user is zooming in either direction. We can continuously update our zoom value every time we receive this event.

Once we have our tracked zoom value, we can apply the zoom on the (x, y, radius) of the circle before drawing it to the canvas:

This works, but we have some annoying behavior, it always zooms to the origin.

This makes sense when you think about it- 0 * <anything> is 0, and that will never change. So when we zoom, everything at the origin won't move from our perspective, while everything not at the origin will be scaled to a coordinate that is further away from (0,0). Giving the appearance that the zoom is fixed around the origin, and everything else is moving away.

The desirable behavior here would be to lock screen around the mouse, not (0,0). So that when a user zooms, the geometry near the users mouse position will grow in size, rather than move away. Otherwise the user would have to zoom, then translate back to the place they to go, zoom, translate, zoom...

Instead of the user performing these uncomfortable set of actions, we could do this for them. Every time the user zooms, keep the camera fixed around the geometry its hovered over by translating the screen the opposite amount it would move from the zoom.

To do this, we would need to know where our mouse is in our coordinate system (which I will refer to as world coordinates), while the browser can only give us information about where our mouse is relative to the screen (which I will refer to as screen coordinates). The mouse will only be moving in terms of world coordinates when zooming, not screen coordinates.

So lets make a function that converts screen coordinate's to world coordinates

Note Going forward we will consolidate the translation + zoom state into a single state variable named camera

export const toWorld = (
  screenCoordinate: {
    x: number;
    y: number;
  },
  camera:  {
    zoom: number
    x: number;
    y: number;
  }
) => {
  return {
    x: camera.x + screenCoordinate.x / camera.zoom,
    y: camera.y + screenCoordinate.y / camera.zoom,
  };
};

export const toWorld = (
  screenCoordinate: {
    x: number;
    y: number;
  },
  camera:  {
    zoom: number
    x: number;
    y: number;
  }
) => {
  return {
    x: camera.x + screenCoordinate.x / camera.zoom,
    y: camera.y + screenCoordinate.y / camera.zoom,
  };
};

Now using this we can calculate how much our mouse has moved, in world coordinates, between zooms. Then we update our translation state so that the (x,y) movement caused by the zoom would be counter-acted by this translation.

const zoomFactor = Math.pow(0.99, e.deltaY);
const newZoom = camera.zoom * zoomFactor;
const mouseWorldBefore = toWorld({ x: mouseX, y: mouseY }, camera);
const newCamera = { ...camera, zoom: newZoom };
const mouseWorldAfter = toWorld({ x: mouseX, y: mouseY }, newCamera);

setCamera({
  x: camera.x + (mouseWorldBefore.x - mouseWorldAfter.x),
  y: camera.y + (mouseWorldBefore.y - mouseWorldAfter.y),
  zoom: newZoom,
});

const zoomFactor = Math.pow(0.99, e.deltaY);
const newZoom = camera.zoom * zoomFactor;
const mouseWorldBefore = toWorld({ x: mouseX, y: mouseY }, camera);
const newCamera = { ...camera, zoom: newZoom };
const mouseWorldAfter = toWorld({ x: mouseX, y: mouseY }, newCamera);

setCamera({
  x: camera.x + (mouseWorldBefore.x - mouseWorldAfter.x),
  y: camera.y + (mouseWorldBefore.y - mouseWorldAfter.y),
  zoom: newZoom,
});

We now also must be conscious when we make translations to move smaller amounts when we are zoomed in. Otherwise the user would be flying all over the place when panning while zoomed in.

setCamera((prev) => ({
  ...prev,
  x: prev.x + e.deltaX / prev.zoom,
  y: prev.y + e.deltaY / prev.zoom,
}));

setCamera((prev) => ({
  ...prev,
  x: prev.x + e.deltaX / prev.zoom,
  y: prev.y + e.deltaY / prev.zoom,
}));

Now that we have a working implementation, lets clean up some things.

The canvas may be a little blurry depending on your device resolution. We can make the canvas use more pixels when rendering objects on devices with higher device pixel ratios (dpr) with:

const ctx = canvas.getContext("2d"); const dpr = window.devicePixelRatio; const
rect = canvas.getBoundingClientRect();

canvas.width = rect.width * dpr;
canvas.height = rect.height * dpr;

canvas.style.width = `${rect.width}px`;
canvas.style.height = `${rect.height}px`;

ctx.scale(dpr, dpr);

const ctx = canvas.getContext("2d"); const dpr = window.devicePixelRatio; const
rect = canvas.getBoundingClientRect();

canvas.width = rect.width * dpr;
canvas.height = rect.height * dpr;

canvas.style.width = `${rect.width}px`;
canvas.style.height = `${rect.height}px`;

ctx.scale(dpr, dpr);

And we don't need to manually translate and zoom the coordinates of every object we draw, the canvas can handle this automatically:

ctx.scale(zoom, zoom);
ctx.translate(-x, -y);

ctx.scale(zoom, zoom);
ctx.translate(-x, -y);

We can also make this implementation reusable by extracting the logic to a hook:

Multiplayer

To make our canvas multiplayer I will use a bun webserver with no dependencies.

Lets first setup a simple websocket (WS) server for our future multiplayer canvas:

const server = Bun.serve({
  fetch(req, server) {
    server.upgrade(req);
  },
  websocket: {
    open: (ws) => {
      ws.subscribe("canvas");
    },
    message: async (ws, message) => {
      ws.publish("canvas", message);
    },
  },
  port: 8080,
});

console.log(`Listening on ${server.hostname}:${server.port}`);

const server = Bun.serve({
  fetch(req, server) {
    server.upgrade(req);
  },
  websocket: {
    open: (ws) => {
      ws.subscribe("canvas");
    },
    message: async (ws, message) => {
      ws.publish("canvas", message);
    },
  },
  port: 8080,
});

console.log(`Listening on ${server.hostname}:${server.port}`);

Bun will automatically send the 101 Upgrade response when we call server.upgrade(req). It then allows you to interact with the created websocket connection through open, close and message events, passing you the websocket connection that triggered it.

We can run the server with:

bun run --hot <filename>

Note! --hot tells bun to reload the server anytime a change is detected in the script

Because bun can transpile typescript internally, we only need the single typescript file to run the server. But if you want autocomplete through typescript in your IDE on the Bun global variable, setup a bun project with:

bun init

And move the websocket code to index.ts. This is enough to give your IDE the information about bun's types.

Now, lets setup some interaction on our local canvas before we start receiving interactions from other players. We will start with drawing a circle onClick of the canvas.

We added a function coordinatesFromMouseEvent. This is a convenience function to calculate the screen coordinates the browser provided, then translate the screen coordinates to world coordinates:

export const coordinatesFromMouseEvent = ({
  event,
  camera,
  canvasRef,
}: {
  event: { clientX: number; clientY: number };
  canvasRef: MutableRefObject<HTMLCanvasElement | null>;
  camera: Camera;
}) => {
  const canvas = canvasRef.current;
  if (!canvas) return;

  const rect = canvas.getBoundingClientRect();
  const mouseX = event.clientX - rect.left;
  const mouseY = event.clientY - rect.top;
  return toWorld({ x: mouseX, y: mouseY }, camera);
};

export const coordinatesFromMouseEvent = ({
  event,
  camera,
  canvasRef,
}: {
  event: { clientX: number; clientY: number };
  canvasRef: MutableRefObject<HTMLCanvasElement | null>;
  camera: Camera;
}) => {
  const canvas = canvasRef.current;
  if (!canvas) return;

  const rect = canvas.getBoundingClientRect();
  const mouseX = event.clientX - rect.left;
  const mouseY = event.clientY - rect.top;
  return toWorld({ x: mouseX, y: mouseY }, camera);
};

Now we are going to update our web-socket server to handle receiving and sending messages from connected clients.

The main task will be maintaining the socket connections within an array, so when users send messages to our WS server we can broadcast the message to everyone who is connected to our WS server (excluding ourselves):

import type { ServerWebSocket } from "bun";

let sockets: Array<ServerWebSocket<unknown>> = [];

const server = Bun.serve({
  fetch(req, server) {
    server.upgrade(req);
  },
  websocket: {
    open: (ws) => {
      sockets.push(ws);
    },
    close: (ws) => {
      sockets = sockets.filter((s) => s !== ws);
    },
    message: async (ws, message) => {
      sockets.forEach((s) => ws !== s && s.send(message));
    },
  },
  port: 8080,
});

import type { ServerWebSocket } from "bun";

let sockets: Array<ServerWebSocket<unknown>> = [];

const server = Bun.serve({
  fetch(req, server) {
    server.upgrade(req);
  },
  websocket: {
    open: (ws) => {
      sockets.push(ws);
    },
    close: (ws) => {
      sockets = sockets.filter((s) => s !== ws);
    },
    message: async (ws, message) => {
      sockets.forEach((s) => ws !== s && s.send(message));
    },
  },
  port: 8080,
});

Next we need to make the connection on the client.

The steps we will go to accomplish this is:

Setup state in react to hold the socket object returned by WebSocket (the browsers API for making WebSocket connections)
Setup event listeners on the socket so we appropriately react to different states the socket will be in

It's useful to setup WS event listeners in their own useEffects. It can be bug prone when complexity increases to have several WS listener setups in a single useEffect.

Cleanup anything we setup in the useEffect using the cleanup function

This is a very important step. When our component unmounts the WS connection will not close automatically. And if the dependencies in our event listener useEffect's change, we will be adding more event listeners, without ever removing the previously added listeners. This can be very problematic because the functions used in the old event listeners may no longer be correct- due to the captured values being stale- potentially breaking our application. And even if they are still valid, it can very quickly lead to a performance-impacting-memory-leak

Note! Here we assume the websocket server is available at http://localhost:8080

const [socket, setSocket] = useState<null | WebSocket>(null);

// makes the WS connection, sets the socket in state when its opened
useEffect(() => {
  const newSocket = new WebSocket("http://localhost:8080");
  const handleOpen = () => {
    setSocket(newSocket);
  };
  newSocket.addEventListener("open", handleOpen);

  return () => {
    newSocket.removeEventListener("open", handleOpen);
    newSocket.close();
  };
}, []);

// makes sure to clean up the socket state incase the server closes the connection
useEffect(() => {
  const handleClose = () => {
    setSocket(null);
  };
  socket?.addEventListener("close", handleClose);

  return () => {
    socket?.removeEventListener("close", handleClose);
  };
}, [socket]);

// handles incoming messages being sent from the server
useEffect(() => {
  if (!socket) {
    return;
  }

  const handleMessage = (e: MessageEvent<string>) => {
    const json = JSON.parse(e.data) as Geometry;
    setGeometry((prev) => [...prev, json]);
  };
  socket.addEventListener("message", handleMessage);

  return () => {
    socket.removeEventListener("message", handleMessage);
  };
}, [socket]);

const [socket, setSocket] = useState<null | WebSocket>(null);

// makes the WS connection, sets the socket in state when its opened
useEffect(() => {
  const newSocket = new WebSocket("http://localhost:8080");
  const handleOpen = () => {
    setSocket(newSocket);
  };
  newSocket.addEventListener("open", handleOpen);

  return () => {
    newSocket.removeEventListener("open", handleOpen);
    newSocket.close();
  };
}, []);

// makes sure to clean up the socket state incase the server closes the connection
useEffect(() => {
  const handleClose = () => {
    setSocket(null);
  };
  socket?.addEventListener("close", handleClose);

  return () => {
    socket?.removeEventListener("close", handleClose);
  };
}, [socket]);

// handles incoming messages being sent from the server
useEffect(() => {
  if (!socket) {
    return;
  }

  const handleMessage = (e: MessageEvent<string>) => {
    const json = JSON.parse(e.data) as Geometry;
    setGeometry((prev) => [...prev, json]);
  };
  socket.addEventListener("message", handleMessage);

  return () => {
    socket.removeEventListener("message", handleMessage);
  };
}, [socket]);

The message event listener is what's interesting here. This will listen for any messages that the server sends to us, and then calls our handleMessage function- passing us the websocket frame payload in e.data. We can safely run JSON.parse on the payload because we know in our application we are only sending JSON encoded strings. There is nothing stopping us from not using JSON, or sending raw bytes- incase we wanted to send multi-media,

Now that we are properly listening for messages, it makes sense to start sending messages to our WS server, which will in-turn relay the sent message to all connected clients.

We can do this using the socket state we have available. This socket object exposes a method send. It allows us to pass in a payload, and it will automatically build a valid websocket frame, and then send it through the open WS connection we made with our bun server.

But, most of this logic is pretty general, and distracts us from what we are really doing inside our infinite canvas component. Lets generalize it and move the logic to a separate hook:

import { useState, useEffect } from "react";

export const useWebSocket = <T>({
  onClose,
  onError,
  onMessage,
  onOpen,
  url,
}: {
  onOpen?: (event: Event) => void;
  onMessage?: (message: MessageEvent<unknown> & { parsedJson: T }) => void;
  onError?: (error: Event) => void;
  onClose?: (e: Event) => void;
  url: string;
}) => {
  const [socket, setSocket] = useState<null | WebSocket>(null);
  const [status, setStatus] = useState<
    "connecting" | "open" | "error" | "closed"
  >("closed");

  // makes the WS connection, sets the socket in state when its opened
  useEffect(() => {
    const newSocket = new WebSocket(url);
    const handleOpen = (e: Event) => {
      setStatus("open");
      setSocket(newSocket);
      onOpen?.(e);
    };
    newSocket.addEventListener("open", handleOpen);

    return () => {
      newSocket.removeEventListener("open", handleOpen);
      newSocket.close();
      setSocket(null);
    };
  }, []);

  useEffect(() => {
    if (!socket) {
      return;
    }
    const handleError = (e: Event) => {
      setStatus("error");
      onError?.(e);
    };

    socket.addEventListener("error", handleError);

    return () => {
      socket.removeEventListener("error", handleError);
      setSocket(null);
    };
  }, [socket, onError]);

  // makes sure to clean up the socket state incase the server closes the connection
  useEffect(() => {
    const handleClose = (e: Event) => {
      setStatus("closed");
      onClose?.(e);
    };
    socket?.addEventListener("close", handleClose);

    return () => {
      socket?.removeEventListener("close", handleClose);
      setSocket(null);
    };
  }, [socket, onClose]);

  // handles incoming messages being sent from the server
  useEffect(() => {
    if (!socket) {
      return;
    }

    const handleMessage = (e: MessageEvent<string>) => {
      if (!onMessage) {
        return;
      }
      const json = JSON.parse(e.data) as T;
      onMessage({ ...e, parsedJson: json });
    };
    socket.addEventListener("message", handleMessage);

    return () => {
      socket.removeEventListener("message", handleMessage);
    };
  }, [socket, onMessage]);

  return {
    socket,
    status,
  };
};

import { useState, useEffect } from "react";

export const useWebSocket = <T>({
  onClose,
  onError,
  onMessage,
  onOpen,
  url,
}: {
  onOpen?: (event: Event) => void;
  onMessage?: (message: MessageEvent<unknown> & { parsedJson: T }) => void;
  onError?: (error: Event) => void;
  onClose?: (e: Event) => void;
  url: string;
}) => {
  const [socket, setSocket] = useState<null | WebSocket>(null);
  const [status, setStatus] = useState<
    "connecting" | "open" | "error" | "closed"
  >("closed");

  // makes the WS connection, sets the socket in state when its opened
  useEffect(() => {
    const newSocket = new WebSocket(url);
    const handleOpen = (e: Event) => {
      setStatus("open");
      setSocket(newSocket);
      onOpen?.(e);
    };
    newSocket.addEventListener("open", handleOpen);

    return () => {
      newSocket.removeEventListener("open", handleOpen);
      newSocket.close();
      setSocket(null);
    };
  }, []);

  useEffect(() => {
    if (!socket) {
      return;
    }
    const handleError = (e: Event) => {
      setStatus("error");
      onError?.(e);
    };

    socket.addEventListener("error", handleError);

    return () => {
      socket.removeEventListener("error", handleError);
      setSocket(null);
    };
  }, [socket, onError]);

  // makes sure to clean up the socket state incase the server closes the connection
  useEffect(() => {
    const handleClose = (e: Event) => {
      setStatus("closed");
      onClose?.(e);
    };
    socket?.addEventListener("close", handleClose);

    return () => {
      socket?.removeEventListener("close", handleClose);
      setSocket(null);
    };
  }, [socket, onClose]);

  // handles incoming messages being sent from the server
  useEffect(() => {
    if (!socket) {
      return;
    }

    const handleMessage = (e: MessageEvent<string>) => {
      if (!onMessage) {
        return;
      }
      const json = JSON.parse(e.data) as T;
      onMessage({ ...e, parsedJson: json });
    };
    socket.addEventListener("message", handleMessage);

    return () => {
      socket.removeEventListener("message", handleMessage);
    };
  }, [socket, onMessage]);

  return {
    socket,
    status,
  };
};

Some key points in this new hook:

we are putting the event handlers, that we accept as arguments in useWebSocket, in the useEffect dependency arrays. This is the only way we will be able to detect if the callback the user passed is stale, so it will be the consumers responsibility to keep the function reference stable between renders to avoid excessive computation.
we track the status of the websocket manually. The socket returned from WebSocket does expose a status, but the status it exposed is mutated and does not alert react of this change. If we were to read the status of the socket we return from the hook during render, it would be impure.

Next, we want to share all the actions we perform on our geometry with other players, lets send everything we set to our geometry state to the WS server:

<canvas
  onClick={(event) => {
    const worldCoordinates = Canvas.coordinatesFromMouseEvent({
      camera,
      canvasRef,
      event,
    });
    if (!worldCoordinates) {
      return;
    }
    const newItem = {
      kind: "circle" as const,
      radius: 20,
      x: worldCoordinates.x,
      y: worldCoordinates.y,
    };
    socket?.send(JSON.stringify(newItem));
    setGeometry((prev) => [...prev, newItem]);
  }}
/>;

<canvas
  onClick={(event) => {
    const worldCoordinates = Canvas.coordinatesFromMouseEvent({
      camera,
      canvasRef,
      event,
    });
    if (!worldCoordinates) {
      return;
    }
    const newItem = {
      kind: "circle" as const,
      radius: 20,
      x: worldCoordinates.x,
      y: worldCoordinates.y,
    };
    socket?.send(JSON.stringify(newItem));
    setGeometry((prev) => [...prev, newItem]);
  }}
/>;

Now that we are sending WS messages, we can use our useWebSocket.ts hook to implement our multiplayer infinite canvas component in a very simple manner

All we need to give the useWebsocket hook to have multiplayer working is a URL to our WS server, and a handler for incoming WS messages.

Now in <100 lines of code, after our reusable abstractions, we have a multiplayer infinite canvas that also supports panning and zooming for free.

Note! The below demo uses a real deployed websocket server, with the same code shown in the tabs!

Bonus

If we were to turn on the react compiler, we can write this component in ~60 lines of code (after abstractions), while still being performant! Thats because we were using useCallback to get a stable function reference on our event handlers to avoid excessive re-draws/event listener setup/teardowns. The react compiler can stabilize our function references for us, letting us not have to think about our function reference changing between renders even if the capture group of the function did not change.

We now have a working multiplayer infinite canvas, along with reusable hooks for any implementation you need.

Though, there are several optimizations we can make to this implementation to improve it, for example:

Render the canvas off the main thread using OffscreenCanvas
Use conflict-free replicated data types (CRDT) for consistent state between players
Virtualize the canvas component to avoid drawing & updating when not on screen

Which I will try to explore and write about at some point :)

Rob Pruzan