Deterministic Lockstep Games With Rollback For The Web

typescript
wasm
rust
physics
engine
17 Jul 20234 min read

QuickGame

When deciding on the multiplayer structure for QuickGame, we had to find a solution that balanced the following

  • Performance
  • Ease of implementation
  • Developer experience

In the end, we chose a solution that was incredibly painful for the first two, but produced a developer experience that is unmatched in the industry. Here is the journey of the how, and the why.

Multiplayer Games

Needless to say, there are a handful of different ways to build multiplayer games. Let's briefly talk about some of them, and their differences:

Dumb Connection

You can connect two players directly together, sending things like your position, velocity, animations, etc. This method is incredibly easy to implement.

Every tick or so you send your data to one another, update your version of the game with the newest data, and continue playing. This is trivial to implement but is filled with obvious issues, most importantly performance and security.

Authoritative server

Instead of direct connections you can choose to have an authoritative server that you proxy your updates to. The server will validate your updates, apply them to its current game state, and update all applicable players with the latest data.

While this is a safer approach, it may not be ideal for all scenarios, including amount of bandwidth required to keep the simulation faithful across all clients.

Granted these explanations are oversimplified, but incredibly common. Let's look at a different way to approach this problem.

Deterministic Lockstep

A different approach would be to have the game play out exactly the same on every player's device, deterministically, and only syncing the player input between each other. If you can pull this off you will mitigate the bandwidth and security issues, since you are only sending keystrokes instead of positional information.

This approach has many caveats for multiplayer, mostly getting inputs long after the tick has occurred. To solve this you must employ a rollback system.

Rolling back

Our game state is simple. It is the representation of the game at one particular instant (or tick), as we know it.

If this was a single player game it would be easy to reason about, since time only marches forward and our game state can only be affected by one person. Once you add multiplayer to the mix, things get a fair bit more complicated.

Given that all our players are executing roughly the same tick at the same time, and the latency between them could be hundreds of milliseconds, its easy to see that by the time Player 1 receives the input from Player 2, Player 1 is already a handful of ticks ahead.

Player 1 cannot simply execute the inputs as this would break the determinism for our game. They must first roll back their game state to the tick in which the input was executed and resimulate the game with the new inputs applied!

This may seem like a lot of unnecessary work, but it is required to keep our game state in good working order, and of course, deterministic.

Of the three approaches this one is probably the most difficult to get right, and requires you to make hard decisions about what your game can do.

In this article I will detail my journey into creating a deterministic game engine!

Determinism

The biggest issue with this type of engine is keeping everyone in sync, all the time. Every tick has to execute the same way, at ideally the same time, across a dozen different devices. For a given input into the system the same exact output must occur, otherwise your peers will quickly slip out of sync with you causing their game to behave unpredictably when they receive your inputs.

Physics Engine

The hardest part of assuring determinism is your physics engine. Even though floats are by definition deterministic (although seemingly chaotic), the precision involved to maintain determinism is incredibly high.

32.00000001 * 32.00000001 = 1024.00000064

32.00000001 * 32.00000002 = 1024.00000096

While the end result feels infinitesimally small between these two statements (and it is), compounded over many thousands of game ticks, this will result in a deviation that is not only noticeable to the player, but also breaks the determinism of our game.

Most JavaScript based physics engines struggle with maintaining determinism. There are simply too many variables to keep track of, and too much variability in execution to faithfully reproduce the same results for the same inputs over long periods of time.

This was a major problem for QuickGame. We went through every major JS physics engine, from Cannon, to Ammo, to Oimo, and they all fell short on maintaining determinism.

Eventually we discovered the incredible Rapier.JS. This is a Web Assembly physics engine with a lot of cool features, the most important of which is that it is completely deterministic in its execution. For a given input, say five bouncy balls with a certain mass, rotation, position, velocity, it will always play out the exact same way.

Random

Random numbers are another important feature of gamedev, but random numbers are supposed to be just that, random, which is ideally not deterministic.

Since JavaScript's Math.random() is especially not deterministic, we needed to use a seed based random number generator. This will produce the exact same result for a given number of calls. For this we use Prando which uses an xorshift to produce its next random number for a given seed.

info

Calling this random number generator the same number of times is critically important. One extra, non deterministic call, and our game state will fall out of sync.

Game State

Now that our physics engine is deterministic we can move on to the other hard problem, keeping our game state in sync.

Our game state in QuickGame contains things like our Game Objects, Heads Up Display information, timers, player details, etc. It is one big serializable object that represents our game at a single tick in time.

Given the nature of our rollback system, we must be able to roll back to any tick in the past, and re-run our game logic. Because of this, we must store in memory all the previous Game States we have executed, going back to the earliest player input we have available.

info

Said differently, if we have inputs from all players going back to at least tick 1000, we can safely remove all Game States prior to that, as we will never need to roll back further than that point.

It is critically important to keep track of what goes into your game state, and assuring you are only affecting it deterministically. One non deterministic Math.random(), out of sync id generation, or logic misstep, and your game will drift off irrecoverably between players.

Serializing

Serializing your game state is tough in JavaScript. The naive solution is simply JSON.stringify, JSON.parse, but this will cut your floats in relatively unpredictable ways, and does not play nice with things like Map and Set. Because of this, we opted to store the objects in memory when handled locally, and serializing to a byte array using Safe-Schema when being sent over the wire, or stored in the database.

This can be especially painful when making changes to your game object schema, as any previously stored states will need to be migrated. Luckily this happens infrequently.

Using the Rapier physics engine we can serialize and deserialize our entire physics state into a single Uint8Array which we can store verbatim. This is another invaluable feature of that incredible engine.

User Inputs

In QuickGame we send only the players' inputs over the wire to one another, and the number of inputs available to the developer are arrows plus 3 buttons. This neatly fits into a single byte.

A socket message will be sent to the server and other players only when a button is pressed or unpressed, which is generally only a few times a second. This means that the bandwidth is a handful of bytes up, and a few dozen bytes down, per second.

This is unprecedented for most multiplayer games, but trivially implemented when using this system.

Once the input is received, it will replay the tick that it occurred on, and every subsequent tick until the current moment in time, and store our game states accordingly.

info

You read that right. We're going to execute multiple ticks in the past when we receive new inputs.

The biggest obvious problem with this approach is performance. Needing to execute so many extra ticks means our ticks need to be fast. Here's how we managed this problem.

Performance

Executing so much to assure perfect determinism is a chore.

There is no quick solution to this problem, just a handful of clever hacks and painful decisions. I'll break it down into a few different sections.

Serializing

To serialize our Game State we have to deep clone a really complicated JavaScript object. We went down so many different paths to solve this problem, including JSON.parse(JSON.stringify()) (which breaks determinism), structuredClone (which is ideal, but does not work in web workers on safari), serializing to a byte array (too much of a perf hit), but we landed on simply recreating the object by hand every time. It's by far the most tedious, and the least scalable, but that is typical when solving performance problems.

Game Logic

This problem was solved simply due to the nature of games made with QuickGame. Because they are relatively simple, at least by game dev standards, the amount of game logic code needing to be executed every tick is not typically enough to break the bank.

Since QuickGame handles so much for you, even with a thousand game objects existing and ticking, it is still easily handled within our window. Your mileage may vary!

Physics

This is the tough one. Physics, especially 3d physics, is complicated, and takes a long time to calculate. For a typical 60fps game that means you only have 16ms to execute all of your physics, game logic, and serialization. This is typically more than enough time to do so, but you have to remember, we may need to execute MANY of these every tick.

To that end we come to the first painful decision: the game logic loop is 50ms long. This was chosen to give game developers the most amount of time to execute even the most complicated game logic multiple times without hitting the 50ms window.

There are concerns around sponginess of the physics only executing 20 times per second, but we felt this number was the right balance between too much and too little, while giving an experience that is not perceived by the general player. Again, your mileage may vary.

Let's do some math!

We are typically seeing full game loops that take 5-6ms for decently complicated games on relatively low powered devices.

info

Remember, this number does not scale up with the number of players! Simply adding a few more physics bodies for each player to our simulation is trivial to handle!

This means we have a window of up to 9-10 rollback ticks at any given time. Since our game tick duration is 50ms, that means we can tolerate up to 500ms of sustained latency at any given time.

In reality, we are seeing a big fluctuation from 100ms to 800ms of latency, with the average being on the far lower end. This is more than enough window to execute everything that is necessary to keep the game alive for all players!

Why we chose this method

programmed like a single player game with bots

The important thing we haven't discussed yet is how easy this method makes developing games for.

In QuickGame, you don't have to be an expert engineer. You don't have to know if your code is being executed on the server or the client, or what state to serialize to send over the wire.

The game plays out effectively as a single player game with bots, just those bots are other human beings. You get their inputs and they are applied as if they were pressed by your buddy sitting next to you.

This delivers an unparalleled developer experience that engines like Roblox cannot compete with.

For all the pain involved in getting it right, and there is surely more to do, it was resoundingly the right decision for us.

Everything else

There are a lot of things that were outside the scope of this article, such as:

  • Things that don't need to be deterministic since they are purely visual
  • Audio in a rollback world
  • LERPing the player visuals so things don't look so jumpy during aggressive rollbacks
  • Forcing a sync from the server when a player falls too far behind
  • Generating game videos one frame at a time based on an initial game state and inputs using ffmpeg and puppeteer

I intend to cover at least some of them in the future since they are all independently interesting, and near and dear to my heart.

I hope this was helpful in shedding some light on a little talked about multiplayer solution, and if you are an aspiring (or professional) gamedev I hope you give QuickGame a try to see the magic in action!

© 2024 QuickGame World, Inc. All rights reserved.