## May 23, 2012

### Multithreading Problems In Game Design

A couple years ago, when I first started designing a game engine to unify Box2D and my graphics engine, I thought this was a superb opportunity to join all the cool kids and multithread it. I mean all the other game developers were talking about having a thread for graphics, a thread for physics, a thread for audio, etc. etc. etc. So I spent a lot of time teaching myself various lockless threading techniques and building quite a few iterations of various multithreading structures. Almost all of them failed spectacularly for various reasons, but in the end they were all too complicated.

I eventually settled on a single worker thread that was sent off to start working on the physics at the beginning of a frame render. Then at the beginning of each subsequent frame I would check to see if the physics were done, and if so sync the physics and graphics and start up another physics render iteration. It was a very clean solution, but fundamentally flawed. For one, it introduces an inescapable frame of input lag.

Single Thread Low Load   FRAME 1   +----+             |    | . Input1 -> |    |             |[__]| Physics               |[__]| Render     . FRAME 2   +----+ INPUT 1 ON BACKBUFFER . Input2 -> |    | . Process ->|    |             |[__]| Physics . Input3 -> |[__]| Render . FRAME 3   +----+ INPUT 2 ON BACKBUFFER, INPUT 1 VISIBLE .           |    | .           |    | . Process ->|[__]| Physics             |[__]| Render   FRAME 4   +----+ INPUT 3 ON BACKBUFFER, INPUT 2 VISIBLE Multi Thread Low Load   FRAME 1   +----+             |    |             |    | . Input1 -> |    | .           |[__]| Render/Physics START   . FRAME 2   +----+         . Input2 -> |____| Physics END .           |    | .           |    | . Input3 -> |[__]| Render/Physics START . FRAME 3   +----+ INPUT 1 ON BACKBUFFER .           |____| .           |    | PHYSICS END .           |    |             |____| Render/Physics START   FRAME 4   +----+ INPUT 2 ON BACKBUFFER, INPUT 1 VISIBLE

The multithreading, by definition, results in any given physics update only being reflected in the next rendered frame, because the entire point of multithreading is to immediately start rendering the current frame as soon as you start calculating physics. This causes a number of latency issues, but in addition it requires that one introduce a separated "physics update" function to be executed only during the physics/graphics sync. As I soon found out, this is a massive architectural complication, especially when you try to put in scripting or let other languages use your engine.

There is another, more subtle problem with dedicated threads for graphics/physics/audio/AI/anything. It doesn't scale. Let's say you have a platformer - AI will be contained inside the game logic, and the absolute vast majority of your CPU time will either be spent in graphics or physics, or possibly both. That means your game effectively only has two threads that are doing any meaningful amount of work. Modern processors have 8 logical cores1, and the best one currently available has 12. You're using two of them. You introduced all this complexity and input lag just so you could use 16.6% of the processor instead of 8.3%.

Instead of trying to create a thread for each individual component, you need to go deeper. You have to parallelize each individual component separately, then tie them together in a single-threaded design. This has the added bonus of being vastly more friendly to single-threaded CPUs that can't thread things (like certain phones), because the parallization goes on at a lower level and is invisible to the overall architecture of the library. So instead of having a graphics thread and a physics thread, you simply call the physics update, then call the graphics update, and inside each physics and graphics update you spawn enough worker threads to match the number of cores you have to work with and concurrently process as much stuff as possible. This eliminates latency problems, complicated library designs, and it scales forever. Even if your initial implementation of concurrency won't handle 32 cores, because the concurrency is encapsulated inside the engine, you can just go back and change it later without ever having to modify any programs that use the graphics engine.

Consequently, don't try to multithread your games. It isn't worth it. Separately parallelize each individual component instead and write your game engine single-threaded; only use additional threads for asynchronous activities like resource loading.

1 The processors actually only have 4 or 6 physical cores, but use hyperthreading techniques so that 8 or 12 logical cores are presented to the operating system. From a software point of view, however, this is immaterial.

#### 27 comments:

1. Well-written and well-reasoned. Thank you for this post.

2. Interesting experience. I've personally found that multi-threading is a lot more stable as long as you manage the latency. As you've said, it isn't easy, but it can be done. I normally use threads for physics and rendering, and then also sub-divide the work in individual threads as is possible - e.g. using 2 or 4 threads for a particle simulation update, splitting the world up and working in different areas in the physics loop on different threads. Additionally, networking and audio are easier to manage on their own threads and then feed events back into the main physics loop. The physics loop can be tightly timed (e.g. running 120hz or more, tightly synchronised with other sub-systems or over the network) and I've also been investigating running a physics loop with double buffering to avoid locking for rendering. There are many interesting possibilities to utilise 8 hardware backed threads or more!

1. Audio and networking are different - networking especially lends itself to a deferred, multithreaded design. Audio can be multithreaded, but I have found this to be of limited use. The thing is that trying to effectively manage that kind of threading is extremely difficult due to complex interdependencies between graphics and physics, hence why I feel a separated encapsulated concurrency model is more appropriate.

3. Modern gaming design would be well served making use of the vastly superior floating point operations of the GPU. I know many of the newer video cards are increasing in cores, memory, and speed all the time. How much work have you done leveraging the GPU vs CPU?

1. That is a question of graphics engine design, and is somewhat dependent on one's choice of API. The issue is that the GPU isn't the CPU and thus there is a memory barrier between information on the GPU and information on the CPU, which causes all sorts of strange bottlenecks. I personally am working on building a GPU-based particle engine and using DX10's geometry shader for speeding up certain sections of rendering, such as shadow calculation. However I have also found that some things are best left to the CPU, because the CPU already has access to the information instead of trying to shuttle it all the way to the GPU to perform the calculations.

4. How do you manage rendering, when OpenGL for example can only handle being rendered to from one thread?

1. DirectX actually has facilities for multithreading, but they are inefficient and so it is in a similar situation. This is something I have been musing on, but I haven't actually tried anything. My approach to parallelization in my engine will likely be focused in two areas - the main culling performed using a kd-tree that then dumps images into a red-black tree for z-order sorting, and the actual rendering stage that has to make the transformation calculations before sending the image to the GPU. The problem with a 2D engine is that I can't just throw things to the GPU whenever they happen to be finished prerendering, I have to call the GPU in a very specific order so the images get rendered correctly. You can sidestep this by isolating which images actually overlap by hijacking the kd-tree that's used for culling, but doing so is complicated and difficult to make effective.

One method for the actual rendering would be to spawn a number of worker threads which then power through the linked list of images, and after finishing, pushing a finished render packet on to a stack and link the image position to it, then have a master thread work through the finished images as they are prerendered by simply walking through the linked list, waiting until the next image is ready. Unfortunately we then run into additional problems with the driver state and the general restriction that the graphics card can only be in one state at a time. This requires that any state-based manipulations of the graphics card be deferred to the master thread, since they must be done in order in a single-threaded context.

Still, this kind of multithreading is probably unnecessary, because the graphics engine offloads a significant amount of calculations to the GPU, such that it is not usually stressful on the CPU.

5. Hi, I've also created multithreading (separate update and render thread), but don't come the conclusion that it isn't worth it. True, it is very difficult to implement decently, but it is worth it for games that require lots of CPU. The goals is to seperate the render-state from the world-state so you can triple buffer the render state. The delay you talk about can be 'solved' using partial synchronization and extrapolation.

For an explanation of the problem please check out:
http://blog.slapware.eu/game-engine/programming/multithreaded-renderloop-part1/
http://blog.slapware.eu/game-engine/programming/multithreaded-renderloop-part2/

The website discusses the problems and shows a decent (though difficult) solution. Both source and binaries can be downloaded.

1. Your solution is unsuitable for a number of reasons. First, its complicated as hell, and second, because of this complexity, it is extraordinarily difficult to properly merge with an existing physics engine.

The primary issue, however, is that the most benefit does not come from separating CPU and GPU intensive tasks, but rather from processing as much information as possible concurrently. This is why it is more valuable to have a vastly simpler single-threaded pipeline where physics and graphics are individually parallelized to maximize CPU usage. This scales way better because now each added core will speed up the program, instead of the set increase in speed that a complex multithreading solution will involve.

For that matter, if you are going to multithread something, doing something complicated like this is a bad way of doing it. If you require a custom physics engine in the first place, then you should package physics along with graphics and use a task-based package model that simply concurrently processes everything at the same time. Your method isn't worth the complexity because there are superior ways of handling it. A good example is Civ V's multithreaded pipeline that packages render calls up into a task manager as needed, which achieves truly astounding performance without a super complicated world state.

Also, spinning on Sleep(0) for a thread is not recommended. It is usually a better idea, whenever possible, to simply put a thread to signaled sleep using SleepEx(INFINITE,true) and then using QueueUserAPC to wake it up. Windows can usually activate the thread in under 37 microseconds. Sleep(0) lets it wake up faster, but is not an effective way to relieve stress on the CPU.

2. It is complicated as hell, I'll give you that. But I think you're misunderstanding what the proposed solution is trying to solve.

The render loop is should be as quick as possible, but multithreading adds a lot of overhead. You don't want to perform a lot of mutex checks in the renderloop. The render loop should be as mutex-free as possible.

The game/simulation must be deterministic (one of the goals from slapware.eu). So "position.X = position.X * elapsedGameTime" is not allowed, because depending on render speed for motion calculation makes it non-deterministic. You might think this is overkill, but determinism is really great for multiplayer games. If you'll shoot a grenade you often see some sort of latency because the server sends lots and lots of position updates for the grenade. With determinism you don't need to spam your internet connection with position-updates. Having a multithreaded implementation that allows for FPS-drops (and absurd FPS like over 9000) without slowing down the game and still being deterministic is a difficult problem to solve and hence the difficult solution. If you know of any other solution I really like to know about it, I find this stuff intriguingly interesting!

The solution does not spin on Sleep(0), that would be a seriously bad implementation (imho). The site discusses the how thread-switching may cause stuttering and the two proposed solutions are partial thread synchronization and motion extrapolation.

If you are using a custom physics engine, you'll just plug it into the update-thread. It is as easy to integrate as in any other single threaded method.

Using a 'task-manager' is the obvious way of multithreading and the solution from http://blog.slapware.eu/ still allows for such implementation. Usually you'll create a producer-consumer queue, and generate tasks that your background threads will handle. If you are adding this to the solution from slapware then the update-thread will be the 'producer' and he will spawn a couple of background threads (consumers) to calculate physics, AI and other CPU related stuff.

At least you should have a separate producer-consumer queue for GPU-related stuff and CPU related stuff. If you optimally want to use your GPU you need 1 thread for all the drawing (heavy on GPU, light on the bus) and one thread for updating models and textures (light on GPU, heavy on the bus). Assuming you have only one GPU. And not to subtract from the achievements of CIV4, a great game, but it is not one of the most complicated games for an engine to handle. (semi-)top-down is relatively easy especially if the game is semi-2D, even if you support LOD-algorithms for awesome infinite zoom-out.

Luckily, the complicated stuff from the proposed solution is only in the engine and not in the game. It is still possible to do everything you want without writing absurdly complicated code when creating a game.

3. You appear to misunderstand what reality is.

1. If you are using mutexes at all in your game engine, you have already fucked yourself beyond all hope of redemption. You should NEVER implement any algorithm that doesn't use some form of lockless or at least obstruction-free implementation.

2. Determinism is impossible. Even if floating point errors were not an issue, *networking itself* guarantees that you will have consistency issues no matter what you do, because packet travel times are random, erratic, and sometimes they don't bother showing up at all. Even if they were perfectly reliable, lag ensures the world state will always be inconsistent with itself no matter what you do. The potential inconsistencies that can arise in a threaded actor-model concurrent system are extremely minor when handled properly, and as a result can simply be dealt with through the same correction algorithms that must deal with floating point inconsistencies and errant packets.

I can make a concurrent game engine that is highly efficient and won't suffer from any significant FPS drops, yet is vastly simpler than what you propose here, because you solve the problem from the wrong end. The future of games will be in engines that use actor-based concurrency models by packaging object information into immutable packets and processing them all using a swath of worker threads using multiple lockless queues that have virtually zero overhead and a single sync point per frame. This is simple, elegant, and is vastly more robust than the hellish complexity you propose. I can even do it for a 2D engine, where the rendering queue is strictly ordered. A 3D game is even easier to pull off, because you can render things wildly out of order using the z-buffer and clipping spaces.

This problem is not answered by creating separate threads that do specific things, its done by evaluating everything at the same time in small bursts. This will keep both the CPU and the GPU saturated with tasks without significant complexity. I was speaking of Civ V, not Civ IV, and Civ V pulled more out of the GPU than anyone else EVER DID. So unless you are saying your engine is literally faster than every other engine in the entire industry, because Civ V's engine is, you really shouldn't be talking.

And yes, that includes Unreal Engine 4, which simply does more fancy things, its raw rendering speed can't beat Civ V.

6. I sense you too have a lot of knowledge about the subject, which makes this an interesting discussion. I'll try not to make it into some sort of flame-war between two techniques, because I really like to read what you have to say :)

1. Mutexes
I totally agree with you. Mutexes should be prevented wherever possible.

2. Determinism is possible.
Floating point errors occur, true, but if the floating point errors are consistent then there isn't any problem. This is also possible cross-platform because floating-operation are standardized. Most compilers even have flags to allow you to change the way that floating point are handled.
Network issues are no problem either, unless you use UDP (or some other protocol). For TCP/IP the order of the packets are guaranteed. This does not mean that packets arrive at your computer at the same time, but TCP uses a buffer and automatically asks for re-transmissions if needed. Your application will receive all packets in order. If there are any problems (some packet will never be received, even after asking for re-transmissions) the connection will just be dropped. Packet corruption is handled by check-sums (in both ip-packet-header and tcp-packet-header)

The complexity of a solution is never a real issue for me because I enjoy figuring out what the best approach is (for my situation). The technique I used is not as far fetched as you might think. It is a technique called triple buffering (not to be confused with triple buffering on the video card). CryENGINE 3 is another one of those great tripple-A game engines. I'm not going to compare these engines, lets just say that most these engines are pretty darn good. I'm certainly not going to compare something I have written in 1~2 weeks with a tripple-A engine. Especially because I have written this implementation in c#, lol. It is very difficult to look at a certain engine and determine what techniques they apply. They often limit themselves to saying things like "we are supporting volumetric clouds" or "we have a taskmanager to optimally use all cores in your CPU" without diving into the specifics of 'how' it is implemented. I wouldn't be surprised if these engines are also able apply a form of tripple buffering on render states.

I also agree with you that you'll need some sort of task manager to optimally use cores in your CPU/GPU. And the technique you describe seems pretty solid. We actually agree on most topics except for the one where I prefer a separation of 'task-manager' for GPU and CPU, while you just say that just one task-manager for both is good enough. I also use triple render-state buffering, but I probably wouldn't have implemented that if I didn't need determinism.

A task-manager with 50 workers for a dual-core CPU is silly as you probably agree with me. There should be some balance between the nr. of cores and the number of threads. The GPU and CPU are intrinsically different. They contain a different number of cores and the are connected through a bus that only allows one 'message' to be send to it at a time (protected by mutexes in the video card driver). Since the bus is protected with mutexes it should also seem silly to have more than 1 threat to try and send a message to the video card. Because they are so different it makes more sense to separate the queues so we are able to fine tune these queues (perhaps even setting thread-priorities).

1. Oh, P.S. Determinism in games is not outrageously naive. For both network-games and replay-files (For example Starcraft II). If you'd like to read about it http://www.aorensoftware.com/blog/2011/01/28/determinism-in-games/ seems to explain the basics pretty well.

2. In my implementation, just like Civ V, I queue rendering calls into a single thread that sends them to the GPU as fast as possible. You'd do this no matter what technique you were using. The precalculations necessary are what the worker threads do. There are always as many worker threads as there are logical cores - for an i7 you'd have 8.

Triple buffering is a standard technique, but my technique implements something akin to it on a more distributed level. Each actor is, by itself, considered immutable, so a copy of it is made upon which calculations are done, and when this calculations are finished, the result is committed by flipping a pointer to ensure atomic usage of information across all actors. I feel like I should point out that your implementation is a particularly horrendous complication of triple buffering. Everyone uses something like that, its the only way to get anything done.

You completely missed that last point I made on determinism - even if all floating point calculations are executed perfectly and all packets arrive as expected, lag ensures that inconsistent situations will always crop up, and those inconsistencies will be several orders of magnitude more severe than any inconsistency that would occur due to out-of-order processing of objects within a single frame. You are trying to solve a problem that isn't a problem. You seem to think that my approach significantly sacrifices determinism when it really doesn't. You should be more worried about what happens when one player shoots another as he blows himself up in an explosion large enough to kill the first, which is exactly what that article you linked to points out, and in turn this implies that any indeterminacy created by my technique is manageable! So by definition the insanity you have created here isn't even necessary!

3. You keep referring to the CIV V engine so you probably have read a lot about it. Could you refer me to the website/pdf that explains how they have implemented their rendering loop and worker queues? I can't seem to find it.

I am well aware of possible latency issues (and solutions) between client and server, but all those issues can be solved by letting the client's extrapolate game states and make a prediction of the changes caused by local actions. The difficulty lies in restoring game states on wrongful predictions, which for some games can be solved by sending a complete game state (on conflict only). For non-deterministic game engines this is way more difficult because each client's game state is by definition almost always wrong. To solve this for non-deterministic game engine you'll need to send position-information pretty much non-stop to prevent out-of-sync weirdness. So really, I do understand. I'm just saying that latency issues are not a good excuse to drop determinism.

The "problem" I tried to "solve" is creating a non blocking multithreaded deterministic game loop. You probably agree with me that this isn't easy to accomplish, just a couple of hour ago you claimed that determinism is impossible. You keep saying that I "missed the point" or that I "misunderstand" and calling my implementation "horrendous". Why are you saying such provocative things? Most of them not even true:
- the sleep(0) loop
- impossibility of non-determinism
- floating point errors make it non-determinsitic
- network packets arriving in the wrong order
- very difficult to integrate external physics module

I've shown you a different method of applying multithreading and it 'does' have advantages, even if you don't agree with me. This is the pseudo-code of my render loop. The only mutex is used when switching states.

while (true) {
RenderState latestUpdatedRenderState = world.GetLatestUpdatedRenderState(); /* Quick thread-safe pointer switch */
Draw(latestUpdatedRenderState);
FlipFrontAndBackBuffer();
}

I don't have to wait for physics calculations to complete, I can just render a new frame. No waiting, ever! Can't get much faster then that right? It is even possible to render at a higher FPS then the update-FPS because all positions are automatically extrapolated and inconsistencies are solved automatically. Not so "horrendous" now, is it? Even if the game is rendering at a meager 10FPS, the game speeds remains steady while still being deterministic.

Maybe we just have different goals for or engine. Especially for game-engines, there are seldom situations where a "one solution fits all" can be applied.

4. You just said you tried specifically to solve the problem of determinancy, when I argue that it is not a significant problem in the first place, due to it needing to be solved elsewhere, and thus you are needlessly solving a problem that doesn't exist.

Non determinism is STILL impossible! All you have done to defend against this point is say that it can be managed, which is exactly what I pointed out.

I call your method horrendous because the render thread itself is attempting to extrapolate out positions, which is a big no-no, and you use a complicated object hierarchy, plus the internal design of the engine is complex and therefore susceptible to incredibly bizarre bugs. In fact, everything you do causes the exact problems that made me throw out my own multithreaded implementation due to potentially inconsistent physics states and lag. So if you don't consider those things to be a problem, then why would you offer your engine as an alternative to mine? Mine does nearly the same thing and has the same problems.

You basically did the same thing I did except in a much more roundabout, complicated way. You solved nothing.

5. I think you meant "determinism is impossible". It is not about 'managing determinism', it either is, or isn't deterministic. Simple as that. It not something you 'solve'. The best way for being deterministic is by creating a fixed time step for updating the world (e.g. applying physics 60 times a second, even if the render-FPS is different). I find it really strange that you talk about the impossibility of determinism, we are talking about the same thing right?. Just download the sample from my website and see that at update-frame #1000 the position of the balls are always the same (no matter at what FPS you're running at). Determinism is possible. If you claim it isn't please give me an example.

You say extrapolation of position in the render thread is a big no-no. There is absolutely nothing wrong that, I don't know where you got that idea. There is nothing wrong it as long as you don't actually modify the state (which I don't).

You also call the internal design of my engine complex to the point it is susceptible to bizarre bugs. Complexity is an "eye of the beholder" thing. I have no problem understanding what is going, maybe it's too complex from your point of view, but not from mine. However, it is not something I recommend to anybody that is trying to make a game (it is better to use an existing engine).

My solution does not have any problems so, I'm not sure why you are saying that we share the same problems.

If you manage to convince me that you model is also deterministic, while being able to render at any FPS (without a slow down of the game) than you'll convince me that your simpler implementation is probably the better one. If you don't care about a deterministic game (which you don't, because you don't even believe in the existence of it) then my way of approaching things must seem unnecessary complex. I totally agree. If you don't care about stuff like that, then I too would have used a simpler method.

P.S. Do you have that link to the CIV V Engine? I really am interested in reading that.

6. This discussion inspired me to type up some stuff of why determinism is a nice thing to have.
http://blog.slapware.eu/game-engine/programming/deterministic-behavior/

I've also shown evidence that my engine is deterministic, so I assume this will convince you that determinism is indeed possible.

P.S. this blog-page is not yet officially published so it is subject to change. I still in the process of creating a new version of the engine...

7. IF that is your definition of determinism, then every single solution I have ever built for this, including every single solution I will ever build, will be deterministic, and therefore I have absolutely no idea how ensuring that your engine was deterministic had any effect on your engine design.

Extrapolation on render thread == things go through walls before you realize they hit them. This is one of the reasons I abandoned several of my previous attempts, because of the potentially invalid physics state that would be rendered, even if the physics state itself was always consistent. Rather than solve that problem, you simply make it worse.

I keep pointing out problems your solution has, yet you continue to insist they are not problems.

Here's how you guarantee determinism in my model: Make all calculations depend on the previous frames' stale data. Ensure data gathering is done on the frame start sync step. Poof, its now deterministic. Of course, it now has a frame of lag, too, just like your solution does, even with extrapolation of positions because it is physically impossible for it to account for input it doesn't know about.

Civ V link: http://developer.amd.com/gpu_assets/Firaxis%20LORE.pps

7. "If that is your definition of determinism"? Come on, it is a normal English word with a clear-cut definition. It is not like I'm making up words here. It's also really important for games (not all type of games though), so I thought you would be fairly familiar with the term.

My extrapolation does not invalidate the physics state. Assuming the update-thread runs at 60 FPS then at worst case the extrapolation will account for max. 16.6 ms of movement (and ~8.3 ms on average). This extrapolation is only used for rendering and not for modifying the physics state. If something goes through a wall because of extrapolation then at the next update frame it will already have been set to the right coordinates.

You keep pointing out problems where there are none, yet you continue to insist that there are. The comments you make, like the one on invalidating a physics state, makes me doubt you have actually read it. I get the impression that you just have glanced over it and thought that it's just some overly complicated jibberish. I don't mind if you don't want to read it, but please don't come up with obviously wrong 'problems' that my engine has.

If you make all calculation depend on the previous frame and gather input at start of the frame, then you're missing one important step. The physics should run at a fixed steps per second, shall we say 60FPS? Now imagine, if you will, your game being rendered on a 120 Hertz monitor. If you decide to render at 120 FPS you're only rendering 60 new frames per second. Any discrepancy between update-FPS and render-FPS will lead to nasty side effects! Simply increasing how many times you update per second won't help you that much. For your engine it is really weird to fix the update-thread to a certain FPS, as they are just 'jobs added to the queue'.

The best thing you said is "I have absolutely no idea how ensuring that your engine was deterministic had any effect on your engine design.". Maybe this is the reason why I have such a difficult time explaining to you how determinism should be enforced. On the other hand you say "every single solution I have ever built for this [...] will be deterministic", another obviously wrong statement. If you render your game at 1000 FPS how often you call your physics library? 1000 per second times right? You want to update the positions 1000 times otherwise rendering with such a high speed has no real benefit. Calculating physics with a time step determined by rendering speed will break determinism. Box 2D physics are only deterministic of the timestep is constant..

Civilization 5 is a turn based game which simplifies creating network-protocols and replay-files, so there is no real added benefit for adding determinism in such a game. This is why I stated previously that the engine for Civilization 5 has it very easy. An RTS has way more hurdles to overcome so I'm more impressed with the engine used in Supreme Commander 2 and Starcraft 2, even if they don't look as beautiful. It is ignorant to think that CIV's approach is a one-solution-fits-all, which is exactly why I've posted the links on a deterministic non-blocking render loop.

If you'd like, I can come up with some resources that explain what determinism is, why it is important for (certain type of) games, why it is difficult to have determinism without fixing render-FPS and how determinism helps in simplifying network protocols for games (among other things).

Do you now understand why your game engine is not able to be deterministic without severe consequences?

Thanks for the link on CIV5's engine presentation. I really wish I was there to hear about the details :) Looking at those sheets really makes me want to continue experimenting with game engine techniques.

8. I didn't SAY your extrapolation invalidated the physics state, I said that you RENDER an invalid physics state!

I keep pointing out problems and then you keep misinterpreting them.

Uh, every single solution I have ever written involves constant timesteps. Because I use Box2D. So yes, it is all deterministic. Sorry. Doing physics with a constant timestep is as simple as using box2D's accumulation method, which my method would do, and furthermore I argue that trying to render frames in between physics calculations is worthless because the extrapolation simply introduces artifacts, and if you can't render physics at 60 FPS you're screwed anyway. Hence, physics are blocking, and if they aren't blocking you either 1. aren't deterministic or 2. are extrapolating positions widely and are therefore rendering frames with an invalid physics state, even if the internal physics state is correct.

So I contend that trying to render faster than the physics timestep is useless, and therefore your solution is useless.

9. rendering an invalid physics state when using extrapolation is sporadic and if it happens it probably doesn't even take 10ms. So you are telling me that you are limiting your game to 60FPS just to be deterministic? Wow, gamers with a monitors with any other frequency than 60 Herz must be pretty disappointed with your engine. Wait, even if you do not want to be deterministic you are using fixed time steps? You are aware that it is possible to use dynamic length time steps so the physics keep up with the render FPS, right?

Let me guess what you are going to say: "If the player wants 120 Hz, I'll just crank up the physics to 120 FPS", showing that you still don't get what we are trying to achieve by ensuring determinism.

I 'can' render faster then the physics time step. If physics runs at a meager 20FPS, everything still runs smoothly (because of a render FPS of 1000+ FPS and extrapolation) I can't even see those nasty physics artifacts you seem to dislike so much. At an update of 60 FPS this becomes even less of a problem. If you can call it a problem, as it isn't even noticeable.

1. I have actually implemented dynamic timesteps before but I keep switching between dynamic and constant because I'm not sure which one has the most benefits yet. The primary reason I wasn't using dynamic timesteps was because I wasn't sure what kind of consequences it might have for the physics calculations. It wasn't much a matter of ensuring determinism as simply not caring about determinism and being more concerned about not destabilizing the physics simulation by using timesteps that are so small they blow up the floating point inaccuracies.

You can render faster than the physics time step and I think its useless. If your physics aren't rendering at a sufficiently high rate, everything else will suffer.

As far as I can tell, you have a more nuanced understanding of determinism than I do, however, in your efforts to ensure it, you have made multiple sacrifices that I was not willing to make even for a nondeterministic multithreaded scenario. Hence, even if I was trying to enforce determinism, I would not accept your solution, because I do not want to render an inconsistent physics step and I will not accept an additional frame of input lag.

I have, however, taken what you have said here to make a few small adjustments to my engine to ensure it actually is deterministic, since a primarily single-threaded engine should probably be deterministic anyway.

2. We clearly have a different opinion on where to prioritize. Lets just sum up the differences and call it a day.

1. Your engine uses multithreading that is easier to implement.
2. Your engine does not have a 1 frame (avg. ~8.3ms) lag for parsing input
3. Your engine will never cause physics artifacts (mine will sporadically have a render-artifact for one frame, most of the time not even visible depending on how hit boxes are created)

1. My engine has a faster render loop (does not have to wait on anything, ever, even if physics or AI turned out to be really slow in a CPU heavy game)
2. My engine is able to run in any FPS I like without having to sacrifice determinism (important for RTS games)
3. My engine will not slow down the game and physics, even if we are rending at 2FPS (important for network games)

Anything I forgot?

Oh P.S. If you are really interested an low-latency rendering I recommend reading articles/interviews/... from John Carmack (creater of doom). He is is still actively developing and is by many people considered to be a God of game engine optimizations. He has a hobby for Virtual Reality, for which any input latency will make you puke.

3. I already pay attention to John Carmack but lately what he's been delving into separates from my own personal avenues of research. In particular he seems to be jumping on the texel-based rendering bandwagon, which I have many objections to.

I would like to note that my engine can still "run" at any FPS I want with a constant physics timestep, its just that no physics objects will move during the update. Technically, animations and other things would still run, and in fact I could implement my own interpolation if I wanted to, but it would still be deterministic.

At this point I'm going to make it so my engine can either use a variable timestep or a constant timestep, which then lets the game using the engine choose whether or not it wants to be deterministic.

10. Besides the 'flame-war' going on.

There are actually many disadvantages from such a multithreaded architecture, but I currently still like it.

The need of conceptually dividing the functionality turns out to be positive for maintainability in my case. The borders between the modules are hard to overcome and therefore you can't take the lazy path of just hacking something in between. A lot of trouble comes from adding fast features that are not separated enough. So in general I found that it makes it easier to work on individual problems. Still the overhead is large.

The general problem is that writing a multithreaded game engine isn't just a work of a few days, therefore the actual performance differences are hard to measure. My current architecture runs with 4 different modules which are basically communicating via double-buffering. And I would also encourage others to pursue a similar path, if they can live with a steep curve of getting things together in the first place. I blieve that separating really pays off when going further along the road. Currently I wouldn't go back to a single threaded/module approach. Still it is possible that my opinion may change in the future.

1. My graphics engine is required to run entirely by itself. I use box2D, which is an encapsulated physics engine. I therefore consider the division of labor to be fairly well enforced. However, I discovered that there is such a thing as *too much* encapsulation in that it really starts to get in the way of more practical solutions. Consequently there are a few places where I deliberately tie things together very closely because doing it any other way has a tendency to create more problems than it solves. CEGUI is a great example of trying to do too much with too much isolation of functionality, rendering it a bloated, catastrophic mess.