September 5, 2011

The Problem of Vsync

If you were to write directly to the screen when drawing a bouncing circle, you would run into some problems. Because you don't do any buffering, your user might end up with a quarter circle drawn for a frame. This can be solved through Double Buffering, which means you draw the circle on to a backbuffer, then "flip" (or copy) the completed image on to the screen. This means you will only ever send a completely drawn scene to the monitor, but you will still have tearing issues. These are caused by trying to update the monitor outside of its refresh rate, meaning you will have only finished drawing half of your new scene over the old scene in the monitor's video buffer when it updates itself, resulting in half the scanlines on the screen having the new scene and half still having the old scene, which gives the impression of tearing.

This can be solved with Vsync, which only flips the backbuffer right before the screen refreshes, effectively locking your frames per second to the refresh rate (usually 60 Hz or 60 FPS). Unfortunately, Vsync with double buffering is implemented by simply locking up the entire program until the next refresh cycle. In DirectX, this problem is made even worse because the API locks up the program with a 100% CPU polling thread, sucking up an entire CPU core just waiting for the screen to enter a refresh cycle, often for almost 13 milliseconds. So your program sucks up an entire CPU core when 90% of the CPU isn't actually doing anything but waiting around for the monitor.

This waiting introduces another issue - Input lag. By definition any input given during the current frame can only come up when the next frame is displayed. However, if you are using vsync and double buffering, the current frame on the screen was the LAST frame, and the CPU is now twiddling its thumbs until the monitor is ready to display the frame that you have already finished rendering. Because you already rendered the frame, the input now has to wait until the end of the frame being displayed on the screen, at which point the frame that was already rendered is flipped on to the screen and your program finally realizes that the mouse moved. It now renders yet another frame taking into account this movement, but because of Vsync that frame is blocked until the next refresh cycle. This means, if you were to press a key just as a frame was put up on the monitor, you would have two full frames of input lag, which at 60 FPS is 33 ms. I can ping a server 20 miles away with a ping of 21 ms. You might as well be in the next city with that much latency.

There is a solution to this - Triple Buffering. The idea is a standard flip mechanism commonly used in dual-thread lockless synchronization scenarios. With two backbuffers, the application can write to one and once its finished, tell the API and it will mark it for flipping to the front-buffer. Then the application starts drawing on the second, after waiting for any flipping operation to finish, and once its done, marks that for flipping to the front-buffer and starts drawing on the first again. This way, the application can draw 2000 frames a second, but only 60 of those frames actually get flipped on to the monitor using what is essentially a lockless flipping mechanism. Because the application is now effectively rendering 2000 frames per second, there is no more input lag. Problem Solved.

Except not, because DirectX implements Triple Buffering in the most useless manner possible. DirectX just treats the extra buffer as a chain, and rotates through the buffers as necessary. The only advantage this has is that it avoids waiting for the backbuffer copy operation to finish before writing again, which is completely useless in an era where said copy operation would have to be measured in microseconds. Instead, it simply ensures that vsync blocks the program, which doesn't solve the input issue at all.

However, there is a flag, D3DPRESENT_DONOTWAIT, that forces vsync to simply return an error if the refresh cycle isn't available. This would allow us to implement a hack resembling what triple buffering should be like by simply rolling our own polling loop and re-rendering things in the background on the second backbuffer. Problem solved!

Except not. It turns out the Nvidia and Intel don't bother implementing this flag, forcing Vsync to block no matter what you do, and to make matters worse, this feature doesn't have an entry in D3DCAPS9, meaning the DirectX9 API just assumes that it exists, and there is no way to check if it is supported. Of course, don't complain about this to anyone, because of the 50% of people who asked about this who weren't simply ignored, almost all of them were immediately accused of bad profiling, and that the Present() function couldn't possibly be blocking with the flag on. I question the wisdom of people who ignore the fact that the code executed its main loop 2000 times with vsync off and 60 times with it on and somehow come to the conclusion that Present() isn't blocking the code.

Either way, we're kind of screwed now. Absolutely no feature in DirectX actually does what its supposed to do, so there doesn't seem to be a way past this input lag.

There is, however, another option. Clever developers would note that to get around vsync's tendency to eat up CPU cycles like a pig, one could introduce a Sleep() call. So long as you left enough time to render the frame, you could recover a large portion of the wasted CPU. A reliable way of doing this is figuring out how long the last frame took to render, then subtracting that from the FPS you want to enforce and sleep in the remaining time. By enforcing an FPS of something like 80, you give yourself a bit of breathing room, but end up finishing rendering the frame around the same time it would have been presented anyway.

By timing your updates very carefully, you can execute a Sleep() call, then update all the inputs, then render the scene. This allows you to cut down the additional lag time by nearly 50% in ideal conditions, almost completely eliminating excess input lag. Unfortunately, if your game is already rendering at or below 100 FPS, it takes you 10 milliseconds to render a frame, allowing you only 2.5 milliseconds of extra time to look for input, which is of limited usefulness. This illustrates why Intel and Nvidia are unlikely to care about D3DPRESENT_DONOTWAIT - modern games will never render fast enough for substantial input lag reduction.

Remember when implementing the Yield that the amount of time it takes to render the frame should be the time difference between the two render calls, minus the amount of time spent sleeping, minus the amount of time Present() was blocking.


  1. Interesting. As a hardcore gamer i've always spent alot of time tweaking and trying to sync mouse input to the screen's refresh rate. It worked with Id's quake engines and the ps/2 rate utility back in the day.. but now we have "cpu swallowing" usb updating at 250/333hz ..

    About the cpu issue with vsync, i'd rather have 1 cpu core at 100%, than my graphics card doing 2000 frames when i only get to see 60 of them. For the environments sake ;)

  2. Unless you are rendering almost nothing on the screen, it is highly likely to be closer to 200 frames, which means two out of three frames would be discarded, which is much more reasonable. But of course, when you turn off vsync, your card would render 200 frames anyway, it would just try throwing them on to the screen as fast as possible. The only difference with triple buffering is that it discards a portion of the frames. So, the only way to reduce the number of frames the card is rendering is to turn on vsync and suffer input lag.

  3. Excellent article, I always used VSync because of the tearing, but did never know how exactly it is implemented.

    But is it 33ms to much waiting? I mean, for sure our internet connection has a very small ping for long distances, but how fast is the human brain to process image input?
    Think about the original cinema, the analogic old ones, the light that prints the current frame just 'blinks' at 20~30fps, and we don't notice that when looking to the screen.

    I did never program directly at a game engine so please forgive me if I say something ridiculous, but I guess another option would be something like "Best Effort"; the render thread would be completely independent from the input pool, and would try render sync to the monitor but asynchronously from input; at the worst scenario the render would skip some input, something like when using console game emulators and use big skip values.

  4. The input lag is very slightly noticeable when pressing a button, but primarily noticeable when you are dragging things around on the screen with the mouse. A hardware rendered mouse will reveal a very large and very obvious input lag, which is one of the primary motivators for trying to eliminate as much of it as possible. The fact that many people turn off vsync to get rid of the input lag indicates that it is noticeable.

  5. Guys m facing a problem in my laptop
    contrast or brightnes of my LED of LAPTOP increase automatically and after few seconds it come to normal state ././.. anyone having idea why this happen and what would b the solution
    thanks in advance