August 25, 2010

WavSaver

There is a documented bug in windows 7 that has pissed me off a few times and recently crippled a friend of mine, where a .wav file with corrupted metadata causes explorer.exe to go into an infinite loop. My friend has a large collection of wavs that somehow got corrupted, so I wrote this program to strip them of all metadata. Due to the nature of the bug, the program can't delete them (you must use the command prompt to do that), but rather creates a folder called "safe" with all the stripped wav files inside of it.

Hosted in case anyone else has corrupted wav files they need to save. Just stick it inside a folder and run it - it'll automatically strip all wav files in the same folder as the executable.

WavSaver

August 17, 2010

Pixel Perfect Hit Testing

After beating World of Goo after stabilizing things in my game and renaming it, I wondered how easy it was to decompile C# applications and simultaneously thought this would be a great opportunity to get pixel perfect hit testing to work on my engine. So, I decompiled GearGOD's composition example and quickly discovered that his method of detecting mouse messages was... well something completely different then his extremely bad attempt at explaining it to me had suggested.

Basically, he did not run into the window event issues that I was having because... he didn't use them. XNA keeps track of the mouse coordinates in its own separate update function, most likely using its special input hook, and hence there is no mousemove to keep track of. Instead of occurring when the user moves the mouse, the hit tests occur every single frame.

Hence, once you have utilized WS_EX_TRANSPARENT|WS_EX_COMPOSITED|WS_EX_LAYERED to make your window click-through-able, you then simply do a hit test on a given pixel after everything has been drawn, and swap out WS_EX_TRANSPARENT depending on the value. GetCursorPos and ScreenToClient will get the mouse coordinates you need, although they can be off your app window entirely so check for that too.
if(_dxDriver->MouseHitTest(GetMouseExact()))
  SetWindowLong(_window,GWL_EXSTYLE,((GetWindowLong(_window,GWL_EXSTYLE))&(~WS_EX_TRANSPARENT)));
else
  SetWindowLong(_window,GWL_EXSTYLE,((GetWindowLong(_window,GWL_EXSTYLE))|WS_EX_TRANSPARENT));
To get the pixel value, its a bit trickier. You have two options - you can make a lockable render target, or you can copy the render target to a temporary texture and lock that instead. The DirectX docs said that locking a render target is so expensive you should just copy it over, but after GearGOD went and yelled at me I tested the lockable render target method, and it turns out to be significantly faster. Futher speed gains can be achieved by making a 1x1 lockable render target and simply copying a single pixel from the backbuffer into the lockable render target and testing that.
void cDirectX_real::ActivateMouseCheck()
{
  if(_mousehittest) _mousehittest->Release();
  DX3D_device->CreateRenderTarget(1,1,_holdparams.BackBufferFormat,D3DMULTISAMPLE_NONE,0,TRUE,&_mousehittest,NULL);
}
bool cDirectX_real::MouseHitTest(const cPositioni& mouse)
{
  RECT rect = { mouse.x,mouse.y,mouse.x+1,mouse.y+1 };
  DX3D_device->StretchRect(_backbuffer,&rect,_mousehittest,0,D3DTEXF_NONE);

  if(mouse.x<0 || mouse.y < 0 || mouse.x > (int)_width || mouse.y > (int)_height)
    return false; //off the stage
  D3DLOCKED_RECT desc = { 0,0 };
  if(FAILED(_mousehittest->LockRect(&desc, 0,D3DLOCK_READONLY)))
    return true;

  unsigned char color = (*((unsigned long*)desc.pBits))>>24;
  _mousehittest->UnlockRect();
  return color>_alphacutoff;
}
Using this method, the performance hit is 620 FPS to 510 FPS at 1280x1024, which is fairly reasonable. However, my Planeshader SDK example is still at 0.9.71, which does not have this updated, fast version, so it will be using a much slower method to do it. The end result is the same though.

August 10, 2010

8-bit color cycling

Someone linked me to this awesome webpage that uses HTML5 to do 8-bit palette color cycling using Mark Ferrari's technique and art. I immediately wanted to implement it in my graphics engine, but soon realized that the technique is so damn old that no modern graphics card supports it anymore. So, I have come up with a pixel shader that creates the same functionality, either by having one image with an alpha channel containing the palette indices and a separate texture acting as the palette, or you can combine them into a single image. This is supposed to support variable palette sizes (up to 256) but I haven't had much ability to test the thing because its so damn hard to get the images formatted correctly. So while all of these variations i'm about to show you should work there is no guarantee they necessarily will.

Video Link

8-bit cycling multi-image

ps 2.0 HLSL
// Global variables
float frame;
float xdim;
float xoff;

// Samplers
sampler s0 : register(s0);
sampler s1 : register(s1);

float4 ps_main( float2 texCoord : TEXCOORD0 ) : COLOR0
{
float4 mainlookup = tex2D( s0, texCoord );
float2 palette = float2(mainlookup.a*xdim + xoff,frame);
mainlookup = tex2D(s1, palette);
return mainlookup;
}


ps 1.4 ASM
ps.1.4
texld r0, t0
mad r0.x, r0.a, c1, c2
mov r0.y, c0
phase
texld r1, r0
mov r0, r1

It is also possible to write the shader in ps.1.1 but it requires crazy UV coordinate hacks.

frame is a value from 0.0 to 1.0 (ps.1.4 will not allow you to wrap this value, but ps.2.0 will) that specifies how far through the palette animation you are.
xdim = 255/(width of palette)
xoff = 1/(2*(width of palette))

Note that all assembly registers correspond to a variable in order of its declaration. So, c0 = frame, c1 = xdim, c2 = xoff.

8-bit cycling single-image

ps 2.0 HLSL
// Global variables
float frame;
float xdim;
float xoff;

// Samplers
sampler s0 : register(s0);

float4 ps_main( float2 texCoord : TEXCOORD0 ) : COLOR0
{
float4 mainlookup = tex2D(s0, texCoord );
float2 palette = float2(mainlookup.a*xdim + xoff,frame);
mainlookup = tex2D(s0, palette);
mainlookup.a = 1.0f;
return mainlookup;
}


ps 1.4 ASM
ps.1.4
def c3, 1.0, 1.0, 1.0, 1.0
texld r0, t0
mov r1, r0
mad r1.x, r1.a, c1, c2
mov r1.y, c0
phase
texld r0, r1
mov r0.a, c3

frame is now a value between 0.0 and (palette height)/(image height).
xdim = 255/(image width)
xoff = 1/((image width)*2)

24-bit cycling

ps 2.0 HLSL
// Global variables
float frame;
float xdim;
float xoff;

// Samplers
sampler s0 : register(s0);
sampler s1 : register(s1);

float4 ps_main( float2 texCoord : TEXCOORD0 ) : COLOR0
{
float4 mainlookup = tex2D( s0, texCoord );
float2 palette = float2(mainlookup.a*xdim + xoff,frame*fElapsedTime);
float4 lookup = tex2D(s1, palette);
return lerp(mainlookup,lookup,lookup.a);
}


ps 1.4 ASM
ps.1.4
def c3, 1.0, 1.0, 1.0, 1.0
texld r0, t0
mov r2, c0
mad r2.x, r0.a, c1, c2
phase
texld r1, r2
mov r0.a, c3
lrp r0, r1.a, r1, r0


Variables are same as 8-bit multi-image.

All this variation does is make it possible to read an alpha value off of the palette texture, which is then interpolated between the palette and the original color value. This way, you can specify 0 alpha palette indexes to have full 24-bit color, and then just use the palette swapping for small, animated areas.

If I had infinite time, I'd write a program that analyzed a palette based image and re-assigned all of the color indexes based on proximety, which would make animating using this method much easier. This will stay as a proof of concept until I get some non-copyrighted images to play with, at which point I'll probably throw an implementation of it inside my engine.

August 8, 2010

P != NP

A paper has been published proving that P != NP.

http://www.scribd.com/doc/35539144/pnp12pt

So far, peer review has confirmed this finding and no errors have been found. This paper answers a question that has been called the most important question in the field of computer science.

http://en.wikipedia.org/wiki/P_versus_NP_problem

P != NP has long been suspected by the majority of computer scientists, due to the fact that no polynomial time algorithms have been discovered for more then 3000 known NP complete problems. The consequences of proving either answer are explained in the paper's introduction:

Later, Karp [Kar72] showed that twenty-one well known combinatorial problems, which include TRAVELLING SALESMAN, CLIQUE, and HAMILTONIAN CIRCUIT, were also NP-complete. In subsequent years, many problems central to diverse areas of application were shown to beNP-complete (see [GJ79] for a list). If P!=NP, we could never solve these problems efficiently. If, on the other hand P=NP, the consequences would be even more stunning, since every one of these problems would have a polynomial time solution. The implications of this on applications such as cryptography, and on the general philosophical question of whether human creativity can be automated, would be profound.

If this paper continues to stand up to peer scrutiny and is declared a valid answer to the P vs. NP problem, this could very well mark a major historic moment in computer science and mathematics.

August 6, 2010

Physics Networking

I'm still working on integrating physics into my game, but at some point here I am going to hit on that one major hurdle: syncing one physics environment with another that could be halfway across the globe. There are a number of ways to do this; some of them are bad, and some of them are absolutely terrible.

If any of you have played Transformice, you will know what I mean by terrible. While I did decompile the game, I never bothered to examine their networking code, but I would speculate that they are simply mass-updating all the clients and not properly interpolating received packets. Consequently, when things get laggy, everyone doesn't just hop around, they completely clip through objects, even ones that they should never clip though no matter what the other players are doing.

The question is, how do you properly handle physics networking without flooding the network with unnecessary packets, especially when you are dealing with very large numbers of physics objects spread across a large map with many people interacting in complex ways?

In a proper setup, a client processes input from the user, and sends this to the server. While it's waiting for a response, it will interpolate the physics by guessing what all the other players are doing. If there are no other players this interpolation can, for the most part, be considered to be perfectly accurate. This will generally hold true if there are players that are far enough away from each other they can't directly or indirectly influence each other's interpolation.

Meanwhile, the server receives the input a little while later - say, 150 ms. In an ideal scenario, the entire physics world is rewound 150 ms and then re-simulated taking into account the player's action. The server then broadcasts new physics locations and the player's action to all other clients.

All the other clients now get these packets another 150 ms later. In an ideal scenario, all they would need to know is that the other player pressed a button 300 ms ago, rewind the simulation that much, then re-simulate to take it into account.

We are, obviously, not in an ideal scenario.

What exactly does this entail? In any interpolation function, speed is gained by making assumptions that introduce possible errors. This error margin grows as the player has more and more objects he might have interacted with. However, this also applies to each physics object against each other physics object, and in turn each other physics object from that. Hence, this boils down to the math equation: p = nn!. That is a worst case scenario. Best case scenario is that none of the physics objects interact with each other, so error margin p = n and is therefore linear.

Hence, we now know that interaction with physics objects - or possibly, any sort of nonuniform physical force, like an explosion - is what creates the uncertainty problem. This is why the server always maintains its own physics world that is synced to everyone else, so that even if its not always right, its at least somewhat consistent. The question is, when do we need to send a physics packet, and when do we not need to send one?

If the player is falling through the air and jumps, and there is nothing he could possibly interact with, we can assume that any half-decent interpolation function will be almost perfect. Hence, we don't actually have to send any physics packets because the interpolation should be perfect. However, the more possible objects that are involved, the more uncertain things get and the more we need to send physics updates in case the interpolation functions get confused. In addition, we have to send packets for all affected objects as well. However, we only have to do this when the player gives input to the game. If the player's input does not change, then he is obeying the interpolation laws of everyone else and no physics update is needed, since the interpolation always assumes the player's input status has not changed.

Hence, in a semi-ideal scenario, we only have to send physics packets for the player and any physics objects he might have collided with, and only when the player changes their input status. Right?

But wait, that works for the server, but not all the other clients. Those clients receive those physics packets 150 ms late, and have to interpolate those as well, introducing more uncertainty that cannot be eliminated. In addition, we aren't even in a semi-ideal scenario - the interpolation functions become more unreliable over time regardless of the player's input status.

However, this uncertainty itself is still predictable. The more objects a player is potentially interacting with, the more uncertain those object states are. This grows exponentially when other players are in the mix because we simply cannot guess what they might do in those 150 ms.

Consequently, one technique would be to send physics updates for all potentially affected objects surrounding the player whenever the player changes their input or interacts with another physics object. This alone will only work when the player is completely alone, and even then require some intermediate packets for higher uncertainty. Hence, one can create an update pattern that looks something like this:

Physics packet rapidity: (number of potential interacting physics objects) * ( (other players) * (number of their potential interacting objects) )

More precise values could be attained by taking into account distance and understand exactly what and how the interpolation function is working. Building an interpolation function that is capable of rewinding and accurately re-simulating the small area around the player is obviously a very desirable course of action, because it removes the primary source of uncertainty and would allow for a lot more breathing room.

Either way, the general idea still stands - the closer your player is to a physics object, the more often that object (and the player) need to be updated. The closer your player is to another player, the update speed goes up exponentially. And always remember to send velocity data too!

I will make a less theoretical post on this after I've had an opportunity to do testing on what really works.