July 22, 2010

Assembly CAS implementation

inline unsigned char BSS_FASTCALL asmcas(int *pval, int newval, int oldval)
{
unsigned char rval;
__asm {
#ifdef BSS_NO_FASTCALL //if we are using fastcall we don't need these instructions
mov EDX, newval
mov ECX, pval
#endif
mov EAX, oldval
lock cmpxchg [ECX], EDX
sete rval // Note that sete sets a 'byte' not the word
}
return rval;
}

This was an absolute bitch to get working in VC++, so maybe this will be useful to someone, somewhere, somehow. The GCC version I based this off of can be found here.

Note that, obviously, this will only work on x86 architecture.

July 21, 2010

PlaneShader v0.9.7

PlaneShader is a high-speed 2D rendering engine, which at some point will have a lot of really cool features, but right now just has all the basics.

- Image manipulation
- Dynamic optimization
- Grouping and parent/child relationships
- Depth (z-axis)
- Advanced culling
- Tilesets
- Animation
- Sprite animation generation
- GUI system
- Masking
- Particle system (Not GPU based yet)
- Limited shader support as of v0.9
- Gradients
- Vista/win7 Desktop composition

Cooler features like a built in lighting system, per-image shaders and distortion will be put in later after I have a working demo of my game. I have not had the time to write a functional .net wrapper for the engine yet, sorry. Keep in mind that this is an alpha test, and while it probably doesn't have any serious leaks or bugs, I would not recommend using it in production.

Precompiled exes can be found in examples/bin.

PlaneShader.zip

July 17, 2010

Desktop Composition

I have succeeded in directly compositing my graphics engine to a vista window, using a dynamic DLL load so it doesn't cause any problems on XP. I also managed to create a situation where the window is entirely transparent and impossible to click, which was an absolute bitch to get working. For future reference, if you ever want a window that is click-through, use CreateWindowEx with both WS_EX_TRANSPARENT and WS_EX_LAYERED. Using only one will not work, nor will any sort of windows message handling. That is the ONLY WAY to make the window click-through. Consequently to do opacity based hittesting, you have to hook the mouse events for the entire desktop and manually notify your window after doing the calculations yourself (something I don't even want to think about).

Pics!




There are other options and you don't get to see the subtle adjustments I make to whether or not a window is draggable, or if its clickthrough etc, but it gives you a rough idea.

Now, on to stencil buffers and masking and maybe i'll release this thing only a week late! FFFFFFFFFFFFFFFFFFF-

July 13, 2010

Function Pointer Speed

So after a lot of misguided profiling where I ended up just testing the stupid CPU cache and its ability to fucking predict what my code is going to do, I have, for the most part, demonstrated the following:

if(!(rand()%2d)) footest.nothing();
else footest.nothing2();

is slightly faster then

(footest.*funcptr[rand()%2])();

where funcptr is an array of the possible function calls. I had suspected this after I looked at the assembly, and a basic function pointer call like that takes around 11 instructions whereas a normal function call takes a single instruction.

In debug mode, however, if you have more then 2 possibilities, a switch statement's very existence takes up almost as many instructions as a single function pointer call, so the function pointer array technique is significantly faster with 3 or more possibilities. However, in release mode, if you write something like switch(rand()%3) and then just the function calls, the whole damn thing gets its own super special optimization that reduces it to about 3 instructions and hence makes the switch statement method slightly faster.

In all of these cases though the speed difference for 1000 calls is about 0.007 milliseconds and varies wildly. The CPU is doing so much architectural optimization that it most likely doesn't really matter which method is used. I do find it interesting that the switch statement gets super-optimized in certain situations, though.