i965: Use Compr4 instruction compression mode on G4X and newer.
No statistically significant performance difference at n=3 with either
openarena or my GL demo, but cutting program size seems like a good
thing to be doing for the hypothetical app that has a working set near
icache size.
i965: Share most of the WM functions between brw_wm_glsl.c and brw_wm_emit.c
The PINTERP code should be faster for brw_wm_glsl.c now since brw_wm_emit.c's
had been improved, and pixel_w should no longer stomp on a neighbor to dst.
i965: Collect GLSL src/dst regs up in generic code.
This matches brw_wm_emit.c, which we'll be using shortly. There's a
possible penalty here in that we'll allocate registers for unused channels,
since we aren't doing ref tracking like brw_wm_pass*.c does. However, my
measurements on GM965 don't show any for either OA or UT2004 with the GLSL
path forced.
mesa: Reduce the source channels considered in optimization passes.
Depending on the writemask or the opcode, we can often trim the source
channels considered used for dead code elimination. This saves actual
instructions on 965 in the non-GLSL path for glean glsl1, and cleans up
the writemasks of programs even further.
mesa: Add an optimization path to remove use of pointless MOVs.
GLSL code such as:
vec4 result = {0, 1, 0, 0};
gl_FragColor = result;
emits code like:
0: MOV TEMP[0], CONST[0];
1: MOV OUTPUT[1], TEMP[0];
and this replaces it with:
0: MOV TEMP[0], CONST[0];
1: MOV OUTPUT[1], CONST[0];
Even when the dead code eliminator fails to clean up a now-useless MOV
instruction (since it doesn't do live/dead ranges), this should at reduce
dependencies.
mesa: Fix up the remove_dead_code pass to operate on a channel basis.
This cleans up a bunch of instructions in GLSL programs to have limited
writemasks, which would translate to wins in shaders that hit the i965
brw_wm_glsl.c path by depending less on in-driver optimizations. It will
also help hit other optimization passes I'm looking at.