i965: Remove duplicate MRF writes in the FS backend.
This is quite common for multitexture sampling, and not only cuts down
on the second and later set of MOVs, but typically also allows
compute-to-MRF on the first set.
No statistically siginficant performance difference in nexuiz (n=3),
but it reduces instruction count in one of its shaders and seems like
a good idea.
We were skipping it if the instruction producing the value we were
going to compute-to-mrf used its result reg as a source reg. This
meant that the typical "write interpolated color to fragment color" or
"texture from interpolated texcoord" shader didn't compute-to-MRF.
Just don't check for the interference cases until after we've checked
if this is the instruction we wanted to compute-to-MRF.
Improves nexuiz high-settings performance on my laptop 0.48% +- 0.08%
(n=3).
i965: Recognize saturates and turn them into a saturated mov.
On pre-gen6, this turns 4 instructions into 1. We could still do
better by folding the saturate into the instruction generating the
value if nobody else uses it, but that should be a separate pass.
glsl: Add a helper function for determining if an rvalue could be a saturate.
Hardware pretty commonly has saturate modifiers on instructions, and
this can be used in codegen to produce those, without everyone else
needing to understand clamping other than min and max.
glsl: Combine many instruction lowering passes into one.
This should save on the overhead of tree-walking and provide a
convenient place to add more instruction lowering in the future.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
The vector operator collects 2, 3, or 4 scalar components into a
vector. Doing this has several advantages. First, it will make
ud-chain tracking for components of vectors much easier. Second, a
later optimization pass could collect scalars into vectors to allow
generation of SWZ instructions (or similar as operands to other
instructions on R200 and i915). It also enables an easy way to
generate IR for SWZ instructions in the ARB_vertex_program assembler.
glsl: Add ir_unop_sin_reduced and ir_unop_cos_reduced
The operate just like ir_unop_sin and ir_unop_cos except that they
expect their inputs to be limited to the range [-pi, pi]. Several
GPUs require this limited range for their sine and cosine
instructions, so having these as operations (along with a to-be-written
lowering pass) helps this architectures.
These new operations also matche the semantics of the
GL_ARB_fragment_program SCS instruction. Having these as operations
helps in generating GLSL IR directly from assembly fragment programs.
Use fetch shader instead of having fetch instruction in the vertex
shader. Allow to restrict shader update to a smaller part when
vertex buffer input layout changes.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
r600g: fix occlusion query on evergreen (avoid lockup)
Occlusion query on evergreen need the event index field to be
set otherwise we endup locking up the GPU.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
ir_to_mesa: Generate smarter code for some conditional moves
Condiation moves with a condition of (a < 0), (a > 0), (a <= 0), or (a
>= 0) can be generated with "a" directly as an operand of the CMP
instruction. This doesn't help much now, but it will help with
assembly shaders that use the CMP instruction.
mesa: pass gl_format to _mesa_init_teximage_fields()
This should prevent the field going unset in the future. See bug
http://bugs.freedesktop.org/show_bug.cgi?id=31544 for background.
Also remove unneeded calls to clear_teximage_fields().
Finally, call _mesa_set_fetch_functions() from the
_mesa_init_teximage_fields() function so callers have one less
thing to worry about.
glsl: Fix 'control reaches end of non-void function' warning.
Fix this GCC warning.
ir.cpp: In static member function
'static unsigned int ir_expression::get_num_operands(ir_expression_operation)':
ir.cpp:199: warning: control reaches end of non-void function
If an instruction writes reg but nothing later uses it, then we don't
need to bother doing it. Before, we were just killing code that was
never read after it was ever written.
This removes many interpolation instructions for attributes with only
a few comopnents used. Improves nexuiz high-settings performance .46%
+/- .12% (n=3) on my Ironlake.