llvmpipe: try to do more of rast_tri_3_16 with intrinsics
There was actually a large quantity of scalar code in these functions
previously. This tries to move more into intrinsics.
Introduce an sse2 mm_mullo_epi32 replacement to avoid sse4 dependency
in the new rasterization code.
It's now much more correct for gen6 than the old backend, with just 2
regressions I've found (one of which is common with pre-gen6 and will
be fixed by an array splitting IR pass).
This does leave the old Mesa IR backend getting used still when we
don't have GLSL IR, but the plan is to get GLSL IR input to the driver
for the ARB programs and fixed function by the next release.
i965: Fix gen6 pixel_[xy] setup to avoid mixing int and float src operands.
Pre-gen6, you could mix int and float just fine. Now, you get goofy
results.
Fixes:
glsl-arb-fragment-coord-conventions
glsl-fs-fragcoord
glsl-fs-if-greater
glsl-fs-if-greater-equal
glsl-fs-if-less
glsl-fs-if-less-equal
i965: Expand uniform args to gen6 math to full registers to get hstride == 1.
This is a hw requirement in math args. This also is inefficient, as
we're calculating the same result 8 times, but then we've been doing
that on pre-gen6 as well. If we're doing math on uniforms, though,
we'd probably be better served by having some sort of mechanism for
precalculating those results into another uniform value to use.
Fixes 7 piglit math tests.
intel_extensions: Add ability to set GLSL version via environment
Add ability to set the GLSL version used by the GLcontext by setting the
environment variable INTEL_GLSL_VERSION. For example,
env INTEL_GLSL_VERSION=130 prog args
If the environment variable is missing, the GLSL versions defaults to 120.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
r200: revalidate after radeon_update_renderbuffers
By calling radeon_draw_buffers (which sets the necessary flags
in radeon->NewGLState) and revalidating if NewGLState is non-zero
in r200TclPrimitive. This fixes an assert in libdrm (the color-/
depthbuffer was changed but not yet validated) and and stops the
kernel cs checker from complaining about them (when they're too
small).
Thanks to Mario Kleiner for the hint to call radeon_draw_buffer
(instead of my half-broken hack).
v2: Also fix the swtcl r200 path.
Cc: Mario Kleiner <mario.kleiner@tuebingen.mpg.de>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This didn't produce a statistically significant performance difference
in my demo (n=4) or nexuiz (n=3), but it still seems like a good idea
and is recommended by the HW team.
Having the single opcode write then read the reg meant that single
instruction opcodes had to consider their source regs to interfere
with their dest regs.
gallivm: Eliminate unsigned integer arithmetic from texture coordinates.
SSE support for 32bit and 16bit unsigned arithmetic is not complete, and
can easily result in inefficient code.
In most cases signed/unsigned doesn't make a difference, such as for
integer texture coordinates.
So remove uint_coord_type and uint_coord_bld to avoid inefficient
operations to sneak in the future.
We need to move the texture sampler resources out of the range of the vertex attribs.
We could probably improve this using an allocator but this is the simple answer for now.
makes mesa-demos/src/glsl/vert-tex work.