ir_to_mesa: Fix implementation of ir_binop_equal, ir_binop_notequal.
These binops are the vector-to-bool comparisons, not vec-to-bvec. We
likely want both operations avilable as expression, since 915 and 965
FS naturally does the vector version, while 965 VS can also naturally
do the scalar version. However, we can save that until later.
Fixes:
glsl-fs-vec4-operator-equal.shader_test
glsl-fs-vec4-operator-notequal.shader_test
glsl-vs-vec4-operator-equal.shader_test
glsl-vs-vec4-operator-notequal.shader_test
Now that we have glsl2 with if flattening in place, most shaders will
just work. Remaining failing shaders will mostly be due to loop
unrolling (in progress), some possible if flattening failures in
inlining functions (planning on fixing), and the register/instruction
count limits.
While the GLSL and GLSL-ES specs say that shaders shouldn't fail to
compile/link due to register/instruction limits, in practice we're not
the first vendor to expose GLSL on hardware with these limitations.
The benefit to application developers of providing a better language
for GPU programming is greater than the pain of having to handle
instruction limits (which they had to for ARB_fp on this hardware
anyway)
This error led to an assertion failure for some constructors of
non-square matrices. It only occured in matrices where the number of
columns was greater than the number of rows. It didn't even always
occur on those.
Fixes piglit glslparsertest case constructor-16.vert.
glsl: When doing algebraic simplification, make sure the type still matches.
When simplifying (vec4(1.0) / (float(x))) to rcp(float(x)), we forgot
to produce a vec4, angering ir_validate when starting alien-arena.
Fixes:
glsl-algebraic-add-zero-2
glsl-algebraic-div-one-2
glsl-algebraic-mul-one-2
glsl-algebraic-sub-zero-3
glsl-algebraic-rcp-sqrt-2
ir_constant: Don't assert on out-of-bounds array accesses
Several optimization paths, including constant folding, can lead to
accessing an ir_constant array with an out of bounds index. The GLSL
spec lets us produce "undefined" results, but it does not let us
crash.
Fixes piglit test case glsl-array-bounds-01 and glsl-array-bounds-03.
Apart from the fact that the radeon.h/r600_states.h editing is a nightmare, this
wasn't so bad.
passes piglit user-clip test now also trivial tests.
Signed-off-by: Dave Airlie <airlied@redhat.com>
i965: Use the implied move available in most brw_wm_emit brw_math() calls.
This saves an extra message reg move in the program, though I'm not
clear on whether it will have any performance impact other than cache
footprint. It will also fix those math calls on Sandybridge, where
the brw_eu_emit.c brw_math() support relies on the implied move being
used.
check_os_katmai_support checks that the operating system running on a
SSE-capable processor supports SSE. This is necessary for unpatched
2.2.x and earlier kernels. 2.4.x and later kernels support SSE.
check_os_katmai_support will disable SSE capabilities for 32-bit x86
operating systems for which there is no code path. Currently, this
function handles Linux, Windows, and several BSDs. Mac OS, Cygwin, and
Solaris are several operating systems with no code paths.
Rather than add code for the unhandled operating systems, remove this
function altogether. This will fix SSE detection on all recent 32-bit
x86 operating systems. This completely breaks functionality on unpatched
2.2.x and earlier kernels, although there are likely no Gallium3D users
on such operating systems.
draw_llvm: fix segfaults on non-SSE2 CPUs where it is disabled (v2)
Changes in v2:
- Change function name
Currently draw_llvm refuses to create itself on non-SSE2 CPUs due to
an alleged LLVM bug.
However, this is implemented improperly, because other parts of draw
still attempt to access draw->llvm, resulting in segfaults.
Instead, put the check in debug_get_option_draw_use_llvm, check that
before calling draw_llvm_create, and then check whether draw->llvm is
non-null everywhere else.
NOTE: Win64 is untested, and is thus currently disabled.
If you have such a system, please enable it and report whether it works.
To enable it, change src/gallium/auxiliary/translate/translate.c
Changes in v5:
- On Win64, preserve %xmm6 and %xmm7 as required by the ABI
- Use _WIN64 instead of WIN64
Changes in v4:
- Use x86_target() and x86_target_caps()
- Enable translate_sse in x86-64, but not in Win64
Changes in v3:
- Win64 support (untested)
- Use u_cpu_detect.h constants instead of #ifs
Changes in v2:
- Minimize #ifs
- Give a name to magic number CHANNELS_0001
- Add support for CPUs without SSE (only memcpy and swizzles, like non SSE2)
- Fixed comments
translate_sse is currently very limited to the point of
being useless in essentially all cases.
In particular, it only support some float32 and unorm8
formats and doesn't work on x86-64.
This commit rewrites it to support:
1. Dumb memory copy for any pair of identical formats
2. All formats that are swizzles of each other
3. Converting 32/64-bit floats and all 8/16/32-bit integers to 32-bit float
4. Converting unorm8/snorm8 to snorm16 and uscaled8/sscaled8 to sscaled16
5. Support for x86-64 (doesn't take advantage of it in any way though)
This new translate can even be useful to translate index buffers for
cards that lack 8-bit index support.
It passes the testsuite I wrote, but note that this is a major change, and more
testing would be great.