i965/skl: Always use a header for SIMD4x2 sampler messages
SKL+ overloads the SIMD4x2 SIMD mode to mean either SIMD8D or SIMD4x2
depending on bit 22 in the message header. If the bit is 0 or there is
no header we get SIMD8D. We always wand SIMD4x2 in vec4 and for fs pull
constants, so use a message header in those cases and set bit 22 there.
Based on an initial patch from Ken.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
We can't (or don't know how to) turn this off. But it can end up being
stored to a higher reg # than what the shader uses, leading to
corruption.
Also we currently aren't clever enough to turn off frag_coord/frag_face
if the input is dead-code, so just fixup max_reg/max_half_reg. Re-org
this a bit so both vp and fp reg footprint fixup are called by a common
fxn used also by ir3_cmdline. Also add a few more output lines for
ir3_cmdline to make it easier to see what is going on.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Handle TEMP[ADDR[]] src registers by generating a fanin to group array
elements, similarly to how texture fetch instructions work.
NOTE:
For all the scalar instructions generated for a single tgsi vector
operation which uses an array src (or possibly even uses the same array
as multiple srcs), re-use the same fanin node. Since a vector operation
operates on all components at the same time, it should never see more
than one version of the same array.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
To use fanin's to group registers in an array, we can potentially have a
much larger array of registers. Rather than continuing to bump up the
array size, just make it dynamically allocated when the instruction is
created.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Group inputs/outputs, in addition to fanin/fanout, as they must also
exist in sequential scalar registers. This lets us simplify RA by
working in terms of neighbor groups.
NOTE: has the slight problem that it can't optimize out mov's for things
like:
MOV OUT[n], IN[m]
To avoid this, instead of trying to figure out what mov's we can
eliminate, we first remove all mov's prior to grouping, and then
re-insert mov's as needed while grouping inputs/outputs/fanins.
Eventually we'd prefer the frontend to not insert extra mov's in the
first place (so we don't have to bother removing them). This is the
plan for an eventual NIR based frontend, so separate out the instr
grouping (which will still be needed for NIR frontend) from the mov
elimination (which won't).
Signed-off-by: Rob Clark <robclark@freedesktop.org>
For temp arrays, a 32bit mask won't be sufficient.. but otoh we don't
need to support an arbitrary mask. So for this case use a simple size
field rather than a bitmask.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
freedreno/ir3: legalize vs unused sam dst components
We probably could be more clever elsewhere and mask out components that
are not used. But either way, legalize should realize that there is
also a write-after-write hazard with texture sample instructions.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Old compiler doesn't have ir3_block's.. so we need a special path. This
hack can be dropped when ir3_compiler_old is retired.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
NOTE IN[] and OUT[] don't need (have?) ArrayID's.. and TEMP[] can
optionally have them. So we implicitly assume that ArrayID==0 always
exists for each file. This is why array_max[file] is never less than
zero.
You can tell from indirect_files(_read/written) if the legacy array-
id zero was actually used.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
At least temporarily, I need to fallback to old compiler still for
relative dest (for freedreno), but I can do relative src temp. Only
a temporary situation, but seems easy/reasonable for tgsi-scan to
track this.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
We were invalidating si_screen:tm by calling
r600_destroy_common_screen() which frees the si_screen object. This
caused the driver to crash in LLVMDisposeTargetMachine() since we
were passing it an invalid pointer.
https://bugs.freedesktop.org/show_bug.cgi?id=88170
It doesn't work on Windows because of STDCALL calling convention -- it's
the callee responsibility to pop the arguments, and the number of
arguments vary with the prototype --, so the stack pointer ends up getting
corrupted.
This is just a non-invasive stop-gap fix. A proper fix would be more
elaborate, and require either:
- a variation of __glapi_noop_table which sets GL_INVALID_OPERATION
error
- stop using APIENTRY on all internal _mesa_* functions.
Tested with piglit gl-1.0-beginend-coverage (it now fails instead of
crashing).
VMware PR1350505
Reviewed-by: Brian Paul <brianp@vmware.com>
glapi: Force frame pointer elimination on Windows.
To catch mismatches in cdecl vs stdcall calling convention. See code
comment for more detailed explanation.
Tested with piglit gl-1.0-beginend-coverage (it now also crashes on
debug builds.)
VMware PR1350505.
Reviewed-by: Brian Paul <brianp@vmware.com>
This fixes a case where a transform feedback buffer is fed back as an index
buffer, because SURFACE_SYNC must be after VS_PARTIAL_FLUSH.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
- we don't usually need to flush TC L2
- we should flush KCACHE
(not really an issue now since we always flush KCACHE when updating
descriptors, but it could be a problem if we used CE, which doesn't
require flushing KCACHE)
- add an explicit VS_PARTIAL_FLUSH flag
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
radeonsi: use TC L2 for CP DMA operations with shader resources on CIK
So that TC L2 doesn't need to be flushed.
The only problem is with index buffers, which don't use TC.
A simple solution is added that flushes TC L2 before a draw call (TC_L2_dirty).
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
radeonsi: don't use TC L2 for updating descriptors on SI
It's causing problems, because we mix uncached CP DMA with cached WRITE_DATA
when updating the same memory.
The solution for SI is to use uncached access here, because CP DMA doesn't
support cached access.
CIK will be handled in the next patch.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
radeonsi: only flush the right set of caches for CP DMA operations
That's either framebuffer caches or caches for shader resources.
The motivation is that framebuffer caches need to be flushed very rarely
here.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
radeonsi: remove special handling of TGSI_INTERPOLATE_COLOR in shader codegen
It doesn't do anything useful. And colors are floating-point, so we can use
fs.interp, remove "flatshade" from the shader key, and rely on the FLAT_SHADE
state only (in the next patch).
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
radeonsi: implement VERTEXID_NOBASE and BASEVERTEX system values
Only done for completeness. Not used by anything yet.
Tested by advertising PIPE_CAP_VERTEXID_NOBASE.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
radeonsi: use ordered compares for SSG and face selection
Ordered compares are what you have in C. Unordered compares are the result
of negating ordered compares (they return true if either argument is NaN).
That special NaN behavior is completely useless here, and unordered
compares produce horrible code with all stable LLVM versions.
(I think that has been fixed in LLVM git)
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
- the relocs array is unused, remove it
- ndw is at most 115 (init), set 140 as the maximum
- compute needs 4 buffers per state, graphics only needs 1; set 4 as the maximum
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
vbo: ignore primitive restart if FixedIndex is enabled in DrawArrays
From GL 4.4 Core profile:
If both PRIMITIVE_RESTART and PRIMITIVE_RESTART_FIXED_INDEX are
enabled, the index value determined by PRIMITIVE_RESTART_FIXED_INDEX is
used. If PRIMITIVE_RESTART_FIXED_INDEX is enabled, primitive restart is not
performed for array elements transferred by any drawing command not taking a
type parameter, including all of the *Draw* commands other than *DrawEle-
ments*.
Cc: 10.2 10.3 10.4 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>