i965/fs: Let CSE handle logical sampler sends as expressions.
This will prevent some shader-db regressions when we start plumbing
logical sends through the optimizer.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
i965/fs: Pass a BAD_FILE register to the logical FB write when oMask is unused.
This will let the optimizer know that the sample mask value is unused
so its definition can be DCE'ed.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This partially fixes CTS test:
GL44-CTS.enhanced_layouts.xfb_get_program_resource_api
The test now fails at a tes evaluation shader with unsized output arrays.
The ARB_enhanced_layouts spec says:
"It is a compile-time error to apply xfb_offset to the declaration of an
unsized array."
So this seems like a bug in the CTS.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94925
Pointed out by Karol Herbst on irc.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The spec says gl_NextBuffer and gl_SkipComponents need to be
returned to userspace in the program interface queries.
We currently throw those away, this requires a complete piglit
run to make sure no drivers fallover due to the extra varyings.
This fixes:
GL45-CTS.program_interface_query.transform-feedback-built-in
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
glsl/ast: subroutineTypes can't be returned from functions.
These types can't be returned.
This fixes:
GL43-CTS.shader_subroutine.subroutines_not_allowed_as_variables_constructors_and_argument_or_return_types
for the return type case.
Reviewed-by: Chris Forbes <chrisforbes@google.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This stops the offset being bumped again when and an explicit
alignment has already been applied.
Fixes alignment issues in:
GL44-CTS.enhanced_layouts.uniform_block_alignment
Note the test still fails due to unrelated issues with doubles.
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
It appears we were over-allocating these arrays.
Previously we would use nir->num_uniforms directly for scalar
programs, and multiply it by 4 for vec4 programs.
Instead we should have been dividing by 4 in both cases to convert
from bytes to a gl_constant_value count. The size of gl_constant_value
is 4 bytes.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
st/mesa: fix setting of point_size_per_vertex in ES contexts
GL ES 2.0+ does not have a GL_PROGRAM_POINT_SIZE enable, unlike desktop
GL. So we have to go and check the last pre-rasterizer stage to see
whether it outputs a point size or not.
This fixes a number of dEQP tests that use a geometry or tessellation
shader to emit points primitives.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
mesa: skip level checking for FramebufferTexture*D if texture is zero
From the OpenGL 4.5 core spec:
"An INVALID_VALUE error is generated if texture is not zero and level is
not a supported texture level for textarget, as described above."
Other FramebufferTexture functions already do the right thing.
This fixes the main menu in F1 2015.
Cc: 11.1 11.2 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
swr: [rasterizer] Do not define _mm256_storeu2_m128i with icc.
Fix build error with icc.
CXX libswrAVX_la-swr_clear.lo
icpc: command line warning #10006: ignoring unknown option '-Wdelete-non-virtual-dtor'
In file included from ./rasterizer/jitter/jit_api.h(31),
from swr_context.h(30),
from swr_clear.cpp(24):
./rasterizer/common/os.h(135): error: expected an identifier
void _mm256_storeu2_m128i(__m128i *hi, __m128i *lo, __m256i a)
^
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
Re-add the "return false" that was removed in 0c02d7002d
It seems that something went wrong when merging the patch. The patch
sent to the mailing list does not directly match what was committed.
https://lists.freedesktop.org/archives/mesa-dev/2016-May/118198.html
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
nvc0: remove outdated surfaces validation code for GK104
This code was used for validating surfaces with compute but now we use
pipe_image_view instead. Anyway, surfaces support should be
re-introduced properly once OpenCL happens.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
nvc0: do not always invalidate 3D CBs when using compute
Constant buffers are aliased between 3D and CP on Fermi, but we should
only invalidate them when a compute shader actually uses CBs and not
all the time after a lauching grid.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
i965: Update compute workgroup size limit calculation for SIMD32.
This should have the side effect of enabling the ARB_compute_shader
extension on Gen8+ hardware and all Gen7 platforms that didn't
previously expose it (VLV and IVB GT1) due to the number of hardware
threads per subslice being insufficient in SIMD16 mode.
v2: Bump workgroup size limit for GLES too (Jordan).
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
The do32 INTEL_DEBUG option causes the back-end to try to generate a
SIMD32 program when compiling a compute shader regardless of the
specified compute shader workgroup size, which will be useful for
testing SIMD32 code generation in the most common case in which the
workgroup size doesn't exceed the SIMD16 limit so SIMD32 codegen
wouldn't be automatically enabled.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
i965/fs: Extend back-end interface for limiting the shader dispatch width.
This replaces the current fs_visitor::no16() interface with
fs_visitor::limit_dispatch_width(), which takes an additional
parameter allowing the caller to specify the maximum dispatch width a
shader can be compiled with.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
i965/fs: Remove pre-Gen7 register allocation class micro-optimization.
This was trying to save some one-time init on pre-Gen7 hardware under
the assumption that one would only ever need 1, 2, 4 and 8-wide
registers on those platforms. However nothing guarantees that those
will be the only VGRF sizes used after lowering and optimization. In
some cases we may end up with a temporary of different size being
allocated (e.g. by SIMD lowering to zip or unzip a multi-component
register region of a logical send instruction), and there is no
guarantee that they will be optimized away before register allocation
(especially since the compute_to_mrf coalescing pass is
rather... lacking...). Instead just allocate classes for all possible
VGRF sizes up to MAX_VGRF_SIZE to avoid a crash in pq_test() when we
encounter a variable of any other size.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
i965/fs: Don't mutate multi-component arguments in sampler payload set-up.
The Gen5+ sampler message payload construction code steps through the
coordinate and derivative components by induction like 'coordinate =
offset(coordinate, bld, 1)', the problem is that while doing that it
may step one past the end of the coordinate vector causing an
assertion failure in offset() if it happens to be a (single component)
immediate. Right now coordinates and derivatives are typically passed
as actual registers but that will no longer be the case when we start
propagating constants into logical messages.
Instead express coordinate components in closed form like
'offset(coordinate, bld, i)' -- The end result seems slightly more
readable that way and it allows passing the coordinate and derivative
registers by const reference instead of by value, so it seems like a
clean-up in its own right.
v2: Fold a few post-increment operators into the last MOV
statement. (Jason)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
i965/fs: Fix multiple ACP interference during copy propagation.
This is more fallout from cf375a3333.
It's possible for multiple ACP entries to interfere with a given VGRF
write, so we need to continue iterating even if an overlapping entry
has already been found.
Cc: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
i965/fs: Fix cmod propagation not to propagate non-identity cmod into CMP(N).
The conditional mod of these instructions determines the semantics of
the comparison itself (rather than being evaluated based on the result
of the instruction as is usually the case for most other instructions
that allow conditional mods), so it's in general not legal to
propagate a conditional mod into a CMP instruction. This prevents
cmod propagation from (mis)optimizing:
cmp.z.f0 tmp, ...
mov.z.f0 null, tmp
into:
cmp.z.f0 tmp, ...
which gives the negation of the flag result of the original sequence.
I could reproduce this easily with SIMD32 but I don't see any reason
why the problem would be SIMD32-specific, it was most likely working
by luck.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
i965/fs: Skip remove_duplicate_mrf_writes() during SIMD32 runs.
The pass is disabled in SIMD16 dispatch mode for the same reason, it
cannot handle instructions that write multiple MRF registers at once.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
i965/fs: Return 32 bit mask from fs_builder::sample_mask().
This doesn't actually handle the FS case, just add an assertion for
the moment so I don't forget to update it later on for SIMD32 fragment
shader dispatch.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
i965/fs: Emit fixed-width null register regardless of the dispatch width.
brw_null_vec() cannot handle widths over 16 but it doesn't really
matter what width we specify for null registers because destination
regions have no width field at the hardware level.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
i965/fs: Fix half() to handle more exotic register files.
horiz_offset() is able to deal with a superset of the register files
currently special-cased in half(). Just call horiz_offset() in all
cases.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
i965/fs: Keep track of flag dependencies with byte granularity during scheduling.
This prevents false dependencies from being created between
instructions that write disjoint 8-bit portions of the flag register
and OTOH should make sure that the scheduler considers dependencies
between instructions that write or read multiple flag subregisters
at once (e.g. 32-wide predication or conditional mods).
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
i965/fs: Track flag register liveness with byte granularity.
This is required for correctness in presence of multiple 8-wide flag
writes (e.g. 8-wide instructions with a conditional mod set) which
update a different portion of the same 16-bit flag subregister. Right
now we keep track of flag dataflow with 16-bit granularity and
consider flag writes to have killed any previous definition of the
same subregister even if the write was less than 16 channels wide,
which can cause live flag register updates to be dead code-eliminated
incorrectly.
Additionally this makes sure that we handle 32-wide flag writes and
reads which may span multiple flag subregisters so the current
approach of just setting/testing a single bit from the live set
wouldn't have worked.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
i965/fs: Expose arbitrary channel execution groups to the IR.
This generalizes the current fs_inst::force_sechalf flag to allow
specifying channel enable groups other than 0 or 8. At some point it
will likely make sense to fix the vec4 generator to support arbitrary
execution groups and then move the definition of fs_inst::group into
backend_instruction (e.g. so we can do FP64 in the VEC4 back-end).
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
i965/ir: Make BROADCAST emit an unmasked single-channel move.
Alternatively we could have extended the current semantics to 32-wide
mode by changing brw_broadcast() to emit multiple indexed MOV
instructions in the generator copying the selected value to all
destination registers, but it seemed rather silly to waste EU cycles
unnecessarily copying the exact same value 32 times in the GRF.
The vstride change in the Align16 path is required to avoid assertions
in validate_reg() since the change causes the execution size of the
MOV and SEL instructions to be equal to the source region width.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
i965/fs: Allow specifying arbitrary quarter control to FIND_LIVE_CHANNEL.
This makes FIND_LIVE_CHANNEL behave like a normal instruction for
non-zero quarter control. On Gen8+ we just leave the quarter control
field of the emitted FBL instruction set to the default value so the
hardware applies the expected shift to the execution mask signals. On
Gen7 we apply the offset manually by specifying a non-zero subregister
offset in the source region of the FBL instruction.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
i965/fs: Allow specifying arbitrary execution sizes up to 32 to FIND_LIVE_CHANNEL.
Due to a Gen7-specific hardware bug native 32-wide instructions get
the lower 16 bits of the execution mask applied incorrectly to both
halves of the instruction, so the MOV trick we currently use wouldn't
work. Instead emit multiple 16-wide MOV instructions in 32-wide mode
in order to cover the whole execution mask.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
i965/fs: Lower 32-wide scratch writes in the generator.
The hardware has messages that can write 32 32bit components at once
but the channel enable mask gets messed up. We need to split them
into several 16-wide scratch writes for the channel enables to be
applied correctly. The SIMD lowering pass cannot be used for this
because scratch writes are emitted rather late during register
allocation long after SIMD lowering has been done.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
i965/eu: Fix Gen7+ DP scratch message size calculation on Gen7.
Gen7 hardware expects the block size field in the message descriptor
to be the number of registers minus one instead of the log2 of the
number of registers.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>