Teach the emitter that the two registers are sequential, and drop the
second arg entirely, in favor of a double-wide first argument.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
This largely leaves the existing image logic alone. When image support
is added this will have to be harmonized somehow.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Issue a MEM_BARRIER. No idea if this is sufficient. As there are no
tests for this, it'll have to do for now.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
(address, length) pairs are uploaded to the driver constbuf as well to
make these values available to the shaders.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
This makes PROGRAM_IMMEDIATE a first-class gl_register_file type, and
adds PROGRAM_BUFFER to the list. These are used purely inside
glsl_to_tgsi conversion.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
glsl: keep track of ssbo variable being accessed, add access params
Currently any access params (coherent/volatile/restrict) are being lost
when lowering to the ssbo load/store intrinsics. Keep track of the
variable being used, and bake its access params in as the last arg of
the load/store intrinsics.
If the variable is accessed via an instance block, then 'variable'
points to the instance block variable and not the field inside the
instance block that we are accessing. In order to check access
parameters for the field itself we need to detect this case and keep
track of the corresponding field struct so we can extract the specific
field access information from there instead.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v1)
v1 -> v2: add tracking of struct field
v2 -> v3: minor adjustments based on Iago's feedback
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
glsl: always initialize image_* fields, copy them on interface init
Interfaces can have image properties set in case they are buffer
interfaces. Make sure not to lose this information.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
tgsi: add MEMBAR opcode to handle memoryBarrier* GLSL intrinsics
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v1)
v1 -> v2: add defines for the various bits
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
winsys/amdgpu: Process RADEON_FLAG_* independently from RADEON_DOMAIN_*
In particular, AMDGPU_GEM_CREATE_CPU_GTT_USWC can affect even BOs created
in VRAM if they get evicted to GTT. In general there's no need to
restrict any of the flags to any particular domains.
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Failing to do this was resulting in the kernel driver unnecessarily
leaving open the possibility of CPU access to tiled BOs.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93862
(This change shouldn't be backported to stable branches, because
released versions of xf86-video-amdgpu unnecessarily try to map the
front buffer)
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
nv50/ir: optimize mad/fma with third argument 0 to mul
Very modest effect, but it's clearly the right thing to do.
total instructions in shared programs : 6131491 -> 6131398 (-0.00%)
total gprs used in shared programs : 910157 -> 910131 (-0.00%)
total local used in shared programs : 15328 -> 15328 (0.00%)
local gpr inst bytes
helped 0 55 85 85
hurt 0 26 20 20
Signed-off-by: Karol Herbst <nouveau@karolherbst.de>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
nv50/ir: optimize shl(shr(a, c), c) to and(a, ~((1 << c) - 1))
Following shader-db results on GK110:
total instructions in shared programs : 6141510 -> 6131491 (-0.16%)
total gprs used in shared programs : 910187 -> 910157 (-0.00%)
total local used in shared programs : 15328 -> 15328 (0.00%)
local gpr inst bytes
helped 0 18 821 821
hurt 0 0 0 0
Signed-off-by: Karol Herbst <nouveau@karolherbst.de>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Add a debug option to select the LLVM SI Machine Scheduler.
R600_DEBUG=sisched
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
glsl: double-precision values don't support interpolation
ARB_gpu_shader_fp64 spec says:
"This extension does not support interpolation of double-precision
values; doubles used as fragment shader inputs must be qualified as
"flat"."
Fixes the regressions added by commit 781d278:
arb_gpu_shader_fp64-double-gettransformfeedbackvarying
arb_gpu_shader_fp64-tf-interleaved
arb_gpu_shader_fp64-tf-interleaved-aligned
arb_gpu_shader_fp64-tf-separate
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93878
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
vc4: Throttle outstanding rendering after submission.
Just make sure that after we've submitted, we get to at least 5
(global) submits ago before we go on to do more. Prevents up to
seconds of lag with window movement in X with xcompmgr -c. There may
be useful tuning to do in the future, but for now this gets us
usability.
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Eric Anholt <eric@anholt.net>
vc4: Don't record the seqno of a failed job submit.
On an error return, the returned seqno will probably be unset, so we'd
lose track of what we've submitted so far for waiting on in the
future.
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Eric Anholt <eric@anholt.net>
i965/skl: Utilize new 5th bit for gateway messages
Modify comment as spotted by Matt, and Chris Forbes
Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
nv50/ir: fix memory corruption when spilling and redoing RA
When RA fails, and we spill, we have to clean everything up before doing
RA again. We were forgetting to reset the hi/lo linked lists - at
least the hi list is guaranteed to still have pointers to now-deleted
RIG nodes.
Signed-off-by: Karol Herbst <nouveau@karolherbst.de>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
v2: drop inline keyword
drop radeon_llvm_dispose_kernel_module wrapper
v3: move definitions to .c file
use in radeonsi
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
The addition of spi_shader_col_format killed all color outputs
in precompiled shaders.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com> (v1)
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (v1)
v2: also set the alpha func (trivial)
glsl: add GL_OES_geometry_point_size and conditionalize gl_PointSize
For now this will be enabled in tandem with GL_OES_geometry_shader.
Should a driver come along that wants to separate them out, another
enable can be added.
Also adds the missed GL_OES_geometry_shader define in glcpp.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
compiler: move the glsl_types C wrapper alongside their C++ brethren
At a later stage we might want to split out the NIR specific [XXX:
which one was it], as to make things move obvious and rename the files
appropriately. This patch aims to split it out of nir.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Matt Turner <mattst88@gmail.com>
Acked-by: Jose Fonseca <jfonseca@vmware.com>
Allows us to remove the SCons workaround :-)
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Matt Turner <mattst88@gmail.com>
Acked-by: Jose Fonseca <jfonseca@vmware.com>
This way one can reuse it in glsl, nir or other infrastructure without
pulling nir as dependency.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Matt Turner <mattst88@gmail.com>
Acked-by: Jose Fonseca <jfonseca@vmware.com>
Currently it's an empty library, although it'll be used to store common
code between GLSL and NIR that is compiler specific (rather than generic
as the one in src/util).
XXX: strictly speaking we could add a python/mako parser to generate the
relevant files instead including builtin_type_macros.h in such a manner.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Matt Turner <mattst88@gmail.com>
Acked-by: Jose Fonseca <jfonseca@vmware.com>
This currently just writes out the name of dump files, which can be useful
to easily correlate those files with other log outputs (driver debug output,
apitrace calls, etc.)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
gallium/ddebug: make 'noflush' also affect 'always' mode
This changes the default behavior of 'always' mode to be consistent with
hang detection mode.
I have used this to more easily compare dumped command streams using diff.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
radeonsi: use llvm.amdgcn.s.barrier instead of llvm.AMDGPU.barrier.local
The new name for the intrinsic was introduced in LLVM r258558.
v2: use ternary operator instead of preprocessor
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com> (v1)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
When setting the conservative thread counts, I halved everything. That isn't
correct for the wm, which has nothing to do with actual thread counts. I suck.
BXT only has 1 slice, and there is some ambiguity about subslices, so just
reserve the max possible for now. It looks like this might fix:
piglit.spec.glsl-1_50.execution.variable-indexing.gs-output-array-vec4-index-wr.bxtm64.
I kind of question why that is, but it is what Jenkins says.
Mark is current running some of the other blacklisted tests on this patch. (it
effects anything requiring scratch space).
Cc: mesa-stable <mesa-stable@lists.freedesktop.org>
Cc: Neil Roberts <neil@linux.intel.com>
Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Mark Janes <mark.a.janes@intel.com>