r600: Correct IDIV if DST and SRC use the same temporary
In cases like
IDIV TEMP[0].xy TEMP[0].xx TEMP[1].yy
the result will be written to the same register that is also a source register.
Since the components are evaluated one by one, this may result in overwriting
the source value for a later operation. Work around this by adding another
temporary to store the result if the destination temporary index is equal to
one of the source temporary indices.
Fixes:
dEQP-GLES2.functional.shaders.operator.binary_operator.div.*
Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
This reverts commit 79fe00efb4.
This reverts commit f5e8b13f78.
This reverts commit d21c086d81.
They broke the Android build and I'd rather not leave it broken
for the long holiday weekend.
i965/miptree: Use cpu tiling/detiling when mapping
Rename the (un)map_gtt functions to (un)map_map (map by
returning a map) and add new functions (un)map_tiled_memcpy that
return a shadow buffer populated with the intel_tiled_memcpy
functions.
Tiling/detiling with the cpu will be the only way to handle Yf/Ys
tiling, when support is added for those formats.
v2: Compute extents properly in the x|y-rounded-down case (Chris Wilson)
v3: Add units to parameter names of tile_extents (Nanley Chery)
Use _mesa_align_malloc for the shadow copy (Nanley)
Continue using gtt maps on gen4 (Nanley)
v4: Use streaming_load_memcpy when detiling
v5: (edited by Ken) Move map_tiled_memcpy above map_movntdqa, so it
takes precedence. Add intel_miptree_access_raw, needed after
rebasing on commit b499b85b0f.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We stream from a tiled and aligned source into an unaligned user buffer,
so we need to use _mm_storeu_si128.
Fixes: d21c086d81 (i965/tiled_memcpy: inline movntdqa loads in tiled_to_linear)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
intel/blorp: Support blits and clears on surfaces with offsets
For certain EGLImage cases, we represent a single slice or LOD of an
image with a byte offset to a tile and X/Y intratile offsets to the
given slice. Most of i965 is fine with this but it breaks blorp. This
is a terrible way to represent slices of a surface in EGL and we should
stop some day but that's a very scary and thorny path. This gets blorp
to start working with those surfaces and fixes some dEQP EGL test bugs.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106629
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
radeonsi: fix color inputs/outputs for GS and tess
GS is tested, tessellation is untested.
Have outputs_written_before_ps for HW VS and outputs_written for other
stages. The reason is that COLOR and BCOLOR alias for HW VS, which
drives elimination of VS outputs based on PS inputs.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
st/mesa: simplify lastLevel determination in st_finalize_texture
This fixes shader images where we always bind stObj->pt and not individual
gl_texture_images.
Roughly based on i965 commit 845ad2667a
which does a similar thing but for a different reason.
This fixes GL CTS assertion failures introduced by Ilia.
Cc: 18.0 18.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
i965/tiled_memcpy: inline movntdqa loads in tiled_to_linear
The reference for MOVNTDQA says:
For WC memory type, the nontemporal hint may be implemented by
loading a temporary internal buffer with the equivalent of an
aligned cache line without filling this data to the cache.
[...] Subsequent MOVNTDQA reads to unread portions of the WC
cache line will receive data from the temporary internal
buffer if data is available.
This hidden cache line sized temporary buffer can improve the
read performance from wc maps.
v2: Add mfence at start of tiled_to_linear for streaming loads (Chris)
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
swr/rast: Adjusted avx512 primitive assembly for msvc codegen
Optimize AVX-512 PA Assemble (PA_STATE_OPT). Reduced generated code by
about 4x, MSVC compiler was going crazy making temporaries and
split-loading inputs onto the stack unless explicit AVX-512 load ops
were added
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Added two new files for a wrapper function for initialization
v2: added missing include for single architecture builds
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
It's recommended by the instruction combining pass, and
RadeonSI also runs it.
This pass used to segfault with one shader of F12017 in the
past, but it no longer crashes. Maybe the LLVM IR generated
by RADV has changed.
Polaris10:
Totals from affected shaders:
SGPRS: 441352 -> 441648 (0.07 %)
VGPRS: 310888 -> 300784 (-3.25 %)
Spilled SGPRs: 13576 -> 12983 (-4.37 %)
Code Size: 22560328 -> 22420544 (-0.62 %) bytes
Max Waves: 40755 -> 41366 (1.50 %)
Vega10:
Totals from affected shaders:
SGPRS: 442848 -> 442000 (-0.19 %)
VGPRS: 310396 -> 300460 (-3.20 %)
Spilled SGPRs: 13708 -> 12906 (-5.85 %)
Code Size: 22479428 -> 22336216 (-0.64 %) bytes
Max Waves: 45783 -> 46506 (1.58 %)
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
radv: rework how shaders are dumped when generating a hang report
Use a flag for the active stages instead.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
mesa: do not leak ctx->Shader.ReferencedProgram references
When glUseProgram is used, references to the included shaders are
added in ctx->Shader.ReferencedProgram. But those references are not
decreased when the shader data is deallocated. Thus, those shaders
are leaked.
Explicitely remove the pending references to these shaders.
Fixes: e6506b3cd2 ("mesa: retain gl_shader_programs after glDeleteProgram if they are in use")
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Suggested-by: Emil Velikov <emil.l.velikov@gmail.com>
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
mesa: changes to expose OES_texture_view extension
Functionality already covered by ARB_texture_view, patch also
adds missing 'gles guard' for enums (added in f1563e6392).
Tested via arb_texture_view.*_gles3 tests and individual app
utilizing texture view with ETC2.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
radv: call nir_split_var_copies() before nir_lower_var_copies()
This doesn't nothing special currently because we don't create
any copy_var instructions, but this is needed for the next patch.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
i965: Use intel_bufferobj_buffer() wrapper in image surface state setup.
Instead of directly using intel_obj->buffer. Among other things
intel_bufferobj_buffer() will update intel_buffer_object::
gpu_active_start/end, which are used by glBufferSubData() to decide
which path to take. Fixes a failure in the Piglit
ARB_shader_image_load_store-host-mem-barrier Buffer Update/WaW tests,
which could be reproduced with a non-standard glGetTexSubImage
implementation (see bug report).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105351
Reported-by: Nanley Chery <nanleychery@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
i965: Handle non-zero texture buffer offsets in buffer object range calculation.
Otherwise the specified surface state will allow the GPU to access
memory up to BufferOffset bytes past the end of the buffer. Found by
inspection.
v2: Protect against out-of-range BufferOffset (Nanley).
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
i965: Move buffer texture size calculation into a common helper function.
The buffer texture size calculations (should be easy enough, right?)
are repeated in three different places, each of them subtly broken in
a different way. E.g. the image load/store path was never fixed to
clamp to MaxTextureBufferSize, and none of them are taking into
account the buffer offset correctly. It's easier to fix it all in one
place.
Cc: mesa-stable@lists.freedesktop.org
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106481
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Revert "mesa: simplify _mesa_is_image_unit_valid for buffers"
This reverts commit c0ed52f614. It was
preventing the image format validation from being done on buffer
textures, which is required to ensure that the application doesn't
attempt to bind a buffer texture with an internal format incompatible
with the image unit format (e.g. of different texel size), which is
not allowed by the spec (it's not allowed for *any* texture target,
whether or not there is spec wording restricting this behavior
specifically for buffer textures) and will cause the driver to
calculate texel bounds incorrectly and potentially crash instead of
the expected behavior.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106465
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
WQM is pretty reliable now on LLVM 7, so let us just use
DPP + WQM.
This gives approximately a 1.5% performance increase on the
vrcompositor built-in benchmark.
v2: Use ac_build_quad_swizzle.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
i965: add {X,A}BGR2101010 to 'intel_image_formats'
This patch adds {X,A}BGR2101010 entries to the list of supported
'intel_image_formats'.
Bug: https://crbug.com/776093
Reviewed-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
dri_util: Add R10G10B10{A,X}2 translation between DRI and mesa_format.
Add R10G10B10{A,X}2 translation between mesa_format and DRI format
to driGLFormatToImageFormat() and driImageFormatToGLFormat().
Bug: https://crbug.com/776093
Reviewed-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
I'm guessing an earlier version of the website used to have the page
contents in <frames>, but this isn't the case anymore so just drop the
unnecessary `target="_main"` :)
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
tgsi/scan: add hw atomic to the list of memory accessing files
This fixes 4 out of 5 cases in:
arb_framebuffer_no_attachments-atomic on cayman.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: "18.0 18.1" <mesa-stable@lists.freedesktop.org>