Improves GLB2.7 performance 0.13% +/- 0.09% (n=104/105, outliers removed).
More importantly, once color glClear()s are done through blorp in the next
commit, this reduces regression in GLES3 conformance tests that rely on
queueing up many glClear()s and having the GPU report being still busy in
an ARB_sync query after that.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Collects various statistical information for each shader
and total stats for contexts.
Printed with R600_DEBUG=sb,sbstat
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
In some cases we use value::gvn_source field to link values that
are known to be equal before gvn pass (e.g. results of DOT4 in different
slots of the same alu group), but then source value may become dead later
and this confuses further passes.
This patch resets value::gvn_source to NULL in the dce_cleanup pass
if it points to dead value.
Fixes segfault during shader optimization with ETQW.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
r600g/sb: use simple heuristic to limit register pressure
It's not a complete register pressure tracking, yet it helps to prevent
register allocation problems in some cases where they were observed.
The problems are uncovered by false dependencies between fetch instructions
introduced by some recent changes in TGSI and/or default backend.
Sometimes we have code like this:
...
SAMPLE R5.xyzw, R5.xyzw
... store R5.xyzw somewhere
MOV R5.x, <next x coord>
MOV R5.y, <next y coord>
SAMPLE R5.xyzw, R5.xyzw
... <may be repeated a lot of times>
With 2D resources, z and w in SAMPLE src reg aren't used and can be simply
masked, but shader backend doesn't have this information, so it's
considered as data dependency by optimization algorithms.
This results in more clean shader code and may improve the quality of
optimized code produced by r600-sb due to eliminated false dependencies
in some cases.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
The remaining bits happen to do nothing that
_swrast_span_render_start()/finish() don't do.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
intel: Move the S8 offset calc function near its remaining usage.
It's not really span code ever since we stopped using spans for S8.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
intel: Ensure renderbuffers are current when mapping them.
In the case of renering to windows in X, we would render to stale buffers
(or not render at all!) if you hit a MapRenderbuffer as the first thing
done to your window after new buffers are ready to be collected in DRI2.
I think this also covers the weird comment about irb->mt being missing
sometimes.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
mesa: Add a clarifying comment about rowStride of compressed textures.
I always forget how we do this for compressed textures.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
swrast: Always use MapTextureImage for mapping textures for swrast.
Now that everything goes through ImageSlices[], we can rely on the
driver's existing texture mapping function.
A big block of code goes away on Radeon that looks like it was to deal with
the validate that happened at SpanRenderStart, which no longer occurs since we
don't need validation for the MapTextureImage hook.
v2: Rewrite comment about ImageSlices, fix duplicated swImages, touch up
unmap loop.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
Reviewed-by: Brian Paul <brianp@vmware.com>
nouveau: Replace swrast_texture_image->Map usage with ->Buffer.
This code is trying to deal with providing a map in the case that
AllocTexImageBuffer was called, which is hooked up to the swrast variant.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
nouveau: Just use MapTextureImage instead of duplicating the logic.
MapTextureImage has the exact same logic, except it can also handle
swrast-allocated buffers.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
swrast: Make a teximage's stored RowStride be in terms of bytes per row.
For hardware drivers with pitch alignment requirements, a
non-power-of-two-sized texture format won't end up being an integer number
of pixels per row. Also, avoids having to change our units between
MapTextureImage's rowStride and swrast's RowStride.
This doesn't fully convert the compressed texel fetch path, but does make
sure we don't drop any bits (not that we'd expect to).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
swrast: Replace ImageOffsets with an ImageSlices pointer.
This is a step toward allowing drivers to use their normal mapping paths,
instead of requiring that all slice mappings come from an aligned offset
from the first slice's map.
This incidentally fixes missing slice handling in FXT1 swrast.
v2: Use slice height helper function.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
v2: Move slice height calculation to a helper function (recommeded by Brian).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
Reviewed-by: Brian Paul <brianp@vmware.com>
This function going to get used a lot more in upcoming patches.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Improves GLB2.7 performance on my HSW by 0.671455% +/- 0.225037% (n=62).
v2: Make is_valid_3src() a method of the fs_reg. (recommended by Ken)
Reviewed-by: Matt Turner <mattst88@gmail.com> (v1)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
intel: Be more conservative in disabling tiling to save memory.
Improves GLB2.7 trex performance 1.01985% +/- 0.721366% on my IVB (n=10)
and by 3.38771% +/- 0.584241% (n=15) on my HSW, due to a 32x32 ARGB8888
cubemap going from untiled to tiled.
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
i965: Disable Z16 on contexts that don't require it.
It appears that Z16 on Intel hardware is in fact slower than Z24, so
people are getting surprisingly hurt when trying to use Z16 as a
performance-versus-precision tradeoff, or when they're targeting GLES2 and
that's all you get.
GL 3.0+ have Z16 on the list of required exact format sizes, but GLES
doesn't, so choose the better-performing layout in that case. Improves
GLB 2.7 trex performance at 1920x1080 by 10.7% +/- 1.1% (n=3) on my IVB
system.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
mesa: Add an extra clarifying set of braces to getter checking.
For this multi-page single statement, my thought the end was to that the
next block was mis-indented, rather than that the dropped indentation
actually indicated the end of the loop.
mesa: Fix error checking for getters consisting of only API versions.
In almost all of our cases, getters that are turned on for only some API
variants will have an extension listed as one of the things that can
enable it, and thus api_check gets set. For extra_gl30_es3 (used for
NUM_EXTENSIONS, MAJOR_VERSION, MINOR_VERSION) on a GL 2.1 context, though,
we would check twice, not find either one, but never actually throw the
error.
While we may provide the extension, we need to tell applications that they
can't actually use it:
An implementation can either set QUERY_COUNTER_BITS_ARB to the
value 0, or to some number greater than or equal to n. If an
implementation returns 0 for QUERY_COUNTER_BITS_ARB, then the
occlusion queries will always return that zero samples passed the
occlusion test, and so an application should not use occlusion
queries on that implementation.
i965: Move is_math/is_tex/is_control_flow() to backend_instruction.
These are entirely based on the opcode, which is available in
backend_instruction. It makes sense to only implement them in one
place.
This changes the VS implementation of is_tex() slightly, which now
accepts FS_OPCODE_TXB and SHADER_OPCODE_LOD. However, since those
aren't generated in the VS anyway, it should be fine.
This also makes is_control_flow() available in the VS.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>