vbo: implement primitive merging for glBegin/End sequences
A surprising number of apps and benchmarks have poor code like this:
glBegin(GL_LINE_STRIP);
glVertex(v1);
glVertex(v2);
glEnd();
// Possibly some no-op state changes here
glBegin(GL_LINE_STRIP);
glVertex(v3);
glVertex(v4);
glEnd();
// repeat many, many times.
The above sequence can be converted into:
glBegin(GL_LINES);
glVertex(v1);
glVertex(v2);
glVertex(v3);
glVertex(v4);
glEnd();
Similarly for GL_POINTS, GL_TRIANGLES, etc.
Merging was already implemented for GL_QUADS in the display list code.
Now other prim types are handled and it's also done for immediate mode.
In one case:
before after
-----------------------------------------------
number of st_draw_vbo() calls: 141 45
number of _mesa_prims issued: 7520 632
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
gallium lies. buffer_size is not actually buffer_size but available
size, which is 'buffer_size - buffer_offset' so by adding buffer
offset we'd incorrectly compute overflow.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
tgsi/ureg: make the dst register match the src indirection
In ureg src registers could have an indirect register that was
either a temp or an addr register, while dst registers allowed
only addr. That made moving between them a little difficult so
make them behave the same way and allow temp's and addr registers
as indirect files for both (tgsi supports it, just ureg didn't).
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
gallium: tgsi documentation updates and clarification for integer opcodes.
A lot of them were missing. Others were moved from the Compute ISA
to a new Integer ISA section as that seemed more appropriate.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Eliminating this we no longer need to copy between linear and swizzled layout.
This is probably not quite ideal since it's a bit more work for now, could do
some optimizations by moving depth testing outside the fragment shader loop
(but tricky for early depth test as we don't have neither the mask nor the
interpolated z in the right order handy).
The large amount of tile/untile code is no longer needed will be deleted
in next commit.
No piglit regressions.
v2: change a forgotten LAYOUT_NONE to LAYOUT_LINEAR.
v3: fix (bogus) uninitialized variable warnings, add comments, fix a bad type
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Assigning a struct only copies the members - any padding is left as is.
Thus this code:
struct foo_t foo;
foo = bar;
leaves the padding of foo intact, ie uninitialized random garbage.
This patch fixes constant shader recompiles by initializing the struct
to zero. For completeness, memcpy is used to copy the key to the shader
struct.
NOTE: This is a candidate for the stable branches.
Signed-off-by: Lauri Kasanen <cand@gmx.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Signed-off-by: Andreas Boll <andreas.boll.dev@gmail.com>
v2: Removed extra libs as requested by Matt Turner.
Signed-off-by: Lauri Kasanen <cand@gmx.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Andreas Boll <andreas.boll.dev@gmail.com>
One build system for linux/unix only drivers should be enough.
Additionally the nouveau target was disabled anyway.
Acked-by: Jose Fonseca <jfonseca@vmware.com>
r600g/sb: fix handling of interference sets in post_scheduler
post_scheduler clears interference set for reallocatable values when
the value becomes live first time, and then updates it to take into
account modified order of operations, but this was not handled properly
if the value appears first time as a source in copy operation.
Fixes issues with webgl demo: http://madebyevan.com/webgl-water/
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
r600g/sb: fix allocation of indirectly addressed input arrays
Some inputs may be preloaded into predefined GPRs,
so we can't reallocate arrays with such inputs.
Fixes issues with webgl demo: http://oos.moxiecode.com/js_webgl/snake/
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
New disassembler is not completely isolated yet from further processing
in r600g/sb that is not required for printing the dump, so it has higher
probability to fail in case of any unexpected features in the bytecode.
This patch adds "sbdisasm" flag for R600_DEBUG that allows to use new
disassembler in r600g/sb for shader dumps when shader optimization
is not enabled.
If shader optimization is enabled, new disassembler is used by default.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
draw: Update for u_assembled_primitive -> u_assembled_prim rename.
Mesa build is too complex to rely on successful builds. On refactorings
it is always a good idea to use git grep to prevent missing cases:
$ git grep u_assembled_primitive
src/gallium/auxiliary/draw/draw_pt_fetch_shade_pipeline_llvm.c: u_assembled_primitive(in_prim);
The differences from the previous releases that affect st/egl are
- logging macros are prefixed with an 'A'
- dequeueBuffer() and enqueueBuffer() require an additoinal argument for
fence fd, acquired from libsync
Additionally, include gralloc_drm.h with extern "C".
The function returns the number of reduced/tessellated primitives for the
given vertex count.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Acked-by: Zack Rusin <zackr@vmware.com>
util/prim: assorted fixes for u_decomposed_prims_for_vertices()
Switch to '>=' for comparisons, and it becomes obvious that the comparison for
PIPE_PRIM_QUAD_STRIP was wrong.
Add minimum vertex count check for PIPE_PRIM_LINE_LOOP. Return 1 for
PIPE_PRIM_POLYGON with 3 vertices.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Acked-by: Zack Rusin <zackr@vmware.com>
util/prim: use vertex count info in u_validate_pipe_prim()
As a side effect, primitives with adjacency are now correctly validated.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Acked-by: Zack Rusin <zackr@vmware.com>
Move together (or add) functions to decompose/reduce/assemble a primitive,
give them consistent names, and document them. Add u_prim_vertex_count() so
that the vertex count information can be used elsewhere.
u_assembled_primitive() will be removed in a folow-on commit.
[olv: fix a warning when -Wold-style-declaration is enabled]
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Acked-by: Zack Rusin <zackr@vmware.com>
While this is ignorant of dependency control, it's still good for a 0.39%
+/- 0.08% performance improvement on GLBenchmark 2.7 (n=548)
v2: Rewrite as a subclass of the base class for the FS instruction
scheduler, inheriting the same latency information.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
i965: Pull a couple of FS scheduling functions out to methods.
These will get virtualized as we add VS scheduling support.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
i965: Share the register file enum between the two backends.
I need this so I can look at vec4 and fs registers' files from the same
.cpp file without namespaces. As far as I can tell we never rely on the
particular numerical values of the files, though I thought it sounded like
a good idea when doing the VS (it turns out having 0 be BAD_FILE is nicer).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
i965/vs: Do round-robin register allocation on gen6+ like we do in the FS.
This will free instruction scheduling to make better choices. No
statistically significant performance difference on GLB2.7 (n=93).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
draw/gs: don't crash when vs/gs signatures don't match
instead of crashing just fill zeros at the input slots that don't
match, that's the mandated behavior and it avoids debug asserts.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
It's valid because we reuse certain arithmetic operations
for both signed and unsigned types (e.g. uadd, umad, which
have a bit unfortunate naming)
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
i965: Fix SNB GPU hangs when a blorp batch is the first thing to execute.
The GPU apparently goes looking for constants even though there are no
shader stages enabled, and gets stuck because we haven't told it there are
no constants to collect. If any other user of the 3D pipeline had run
(even the Render accel of the X server!) since power on, then the in-GPU
constant buffers would have been set up with some contents we didn't use,
and we would succeed.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=56416
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Dave Airlie <airlied@redhat.com>
NOTE: This is a candidate for the stable branches.
r600g: Don't set the dest cache bits on surface sync for R600_CONTEXT_FLUSH_AND_INV
We are already emitting a EVENT_TYPE_CACHE_FLUSH_AND_INV_EVENT packet
when this flush flag is set, so flushing the dest caches with a
SURFACE_SYNC should not be necessary.
The motivation for this change is that emitting a SURFACE_SYNC packet with
the CB bits set was causing compute shaders to hang on Cayman.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>