winsys/radeon: move managing GEM domains back to drivers
This partially reverts commit 363ff84475.
It caused severe performance drops in Nexuiz. Reported by Phoronix.
Tested by me on r300g and by IRC people on r600g.
i965 gen6: Fix incorrect order of dwords in gen6_update_sol_indices()
When updating SOL indices, we were accidentally putting the starting
index in dword 1 and the SVBI number to increment in dword 2--these
should be reversed. Usually both of these values are zero, so we
didn't see any problem. However, if a transform feedback operation
spans multiple batch buffers, the starting index will be nonzero.
Fixes piglit test "EXT_transform_feedback/intervening-read output".
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
i965 gen6: Fix transform feedback of triangle strips.
When rendering triangle strips, vertices come down the pipeline in the
order specified, even though this causes alternate triangles to have
reversed winding order. For example, if the vertices are ABCDE, then
the GS is invoked on triangles ABC, BCD, and CDE, even though this
means that triangle BCD is in the reverse of the normal winding order.
The hardware automatically flags the triangles with reversed winding
order as _3DPRIM_TRISTRIP_REVERSE, so that face culling and two-sided
coloring can be adjusted to account for the reversed order.
In order to ensure that winding order is correct when streaming
vertices out to a transform feedback buffer, we need to alter the
ordering of BCD to BDC when the first provoking vertex convention is
in use, and to CBD when the last provoking vertex convention is in
use.
To do this, we precompute an array of indices indicating where each
vertex will be placed in the transform feedback buffer; normally this
is SVBI[0] + (0, 1, 2), indicating that vertex order should be
preserved. When the primitive type is _3DPRIM_TRISTRIP_REVERSE, we
change this order to either SVBI[0] + (0, 2, 1) or SVBI[0] + (1, 0,
2), depending on the provoking vertex convention.
Fixes piglit tests "EXT_transform_feedback/tessellation
triangle_strip" on Gen6.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The code for storing 1D, 2D and 3D tex images (whole or sub-images) was
all pretty similar. This consolidates those six paths.
v2: rework switch statement to catch unexpected targets
Reviewed-by: José Fonseca <jfonseca@vmware.com>
mesa: fix _mesa_store_texsubimage2d() for GL_TEXTURE_1D_ARRAY
For 1D arrays, map each slice separately. Note that this was handled
correctly in _mesa_store_teximage2d() but not here.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
This gets rid of another renderbuffer->PutRow() call and _DepthBuffer
usage. We always work with 32-bit uint Z values now.
Reviewed-by: Eric Anholt <eric@anholt.net>
Use Map/UnmapRenderbuffer() for the special, optimized cases we care about.
Note that we're dropping some seldom-used cases in the new fast-path
code: as CI->RGB conversion and zooming.
Reviewed-by: Eric Anholt <eric@anholt.net>
swrast: move swrast_render_start/finish() call in drawpixels code
We don't want to call these functions where we'll be using
Map/UnmapRenderbuffer(). So push them further down in the drawpixels
cases so that we can switch over to Map/UnmapRenderbuffer() step by step.
Reviewed-by: Eric Anholt <eric@anholt.net>
swrast: new fast_draw_depth_stencil() for glDrawPixels(GL_DEPTH_STENCIL)
Stop using deprecated renderbuffer PutRow() function. Note that we
aren't using Map/UnmapRenderbuffer() yet because this call is inside
a swrast_render_start/finish() pair.
v2: use _mesa_pack_uint_24_8_depth_stencil_row(), per Eric.
swrast: remove the copy_depth_stencil_pixels() function
Hopefully glCopyPixels(GL_DEPTH_STENCIL) will be handled by the
fast copy function. Otherwise, just do the copy with separate
depth + stencil copies. That's effectively what the removed code
did anyway.
Reviewed-by: Eric Anholt <eric@anholt.net>
swrast: stop using depth/stencil wrappers in CopyPixels code
The functions that read depth/stencil values understand all (packed)
depth/stencil buffer formats now so there's no reason to use the
wrappers.
Also, improve the format checks in fast_copy_pixels() to catch mismatched
depth/stencil cases.
v2: fix the test for combined depth+stencil buffers, per Eric.
Stop using the deprecated renderbuffer Get/Put Row/Values functions.
Consolidate code paths, etc. The file is nearly half the size it used
to be!
Reviewed-by: Eric Anholt <eric@anholt.net>
Use format pack/unpack functions instead of deprecated renderbuffer
GetRow/PutRow functions.
v2: use get_stencil_address(), s/destVals/newVals/
Reviewed-by: Eric Anholt <eric@anholt.net>
mesa: remove gl_renderbufer::PutMonoRow() and PutMonoValues()
The former was only used for clearing buffers. The later wasn't used
anywhere! Remove them and all implementations of those functions.
Reviewed-by: Eric Anholt <eric@anholt.net>
swrast: do depth/stencil clearing with Map/UnmapRenderbuffer()
Another step toward getting rid of the renderbuffer PutRow/etc functions.
v2: fix assorted depth/stencil clear bugs found by Eric
Reviewed-by: José Fonseca <jfonseca@vmware.com>
mesa: move the format and type check before select_tex_image()
Move the format and type check before select_tex_image, or it will fail to
report the mismatch error if the teximage is null.
Reported-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Jian Zhao <jian.j.zhao@intel.com>
Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Signed-off-by: Brian Paul <brianp@vmware.com>
Fixed the build failure, fixed a warning where attributs and error arguments had
been
inverted and fixed another call that was missing an argument.
Signed-off-by: José Fonseca <jfonseca@vmware.com>
Fixes almost all of the transform feedback piglit tests. Remaining
are a few tests related to tesselation for
quads/trifans/tristrips/polygons with flat shading.
v2: Incorporate Paul's feedback (squash with previous, state flag note,
static assert, update FINISHME)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
Reviewed-by: Paul Berry <stereotype441@gmail.com>
i965/gen7: Move SOL stage disable to gen7_sol_state.c
We'll be growing more code in here as we actually enable the unit.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
i965/gen7: Add register definitions for GL_EXT_transform_feedback.
v2: Make the buffer enable bitfield take an index argument.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
The code was relying on gs.prog_data's copy of the
number-of-verts-per-prim, which segfaulted on gen7 since it doesn't
make a GS program. We can easily calculate that value right here.
v2: Fix svbi_0_starting_index regression.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
i965 Gen6+: Invalidate VF address-based cache on flush
Although there is not much documentation of this fact, there are in
fact two separate VF caches:
- an "index-based" cache (described in the Sandy Bridge PRM, vol 2
part 1, section 2.1.2 "Vertex Cache"). This cache stores URB
handles of vertex shader outputs; its purpose is to avoid redundant
invocations of the vertex shader when drawing in random access mode
(e.g. glDrawElements()), and the same vertex index is specified
multiple times. It is automatically invalidated between
3D_PRIMITIVE commands and between instances within a single
3D_PRIMITIVE command.
- an "address-based" cache (mentioned briefly in vol 2 part 1, section
1.7.4 "PIPE_CONTROL Command"). This cache stores the data read from
vertex buffers; its purpose is to avoid redundant memory accesses
when doing instanced drawing or when multiple 3D_PRIMITIVE commands
access the same vertex data. It needs to be manually invalidated
whenever new data is written to a buffer that is used for vertex
data.
Previous to this patch, it was not necessary for Mesa to explicitly
invalidate the address-based cache, because there were no reasonable
use cases in which the GPU would write to a vertex data buffer during
a batch, and inter-batch flushing was taken care of by the kernel.
However, with transform feedback, there is now a reasonable use case:
vertex data is written to a buffer using transform feedback, and then
that data is immediately re-used as vertex input in the next drawing
operation. To make this use case work, we need to flush the
address-based VF cache between transform feedback and the next draw
operation. Since we are already calling
intel_batchbuffer_emit_mi_flush() when transform feedback completes,
and intel_batchbuffer_emit_mi_flush() is intended to invalidate all
caches, it seems reasonable to add VF cache invalidation to this
function.
As with commit 63cf7fad13 (i965: Flush
pipeline on EndTransformFeedback), this is not an ideal solution. It
would be preferable to only invalidate the VF cache if the next draw
call was about to consume data generated by a previous draw call in
the same batch. However, since we don't have the necessary dependency
tracking infrastructure to figure that out right now, we have to
overzealously invalidate the cache.
Fixes Piglit test "EXT_transform_feedback/immediate-reuse".
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
i965 gen6: Resend binding table pointer after updating SOL bindings.
After creating new binding table entries for transform feedback, we
need to set the dirty flag BRW_NEW_SURFACES, so that a new binding
table pointer will be sent to the hardware. Otherwise the new binding
table entries will not take effect.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>