this needs revisiting, we really don't want to be flushing all 32 of these,
but currently we don't flush any of them, and it seems to have caused a regression
as reported on irc with doom3 on evergreen.
Signed-off-by: Dave Airlie <airlied@redhat.com>
r300/compiler: Fix dataflow analysis bug with ELSE blocks
Writes within ELSE blocks were being ignored which prevented us from
discovering all possible writers for some register values.
Fixes piglit glsl-fs-raytrace-bug27060
r600g: modify block to only emit the first few dirty registers. (v2)
This gets me from 2200 to 1978 dwords for a gears frame.
This is due to us having some 32-dwords blocks in the SPI, that we only
modify the first dwords off.
v2: fix dirty reg count from Bas Nieuwenhuizen
Signed-off-by: Dave Airlie <airlied@redhat.com>
This is a first step to decreasing the CPU usage, by decreasing how much
stuff we pass to the GPU and hence to the kernel CS checker.
This adds a check to see if the values we need to write are actually dirty,
and avoids writing if they are. However certain register need to always
be written so we add a new flag to say which ones should be always written
if used. (Note this could probably be done cleaner with a larger refactoring,
since I think the CONST_BUFFER_SIZE_PS/VS and CONST_CACHE_PS/VS might
be better off as a special state).
It also moves the need_bo to be a flags on the register now.
With this, a frame of gears goes from emitting 3k dwords to emitting 2k dwords,
and I'm sure it could get a lot smaller.
v2: fix some evergreen dirty bits.
Original patch from: Bas Nieuwenhuizen, I NIHed nearly the same thing
before seeing his patch on the list, oops.
Reviewed-by: Bas Nieuwenhuizen
Signed-off-by: Dave Airlie <airlied@redhat.com>
i965: Convert 3DPRIMITIVE command from struct-style to OUT_BATCH style.
Most of the newer portions of the code use OUT_BATCH style. I prefer
this style because it offers a clear distinction between a) hardware
messages/structures with a mandatory format, and b) data structures for
our own internal use that we can format however we want.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
i965: Allocate the whole URB to the VS and fix calculations for Gen6.
Since we never enable the GS on Sandybridge, there's no need to allocate
it any URB space.
Furthermore, the previous calculation was incorrect: it neglected to
multiply by nr_vs_entries, instead comparing whether twice the size of
a single VS URB entry was bigger than the entire URB space. It also
neglected to take into account that vs_size is in units of 128 byte
blocks, while urb_size is in bytes.
Despite the above problems, the calculations resulted in an acceptable
programming of the URB in most cases, at least on GT2.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
glsl: Emit a warning when the left-hand operand of a comma has no effect
The expression
x = y, 5, 3;
will generate
0:7(9): warning: left-hand operand of comma expression has no effect
The warning is only emitted for the left-hand operands, becuase the
right-most operand is the result of the expression. This could be
used in an assignment, etc.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Revert "intel: Add spans code for the ARB_texture_rg support."
This reverts what remains of commit
28bab24e16. It was garbage, trying to
use a MESA_FORMAT enum as a preprocessor token, and I don't know how I
thought it was even tested.
Reviewed-by: Brian Paul <brianp@vmware.com>
swrast: Don't try to adjust_colors for <8bpc when handling R16, RG1616.
The GL_RED and GL_RG were tricking this code into executing, but it's
totally unprepared for a 16-bit channel and just rescaled the values
down to 0. We don't have anything with <8bit channels alongside >8bit
channels, so disabling it should be safe.
Reviewed-by: Brian Paul <brianp@vmware.com>
mesa: Add renderbuffer accessors for R8/RG88/R16/RG1616.
This will replace the current (broken by trying to use an enum in the
preprocessor) spantmp2.h support I wrote for the intel driver.
Reviewed-by: Brian Paul <brianp@vmware.com>
intel: Use Mesa core's renderbuffer accessors for depth.
Since we're using GTT mappings now (no manual detiling), there's
really nothing special to accessing these buffers, other than needing
the new RowStride field of gl_renderbuffer to accomodate padding.
Reduces the driver size by 2.7kb, and improves glean depthStencil
performance 3-10x (!)
Reviewed-by: Brian Paul <brianp@vmware.com>
mesa: Add a gl_renderbuffer.RowStride field like textures have.
This will allow some drivers to reuse the core renderbuffer.c get/put
row functions in place of using the spantmp.h macros. Note that
unlike textures, we use a signed integer here to allow for handling
FBO orientation.
Reviewed-by: Brian Paul <brianp@vmware.com>
swrast: Don't assert against glReadPixels of GL_RED and GL_RG.
Everything appears to already be in place for this. Fixes aborts in:
ARB_texture_rg/fbo-alphatest-formats-float
ARB_texture_rg/fbo-blending-formats-float.
Reviewed-by: Brian Paul <brianp@vmware.com>
intel: Use _mesa_base_tex_format for FBO texture attachments.
The _mesa_base_fbo_format variant doesn't handle some texture
internalformats, such as "3".
Fixes:
fbo-blending-formats.
fbo-alphatest-formats
EXT_texture_sRGB/fbo-alphatest-formats
Reviewed-by: Brian Paul <brianp@vmware.com>
The very presence of this extension breaks things.
This should bring us closer to being able to run Unigine Heaven.
The extension will be re-enabled once gl_InstanceID is implemented.
This was copy-and-paste from originally trying to get DP read/write
working reliably, and notably for other common messages (URB, sampler)
we weren't doing this.
Most of this is code movement to get the scratch space allocated in a
shared location. Other than that, the only real changes are that the
old oword block messages now operate on oword-aligned areas (with new
messages for unaligned access, which we don't do), and that the
caching control is in the SFID part of the descriptor instead of
message control.
Fixes glsl-fs-convolution-1.
It was accepting only GL_DUDV_ATI and not the specific sized format
GL_DU8DV8_ATI. Fixes assertion failure at startup in Shadowgrounds.
Reviewed-by: Brian Paul <brianp@vmware.com>
glcpp: Fix attempts to expand recursive macros infinitely (bug #32835).
The 095-recursive-define test case was triggering infinite recursion
with the following test case:
#define A(a, b) B(a, b)
#define C A(0, C)
C
Here's what was happening:
1. "C" was pushed onto the active list to expand the C node
2. While expanding the "0" argument, the active list would be
emptied by the code at the end of _glcpp_parser_expand_token_list
3. When expanding the "C" argument, the active list was now empty,
so lather, rinse, repeat.
We fix this by adjusting the final popping at the end of
_glcpp_parser_expand_token_list to never pop more nodes then this
particular invocation had pushed itself. This is as simple as saving
the original state of the active list, and then interrupting the
popping when we reach this same state.
With this fix, all of the glcpp-test tests now pass.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=32835
Signed-off-by: Carl Worth <cworth@cworth.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-and-tested-by: Kenneth Graunke <kenneth@whitecape.org>
gallium: include u_format.h for util_format_is_supported
Without it gcc complains:
nv50_screen.c: In function ‘nv50_screen_is_format_supported’:
nv50_screen.c:48: warning: implicit declaration of function ‘util_format_is_supported’
and handles it wrongly - util_format_is_supported returns boolean, which is typedef'ed
to uchar, but function without prototype is assumed to return int.
For me nv50_screen_is_format_supported was returning true for float formats without
--enable-texture-float...
nvc0: work around what looks like a code prefetch bug
Sounds very unlikely, but I don't have a better explanation at the
moment.
The GPU throws page faults at the first page after the code buffer
quite frequently on startup, and traces don't show us overflowing.