Excluding the existing liveout bits is a deviation from the textbook
algorithm. The reason for doing so was to determine if the value
changed, which means the fixed-point algorithm needs to run for another
iteration.
The simpler way to do that is to save the value from step (N-1) and
compare it to the new value at step N.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
i965/fs: Create the COPY() set for use in copy propagation dataflow.
This is the "COPY" set from Muchnick's textbook, which is necessary
to do the dataflow algorithm correctly.
v2: Simplify initialization based on Paul Berry's observation that
out_acp contains exactly what needs to be in the COPY set.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
i965/fs: Rename setup_kills() to setup_initial_values().
Although this function currently only initializes the KILL set, it will
soon initialize other data flow sets as well.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
To compute the actual liveout/livein data flow values, we start with
some initial values and apply a fixed-point algorithm until they settle.
Previously, we iterated through all blocks, updating both liveout and
livein together in one pass. This is awkward, since computing livein
for a block requires knowing liveout for all parent blocks. Not all
of those parent blocks may have been processed yet.
This patch separates the two. First, we update liveout for all blocks.
At iteration N of the fixed-point algorithm, this uses livein values
from iteration N-1. Secondly, we update livein for all blocks. At
step N, this uses the liveout information we just computed (in step N).
This ensures each computation has a consistent picture of the data,
rather than seeing an random mix of data from steps N-1 and N depending
on the order of the blocks in the CFG data structure.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
i965/fs: Rename "cont" to "progress" in dataflow algorithm.
This variable indicates that the fixed-point algorithm made changes to
the data at this step, so it needs to run for another iteration.
"progress" seems a nicer name for that.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
i965/fs: Switch to a do-while loop in copy propagation dataflow.
The fixed-point algorithm needs to run at least once, so a do-while loop
is more natural.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
The dataflow analysis used for global copy propagation is severely
broken, and I believe it doesn't actually do anything. Fixing it will
require a lot of changes, each of which might break things.
Once all the fixes land, we can re-enable this.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Any decent compiler will do this for us, although doing this
will make grepping through the code alot easier.
v2: In both mixer and query interface
v3: rebase
Reviewed-by: Christian König <christian.koenig@amd.com> [v1]
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Code should loop through and cleanup the three (VL_NUM_COMPONENTS) idct
buffers, rather than doing the first one three times.
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Check if we have successfully allocated memory.
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
st/xvmc: exit gracefully if we fail to create video buffer
Free any allocated memory and return BadAlloc if create_video_buffer()
has failed to create a buffer.
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
st/vdpau: don't try to create video buffer when the format is FORMAT_NONE
Not seen in the wild yet, but seems like a reasonable thing to do.
[suggested by Christian]
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
I was looking into some minor 422 issues/discrepencies I noticed long
ago using vdpau on my rv790.
I noticed that there is code that is halving height rather than width -
422 is full height AFAIK.
Making the changes below doesn't actually make any noticable difference
to what I was looking into.
Maybe there are more but here's three I've found so far
Reviewed-by: Christian König <christian.koenig@amd.com>
i965: STATIC_ASSERT that there aren't too many BRW_NEW_* flags.
We are getting close to the maximum number of BRW_NEW_* bits that can
be stored in brw->state.dirty.brw without overflowing 32 bits, and
geometry shaders are going to add more. Add a STATIC_ASSERT so that
we will be alerted when we need to switch to 64 bits.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
glsl: don't eliminate texcoords that can be set by GL_COORD_REPLACE
Tested by examining generated TGSI shaders from piglit/glsl-routing.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Henri Verbeet <hverbeet@gmail.com>
Tested-by: Henri Verbeet <hverbeet@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
nv50: allow non-nv12 buffers to be created, just pass them through to vl
Since we expose non-NV12 formats as supported when there is no decoer
profile selected, make sure that those formats are actually allowed to
be allocated.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Tested-by: Emil Velikov <emil.l.velikov@gmail.com>
Cc: "9.2" <mesa-stable@lists.freedesktop.org>
Previously, we were asserting that each driver specified an NConfigOptions
exactly equal to the number of options they supplied, leading to frequent
bugs when people would forget to adjust the value when adjusting driver
options. Instead, just overallocate the table by a bit and leave sanity
checking to the assert in findOption().
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
i965: Improve comments for driver hooks in intel_buffer_object.c.
Consistently using a "The ___ driver hook." line at the the top of each
function's comment block makes it easy to see at a glance what function
is being implemented.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
i965: Split intel_upload code out into a separate file.
This code upload performs batched uploads via a BO. By moving it out to
a separate file, intel_buffer_objects.c only provides the core buffer
object functionality.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
i965: Move GL_APPLE_object_purgeable functionality into a new file.
GL_APPLE_object_purgeable creates a mechanism for marking OpenGL objects
as "purgeable" so they can be thrown away when system resources become
scarce. It specifically applies to buffer objects, textures, and
renderbuffers.
The intel_buffer_objects.c file provides core functionality for GL
buffer objects, such as MapBufferRange and CopyBufferSubData. Having
texture and renderbuffer functionality in that file is a bit strange.
The 2010 copyright on the new file is because Chris Wilson first added
this code in January 2010 (commit 755915fa).
v2: Actually remember to call the new dd table setup function.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Clover needs the option component of llvm.
Reviewed-by: Tom Stellard <tom@stellard.net>
Signed-off-by: Niels Ole Salscheider <niels_ole@salscheider-online.de>
radeonsi: don't make scanout resources linear except for cursors
The surface allocator understands the scanout flag just fine.
This seems to improve performance for Ubuntu Unity on top of st/xorg
and it fixes the cursor.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
This started as an attempt to add support for MSAA texture transfers and
MSAA depth-stencil decompression for the DB->CB copy path.
It has gotten a bit out of control, but it's for the greater good.
Some changes do not make much sense, they are there just to make it look
like the other driver.
With a few cosmetic modifications, r600_texture.c can be shared with
a symlink.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
This is glBlitFramebuffer support for MSAA surfaces as required by GL 3.0
and texturing as required by GL 3.2 and GL_ARB_texture_multisample.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
radeonsi: implement uncompressed MSAA rendering and color resolving
This is basic MSAA support which should work with most apps.
Some features are missing, those will be implemented by other commits.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
radeonsi: add flexible shader descriptor management and use it for sampler views
It moves all sampler view descriptors to a buffer.
It supports partial resource updates and it can also unbind resources
(required for FMASK texturing).
The buffer contains all sampler view descriptors for one shader stage,
represented as an array. On top of that, there are N arrays in the buffer,
which are used to emulate context registers as implemented by the previous
ASICs (each array is a context).
This uses the RCU synchronization approach to avoid read-after-write hazards
as discussed in the thread:
"radeonsi: add FMASK texture binding slots and resource setup"
CP DMA is used to clear the descriptors at context initialization and to copy
the descriptors from one context to the next.
v2: - use PKT3_DMA_DATA on CIK (I'll test CIK later)
- turn the bool CP DMA parameters into self-explanatory flags
- add a nice simple API for packet emission to radeon_winsys.h
- use 256 contexts, 128 causes texture corruption in openarena
radeonsi/compute: Let the state tracker do all the flushing
It shouldn't be necessary to call radeon_winsys::cs_flush() from
radeonsi_launch_grid(), because the state tracker is responsible for
flushing the pipeline at the appropriate time. The current behavior is
also wrong, because radeonsi_launch_grid() submits packets to the
compute ring, but when the state tracker calls pipe->flush() everything
is submitted to the graphics ring. This has the potential to create a
race condition.
The downside of removing this flush is that the compute dispatch packets
will be sent to the graphics ring rather than the compute ring.
In the future we will need to come up with a way to detect 'compute'
command streams and submit them to the appropriate ring.
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
i965: Dump more information about batch buffer usage.
Previously, INTEL_DEBUG=bat would dump messages like:
intel_mipmap_tree.c:1643: Batchbuffer flush with 456b used
This only reported the space used for command packets, and didn't
report any information on the space used for indirect state.
Now it dumps:
intel_context.c:366: Batchbuffer flush with 6128b (pkt) + 4288b (state)
= 10416b (31.8%)
This conveniently shows the breakdown of space used for packets vs.
state, as well as the percentage of batchbuffer space.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
i965: Add Gen7 depth stall flushes before disabling depth in BLORP.
We emit these before configuring depth in the normal path, or actually
using the depth buffer in BLORP - we just failed to emit them when
disabling depth altogether.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
i965: Add Gen6 depth stall flushes before disabling depth in BLORP.
We emit these before configuring depth in the normal path, or actually
using the depth buffer in BLORP - we just failed to emit them when
disabling depth altogether.
On Sandybridge, this also requires the post_sync_nonzero flush.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
i965: Don't copy propagate bitcasts with source modifiers.
Previously, copy propagation would cause bitcast_f2u(abs(float)) to
be performed in a single step, but the application of source modifiers
(abs, neg) happens after type conversion, leading to incorrect results.
That is, for bitcast_f2u(abs(float)) we would in fact generate code to
do abs(bitcast_f2u(float)).
For example, whereas bitcast_f2u(abs(float)) might result in a register
argument such as
(abs)g2.2<0,1,0>UD
v2: Set interfered = true and break in register_coalesce instead of
returning false.
Reviewed-by: Paul Berry <stereoytpe441@gmail.com>
Necessary to avoid combining a bitcast and a modifier into a single
operation. Otherwise if safe, the MOV should be removed by
copy-propagation or register coalescing.
With this and the next patch, there are only four changes in shader-db:
all a single extra instruction. The code does something like
mov a.w, -b.x
and copy propagation doesn't work because it only handles no-op
swizzles. Seems acceptable, given the known limitation of our copy
propagation.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Paul Berry <stereoytpe441@gmail.com>