From http://lists.freedesktop.org/archives/mesa-dev/2015-May/084883.html:
"There are no real error cases here, just dead code.
validate_render() is supposed to make sure we never call these
functions if the code can't actually render the primitives. The
fprintf()+return branches should really just contain assert(0) or
equivalent."
I also rearranged the if-else-block in render_quad_strip_verts to look
more like the other functions. A future patch is going to change a
bunch of that code anyway.
v2: Make "unreachable" message more descriptive. Suggested by Iago.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Suggested-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Using C99 initializers for the primitive arrays makes things more
readable.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Suggested-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Using C99 initializers for the primitive arrays makes things more
readable.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Suggested-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Using C99 initializers for the primitive arrays makes things more
readable.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
With NIR, it actually hurts things.
total instructions in shared programs: 6529329 -> 6528888 (-0.01%)
instructions in affected programs: 14833 -> 14392 (-2.97%)
helped: 299
HURT: 1
In all affected programs I inspected (including the single hurt one) the
pass CSE'd some multiplies and caused some reassociation (e.g., caused
(A * B) * C to be A * (B * C)) when the original intermediate result was
reused elsewhere.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
i965/fs: Use backend_instruction in predicated break peephole.
We're not using any fs_inst fields, and the next commit will make the
peephole used by the vec4 backend.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
i965/fs: Remove SNB embedded-comparison support from optimizations.
We never emit IF instructions with an embedded comparison (lost in the
switch to NIR), so this code is not used. If we want to readd support,
we should have a pass that merges a CMP instruction with an IF or a
WHILE instruction after other optimizations have run.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
mesa: Add missing _mm_mfence() before streaming loads.
According to the Intel Software Development Manual (Volume 1: Basic
Architecture, 12.10.3 Streaming Load Hint Instruction):
Streaming loads may be weakly ordered and may appear to software to
execute out of order with respect to other memory operations.
Software must explicitly use fences (e.g. MFENCE) if it needs to
preserve order among streaming loads or between streaming loads and
other memory operations.
That is, a memory fence is needed to preserve the order between the GPU
writing the buffer and the streaming loads reading it back.
Reported-by: Joseph Nuzman <joseph.nuzman@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
There are three types of fast clears:
a. fast depth clears
b. fast singlesample color clears
c. fast multisample color clears
Function intel_miptree_is_fast_clear_capable() checks if a miptree
supports fast clears of type (b).
Rename the function to disambiguate what it does:
old: intel_miptree_is_fast_clear_capable
new: intel_miptree_supports_non_msrt_fast_clear
The functionally accidentally rejected multisampled color surfaces
because it thought they were singlesample array surfaces. Fix that by
explicitly rejecting surfaces with samples > 1.
This fix would have been needed before we enabled layered fast
singlesample color clears (introduced in gen8), which we want to do
eventually. For now, though, this patch changes no behavior; it just
fixes how the driver chooses its behavior.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
intel_tiling_supports_non_msrt_mcs() and
intel_miptree_is_fast_clear_capable() are not used outside of
intel_mipmap_tree.c.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
We need a virtual destructor when at least one of the class' methods is virtual.
Failure to do so might lead to undefined behavior when destructing derived classes.
Fixes the following warning:
brw_vec4_gs_visitor.cpp: In function 'const unsigned int* brw::brw_gs_emit(brw_context*, gl_shader_program*, brw_gs_compile*, void*, unsigned int*)':
brw_vec4_gs_visitor.cpp:703:11: warning: deleting object of polymorphic class type 'brw::vec4_gs_visitor' which has non-virtual destructor might cause undefined behaviour [-Wdelete-non-virtual-dtor]
delete gs;
Curro: This shouldn't be causing any actual bugs at the moment because
gen6_gs_visitor is the only subclass of vec4_visitor destroyed through
a pointer of a base class (vec4_gs_visitor *) and its destructor is
basically the same as its parent's. Anyway it seems sensible to change
this so it doesn't bite us in the future.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
glsl: set glsl error if binding qualifier used on global scope
Fixes following Piglit test:
global-scope-binding-qualifier.frag
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
i965: Assert on the number of combined UBO and SSBO binding table entries
In theory we can't break this assertion since the compiler frontend checks
that we don't exceed any of the individual limits, but it does not hurt to
be extra safe.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
i965: Reserve binding table space for SSBO surfaces
These share the space with UBO surfaces but we need to make sure we
allocate enough space for both sets (12 of each)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
i965: Don't print line numbers with INTEL_DEBUG=optimizer.
The thing you want to do with the output files is diff them, which is
made more difficult by line numbers changing.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
nv30: always go through translate module on big-endian
It seems like things are either coming in slighly wrong, or perhaps
uploaded incorrectly, but either way passing them through the translate
module seems to fix everything. Eventually we should figure out what's
going wrong and fix it "for real", but this should do for now.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
nv30: pretend to have packed texture/surface formats
This puts us in line with what the DDX/DRI2 st are expecting. It also
happens to work... no idea why, but seems better to have it work than to
ask lots of questions.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
Fixes Gallium based DRI drivers failing to load on big endian hosts
because they can't find any matching fbconfigs.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=71789
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Ilia Mirkin <imirkin@alum.mit.edu>
glsl: reduce memory footprint of uniform_storage struct
The uniform will only be of a single type so store the data for
opaque types in a single array.
Cc: Francisco Jerez <currojerez@riseup.net>
Cc: Ilia Mirkin <imirkin@alum.mit.edu>
nir: Introduce new nir_intrinsic_load_per_vertex_input intrinsics.
Geometry and tessellation shaders process multiple vertices; their
inputs are arrays indexed by the vertex number. While GLSL makes
this look like a normal array, it can be very different behind the
scenes.
On Intel hardware, all inputs for a particular vertex are stored
together - as if they were grouped into a single struct. This means
that consecutive elements of these top-level arrays are not contiguous.
In fact, they may sometimes be in completely disjoint memory segments.
NIR's existing load_input intrinsics are awkward for this case, as they
distill everything down to a single offset. We'd much rather keep the
vertex ID separate, but build up an offset as normal beyond that.
This patch introduces new nir_intrinsic_load_per_vertex_input
intrinsics to handle this case. They work like ordinary load_input
intrinsics, but have an extra source (src[0]) which represents the
outermost array index.
v2: Rebase on earlier refactors.
v3: Use ssa defs instead of nir_srcs, rebase on earlier refactors.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
nir/lower_io: Make get_io_offset() return a nir_ssa_def * for indirects.
get_io_offset() already walks the dereference chain and discovers
whether or not we have an indirect; we can just return that rather than
computing it a second time via deref_has_indirect(). This means moving
the call a bit earlier.
By returning a nir_ssa_def *, we can pass back both an existence flag
(via NULL checking the pointer) and the value in one parameter. It
also simplifies the code somewhat. nir_lower_samplers works in a
similar fashion.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
st/mesa: set force_persample_interp if ARB_sample_shading is used
This is only a half of the work. The next patch will handle
gl_SampleID/SamplePos, which is the other half of ARB_sample_shading.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
gallium: add per-sample interpolation control into rasterizer statOAe
Required by ARB_sample_shading for drivers that don't want a shader variant
in st/mesa.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Acked-by: Roland Scheidegger <sroland@vmware.com>