Nicolai Hähnle
77c81164bc
radeonsi: support ARB_compute_variable_group_size
Not sure if it's possible to avoid programming the block size twice (once for
the userdata and once for the dispatch).
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
9 years ago
Marek Olšák
1b37e5541c
radeonsi: fix interpolateAt opcodes for .zw components
Not returning garbage in .zw seems pretty important.
This fixes:
GL45-CTS.shader_multisample_interpolation.render.interpolate_at_*_check.*
Cc: 11.2 12.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
9 years ago
Marek Olšák
d4a8bf89ce
radeonsi: interpolate colors after interpolation weight shuffling
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
9 years ago
Nicolai Hähnle
8b1f9fd3b3
radeonsi: optionally run the LLVM IR verifier pass
This is enabled automatically if shader printing is enabled, or separately
by R600_DEBUG=checkir. Catch mal-formed IR before it crashes in a later
pass.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
9 years ago
Marek Olšák
71a5cf6f3b
radeonsi: don't declare LDS in PS when ds_bpermute is used
I guess this is not needed because dead code elimination removes
the declaration.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
9 years ago
Marek Olšák
b2a694f079
radeonsi: use DDX/DDY directly in si_llvm_emit_ddxy_interp
We can finally do this, because the opcodes are scalar now.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
9 years ago
Marek Olšák
b57aef8033
radeonsi: simplify si_llvm_emit_ddxy
si_llvm_emit_ddxy is called once per element, so we don't have to generate
code for 4 elements at once.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
9 years ago
Marek Olšák
046c199c3a
radeonsi: don't call build_gep0 in si_llvm_emit_ddxy on VI
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
9 years ago
Marek Olšák
bcc55e1f32
radeonsi: use a helper function for BuildGEP(0, x)
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
9 years ago
Marek Olšák
e20f7142a3
radeonsi: remove obsolete shader definitions
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
9 years ago
Marek Olšák
8c6ea5a6ff
radeonsi: remove unnecessary #includes
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
9 years ago
Marek Olšák
ab29788250
radeonsi: reload PS inputs with direct indexing at each use (v2)
The LLVM compiler can CSE interp intrinsics thanks to
LLVMReadNoneAttribute.
26011 shaders in 14651 tests
Totals:
SGPRS: 1146340
-> 1132676
(-1.19 %)
VGPRS: 727371 -> 711730 (-2.15 %)
Spilled SGPRs: 2218 -> 2078 (-6.31 %)
Spilled VGPRs: 369 -> 369 (0.00 %)
Scratch VGPRs: 1344 -> 1344 (0.00 %) dwords per thread
Code Size: 35841268
-> 36009732
(0.47 %) bytes
LDS: 767 -> 767 (0.00 %) blocks
Max Waves: 222559 -> 224779 (1.00 %)
Wait states: 0 -> 0 (0.00 %)
v2: don't call load_input for fragment shaders in emit_declaration
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
9 years ago
Marek Olšák
007b512f9d
radeonsi: get rid of constant buffer preloading
26011 shaders in 14651 tests
Totals:
SGPRS: 1152636
-> 1146340
(-0.55 %)
VGPRS: 728198 -> 727371 (-0.11 %)
Spilled SGPRs: 3776 -> 2218 (-41.26 %)
Spilled VGPRs: 369 -> 369 (0.00 %)
Scratch VGPRs: 1344 -> 1344 (0.00 %) dwords per thread
Code Size: 35835152
-> 35841268
(0.02 %) bytes
LDS: 767 -> 767 (0.00 %) blocks
Max Waves: 222372 -> 222559 (0.08 %)
Wait states: 0 -> 0 (0.00 %)
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
9 years ago
Marek Olšák
16be87c904
radeonsi: get rid of img/buf/sampler descriptor preloading (v2)
26011 shaders in 14651 tests
Totals:
SGPRS: 1251920
-> 1152636
(-7.93 %)
VGPRS: 728421 -> 728198 (-0.03 %)
Spilled SGPRs: 16644 -> 3776 (-77.31 %)
Spilled VGPRs: 369 -> 369 (0.00 %)
Scratch VGPRs: 1344 -> 1344 (0.00 %) dwords per thread
Code Size: 36001064
-> 35835152
(-0.46 %) bytes
LDS: 767 -> 767 (0.00 %) blocks
Max Waves: 222221 -> 222372 (0.07 %)
Wait states: 0 -> 0 (0.00 %)
v2: merge codepaths where possible
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
9 years ago
Marek Olšák
22797d7d83
radeonsi: rename get_sampler_desc -> load_sampler_desc
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
9 years ago
Marek Olšák
5f0a8fbcc8
radeonsi: cosmetic changes in si_shader.c
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
9 years ago
Marek Olšák
afaf27bff3
radeonsi: load streamout buffer descriptors before use (v2)
v2: inline the code and remove the conditional that's a no-op now
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
9 years ago
Marek Olšák
15a127bc2c
radeonsi: fix FP64 UBO loads with indirect uniform block indexing
No known tests.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
9 years ago
Marek Olšák
275c073c6a
radeonsi: export SampleMask from pixel shaders at full rate
Heaven and Valley write gl_SampleMask and not Z.
Use 16_ABGR instead of 32_ABGR if Z isn't written.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
9 years ago
Marek Olšák
546bc07349
radeonsi: don't preload constants at the beginning of shaders
LLVM can CSE the loads, thus we can always re-load constants before each
use. The decrease in SGPR spilling is huge.
The best improvements are the dumbest ones.
26011 shaders in 14651 tests
Totals:
SGPRS: 1453346
-> 1251920
(-13.86 %)
VGPRS: 742576 -> 728421 (-1.91 %)
Spilled SGPRs: 52298 -> 16644 (-68.17 %)
Spilled VGPRs: 397 -> 369 (-7.05 %)
Scratch VGPRs: 1372 -> 1344 (-2.04 %) dwords per thread
Code Size: 36136488
-> 36001064
(-0.37 %) bytes
LDS: 767 -> 767 (0.00 %) blocks
Max Waves: 219315 -> 222221 (1.33 %)
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
9 years ago
Marek Olšák
63da0c991d
radeonsi: fix Gather4 with integer formats
The closed compiler does the same thing.
This fixes: GL45-CTS.texture_gather.*-int-* (18 tests)
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
9 years ago
Marek Olšák
3e756f09d4
radeonsi: fix a crash in imageSize for cubemap arrays
Sometimes it was f32, other times it was i32. Now it's always i32.
This fixes:
GL45-CTS.texture_cube_map_array.image_texture_size.texture_size_compute_sh
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
9 years ago
Marek Olšák
03708deed2
radeonsi: fix gl_PatchVerticesIn for tessellation evaluation shader
This fixes:
GL45-CTS.tessellation_shader.tessellation_control_to_tessellation_evaluation
.gl_PatchVerticesIn
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
9 years ago
Marek Olšák
2975230fdc
radeonsi: always use the same function signature for llvm.SI.export
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
9 years ago
Tom Stellard
63ed11cde9
radeonsi: Don't use global variables for tess lds
We were allocating global variables for the maximum LDS size
which made the compiler think we were using all of LDS, which
isn't the case.
Reviewed-By: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
9 years ago
Nicolai Hähnle
ea283779be
gallium/radeon: add radeon_llvm_bound_index for bounds checking
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
9 years ago
Nicolai Hähnle
6bba956073
gallium/radeon: use tgsi_scan_arrays for temp arrays
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
9 years ago
Nicolai Hähnle
7c2295d7ef
gallium/radeon: allocate temps array info in radeon_llvm_context_init
Also, prepare for using tgsi_array_info.
This also opens the door for properly handling allocation failures, but I'm
leaving that for a separate change.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
9 years ago
Nicolai Hähnle
8dbf2a8570
radeonsi: add DRAWID parameter to vertex shaders
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
9 years ago
Nicolai Hähnle
febb5dbf72
radeonsi: wire up TGSI_SEMANTIC_BASEINSTANCE
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
9 years ago
Nicolai Hähnle
7f5a8dc27e
radeonsi: move spi_ps_input_addr override outside of the loop
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
9 years ago
Nicolai Hähnle
287822ee33
radeonsi: drop unnecessary u_pstipple.h include
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
9 years ago
Nicolai Hähnle
3e4c5693a1
radeonsi: do not pass the return type to buffer_load_const
Overriding it is not allowed anyway, and actually lead to a crash when polygon
stippling was used with monolithic shaders.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
9 years ago
Marek Olšák
1e5f00f9d5
radeonsi: pre-generate shader logs for ddebug
This cuts down the overhead of si_dump_shader when ddebug is capturing
shader logs, which is done for every draw call unconditionally (that's
quite a lot of work for a draw call).
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
9 years ago
Marek Olšák
18475aab6d
radeonsi: add empty lines after shader stats
to separate individual shaders dumped consecutively.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
9 years ago
Marek Olšák
dd66f9d3e7
radeonsi: move the shader key dumping to si_shader_dump
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
9 years ago
Marek Olšák
2596ae2b6e
radeonsi: emit PS exports last
This effectively removes s_waitcnt instructions after FP16 exports.
Before:
v_cvt_pkrtz_f16_f32_e32 v0, v0, v1 ; 5E000300
v_cvt_pkrtz_f16_f32_e32 v1, v2, v3 ; 5E020702
exp 15, 0, 1, 0, 0, v0, v1, v0, v0 ; F800040F 00000100
s_waitcnt expcnt(0) ; BF8C0F0F
v_cvt_pkrtz_f16_f32_e32 v0, v4, v5 ; 5E000B04
v_cvt_pkrtz_f16_f32_e32 v1, v6, v7 ; 5E020F06
exp 15, 1, 1, 0, 0, v0, v1, v0, v0 ; F800041F 00000100
s_waitcnt expcnt(0) ; BF8C0F0F
v_cvt_pkrtz_f16_f32_e32 v0, v8, v9 ; 5E001308
v_cvt_pkrtz_f16_f32_e32 v1, v10, v11 ; 5E02170A
exp 15, 2, 1, 0, 0, v0, v1, v0, v0 ; F800042F 00000100
s_waitcnt expcnt(0) ; BF8C0F0F
v_cvt_pkrtz_f16_f32_e32 v0, v12, v13 ; 5E001B0C
v_cvt_pkrtz_f16_f32_e32 v1, v14, v15 ; 5E021F0E
exp 15, 3, 1, 1, 1, v0, v1, v0, v0 ; F8001C3F 00000100
s_endpgm ; BF810000
After:
v_cvt_pkrtz_f16_f32_e32 v0, v0, v1 ; 5E000300
v_cvt_pkrtz_f16_f32_e32 v1, v2, v3 ; 5E020702
v_cvt_pkrtz_f16_f32_e32 v2, v4, v5 ; 5E040B04
v_cvt_pkrtz_f16_f32_e32 v3, v6, v7 ; 5E060F06
exp 15, 0, 1, 0, 0, v0, v1, v0, v0 ; F800040F 00000100
v_cvt_pkrtz_f16_f32_e32 v4, v8, v9 ; 5E081308
v_cvt_pkrtz_f16_f32_e32 v5, v10, v11 ; 5E0A170A
exp 15, 1, 1, 0, 0, v2, v3, v0, v0 ; F800041F 00000302
v_cvt_pkrtz_f16_f32_e32 v6, v12, v13 ; 5E0C1B0C
v_cvt_pkrtz_f16_f32_e32 v7, v14, v15 ; 5E0E1F0E
exp 15, 2, 1, 0, 0, v4, v5, v0, v0 ; F800042F 00000504
exp 15, 3, 1, 1, 1, v6, v7, v0, v0 ; F8001C3F 00000706
s_endpgm ; BF810000
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
9 years ago
Marek Olšák
0f7a6ea5e7
radeonsi: report accurate SGPR and VGPR spills
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
9 years ago
Marek Olšák
d227dbe272
radeonsi: add a workaround for a compute VGPR-usage LLVM bug
v2: use abort(), describe which LLVM version is affected
Cc: 12.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
9 years ago
Marek Olšák
f4d1de7f86
radeonsi: use LLVMGetTypeKind to tell if an input is an array of descriptors
just a cleanup
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
9 years ago
Marek Olšák
785073ed0b
radeonsi: replace !tbaa with !invariant.load
no change in generated code thanks to dereferenceable(n)
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
9 years ago
Marek Olšák
348b9a5b1c
radeonsi: set dereferenceable attribute on descriptor arrays
This allows moving the loads arbitrarily in the Sinking pass.
26002 shaders in 14643 tests
Totals:
SGPRS: 2080160
-> 2080160
(0.00 %)
VGPRS: 798875 -> 797826 (-0.13 %)
Spilled SGPRs: 108485 -> 79165 (-27.03 %)
Spilled VGPRs: 327 -> 327 (0.00 %)
Scratch VGPRs: 1656 -> 1652 (-0.24 %) dwords per thread
Code Size: 36127192
-> 35559780
(-1.57 %) bytes
LDS: 767 -> 767 (0.00 %) blocks
Max Waves: 212464 -> 212672 (0.10 %)
Wait states: 0 -> 0 (0.00 %)
PERCENTAGES / App Shaders SGPRs VGPRs SpillSGPR SpillVGPR Scratch CodeSize MaxWaves Waits
(unknown) 4 . . . . . . . .
0ad 6 . . . . . . . .
alien_isolation 2938 . 0.04 % -8.53 % . . -0.71 % -0.06 % .
anholt 10 . . . . . . . .
batman_arkham_origins 589 . -0.58 % -79.54 % . . -6.72 % 0.57 % .
bioshock-infinite 1769 . -0.65 % -89.32 % . . -4.73 % 0.48 % .
borderlands2 3968 . -0.31 % -51.21 % . . -4.09 % 0.22 % .
brutal-legend 338 . -0.03 % -2.95 % . . -0.06 % . .
civilization_beyond.. 116 . . -14.17 % . . -0.88 % . .
counter_strike_glob.. 1142 . . . . . . . .
dirt-showdown 541 . -0.56 % -40.14 % . -3.45 % -1.82 % 0.35 % .
dolphin 22 . . . . . 0.16 % . .
dota2 1747 . . . . . 0.01 % . .
europa_universalis_4 76 . -0.23 % -42.11 % . . -0.96 % . .
f1-2015 774 . -0.09 % -28.89 % . . -2.60 % 0.09 % .
furmark-0.7.0 4 . . . . . . . .
gimark-0.7.0 10 . . . . . . . .
glamor 16 . . . . . . . .
humus-celshading 4 . . . . . . . .
humus-domino 6 . . . . . . . .
humus-dynamicbranching 24 . 0.71 % . . . 0.29 % -0.45 % .
humus-hdr 10 . . . . . . . .
humus-portals 2 . . . . . . . .
humus-volumetricfog.. 6 . . . . . . . .
left_4_dead_2 1762 . . . . . . . .
metro_2033_redux 2670 . -0.10 % -7.15 % . . -0.03 % . .
nexuiz 80 . . . . . . . .
pixmark-julia-fp32 2 . . . . . . . .
pixmark-julia-fp64 2 . . . . . . . .
pixmark-piano-0.7.0 2 . . . . . . . .
pixmark-volplosion-.. 2 . . . . . . . .
plot3d-0.7.0 8 . . . . . . . .
portal 474 . . . . . . . .
sauerbraten 7 . . . . . . . .
serious_sam_3_bfe 392 . . -13.20 % . . -1.81 % . .
supertuxkart 4 . . . . . . . .
talos_principle 324 . -0.21 % -18.39 % . . -2.73 % 0.14 % .
team_fortress_2 808 . . . . . . . .
tesseract 430 . 0.08 % -68.57 % . . -0.45 % . .
tessmark-0.7.0 6 . . . . . . . .
thea 172 . . . . . 0.03 % . .
ue4_effects_cave 299 . -0.04 % -10.15 % . . -0.25 % 0.04 % .
ue4_elemental 586 . -0.02 % -13.93 % . . -0.13 % 0.02 % .
ue4_lightroom_inter.. 74 . -0.17 % -70.00 % . . -1.27 % . .
ue4_realistic_rende.. 92 . . -32.58 % . . -0.35 % . .
unigine_heaven 322 . 0.12 % -54.17 % . . -1.42 % -0.12 % .
unigine_sanctuary 264 . . . . . . . .
unigine_tropics 210 . . . . . . . .
unigine_valley 278 . -0.15 % -40.74 % . . -2.00 % 0.09 % .
unity 72 . . . . . 0.03 % . .
warsow 176 . . . . . . . .
warzone2100 4 . . . . . 0.13 % . .
witcher2 1040 . -0.03 % -86.28 % . . -0.28 % 0.01 % .
xcom_enemy_within 1236 . -0.24 % -63.54 % . . -0.93 % 0.18 % .
yofrankie 82 . -0.61 % -100.00 % . . -0.83 % 0.41 % .
-----------------------------------------------------------------------------------------------------------
Total 26002 . -0.13 % -27.03 % . -0.24 % -1.57 % 0.10 % .
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
9 years ago
Marek Olšák
bccf9de4df
radeonsi: clean up shader value metadata code
No change in behavior.
BTW, tbaa_md_kind == 1, which was the magic number in the code.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
9 years ago
Marek Olšák
d7d7e6adbe
radeonsi: remove LLVMNoUnwindAttribute uses
always set by gallivm
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
9 years ago
Marek Olšák
c4807505c0
radeonsi: fix a typo in SI_PARAM_LINEAR_* handling
introduced in 476e9cee1d
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
9 years ago
Marek Olšák
027ad71b57
radeonsi: print LLVM IRs to ddebug logs
Getting LLVM IRs of hanging shaders have never been easier.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
9 years ago
Marek Olšák
1c00086746
radeonsi: remove an obsolete comment
It's not true.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
9 years ago
Marek Olšák
4d1f32376d
radeonsi: don't interpolate colors if flatshading is enabled
use v_interp_mov for those
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
9 years ago
Marek Olšák
4accb02d7a
radeonsi: enable the barycentric optimization in all cases
Handle the bc_optimize SGPR bit if both CENTER and CENTROID are enabled.
This should increase the PS launch rate for big primitives with MSAA.
Based on discussion with SPI guys.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
9 years ago
Marek Olšák
476e9cee1d
radeonsi: compute only one set of interpolation (i,j) when MSAA is disabled
This should increase the PS launch rate for shaders using at least 2 pairs
of perspective (i,j) and same for linear.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
9 years ago