llvmpipe: generate two shader varients, one omits triangle in/out testing
When we know that a 4x4 pixel block is entirely inside of a triangle
use the jit function which omits the in/out test code.
Results in a few percent speedup in many tests.
Adjust definition of empty_bin according to what's actually in empty
bins. We often have a state packet before/after load commands.
Still need to do something about the fence packets.
llvmpipe: do final the pixel in/out triangle test in the fragment shader
The test to determine which of the pixels in a 2x2 quad is now done in
the fragment shader rather than in the calling C code. This is a little
faster but there's a few more things to do.
Note that the step[] array elements are in a different order now. Rather
than being in row-major order for the 4x4 grid, they're in "quad-major"
order. The setup of the step arrays is a little more complicated now.
So is the course/intermediate tile test code, but some lookup tables
help with that.
Next steps:
- early-cull 2x2 quads which are totally outside the triangle.
- skip the in/out test for fully contained quads
- make the in/out comparison code tighter/faster.
It was pretty confusing having an entity named "bin" and another named
"bins", not least because sometimes there was a need to talk about >1
of the "bins" objects, which couldn't be pluralized any further...
Scene is a term used in a bunch of places to talk about what a binner
operates on, so it's a decent choice here.
llvmpipe: reorganization of binning data structions and funtions
New lp_bins struct contains all bin information.
More move bin-related code into lp_bin.[ch]
Use new/updated bin-access functions to hide implementation details.
The result is more/cleaner separation between the setup and rast components.
This will make double-buffering of the bins easier, etc.
Previously, each triangle had a pointer to the state to use for shading.
Now we insert state-change commands into the bins. When we execute one
of those commands we just update a 'current state' pointer and use that
pointer when calling the jit shader.
When inserting state-change commands into a bin we check if the previous
command was also a state-change command and simply replace it. This
avoids accumulating useless/redundant state-change commands.
llvmpipe: execute shaders on 4x4 blocks instead of 8x2
This matches the convention used by the recursive rasterizer.
Also fixed assorted typos, comments, etc.
Now tri-z.c, gears.c, etc look basically right but there's still some
cracks in triangle rasterization.