XBox:Framework:UI Rendering
From MadoxLabs
|
Edit Downloads:XBox (http://www.madox.ca/mediawiki2/index.php?title=Template:Downloads:XBox&action=edit)
|
I've been playing with various ways to render the UI. I will log all the results on this page.
Any UI widget is made up of a numnber of componants, which are basicly sprites, and may also have child widgets. Child widgets are entirely contained inside the parent's child area. Each componant requires a texture to render and if the UI contains many textures, we normally need to make a draw call per texture. Also, I used to limit child widgets to the parent's are with a scissor rect, which required a draw call everytime I had to alter the scissor rect.
So the UI is tricky to render because it has a bunch of features that make it hard to render all in one shot.
- Handle multiple texture sources
- Handle limiting to parent area
- Text output
My simple test renders 200 quads of 100x100, half randomly use texture A, and half use texture B. The timeing is the result of several runs of varing time periods.
Mode 1
The first naive test rendered each set of 100 in its own draw call using user primitives. This is pretty poor since it means you cant intertwine the depths of the quads.
Time: 146 ns
Mode 2
The second test is to set both textures in the shader, and attach a value to each vertex that says which texture to use. This lets me render all in one shot and each quad is at its proper depth. I had to use an entire COLOR semantic to hold a 1 or a 2 though, so this isn't that optimal.
I also played with having a clip region defined in the shader. The vertex shader outputs the position to the pixel shader using parts of the COLOR semantic that holds the texture flag, then the pixel shader can check for clipping or not. This works great, and I can make a per-quad clip region by setting the rect values in another COLOR semantic.
Time: 100 ns
Mode 3
I changed from using UserPrimitives to using a DynamicVertexBuffer and IndexBuffer. I'm not sure why you would ever choose to use UserPrimitives over DynamicVertexBuffer. There is a limit to how many UserPrimitives you can draw per frame and it looks like the limit is just over 10000.
Time: 35 ns
Mode 4
I moved the clip rect from the shader to the vertex. Each vertex now has a COLOR0 semantic that hold the parent widget's clip rect. This is a bit inefficient since all 4 vertexes have the same clip rect, so they are wasting. What I should try is setting a 2nd vertex stream and use vfetch to get that vertex data from there! Anyhow, this method seems to work fine and has minimal effect on time.
Time: 37 ns
Another change was that I took the 200x200 textures and placed them in a 4096x4096 texture. This is to simulate stitching together many UI textures into one. The UV coords were change to still address the 200x200 part. The use of giant textures didnt alter the time per frame at all.
A final note is that by setting the Position.Z value properly, the UI can be drawn all in one shot with the widget z ordering properly rendered. Have to make sure that only one widget exists at a given depth.
Shader so far:
// UI RENDERER
// No need for World/View/Projection - just scale the coords down to [-1,1]
float2 scaleSurfaceToProj; // (2/width, 2/height)
// We are doing both textures at once to set them up
texture tex0;
texture tex1;
sampler2D sam0 = sampler_state { Texture = <tex0>; };
sampler2D sam1 = sampler_state { Texture = <tex1>; };
struct VertexShaderInput
{
float4 Position : POSITION0;
float2 Texture : TEXCOORD0;
float4 Clip : COLOR0;
};
struct VertexShaderOutput
{
float4 Position : POSITION0;
float2 Texture : TEXCOORD0;
float4 Clip : COLOR0;
float3 Location : COLOR1;
};
VertexShaderOutput VS(VertexShaderInput vertex)
{
VertexShaderOutput output;
// convert screen coords to output position. Scale to [0,2] then translate to [-1,1]
// We have to subtract half a pixel because the texel origin is in the middle, we want the topleft
float2 halfPixel = 0.5 * scaleSurfaceToProj;
float2 pos = ((vertex.Position.xy * scaleSurfaceToProj) - 1.0f) - halfPixel;
// output position with inverted y axis.
output.Position = float4(pos.x, pos.y * -1, vertex.Position.z, 1.0 );
output.Texture = vertex.Texture;
output.Location.xy = output.Position.xy; // copy the position for use in clipping
output.Location.z = vertex.Position.w; // pass the texture flag on
output.Clip = vertex.Clip; // pass the clip rect on
return output;
}
float4 PS(VertexShaderOutput pixel) : COLOR0
{
float4 clipped = {0,0,0,1}; // color of something that is clipped
float4 ret;
// is the pixel out of bounds?
if (pixel.Location.x < pixel.Clip.x) ret = clipped;
else if (pixel.Location.x > pixel.Clip.z) ret = clipped;
else if (pixel.Location.y < pixel.Clip.y) ret = clipped;
else if (pixel.Location.y > pixel.Clip.w) ret = clipped;
// get the color from the image in the flag
else if (pixel.Location.z == 0)
ret = tex2D(sam0, pixel.Texture);
else
ret = tex2D(sam1, pixel.Texture);
return ret;
}
technique Technique1
{
pass Pass1
{
SrcBlend = SRCALPHA;
DestBlend = INVSRCALPHA;
AlphaBlendEnable = TRUE;
ZEnable = TRUE;
VertexShader = compile vs_3_0 VS();
PixelShader = compile ps_3_0 PS();
}
}
Mode 5
Another change I got working is to optimize sending the clip areas to the GPU. If there are 100s of quads to write but only a handfull of clip regions, duplicating them for all 100s of quads is a waste. Instead, I have a second vertex buffer that only contains the unique clip areas. Then I have each quad contain an index into the second vertex buffer so that the clip area can be loaded in the vertex buffer as needed.
To do this, the two vertex buffers are set up seperately:
struct CrapVertex
{
public Vector4 Position; // The W coordinate is the flag for which texture to use
public Vector2 Texture;
public Single ClipIndex;
public static int SizeInBytes
{
get { return 7 * sizeof(float); }
}
}
struct ClipVertex
{
public Vector4 Position;
public static int SizeInBytes
{
get { return 4 * sizeof(float); }
}
}
But the vertex declaration contains both of them combined:
readonly VertexElement[] VertexElements = new VertexElement[] {
new VertexElement(0, 0, VertexElementFormat.Vector4, VertexElementMethod.Default, VertexElementUsage.Position, 0),
new VertexElement(0, 16, VertexElementFormat.Vector2, VertexElementMethod.Default, VertexElementUsage.TextureCoordinate, 0),
new VertexElement(0, 24, VertexElementFormat.Single, VertexElementMethod.Default, VertexElementUsage.Position, 1),
new VertexElement(1, 0, VertexElementFormat.Vector4, VertexElementMethod.Default, VertexElementUsage.Position, 2)
};
Note the last vertex element is set to come from vertex stream 1, and the rest are from 0.
In the vertex shader, ClipIndex is set up as 'float4 Clip : POSITION1;'. Use this with vfetch to load the clip area from stream 1, which is set up as POSITION2.
float4 tmp;
int i = vertex.Clip;
asm {
vfetch tmp, i, position2;
};
Tmp now holds the clip area in screen coords, so convert them to [-1,1] like before
output.Clip.xz = ((tmp.xz * scaleSurfaceToProj.x) - 1.0f) - halfPixel.x; output.Clip.yw = (((tmp.yw * scaleSurfaceToProj.y) - 1.0f) - halfPixel.y);
The pixel shader stays the same. This only works on the XBox so when debugging on PC, I have to stick with Mode 4, single vertex stream.
Update 2010-10-10
Mode 6
I thought of another rendering method that seems to work well. Using multiple textures and an 'if' in the pixel shader is pretty crappy performance. I tested using a Texture3D and packing all the multiple textures in there. Each z-axis slice of the Texture3D can be a treated as a regular texture.
Setting up the test app to do this with a 2-layer Texture3D these are the times I got:
- Quads: 1500
- Texture: 2048 x 2048 x 2
- Time spent: 60s
The following times are the total time spent per frame in the Render() call (in microseconds):
Time using a single texture atlas with all textures baked into it: 32.16 us This is the base time of doing things normally.
Time using 2 separate textures with a pixel shader if: 36.50 us
Time using a Texture3D: 32.25 us
You can see that the time spent using the 3d texture is almost the same as a regular texture. This means that if the texture atlas is growing too large, we can always add more layers to it on the zaxis to get more space.