My day-time job as VP Engineering for Mobile at Mozilla is to manage an engineering organization of some 130 engineers. Most of my time is spent working with people and helping them to be effective at building software. That means a lot of meetings, a lot of planning, and a lot of HR-type busy work.
As much as I miss putting my PhD in Computer Science to use these days, having such a large engineering organization behind me has its perks. The amount of stuff “I” can get done by finding so many engineers the right things to work on and removing any roadblocks they face is really amazing, and it makes up for the lost hacking time.
Still, to avoid feeling like a useless paper pusher, I usually have a hacking project going on on the side. I can’t really justify spending time hacking at work. There are simply too many meetings to attend, emails to reply to, and problems to solve. However, whenever I sit on a plane without WiFi, I can’t really effectively do my VP job since that job is all about communication. Time on the plane is hacking time, and as it happens, I travel a lot–I obtained Global Services status with United 2 years in a row, almost exclusively flying Economy (that must be some sort of World Record).
Whats a layer anyway?
In Gecko, our rendering engine, we try to detect when frames (a “div” is a frame, for example) are animated, and if so, we put those frames into their own layer. Each layer can be rendered to an independent texture, forming a tree of layers representing the visible part of the document, and we use a compositor to draw those layers into the frame buffer (frame buffer is the technical term used by OpenGL, “window” is what this means on most systems in practice).
Gecko has a couple different kinds of layers. Color layers consist of a single color. The body element of a document is usually white, and we use a color layer to draw that opaque, white rectangle. Image layers can hold one single image and are a special case of content layers (internally called Thebes, for historical reasons). Content layers is where we render arbitrary content into (text, etc).
When rendering the visible part of a document we already try to skip invisible frames, but when frames are animated (are moving around), we often end up having a layer tree where multiple layers are painted on top of each other, partially hiding each other. The compositor draws these layers in Z order, so the result is correct, but we sometimes composite pixels that are guaranteed to be occluded by layers that are pasted right on top of them. On desktop this is wasteful from a power consumption perspective, but in practice usually not a big deal. On mobile, on the other hand, this can actually cause significant performance problems. Mobile systems often have unified memory (texture data and the frame buffer share memory with the CPU) with fairly low memory bandwidth. Overcompositing (drawing pixels that aren’t visible in the end) wastes precious memory bandwidth. In extreme cases this can cause the frame rate to drop below our target frame rate of 60 frames per second for animations.
Flatfish is a tablet we have ported FirefoxOS onto. It has a high resolution screen and a comparatively weak GPU. As a result, over-compositing can cause the frame rate to drop. In case of the home screen for example we were compositing a color layer (blue in the image below) that was completely hidden by a content layer (yellow star). Setting each pixel in the frame buffer to black before copying the actual content over it caused us to miss the 60 FPS target for homescreen animations.
To solve this problem, I wrote a little patch (bug 911471) for the layers system that walks the children of a container layer in reverse Z order and accumulates a region of pixels that are guaranteed to be covered by opaque pixels (some layers might be transparent, those are not added to this region). As we make our way through the list of layers, any pixel that is covered by layers we paint later (remember, we are walking in reverse Z order) we don’t have to actually composite. It would be overwritten by an opaque pixel anyway. We use this information to shrink the scissor rectangle the compositor uses to composite each layer. The scissor rectangle describes the bounds of the OpenGL draw operation we use to composite.
Not a perfect solution, yet
This approach is not optimal, because the scissor rectangle is just a rectangle, and the layer might be partially occluded. Such partial occlusion is properly described by the region we are accumulating (regions can consist of multiple rectangles), but when setting the scissor rectangle I have to take the bounds of the region to paint (since GL doesn’t support a scissor region). This can still cause over-composition. However, in essentially every test case I have seen this doesn’t matter. Layers tend to be occluded by exactly one other layer, not by a set of layers partially occluding the layer.
It is possible to precisely solve this problem by splitting the actual draw operation into multiple draws with different scissor rects. This might be slow, however, since it writes to the GPU pipeline multiple times. A faster approach is probably to split the draw into multiple quads and draw all of them with one GPU call. Since this is a rare case to begin with, I am not sure we will need this additional optimization. We can always add it later.
What the SOC for this tablet?
We were using a Freescale SOC with a Vivante GPU core initially. We were seeing difficulties doing a full screen animation at 60 FPS (think “panning home screen”) because of over-composition (color layers, and also ref layers that we use for cross-process content). We have since then switched to a different SOC (for unrelated reasons).
Which SOC did you switch to? Likely to be the SOC used in the production tablet?
This is a bit of an insane idea, but what about using the hardware Z rejection (drawing opaque stuff front-to-back setting the Z coordinates correctly, then transparent stuff back to front with its Z coords too)?
Hardware is usually very efficient at depth culling. This of course requires more memory since you’d be using a Z-buffer, and would only help in the cases the current approach doesn’t catch (and hurt otherwise).
I think that would work and it would catch complex corner cases, but as you said, it would increase the memory bandwidth used since the GPU has to read/write to the Z buffer.