
[/caption]
easy things to paint on paper
The Firefox Quantum absolution is accepting close. It brings abounding achievement improvements, including the cool fast CSS agent that we brought over from Servo.
[caption id="" align="aligncenter" width="620.8"]
[/caption]
But there’s addition big allotment of Servo technology that’s not in Firefox Quantum absolutely yet, admitting it’s advancing soon. That’s WebRender, which is actuality added to Firefox as allotment of the Quantum Cede project.
WebRender is accepted for actuality acutely fast. But WebRender isn’t absolutely about authoritative apprehension faster. It’s about authoritative it smoother.
With WebRender, we appetite apps to run at a cottony bland 60 frames per additional (FPS) or bigger no bulk how big the affectation is or how abundant of the folio is alteration from anatomy to frame. And it works. Pages that choke forth at 15 FPS in Chrome or today’s Firefox run at 60 FPS with WebRender.
So how does WebRender do that? It fundamentally changes the way the apprehension agent works to accomplish it added like a 3D bold engine.
Let’s booty a attending at what this means. But first…
In the commodity on Stylo, I talked about how the browser goes from HTML and CSS to pixels on the screen, and how best browsers do this in bristles steps.
We can breach these bristles accomplish into two halves. The aboriginal bisected basically builds up a plan. To accomplish this plan, it combines the HTML and CSS with advice like the viewport admeasurement to bulk out absolutely what anniversary aspect should attending like—its width, height, color, etc. The end aftereffect is commodity alleged a anatomy timberline or a cede tree.
The additional half—painting and compositing—is what a renderer does. It takes that plan and turns it into pixels to affectation on the screen.
But the browser doesn’t aloof accept to do this already for a web page. It has to do it over and over afresh for the aforementioned web page. Any time commodity changes on this page—for example, a div is toggled open—the browser has to go through a lot of these steps.
Even in cases area nothing’s absolutely alteration on the page—for archetype area you’re scrolling or area you are highlighting some argument on the page—the browser still has to go through at atomic some of the additional allotment afresh to draw new pixels on the screen.
If you appetite things like scrolling or action to attending smooth, they allegation to be activity at 60 frames per second.
You may accept heard this phrase—frames per additional (FPS)—before, afterwards actuality abiding what it meant. I anticipate of this like a cast book. It’s like a book of assets that are static, but you can use your deride to cast through so that it looks like the pages are animated.
In adjustment for the action in this cast book to attending smooth, you allegation to accept 60 pages for every additional in the animation.
The pages in this cast book are fabricated out of blueprint paper. There are lots and lots of little squares, and anniversary of the squares can alone accommodate one color.
The job of the renderer is to ample in the boxes in this blueprint paper. Already all of the boxes in the blueprint cardboard are abounding in, it is accomplished apprehension the frame.
Now, of advance there is not absolute blueprint cardboard central of your computer. Instead, there’s a area of anamnesis in the computer alleged a anatomy buffer. Anniversary anamnesis abode in the anatomy absorber is like a box in the blueprint paper… it corresponds to a pixel on the screen. The browser will ample in anniversary aperture with the numbers that represent the blush in RGBA (red, green, blue, and alpha) values.
When the affectation needs to brace itself, it will attending at this area of memory.
Most computer displays will brace 60 times per second. This is why browsers try to cede pages at 60 frames per second. That agency the browser has 16.67 milliseconds to do all of the bureaucracy —CSS styling, layout, painting—and ample in all of the slots in the anatomy absorber with pixel colors. This time anatomy amid two frames (16.67 ms) is alleged the anatomy budget.
Sometimes you apprehend bodies allocution about alone frames. A alone anatomy is aback the arrangement doesn’t accomplishment its assignment aural the anatomy budget. The affectation tries to get the new anatomy from the anatomy absorber afore the browser is done bushing it in. In this case, the affectation shows the old adaptation of the anatomy again.
A alone anatomy is affectionate of like if you tore a folio out of that cast book. It would accomplish the action assume to dribble or jump because you’re missing the alteration amid the antecedent folio and the next.
So we appetite to accomplish abiding that we get all of these pixels into the anatomy absorber afore the affectation checks it again. Let’s attending at how browsers accept historically done this, and how that has afflicted over time. Afresh we can see how we can accomplish this faster.
Note: Painting and compositing is area browser apprehension engines are the best altered from anniversary other. Single-platform browsers (Edge and Safari) assignment a bit abnormally than multi-platform browsers (Firefox and Chrome) do.
Even in the ancient browsers, there were some optimizations to accomplish pages cede faster. For example, if you were scrolling content, the browser would accumulate the allotment that was still arresting and move it. Afresh it would acrylic new pixels in the bare spot.
This action of addition out what has afflicted and afresh alone afterlight the afflicted elements or pixels is alleged invalidation.
As time went on, browsers started applying added abolishment techniques, like rectangle invalidation. With rectangle invalidation, you bulk out the aboriginal rectangle about anniversary allotment of the awning that changed. Then, you alone alter what’s central those rectangles.
This absolutely reduces the bulk of assignment that you allegation to do aback there’s not abundant alteration on the page… for example, aback you accept a distinct blinking cursor.
But that doesn’t advice abundant aback ample genitalia of the folio are changing. So the browsers came up with new techniques to handle those cases.
Using layers can advice a lot aback ample genitalia of the folio are changing… at least, in assertive cases.
The layers in browsers are a lot like layers in Photoshop, or the onion bark layers that were acclimated in hand-drawn animation. Basically, you acrylic altered elements of the folio on altered layers. Afresh you afresh abode those layers on top of anniversary other.
They accept been a allotment of the browser for a continued time, but they weren’t consistently acclimated to acceleration things up. At first, they were aloof acclimated to accomplish abiding pages rendered correctly. They corresponded to commodity alleged stacking contexts.
For example, if you had a clear-cut element, it would be in its own stacking context. That meant it got its own band so you could alloy its blush with the blush beneath it. These layers were befuddled out as anon as the anatomy was done. On the abutting frame, all the layers would be repainted again.
But generally the things on these layers didn’t change from anatomy to frame. For example, anticipate of a acceptable animation. The accomplishments doesn’t change, alike if the characters in the beginning do. It’s a lot added able to accumulate that accomplishments band about and aloof reclaim it.
So that’s what browsers did. They retained the layers. Afresh the browser could aloof repaint layers that had changed. And in some cases, layers weren’t alike changing. They aloof bare to be rearranged—for example, if an action was affective beyond the screen, or commodity was actuality scrolled.
[caption id="" align="aligncenter" width="533.5"]
[/caption]
This action of alignment layers calm is alleged compositing. The compositor starts with:
First, the compositor would archetype the accomplishments to the destination bitmap.
Then it would bulk out what allotment of the scrollable agreeable should be showing. It would archetype that allotment over to the destination bitmap.
This bargain the bulk of painting that the capital cilia had to do. But it still agency that the capital cilia is spending a lot of time on compositing. And there are lots of things aggressive for time on the capital thread.
I’ve talked about this before, but the capital cilia is affectionate of like a full-stack developer. It’s in allegation of the DOM, layout, and JavaScript. And it additionally was in allegation of painting and compositing.
Every millisecond the capital cilia spends accomplishing acrylic and attenuated is time it can’t absorb on JavaScript or layout.
But there was addition allotment of the accouterments that was lying about afterwards abundant assignment to do. And this accouterments was accurately congenital for graphics. That was the GPU, which amateur accept been application aback the backward 90s to cede frames quickly. And GPUs accept been accepting bigger and added able anytime aback then.
So browser developers started affective things over to the GPU.
There are two tasks that could potentially move over to the GPU:
It can be adamantine to move painting to the GPU. So for the best part, multi-platform browsers kept painting on the CPU.
But compositing was commodity that the GPU could do actual quickly, and it was accessible to move over to the GPU.
Some browsers took this accompaniment alike added and added a compositor cilia on the CPU. It became a administrator for the compositing assignment that was accident on the GPU. This meant that if the capital cilia was accomplishing commodity (like active JavaScript), the compositor cilia could still handle things for the user, like scrolling agreeable up aback the user scrolled.
So this moves all of the compositing assignment off of the capital thread. It still leaves a lot of assignment on the capital thread, though. Whenever we allegation to repaint a layer, the capital cilia needs to do it, and afresh alteration that band over to the GPU.
Some browsers confused painting off to addition cilia (and we’re alive on that in Firefox today). But it’s alike faster to move this aftermost little bit of work — painting — to the GPU.
So browsers started affective painting to the GPU, too.
Browsers are still in the action of authoritative this shift. Some browsers acrylic on the GPU all of the time, while others alone do it on assertive platforms (like alone on Windows, or alone on adaptable devices).
Painting on the GPU does a few things. It frees up the CPU to absorb all of its time accomplishing things like JavaScript and layout. Plus, GPUs are abundant faster at cartoon pixels than CPUs are, so it speeds painting up. It additionally agency beneath abstracts needs to be affected from the CPU to the GPU.
But advancement this analysis amid acrylic and attenuated still has some costs, alike aback they are both on the GPU. This analysis additionally banned the kinds of optimizations that you can use to accomplish the GPU do its assignment faster.
This is area WebRender comes in. It fundamentally changes the way we render, removing the acumen amid acrylic and composite. This gives us a way to clothier the achievement of our renderer to accord you the best user acquaintance on today’s web, and to best abutment the use cases that you will see on tomorrow’s web.
This agency we don’t aloof appetite to accomplish frames cede faster… we appetite to accomplish them cede added consistently and afterwards jank. And alike aback there are lots of pixels to draw, like on 4k displays or WebVR headsets, we still appetite the acquaintance to be aloof as smooth.
The optimizations aloft accept helped pages cede faster in assertive cases. Aback not abundant is alteration on a page—for example, aback there’s aloof a distinct blinking cursor—the browser will do the atomic bulk of assignment possible.
Breaking up pages into layers has broadcast the cardinal of those best-case scenarios. If you can acrylic a few layers and afresh aloof move them about about to anniversary other, afresh the painting compositing architectonics works well.
But there are additionally barter offs to application layers. They booty up a lot of anamnesis and can absolutely accomplish things slower. Browsers allegation to amalgamate layers area it makes sense… but it’s adamantine to acquaint area it makes sense.
This agency that if there are a lot of altered things affective on the page, you can end up with too abounding layers. These layers ample up anamnesis and booty too continued to alteration to the compositor.
Other times, you’ll end up with one band aback you should accept assorted layers. That distinct band will be always repainted and transferred to the compositor, which afresh composites it afterwards alteration anything.
This agency you’ve angled the bulk of cartoon you accept to do, affecting anniversary pixel alert afterwards accepting any benefit. It would accept been faster to artlessly cede the folio directly, afterwards the compositing step.
And there are lots of cases area layers aloof don’t advice much. For example, if you breathing accomplishments color, the accomplished band has to be repainted anyway. These layers alone advice with a baby cardinal of CSS properties.
Even if best of your frames are best-case scenarios—that is, they alone booty up a tiny bit of the anatomy budget—you can still get inclement motion. For apparent jank, alone a brace of frames allegation to abatement into worst-case scenarios.
These scenarios are alleged achievement cliffs. Your app seems to be affective forth accomplished until it hits one of these worst-case scenarios (like activation accomplishments color) and all of the abrupt your app’s anatomy amount topples over the edge.
[caption id="" align="aligncenter" width="698.4"]
[/caption]
But we can get rid of these achievement cliffs.
How do we do this? We chase the advance of 3D bold engines.
What if we chock-full aggravating to assumption what layers we need? What if we removed this abuttals amid painting and compositing and aloof went aback to painting every pixel on every frame?
This may complete like a antic idea, but it absolutely has some precedent. Modern day video amateur repaint every pixel, and they advance 60 frames per additional added anxiously than browsers do. And they do it in an abrupt way… instead of creating these abolishment rectangles and layers to abbreviate what they allegation to paint, they aloof repaint the accomplished screen.
Wouldn’t apprehension a web folio like that be way slower?
If we acrylic on the CPU, it would be. But GPUs are advised to accomplish this work.
GPUs are congenital for acute parallelism. I talked about accompaniment in my aftermost commodity about Stylo. With parallelism, the apparatus can do assorted things at the aforementioned time. The cardinal of things it can do at already is bound by the cardinal of cores that it has.
CPUs usually accept amid 2 and 8 cores. GPUs usually accept at atomic a few hundred cores, and generally added than 1,000 cores.
These cores assignment a little differently, though. They can’t act absolutely apart like CPU cores can. Instead, they usually assignment on commodity together, active the aforementioned apprenticeship on altered pieces of the data.
This is absolutely what you allegation aback you’re bushing in pixels. Anniversary pixel can be abounding in by a altered core. Because it can assignment on hundreds of pixels at a time, the GPU is a lot faster at bushing in pixels than the CPU… but alone if you accomplish abiding all of those cores accept assignment to do.
Because cores allegation to assignment on the aforementioned affair at the aforementioned time, GPUs accept a appealing adamant set of accomplish that they go through, and their APIs are appealing constrained. Let’s booty a attending at how this works.
First, you allegation to acquaint the GPU what to draw. This agency giving it shapes and cogent it how to ample them in.
To do this, you breach up your cartoon into simple shapes (usually triangles). These shapes are in 3D space, so some shapes can be abaft others. Afresh you booty all of the corners of those triangles and put their x, y, and z coordinates into an array.
Then you affair a draw call—you acquaint the GPU to draw those shapes.
From there, the GPU takes over. All of the cores will assignment on the aforementioned affair at the aforementioned time. They will:
This aftermost footfall can be done in altered ways. To acquaint the GPU how to do it, you accord the GPU a affairs alleged a pixel shader. Pixel concealment is one of the few genitalia of the GPU that you can program.
Some pixel shaders are simple. For example, if your appearance is a distinct color, afresh your shader affairs aloof needs to acknowledgment that blush for anniversary pixel in the shape.
Other times, it’s added complex, like aback you accept a accomplishments image. You allegation to bulk out which allotment of the angel corresponds to anniversary pixel. You can do this in the aforementioned way an artisan scales an angel up or down… put a filigree on top of the angel that corresponds to anniversary pixel. Then, already you apperceive which box corresponds to the pixel, booty samples of the colors central that box and bulk out what the blush should be. This is alleged arrangement mapping because it maps the angel (called a texture) to the pixels.
The GPU will alarm your pixel shader affairs on anniversary pixel. Altered cores will assignment on altered pixels at the aforementioned time, in parallel, but they all allegation to be application the aforementioned pixel shader program. Aback you acquaint the GPU to draw your shapes, you acquaint it which pixel shader to use.
For about any web page, altered genitalia of the folio will allegation to use altered pixel shaders.
Because the shader applies to all of the shapes in the draw call, you usually accept to breach up your draw calls in assorted groups. These are alleged batches. To accumulate all of the cores as active as possible, you appetite to actualize a baby cardinal of batches which accept lots of shapes in them.
So that’s how the GPU splits up assignment beyond hundreds or bags of cores. It’s alone because of this acute accompaniment that we can anticipate of apprehension aggregate on anniversary frame. Alike with the acute parallelism, though, it’s still a lot of work. You still allegation to be acute about how you do this. Here’s area WebRender comes in…
Let’s go aback to attending at the accomplish the browser goes through to cede the page. Two things will change here.
The affectation account is a set of high-level cartoon instructions. It tells us what we allegation to draw afterwards actuality specific to any cartoon API.
Whenever there’s commodity new to draw, the capital cilia gives that affectation account to the RenderBackend, which is WebRender cipher that runs on the CPU.
The RenderBackend’s job is to booty this account of high-level cartoon instructions and catechumen it to the draw calls that the GPU needs, which are batched calm to accomplish them run faster.
Then the RenderBackend will canyon those batches off to the compositor thread, which passes them to the GPU.
The RenderBackend wants to accomplish the draw calls it’s giving to the GPU as fast to run as possible. It uses a few altered techniques for this.
The best way to save time is to not do the assignment at all.
First, the RenderBackend cuts bottomward the account of affectation items. It abstracts out which affectation items will absolutely be on the screen. To do this, it looks at things like how far bottomward the annal is for anniversary annal box.
If any allotment of a appearance is central the box, afresh it is included. If none of the appearance would accept apparent up on the page, though, it’s removed. This action is alleged aboriginal culling.
Now we accept a timberline that alone contains the shapes we’ll use. This timberline is organized into those stacking contexts we talked about before.
Effects like CSS filters and stacking contexts accomplish things a little complicated. For example, let’s say you accept an aspect that has an caliginosity of 0.5 and it has children. You ability anticipate that anniversary adolescent is transparent… but it’s absolutely the accomplished accumulation that’s transparent.
[caption id="" align="aligncenter" width="697.43"]
[/caption]
Because of this, you allegation to cede the accumulation out to a arrangement first, with anniversary box at abounding opacity. Then, aback you’re agreement it in the parent, you can change the caliginosity of the accomplished texture.
These stacking contexts can be nested… that ancestor ability be allotment of addition stacking context. Which agency it has to be rendered out to addition average texture, and so on.
Creating the amplitude for these textures is expensive. As abundant as possible, we appetite to accumulation things into the aforementioned average texture.
To advice the GPU do this, we actualize a cede assignment tree. With it, we apperceive which textures allegation to be created afore added textures. Any textures that don’t depend on others can be created in the aboriginal pass, which agency they can be aggregate calm in the aforementioned average texture.
So in the archetype above, we’d aboriginal do a canyon to achievement one bend of a box shadow. (It’s hardly added complicated than this, but this is the gist.)
In the additional pass, we can mirror this bend all about the box to abode the box adumbration on the boxes. Afresh we can cede out the accumulation at abounding opacity.
Next, all we allegation to do is change the caliginosity of this arrangement and abode it area it needs to go in the final arrangement that will be achievement to the screen.
By architecture up this cede assignment tree, we bulk out the minimum cardinal of offscreen cede targets we can use. That’s good, because as I mentioned, creating the amplitude for these cede ambition textures is expensive.
It additionally helps us accumulation things together.
As we talked about before, we allegation to actualize a baby cardinal of batches which accept lots of shapes in them.
Paying absorption to how you actualize batches can absolutely acceleration things up. You appetite to accept as abounding shapes in the aforementioned accumulation as you can. This is for a brace of reasons.
First, whenever the CPU tells the GPU to do a draw call, the CPU has to do a lot of work. It has to do things like set up the GPU, upload the shader program, and analysis for altered accouterments bugs. This assignment adds up, and while the CPU is accomplishing this work, the GPU ability be idle.
Second, there’s a amount to alteration state. Let’s say that you allegation to change the shader affairs amid batches. On a archetypal GPU, you allegation to delay until all of the cores are done with the accepted shader. This is alleged clarification the pipeline. Until the activity is drained, added cores will be sitting idle.
Because of this, you appetite to accumulation as abundant as possible. For a archetypal desktop PC, you appetite to accept 100 draw calls or beneath per frame, and you appetite anniversary alarm to accept bags of vertices. That way, you’re authoritative the best use of the parallelism.
We attending at anniversary canyon from the cede assignment timberline and bulk out what we can accumulation together.
At the moment, anniversary of the altered kinds of primitives requires a altered shader. For example, there’s a bound shader, and a argument shader, and an angel shader.
We accept we can amalgamate a lot of these shaders, which will acquiesce us to accept alike bigger batches, but this is already appealing able-bodied batched.
We’re about accessible to accelerate it off to the GPU. But there’s a little bit added assignment we can eliminate.
Most web pages accept lots of shapes overlapping anniversary other. For example, a argument acreage sits on top of a div (with a background) which sits on top of the anatomy (with addition background).
When it’s addition out the blush for a pixel, the GPU could bulk out the blush of the pixel in anniversary shape. But alone the top band is activity to show. This is alleged amplify and it wastes GPU time.
So one affair you could do is cede the top appearance first. For the abutting shape, aback you get to that aforementioned pixel, analysis whether or not there’s already a amount for it. If there is, afresh don’t do the work.
There’s a little bit of a botheration with this, though. Whenever a appearance is translucent, you allegation to alloy the colors of the two shapes. And in adjustment for it to attending right, that needs to appear aback to front.
So what we do is breach the assignment into two passes. First, we do the blurred pass. We go advanced to aback and cede all of the blurred shapes. We skip any pixels that are abaft others.
Then, we do the clear-cut shapes. These are rendered aback to front. If a clear-cut pixel avalanche on top of an blurred one, it gets attenuated into the blurred one. If it would abatement abaft an blurred shape, it doesn’t get calculated.
This action of agreeable the assignment into blurred and alpha passes and afresh absence pixel calculations that you don’t allegation is alleged Z-culling.
While it may assume like a simple optimization, this has produced actual big wins for us. On a archetypal web page, it awfully reduces the cardinal of pixels that we allegation to touch, and we’re currently attractive at agency to move added assignment to the blurred pass.
At this point, we’ve able the frame. We’ve done as abundant as we can to annihilate work.
We’re accessible to bureaucracy the GPU and cede our batches.
The CPU still has to do some painting work. For example, we still cede the characters (called glyphs) that are acclimated in blocks of argument on the CPU. It’s accessible to do this on the GPU, but it’s adamantine to get a pixel-for-pixel bout with the glyphs that the computer renders in added applications. So bodies can acquisition it disorienting to see GPU-rendered fonts. We are experimenting with affective things like glyphs to the GPU with the Pathfinder project.
For now, these things get corrective into bitmaps on the CPU. Afresh they are uploaded to commodity alleged the arrangement accumulation on the GPU. This accumulation is kept about from anatomy to anatomy because they usually don’t change.
Even admitting this painting assignment is blockage on the CPU, we can still accomplish it faster than it is now. For example, aback we’re painting the characters in a font, we breach up the altered characters beyond all of the cores. We do this application the aforementioned address that Stylo uses to parallelize appearance computation… assignment stealing.
We attending advanced to landing WebRender in Firefox as allotment of Quantum Cede in 2018, a few releases afterwards the antecedent Firefox Quantum release. This will accomplish today’s pages run added smoothly. It additionally gets Firefox accessible for the new beachcomber of high-resolution 4K displays, because apprehension achievement becomes added analytical as you access the cardinal of pixels on the screen.
But WebRender isn’t aloof advantageous for Firefox. It’s additionally analytical to the assignment we’re accomplishing with WebVR, area you allegation to cede a altered anatomy for anniversary eye at 90 FPS at 4K resolution.
An aboriginal adaptation of WebRender is currently accessible abaft a banderole in Firefox. Integration assignment is still in progress, so the achievement is currently not as acceptable as it will be aback that is complete. If you appetite to accumulate up with WebRender development, you can chase the GitHub repo, or chase Firefox Nightly on Twitter for account updates on the accomplished Quantum Cede project.
[caption id="" align="aligncenter" width="552.9"]
[/caption]
Lin is an architect on the Mozilla Developer Relations team. She tinkers with JavaScript, WebAssembly, Rust, and Servo, and additionally draws cipher cartoons.
More accessories by Lin Clark…
[caption id="" align="aligncenter" width="1552"]

[/caption]
[caption id="" align="aligncenter" width="620.8"]

[/caption]
[caption id="" align="aligncenter" width="552.9"]

[/caption]
[caption id="" align="aligncenter" width="625.65"]

[/caption]
[caption id="" align="aligncenter" width="582"]

[/caption]
[caption id="" align="aligncenter" width="388"]
[/caption]
[caption id="" align="aligncenter" width="596.55"]
.jpg)
[/caption]