My thinking: if I'm going to build websites that are fast and reliable, I need to really understand the mechanics of each step a browser goes through to render a web page, so that each can be considered and optimised during development. This post is a summary of my learnings of the end-to-end process at a fairly high level.
A lot of this is based on the fantastic (and FREE!) Website Performance Optimization course by Ilya Grigorik and Cameron Pittman on Udacity. I'd highly recommend checking it out.
Also very helpful was the article How Browsers Work: Behind the scenes of modern web browsers by Paul Irish and Tali Garsiel. It's from 2011 but many of the fundamentals of how browsers work remain relevant at the time of writing this blog post.
Ok, here we go. The process can be broken down into these main stages:
- Start to parse the HTML
- Fetch external resources
- Parse the CSS and build the CSSOM
- Execute the JavaScript
- Merge DOM and CSSOM to construct the render tree
- Calculate layout and paint
1. Start to parse the HTML
When the browser begins to receive the HTML data of a page over the network, it immediately sets its parser to work to convert the HTML into a Document Object Model (DOM).
The Document Object Model (DOM) is the data representation of the objects that comprise the structure and content of a document on the web.
The first step of this parsing process is to break down the HTML into tokens that represent start tags, end tags, and their contents. From that it can construct the DOM.
2. Fetch external resources
When the parser comes across an external resource like a CSS or JavaScript file, it goes off to fetch those files. The parser will continue as a CSS file is being loaded, although it will block rendering until it has been loaded and parsed (more on that in a bit).
JavaScript files are a little different - by default they block parsing of the HTML whilst the JavaScript file is loaded and then parsed. There are two attributes that can be added to script tags to mitigate this: defer
and async
. Both allow the parser to continue whilst the JavaScript file is loaded in the background, but they operate differently in the way that they execute. More on that in a bit too, but in summary:
defer
means that the execution of the file will be delayed until the parsing of the document is complete. If multiple files have the defer attribute, they will be executed in the order that they were discovered in the HTML.
<script type="text/javascript" src="script.js" defer>
async
means that the file will be executed as soon as it loads, which could be during or after the parsing process, and therefore the order in which async scripts are executed cannot be guaranteed.
<script type="text/javascript" src="script.js" async>
Preloading resources
As an aside, modern browsers will continue to scan the HTML whilst blocked and 'look ahead' to what external resources are coming up and then download them speculatively. The manner in which they do this varies between different browsers so cannot be relied upon to behave a certain way. In order to mark a resource as important and therefore more likely to be downloaded early in the rendering process, a link tag with rel="preload" can be used.
<link href="style.css" rel="preload" as="style" />
3. Parse the CSS and build the CSSOM
You may well have heard of the DOM before, but have you heard of the CSSOM (CSS Object Model)? Before I started researching this topic a little while back, I hadn't!
The CSS Object Model (CSSOM) is a map of all CSS selectors and relevant properties for each selector in the form of a tree, with a root node, sibling, descendant, child, and other relationship. The CSSOM is very similar to the Document Object Model (DOM). Both of them are part of the critical rendering path which is a series of steps that must happen to properly render a website.
The CSSOM, together with the DOM, to build the render tree, which is in turn used by the browser to layout and paint the web page.
Similar to HTML files and the DOM, when CSS files are loaded they must be parsed and converted to a tree - this time the CSSOM. It describes all of the CSS selectors on the page, their hierarchy and their properties.
Where the CSSOM differs to the DOM is that it cannot be built incrementally, as CSS rules can overwrite each other at various different points due to specificity. This is why CSS blocks rendering, as until all CSS is parsed and the CSSOM built, the browser can't know where and how to position each element on the screen.
4. Execute the JavaScript
How and when the JavaScript resources are loaded will determine exactly when this happens, but at some point they will be parsed, compiled and executed. Different browsers have different JavaScript engines to perform this task. Parsing JavaScript can be an expensive process in terms of a computer's resources, more-so than other types of resource, hence why optimising it is so important in achieving good performance. Check out this fantastic post for a deeper dive into how the JavaScript engine works.
Load events
Once synchronously loaded JavaScript and the DOM are fully parsed and ready, the document.DOMContentLoaded event will be emitted. For any scripts that require access to the DOM, for example to manipulate it in some way or listen for user interaction events, it is good practice to first wait for this event before executing the scripts.
document.addEventListener('DOMContentLoaded', (event) => {
// You can now safely access the DOM
});
After everything else like async JavaScript, images etc. have finished loading then the window.load event is fired.
window.addEventListener('load', (event) => {
// The page has now fully loaded
});
5. Merge DOM and CSSOM to construct the render tree
The render tree is a combination of the DOM and CSSOM, and represents everything that will be rendered onto the page. That does not necessarily mean all nodes in the render tree will be visually present, for example nodes with styles of opacity: 0
or visibility: hidden
will be included, and may still be read by a screen reader etc., whereas those set to display: none
will not be included. Additionally, tags such as <head>
that do not contain any visual information will always be omitted.
As with JavaScript engines, different browsers have different rendering engines.
6. Calculate layout and paint
Now that we have a complete render tree the browser knows what to render, but not where to render it. Therefore the layout of the page (i.e. every node's position and size) must be calculated. The rendering engine traverses the render tree, starting at the top and working down, calculating the coordinates at which each node should be displayed.
Once that is complete, the final step is to take that layout information and paint the pixels to the screen.
And voila! After all that, we have a fully rendered web page!
Top comments (40)
Great post.loved the images. Thank you.
Thanks, really appreciate it!
Nice article, but I've long argued that the information in the diagram in section 3 cannot possibly be right. In the absence of a DOM document, (remember, that's not applied until section 5) there's no way to turn
body { font-size: 16px } div { font-size: 14px }
into a tree structure in which a div rule is a child of a body rule. You could possibly turnbody { font-size: 16px } body div { font-size: 14px }
into such a tree structure, though personally I have my doubts that browsers actually do this, since I really can't see how doing so would significantly help evaluate the cascade.The CSSOM is real and is indeed tree structured, but it's a very different sort of tree, where stylesheets are towards the top of the tree and each has rule children, which have selector and declaration-block children. Each declaration block has declaration children which have name and value children.
Thanks for the feedback, yes I see what you're saying. It's a great point. I'm happy to admit I'm no expert on the inner workings of the CSSOM. I'll do some more investigation and see if I can update the diagram to better reflect what's happening 👍
From what I've read on Firefox source code, James is right.
You can actually make the CSS tree at #3. Both trees are constructed at the same time. Rules are just rules, they are attached in reverse order from what you expect, you can actually build the entire rule tree without having the DOM, its the DOM that does the lookup on the CSSOM when its time to render.
Its more like the body rule is the child of the div rule, CSS is inverted. The lookup is the inverse of what common sense says it is. Its a literal inverted tree, it actually is.
What do you mean by both trees? The DOM tree and the CSSOM tree, or the CSSOM tree and the render tree? Have you got a link to the relevant bit of the Firefox source code?
Thanks for putting this together so succinctly!
Over on A List Apart, we have a series that goes into great depth on this as well as how assistive technologies play into it. The roll-up is here: alistapart.com/article/from-url-to...
Awesome, thanks, I'll check it out!
How did you created this diagrams ? The style of them looks SICK as hell :+)
Thanks! 😀 I draw them in Photoshop. The typeface is one I made, intending to open source it soon.
awesome work !
The interviewer asked me this question and I'm failed to answer it and exactly define the process of browser rendering when the user visited any URL. After reading your article I'm pretty much sure that I'll explain it to someone in a better way. I really like your article and started following you to read similar kind of articles in the future.
Thank you, @james Starkie.
Very kind, thank you, glad I could help 😊
Hi @jstarmx , thanks for writing this great post.
Could you please fix the link for "Check out this fantastic post for a deeper dive into how the JavaScript engine works" it takes me to this same page. Thank you!
Ah whoops!! Thanks for the heads up, sorted now 😊
So if css only render blocking, and not parser blocking, does it makes sense to preload the css file according to this article web.dev/articles/defer-non-critica...? Normally we put link:css in the head anyway, so it will be downloaded really soon without blocking the html parsing process, so preloading it makes no sense?
Hello, may I translate your article into Chinese?I would like to share it with more developers in China. I will give the original author and original source.
Yes absolutely, please go ahead, that would be great :)
thank you very mush!
Great post, have one question - In case of Ajax or SPA, all of these steps executed or is there any difference?
Thanks! These steps would still be executed, the differences would come afterwards really. In the case of a SPA, most would only happen once (on initial load), the trade-off being that those steps will likely take longer because there are more assets to load up-front. But the last couple of steps will execute every time the DOM is updated (e.g. navigating to a new 'page') as the layout will need to be recalculated and repainted each time.
Make sense, thanks.
Thank you. Your post is interesting