DEV Community

djmitche
djmitche

Posted on • Edited on

Chromium Spelunking: Life and Times

In the last post, I read through one of the guides to the network stack and summarized my findings, with two more to go. In this post, I'll cover those two subsequent documents, and then plot how I'll start digging in deeper.

Life of a URLRequest

This document is a top-down summary of how URLs are fetched, meaning it begins with some function that says "here's a URL, go get it" and probably ends with some details of TCP connections and HTTP transactions. I tend to think in the opposite order: bottom-up. So, I want to understand how TCP connections are handled and what the API is for that implementation. Once I've got that down, I want to know how the next higher layer (HTTP?) operates and what its API is. And so on.

Preliminaries

This document begins with some general observations, which may help when it comes time to unravel how to find instances of the dozens of classes involved here.

  • URLRequestContext is the top-level entry point for loading a URL, and creates URLRequest instances. It seems like it encapsulates the "top half" of the network stack, down to where actual network connections occur.
  • That second level is encapsulated in HttpNetworkSession, which handles network streams, socket pools, and so on.
  • Following a pattern that is common to Chromium, sets of callbacks for users of the network stack are bundled together in "Delegate" classes, in this case URLRequest::Delegate (specific to a request) and NetworkDelegate (global to theURLRequestContext`).

There are some details about how other parts of Chromium communicate with the network stack via Mojo, but for the moment my focus is within that boundary, so I'll ignore that. In fact, that makes quite a bit of this document irrelevant to our purposes.

Tip to Toe and Back

  • network::URLLoader (part of the network Mojo service, by the network:: namespace) creates a URLRequest. This is handed to network::ResourceScheduler to actually start the request. This suggests that a URLRequest doesn't start immediately on creation -- something to look out for later.

  • URLRequest gets an implementation of URLRequestJob from the URLRequestJobFactory. Specifically, that will be a URLRequestHttpJob instance.

  • URLRequestHttpJob attaches cookies to the request (and probably some other stuff!) and then makes an HttpCache::Transaction and activates it. It seems the HTTP cache is a read-through cache, as on a miss the cache is responsible for the next steps:

  • Use the HttpNetworkLayer to create a new HttpNetworkTransaction. The document says it "transparently wraps" this object, but it's unclear what that might mean.

  • HttpNetworkTransaction then gets an HttpStream from the HttpStreamFactory.

I imagine that by the time we have an HttpStream, we're in the lower of the two "big layers", but I don't see any mention of HttpNetworkSession here. Presumably HttpStream is an abstraction for a connection that can carry requests and responses, but doesn't get into the specifics of HTTP versions or connection mechanisms. Continuing with the process of creating an HttpStream (assuming the simple case with no pre-existing sockets):

  • HttpStreamFactory::Job needs to get a client socket (which it will store in a ClientSocketHandle) from the ClientSocketPoolManager. It sounds like this object is where proxies might get hooked in, probably with some recursive calls, but in this simple case it relies on the TransportClientSocketPool. I suppose "Transport" here means over an actual HTTP/x protocol on a network connection (so, not proxied). There's a ClientSocketPoolBase and ClientSocketPoolBaseHelper involved here, too - are you getting some strong Java vibes here?

    In this case the pool is empty, so it needs to create a new connection, via a TransportConnectJob (there's that word "job" again..). This will handle DNS resolution, which is probably fascinating with the advent of DoH but out of scope for me at the moment.

  • The HttpStreamFactory::Job gets the connection object (wrapped in a ClientSocketHandle) and creates an HttpBasicStream. I'm guessing this is a subclass of HttpStream, as it passes this back to the HttpNetworkTransaction.

  • The HttpNetworkTransaction then passes the request header and body to HttpBasicStream, which uses an HttpStreamParser to write the headers and body to the stream. That's an interesting use of a "parser", but OK.

  • The HttpStreamParser then waits for the response header, parses it, and sends it back up the stack: HttpNetworkTransaction, HttpCache::Transaction (which probably caches a copy, if possible), and URLRequestHttpJob (which saves cookies), and URLRequest.

    This section mentions HTTP/1.x, so it's possible that H2 and QUIC diverge from this process somewhere before this point.

  • The body is read by passing buffers all the way up and down the stack.

  • Once the request is complete, HttpNetworkTransaction determines whether the connection is reusable -- depending on headers in the connection, the response, and so on -- and either returns it to the pool or destroys it.

All of that seems comprehensible enough to provide a scaffolding for understanding this later. I've noted a few questions that I'd like to answer, too:

  • What is a "job"? This seems like a pattern like factories and builders, but maybe more specific to the network stack or chromium (like delegates).

  • Where do H2 and QUIC diverge in this process?

  • What do things look like, at this level of detail, when there's a proxy involved?

  • Where does TLS fit in?

Happily, most of these are covered in the remainder of the document.

Ownership (??!)

The next bit of the document contains a comically complex ownership diagram that seems to combine ownership, inheritance, templating, and interfaces. It has footnotes for additional information that does not appear "clearly" in the diagram! Perhaps this will be a useful reference for me later as I try to avoid introducing use-after-free or double-free bugs.

Socket Pools

Socket pools are keyed by a "group name", such that connections with the same group name can be used interchangeably. This is made up of a host, port, protocol, and "privacy mode".

Sockets aren't OS-level sockets, and it seems there are a number of implementations of sockets, all with their own pools. In fact, these can be layered, so a higher-level socket utilizes a lower-level socket. I suppose the obvious case here is a TLS socket utilizing a TCP socket. ConnectJob is another "job" implementation here, in this case performing the operations to initiate a socket connection.

There are some details here of the class relationships that I will want to refer back to.

Proxies

HttpStreamFactory::Job uses a "Proxy Service" to determine which proxies to use for a request. Each proxy then exposes a socket pool for connections via that socket, and HttpStreamFactory gets a socket from that pool.

HTTP/2

HTTP/2 (a.k.a. SPDY) has a slightly different "shape" from HTTP/1.x. It works over a TCP connection just like HTTP/1.x, and can be activated during TLS negotiation. It allows multiple, concurrent connections in a single session (= TCP connection). The network stack will multiplex multiple concurrent requests over a single session, but it appears that's not done via another layer of connection pooling. Rather, the HttpStreamFactory::Job creates a SpdySession and from that a SpdyHttpStream, which it passes to the HttpNetworkTransaction. But it's not clear from the text how an existing SpdySession would be used to create a new SpdyHttpStream.

There's some extra optimization here to avoid making multiple TCP connections to a server that supports HTTP/2.

QUIC

QUIC (the transport beneath HTTP/3) has a very different shape from HTTP/1.x. To begin with, it operates over UDP, not TCP. A server's support for QUIC is advertised in headers, so the browser must "remember" which servers support QUIC and try to connect with QUIC when that server is next used.

When a server supports QUIC, HttpStreamFactory will "race" two jobs - one for QUIC and one for all previous protocols -- and pick the one that gets a stream first. This strategy is reminiscent of the "happy eyeballs" algorithm for IPv4 and IPv6. It gets the best performance for the user at the cost of "wasting" some connections.

Proxy support in Chrome

I set out to read this document in the previous post, but on closer inspection it's not especially relevant. It mostly covers how proxies are configured, and mostly from the perspective of someone doing the configuring.

It does link to crbug 969859 where support for QUIC proxies was disabled by default. As with many Chromium bugs, it and the blocked/blocking bugs are pretty low on details!

Next Steps

That exhausts the "obvious" sources of documentation, although I'm sure I'll find some more as I proceed. Chromium development has a common practice of putting documentation in Google Docs documents. These are usually (but not always) linked from somewhere (a CL, a bug, or maybe in the source), and they are sometimes publicly readable (I won't be able to comment on anything that is not). These documents are generally "design documents", so they discuss a proposed change along with alternatives and potential impacts. What they do not do is document how things work -- they generally only make sense if you understand the state of the codebase before the proposed change, and if no subsequent change has been made to the same code.

I hope it's clear why this situation is a nightmare from an approachability perspective!

I have two next steps in mind:

  • Begin exploring the code from the bottom up (so, beginning with some of the simpler socket pool implementations). I have written a useful script to help me dig up the "hidden documentation" for a piece of code, so I'll be interested to see how that works in practice.
  • Try to write a curl-like utility that embeds the network stack and fetches the URL given on the command line. I expect this will be a substantial amount of work -- I think it involves building a new "embedder" and likely implementing lots of complex delegate methods -- but I might learn something from the attempt even if I don't finish it.

So far I've just been passively "absorbing" information, and that's typically not a great way to learn, so I am inclined to get a start start on the curl-like utility just to get my fingers on the keyboard for a bit.

Top comments (0)