Skip to content

Latest commit

 

History

History
67 lines (39 loc) · 5.81 KB

curl-2022-11-12.md

File metadata and controls

67 lines (39 loc) · 5.81 KB

curl (2022-11-12)

I was invited by Daniel Stenberg to work with him on curl improvements sponsored by the Sovereign Tech Fund, an initiative of the German government to strengthen digital infrastructure and open source in the public interests. Daniel blogged about it.

Via this blog I try to give some updates on my ongoing work in this project, not least for transparency. This is deeply technical gobbledygook.

Connection Filters Merged

Yesterday, Daniel merged my PR #9855 that introduces "Connection Filters" to curl. I wrote in my previous blog about the motivation and what we are trying to achieve. The changes are purely internal and not visible in curl's API. As a user of curl you do not need to know anything about them.

As a programmer, you might be curious what they do and how they work, though. The concept is not new. Apache httpd uses something similar since ages. They are quite common in networking stacks.

Naming

Naming things is the most difficult thing. Names should be easy to remember and convey some sense of what something is about. I was thinking of calling them "Connection Handlers", but curl already has "Protocol Handlers" and often just calls those "handlers", so adding another kind under the same name would only be confusing.

Filters are common in music production. You pass audio through a filter and then to the next filter to get the effects you desire, with the last filter putting the sound into the output or into a recording, or both.

Curl's connection filters are similar. We take the filters we need for a URL request, put them on top of each other, make the connection, write the request into the filter chain and read the response back from it. So, they are a bit broader in scope than a one-directional filter in music. But they share the stacking (or chaining) behaviour. So the name is not a complete miss.

Protocols

Stacking connection filters fits very well with how network protocols work together. When you do a https: request, no matter with what software, the following happens:

 1. get https://curl.se
 2. -> transformed into a HTTP/1.1 request
   GET / HTTP/1.1
   Host: curl.se
   ...
 3. -> transformed into encrypted TLS records
 4. -> write onto a socket
 5. -> transformed into TCP packets (IP packets)
 6. -> wrapped up into Layer 0 to next-hop

This is a chain of transformations on the sender and, in reverse order, on the receiver side. In curl's case, the transformations 5-6 are done by the operating system, 1 is done in the libcurl API (or the command line). Step 2 is done by curl's HTTP protocol handler and 3+4 are now in connection filters.

Why is this good? Well, TCP does not really know about TLS (and vice versa) and HTTP/1.1 is the same, with out without TLS. HTTP/1.1 does not care about TCP either. It is very much always using TCP, but it does not have to. Even HTTP/3 is only loosely coupled to QUIC (which is tightly coupled to TLS+UDP, but that maybe a topic for another blog).

The internet grew quite nicely with this layering/wrapping/chaining approach in protocols. Because it supports using existing things in way that were not thought of initially. That is a property we'd also like to have in curl internally for more parts.

Locality

Curl is a large library. Connection filters help to localize functionality. With that we mean that details about how a transformation is done, like en-/decryption the data, are visible in a few source files.

If you look at curl's implementation for proxy tunneling (e.g. establishing a connection through a HTTP proxy) before and after we added connection filters, you'll see that all this is now in one source file. And its header file exposes one method Curl_cfilter_http_proxy_add() to the rest of curl. Meaning, for a connection we add the proxy filter when needed. Protocol handlers do no longer care about this.

The SOCKS proxy filter has become similarly isolated. And recently, the support for the haproxy protocol as well.

Next on my list are the SSL filters. Those do already exist and are in use, but they are not yet "local" enough in managing their data. Also, the are not as chainable as they need to be.

It is one aspect of locality when filter implementations reside only in few source files. The other aspect of locality is that a filter implementation does not really have knowledge about what the next filter in the chain is. The SSL filter currently assumes that the next filter is either about a socket or another SSL filter. This will not do for HTTP/2 proxying.

Utilities

There are many aspects of curl which we may also implement in connection filters. Timeout and progress handling is one thing we are thinking of. A filter does not have to do protocol things. Add a progress filter to a connection, let it know you will transfer 10 MB. It will see the amount of bytes written and report progress accordingly. No need to get the 20+ protocol handlers concerned with this.

Or add a debug filter when requested that dumps all data send and received. Add it in different places to dump data before encryption or raw data in network io. Again, without impacting other implementations.

Next Tasks

Cleaning up the SSL filters (with their many backend implementations) is next. Then we need to make a new filter for HTTP/2 sessions. Once we have that we make filters chains as SSL -> H2-PROXY -> SSL -> socket to have proxy tunneling via HTTP/2.

And after that, we take what we learned from the HTTP/2 filter and create a HTTP/3 filter. And then we start addressing the various issues around HTTP version selection, resolving, happy eyeballs and network switching.

It will be interesting work in the next months!