Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move HTTP base Parser to namespace Http #894

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

yadij
Copy link
Contributor

@yadij yadij commented Sep 5, 2021

Correct a design flaw placing the HTTP parsing logics all under HTTP/1
specific namespace. Syntax details are shared with future HTTP
versions and will be needed by their parsers when added.

Copy link
Contributor

@rousskov rousskov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct a design flaw placing the HTTP parsing logics all under HTTP/1 specific namespace

The existing Http::One::Parser class hierarchy has serious design flaws, but I do not see why its location in Http::One is wrong. I wonder whether those other flaws has led to the above (incorrect AFAICT) conclusion.

To make progress, I suggest that we try to establish what the class currently known as Http::One::Parser should be and then come back to the question of where that class should be located. I can bootstrap that discussion with a proposal echoing the current Http::One::Parser documentation and API:

  1. Http::One::Parser should be a class responsible for parsing an HTTP/1 message prefix (i.e. start-line followed by field-lines).

I would not be violently against a more broad definition (in view of theoretically possible future code changes):

  1. Http::One::Parser should be a class responsible for parsing an HTTP/1 message (i.e. start-line followed by field-lines and message-body).

The above are rough definitions that omit some details and need polishing, but I think they are usable in this discussion. Both imply that the class is in the right namespace.

Clearly, this PR implies that Http::One::Parser should be something else instead. What should it be, in your opinion?

FWIW, I could not extract the answer from the PR changes because the proposed HTTP protocol parser definition is too broad to describe any class -- there are just too many very different parsing needs related to HTTP (e.g., reason-phrase parsing, Cache-Control header field parsing, HPACK parsing, chunked extension parsing, etc., etc.). One class simply cannot meaningfully/usefully represent all those needs, no matter how abstract that class is, and the existing Http::One::Parser overall API does not really try to do that.


P.S. There are some other problems with the PR, but I think we should focus on this key issue first before diving into details.

@yadij
Copy link
Contributor Author

yadij commented Sep 7, 2021

When this Parser class was designed and implemented there was an assumption that HTTP/2+ being binary would need an entirely different set of Parser internals, thus a different Parser class. In practice that turned out to be wrong and HTTP/2 needs to share these pieces of the basic parsing for version selection and field value parsing. PR #893 has been published as example.

@rousskov
Copy link
Contributor

rousskov commented Sep 7, 2021

When this Parser class was designed and implemented there was an assumption that HTTP/2+ being binary would need an entirely different set of Parser internals, thus a different Parser class. In practice that turned out to be wrong and HTTP/2 needs to share these pieces of the basic parsing for version selection and field value parsing.

"Pieces of basic HTTP parsing" reusable across HTTP/1 and HTTP/2 message parsers do not belong to a single C++ class. For example, they do not belong to the Http::One::Parser class (as defined in my review and for any other reasonable definition I can imagine).

This misunderstanding is exactly why I suggested that we define what Http::One::Parser class is (or should be) before we discuss whether that class should be moved.

@squid-anubis squid-anubis added M-failed-other https://github.com/measurement-factory/anubis#pull-request-labels and removed M-failed-other https://github.com/measurement-factory/anubis#pull-request-labels labels Sep 15, 2021
@yadij
Copy link
Contributor Author

yadij commented Sep 30, 2021

HTTP/1 and HTTP/2 syntax and use of TCP are still highly intertwined. HTTP/2 only redefines the framing structure and transport encoding. It does not change the HTTP header parsing and validation needs. HTTP/1 parser needs to handle sudden arrival of the HTTP/2 magic prefix instead of any HTTP/1 request message.

You can see this by how PR #893 does not change the Http::Parser API methods or how they are used by most of the Squid code - just slight changes their results depending on protocol version parsed.

@rousskov
Copy link
Contributor

HTTP/1 and HTTP/2 syntax and use of TCP are still highly intertwined.

Regardless of the accuracy of that assertion (which I hope we do not need to perfect and debate here), my concerns are not directly related to its validity. Even if we assume, for the sake of the argument, that "HTTP/1 and HTTP/2 syntax and use of TCP are highly intertwined", this PR is moving things in the wrong direction (due to largely unrelated concerns detailed earlier, along with a specific suggestion on how to resolve them).

HTTP/1 parser needs to handle sudden arrival of the HTTP/2 magic prefix

I am not sure it needs to do that, but even if it does, it should use HTTP/2 parsing code for that (and vice versa). And, again, this assertion is largely unrelated to the suggested move of the C++ class in question AFAICT: The best place for the HTTP/1 message (prefix) parser class does not depend on whether that parser needs to know about other protocol magic prefixes.

You can see this by how PR #893 does not change the Http::Parser API methods or how they are used by most of the Squid code - just slight changes their results depending on protocol version parsed.

I do not think that PR 893 is a good example or uses case, but I suggest to (continue to) discuss problems with that PR in that PR.

@yadij
Copy link
Contributor Author

yadij commented Sep 30, 2021

Since you keep referring to your initial comment as your reasoning I shall address each point individually...

The existing Http::One::Parser class hierarchy has serious design flaws, but I do not see why its location in Http::One is wrong.

It is wrong location because Http::One and src/http/one are the wrong places to be defining code that parses HTTP/2 and later versions messages. That location and namespace should be restricted to the logic parsing syntax defined only for HTTP/1 messages (eg chunked encoding, request-line, response-line, HTTP/1 message framing).

I wonder whether those other flaws has led to the above (incorrect AFAICT) conclusion.

The only information I have on these alluded "design flaws" is that a) you have a history of saying the same thing about all features not designed by yourself, and b) the Parser we have now has been highly modified from original design model to meet your imposed design changes to make in "perfect". So "what flaws?"

To make progress, I suggest that we try to establish what the class currently known as Http::One::Parser should be and then come back to the question of where that class should be located. I can bootstrap that discussion with a proposal echoing the current Http::One::Parser documentation and API:

  1. Http::One::Parser should be a class responsible for parsing an HTTP/1 message prefix (i.e. start-line followed by field-lines).

No. There are 3 distinct message semantics and framing in HTTP/1. They are parsed by Http1::RequestParser, Http1::ResponseParser, and Http1::TeChunkedParser.

The class being touched by this PR provides the logic for handling the generic HTTP semantic sections "first-line", magic version detection, "mime-header", commonly used delimiters, etc. Those parts are shared with HTTP/2+ versions even while those versions have distinctly different Parser (eg Http2::FrameParser) handling how the generic HTTP message parts are framed.

None of the version-specific parsing is changed by this PR.

I would not be violently against a more broad definition (in view of theoretically possible future code changes):

  1. Http::One::Parser should be a class responsible for parsing an HTTP/1 message (i.e. start-line followed by field-lines and message-body).

As mentioned, HTTP/1 has three syntax:

A) request := first-line CRLF mime-header CRLF payload
B) response : = status-line CRLF mime-header CRLF payload
C) payload := [ chunk-size ';' mime-header CRLF] #octets

(A) is Http1::RequestParser, (B) is Http1::ResponseParser, and (C) is Http1::TeChunkedParser.

HTTP/2 has one syntax:
D) h2c := first-line header-frame(s) data-frame(s)

header-frame contains two HPACK encoded blobs parsed by mime-header parser. The first of those blobs provide the semantic details for first-line and status-line. The second provided mime-header.

data-frame contains raw octets. Which is the generic HTTP "payload".

Thus you can see HTTP has generic semantic structures which are presented by both versions. The API to fetch these semantic details are what Http::Parser provides. The inherited child classes perform version-specific parsing to fetch those details.

The above are rough definitions that omit some details and need polishing, but I think they are usable in this discussion. Both imply that the class is in the right namespace.

The primary "detail" I see being ignored to reach that implication is the existence of HTTP/2+ versions.

FYI, HTTP is being updated to separate HTTP/1.1 messaging[1] behaviour, HTTP/2 messaging[2] and HTTP semantics[3] syntax. Both [1] and [2] depend on shared details defined by [3].

[1] https://datatracker.ietf.org/doc/html/draft-ietf-httpbis-messaging-19.txt (Http1::RequestParser and Http1::ResponseParser)
[2] https://datatracker.ietf.org/doc/html/draft-ietf-httpbis-http2bis-05.txt (Http2::FrameParser)
[3] https://datatracker.ietf.org/doc/html/draft-ietf-httpbis-semantics-19.txt (Http::Parser)

This PR acknowledges those distinctions by moving the generic logic and API of Http1::Parser to Http::Parser.

Clearly, this PR implies that Http::One::Parser should be something else instead. What should it be, in your opinion?

We should not have a Http::One::Parser, or at most it should be the logic shared by the various HTTP/1 messages and not needed by other HTTP versions. I have not found any such HTTP/1 specific thing though - even the CRLF delimiter is needed by HTTP/2 parser at times.

The Http::Parser moved by this PR is implemented as a parent class such that Squid ports I/O can be handled by either Http1::RequestParser, Http1::ResponseParser, Http1::TeChunkedParser, Http2::FrameParser or Http3::FrameParser.
IF we find things which are shared by HTTP/1 message parsers

FWIW, I could not extract the answer from the PR changes because the proposed HTTP protocol parser definition is too broad to describe any class

I disagree. As demonstrated by the HTTP WG production of [3] above "HTTP protocol" definition has semantic structures which are completely agnostic of version. Those parts can absolutely be represented by an Http::Parser class, most of them already are. With C++ inheritence we can share that code into version-specific or message-specific Parser classes. That is how/what "good OO design" looks like.

-- there are just too many very different parsing needs related to HTTP (e.g., reason-phrase parsing, Cache-Control header field parsing, HPACK parsing, chunked extension parsing, etc., etc.). One class simply cannot meaningfully/usefully represent all those needs, no matter how abstract that class is,

Strawman argument there. Nobody is attempting to implement a single class that does everything. Notice how all the parsers for the features you mention are left untouched by both this PR and the associated one this supports. Even the Http1::Request/ResponseParser only change in minor ways from the symbol renaming.

and the existing Http::One::Parser overall API does not really try to do that.

Indeed. I don't know why you keep bringing up or insisting on designs using that model of "single class which does everything".

This PR is simply moving the logic needed by both HTTP/1 and HTTP/2 to be outside the HTTP/1-specific namespace. Leaving the version-specific logic in their own HttpN::FooParser classes, untouched.

@rousskov
Copy link
Contributor

TLDR:

// Amos proposal (as far as I can tell; with an empty One::Parser I added)
namespace Http {
    /// The API to fetch HTTP semantic details.
    class Parser {...};

    namespace One {
        /// The API to fetch HTTP/1 semantic details.
        using Parser = ::Http::Parser;

        /// HTTP/1 request prefix (i.e. request-line plus headers) parser.
        class RequestParser: public Parser {...};

        /// HTTP/1 reply prefix (i.e. status-line plus headers) parser.
        class ResponseParser: public Parser {...};

        /// HTTP/1 chunked-encoding parser
        class TeChunkedParser: public Parser {...};
    }
}

Please adjust the above sketch to accurately represent what you are proposing, but keep that ("empty" if you wish) Http::One::Parser there to help us make progress and compare apples to apples. See further below for details.

// Alex view of the existing code after bug fixes and One::Parser renaming
namespace Http {
    namespace One {
        /// HTTP/1 message prefix (i.e. first-line plus headers) parser.
        class MessageParser {...};

        /// HTTP/1 request prefix (i.e. request-line plus headers) parser.
        class RequestParser: public MessageParser {...};

        /// HTTP/1 reply prefix (i.e. status-line plus headers) parser.
        class ResponseParser: public MessageParser {...};

        /// HTTP/1 chunked-encoding parser
        class TeChunkedParser {...}; // not a MessageParser child!
    }
}

I doubt HTTP/2 will follow exactly the same pattern due to messages encapsulated inside frames, but, if it does, we will eventually have something like this:

// Alex view of a possible distant future
namespace Http {
     /// HTTP message prefix (i.e. stuff before the body) parser.
     class MessageParser {...};

    namespace One {
        /// HTTP/1 message prefix (i.e. first-line plus headers) parser.
        class MessageParser: public Http::MessageParser {...};

        /// HTTP/1 request prefix (i.e. request-line plus headers) parser.
        class RequestParser: public MessageParser {...};

        /// HTTP/1 reply prefix (i.e. status-line plus headers) parser.
        class ResponseParser: public MessageParser {...};

        /// HTTP/1 chunked-encoding parser
        class TeChunkedParser {...}; // not a MessageParser child!
    }

    namespace Two {
        /// HTTP/2 message prefix (i.e. HEADERS payloads) parser.
        class MessageParser: public Http::MessageParser {...};
    }
}

Alex: I do not see why Http::One::Parser location in Http::One is wrong.

Amos: It is wrong location because Http::One and src/http/one are the wrong places to be defining code that parses HTTP/2 and later versions messages.

You are saying that the Http::One::Parser location is wrong because of Y. I agree with Y, but see no valid connection between the Http::One::Parser location and Y. To me, that implies that your definition of Http::One::Parser differs from mine:

  • My rough definition is "HTTP/1 message prefix parser". It is based, in part, on the official code and detailed in the original PR review.

  • I am still not sure what your definition is because I could not find a clear one in your response. It is possible that you are using something like "a parser of HTTP message prefix or a parser of HTTP body encodings/frames". I disagree that Http::One::Parser is that today. I disagree that Http::One::Parser (or even Http::Parser, if any) should become that tomorrow. It is possible that you are using something like "The API to fetch HTTP semantic details" but given the list of the "semantic details" you have mentioned, that definition is equivalent to the "prefix or body parts parser" definition I disagree with.

We are less likely to make progress here without a shared understanding of what Http::One::Parser is (and should become if we want to change it). I still think that we should focus on developing that understanding. The next step in that direction, IMO, is for you to provide a definition so that I do not have to discuss something based on my guesses of what you have in mind. The discussion below may help with that, but we should try not to lose that focus. I have also provided design sketches above in hope to provide a common framework for developing those definitions.

Amos: That location and namespace should be restricted to the logic parsing syntax defined only for HTTP/1 messages (eg chunked encoding, request-line, response-line, HTTP/1 message framing).

I agree that Http::One namespace should be used for HTTP/1-specific things.

Alex: The existing Http::One::Parser class hierarchy has serious design flaws

Amos: what flaws?

A comprehensive list does not really matter right now, but I will mention two flaws that may be relevant:

  1. Http::One::TeChunkedParser should not derive from Http::One::Parser. A few chunking-specific bits that were forced into Http::One::Parser and its other kids (e.g., HTTP_PARSE_CHUNK_EXT state) should be moved away. The chunking parser does not really use most of what its parent class has to offer, and the bits it does use often stand in the way. The bogus overloaded firstLineSize() implementation that is documented as having "no meaning" is a telltale sign of that design mistake!

  2. Http::One::Parser mixes up its primary goal (i.e. HTTP/1 message prefix parsing) with generally reusable parsing bits that should live outside that class (e.g., definitely WhitespaceCharacters() and probably skipLineTerminator()).

Alex: 1. Http::One::Parser should be a class responsible for parsing an HTTP/1 message prefix (i.e. start-line followed by field-lines).

Amos: No. There are 3 distinct message semantics and framing in HTTP/1. They are parsed by Http1::RequestParser, Http1::ResponseParser, and Http1::TeChunkedParser.

Sorry, I see no connection between "No" and the sentence following that word. The two classes you mention are correctly derived from Http::One::Parser, so the essence of any Http::One::Parser definition ought to apply to them as well -- they only detail/specialize what their parent class does! The current TeChunkedParser parent is one of the design flaws, so it has no bearing on what the parent class should be.

Amos: The class being touched by this PR provides the logic for handling the generic HTTP semantic sections "first-line", magic version detection, "mime-header", commonly used delimiters, etc.

I disagree that Http::One::Parser is such a class today. The existing code is clearly specific to HTTP/1. That does not mean that some bits and pieces cannot be reused elsewhere, of course, but the class is essentially an HTTP/1 message prefix parser today.

I am not sure we will need a "generic HTTP semantic sections parser" at all. If we develop such a need in the future, that class can be created, of course, but this existing-class-moving PR is not about that.

Amos: None of the version-specific parsing is changed by this PR.

The concern is not the unchanged parsing code internals, but the changed location of their class.

All of that version-specific code is changed by moving it to a different namespace.

I am not sure what you mean exactly here, but I see HTTP/1-specific code being moved into version-agnostic Http namespace, and that is wrong. Http1magic is the most basic/obvious example.

A) request := first-line CRLF mime-header CRLF payload
B) response : = status-line CRLF mime-header CRLF payload
C) payload := [ chunk-size ';' mime-header CRLF] #octets
D) h2c := first-line header-frame(s) data-frame(s)

Even if we ignore its oversimplifications and bugs, the above summary does not imply it is a good idea to introduce a single C++ parent class for the above component parsers. The problems with the existing chunked handling code illustrate what happens when the wrong parent class is forced on a parser. And even if such a single parent class will become needed in the future, it is not what Http::One::Parser is today.

Alex: The above are rough definitions that omit some details and need polishing, but I think they are usable in this discussion. Both imply that the class is in the right namespace.

Amos: The primary "detail" I see being ignored to reach that implication is the existence of HTTP/2+ versions.

HTTP/2+ versions existence has no effect on my definition. It is not an omitted detail; it is an irrelevant one (for my definition)!

You are probably thinking about your "The API to fetch HTTP semantic details" definition (if that is what you are proposing as a definition -- I am not sure, please clarify). HTTP/2 would be relevant to that hypothetical API, of course. If so, please note that I am not even trying to define that Http::Parser. I am defining Http::One::Parser and asking you to do the same. It would be easier to make progress starting from something we (hopefully) agree on -- Http::One::RequestParser and Http::One::ResponseParser -- and then moving up, one layer at a time. Even if a layer does not add anything new, it can still be defined, of course.

This PR acknowledges those distinctions by moving the generic logic and API of Http1::Parser to Http::Parser.

Both the official code and I disagree that Http::One::Parser represents generic HTTP parsing logic. It does not even represent version-agnostic message sectioning logic well. Thus, that class should not be moved.

We should not have a Http::One::Parser

I doubt that, but I would be happy if it is gone without duplicating Http::One::RequestParser and Http::One::ResponseParser code. To make progress, you can assume (for the sake of agreeing on one Http::One::Parser definition as a starting point) that Http::One::Parser exists but is essentially empty. See TLDR sketches.

or at most it should be the logic shared by the various HTTP/1 messages and not needed by other HTTP versions.

Yes, and not needed by other HTTP/1 components that do not work on the "message" level. For example, a chunk size parser is not a message-level thing and should not have an Http::One::MessageParser as a parent.

I have not found any such HTTP/1 specific thing though - even the CRLF delimiter is needed by HTTP/2 parser at times.

There is a big difference between needing X to parse some protocol component and having a parent class member X for parsing that component. In many cases, specific parsers do not need a common parent class because the set of Xs they need differ too much. They are much better off reusing stand-alone functions or low-level classes dedicated to parsing specific things.

HTTP/1 chunked encoding and HTTP/2 HPACK parsers are good examples of two HTTP parsers that are unlikely to benefit from a common parent class.

The Http::Parser moved by this PR is implemented as a parent class such that Squid ports I/O can be handled by either Http1::RequestParser, Http1::ResponseParser, Http1::TeChunkedParser, Http2::FrameParser or Http3::FrameParser.
IF we find things which are shared by HTTP/1 message parsers

The existing Http::One::Parser (being moved by this PR) does not represent an "HTTP port traffic parser" API. It represents a message (prefix) parsing API. Moving that class will not create the API you envision, of course.

Alex: the proposed HTTP protocol parser definition is too broad to describe any class

Amos: I disagree. As demonstrated by the HTTP WG ... "HTTP protocol" definition has semantic structures which are completely agnostic of version.

That fact does not mean "HTTP protocol parser" is specific enough to describe a class. Sharing of certain concepts across protocol versions is very natural, but does not imply that there should be a single C++ class that represents all those concepts (and especially a single C++ class that represents a parser of all those shared concepts).

However, we do not need to agree on this, I think. When two protocol versions have two parsers that share code we can move that code into a reusable routine or a parent class (whichever works best for that code). If that process results in the creation of an Http::Parser class, then we will have an Http::Parser class.

Amos: Those parts can absolutely be represented by an Http::Parser class, most of them already are. With C++ inheritence we can share that code into version-specific or message-specific Parser classes. That is how/what "good OO design" looks like.

Good design uses inheritance to implement useful abstractions. Just because two things are described in the same protocol document does not mean they are based on the same abstraction useful in Squid context. Yes, all protocol element parsers are parsers, but that is not a useful abstraction in Squid context (yet?). Good abstractions do not result in "has no meaning" virtual method implementations, for example.

Alex: ... there are just too many very different parsing needs related to HTTP (e.g., reason-phrase parsing, Cache-Control header field parsing, HPACK parsing, chunked extension parsing, etc., etc.). One class simply cannot meaningfully/usefully represent all those needs, no matter how abstract that class is,

Amos: Strawman argument there. Nobody is attempting to implement a single class that does everything.

It is not a strawman argument because the "HTTP protocol parser" definition this argument rejects does not exclude any of the parsers I listed. The argument matches that definition! If you want to narrow down the list, then we need to find a more specific definition. And when we are done, I bet it will not match the existing Http::One::Parser class well (because of its flaws), but will be similar to the "HTTP/1 message prefix parser" I have proposed.

Alex: and the existing Http::One::Parser overall API does not really try to do that.

Amos: Indeed. I don't know why you keep bringing up or insisting on designs using that model of "single class which does everything".

I do not. This PR proposes such a model -- "HTTP protocol parser" -- not me.

This PR is simply moving the logic needed by both HTTP/1 and HTTP/2 to be outside the HTTP/1-specific namespace. Leaving the version-specific logic in their own HttpN::FooParser classes, untouched.

This PR moves a problematic API (that we cannot even agree on how to define!) and HTTP/1-specific code into the version-agnostic namespace. We should not do any of that, regardless of how "simple" that change looks.


Amos: you have a history of saying the same thing about all features not designed by yourself, and b) the Parser we have now has been highly modified from original design model to meet your imposed design changes to make in "perfect".

I am ignoring both insults as factually incorrect and irrelevant to this PR.

@yadij
Copy link
Contributor Author

yadij commented Oct 22, 2021

Responding to your TLDR pseudo-code examples (only). You came close, however the "headers parser" is a separate "class HttpHeader" in the current code and not touched by this PR.

// Alex view of the existing code after bug fixes and One::Parser renaming
namespace Http {
    namespace One {
        /// HTTP/1 message prefix (i.e. first-line plus headers) parser.
        class MessageParser {...};

        /// HTTP/1 request prefix (i.e. request-line plus headers) parser.
        class RequestParser: public MessageParser {...};

        /// HTTP/1 reply prefix (i.e. status-line plus headers) parser.
        class ResponseParser: public MessageParser {...};

        /// HTTP/1 chunked-encoding parser
        class TeChunkedParser {...}; // not a MessageParser child!
    }
}

This has several issues:

  1. using the word "message" - HTTP is parsed in sections, not a whole message at a time. These Parser are used by higher level code which is about assembling/processing "message" from randomly arriving I/O, they do not even see all of a message (eg the payload section). This is why the parent class is just called Parser today.

  2. the "plus headers" notes on your parser. Nothing in Http namespace parse any of the Mime section (yet). One day hopefully, but this is not true of "current state".

I doubt HTTP/2 will follow exactly the same pattern due to messages encapsulated inside frames,

HTTP/2 introduces the term "frame". However it is fully backward compatible to describe the parts of an HTTP/1 and even HTTP/0 message structure as a series of frames in ASCII coding that cannot be interleaved. When I refer to a "frame" of HTTP/1 etc I mean one of those parts, not the whole message.

Latest IETF specification which is about to be assigned RFC number is https://datatracker.ietf.org/doc/html/draft-ietf-httpbis-semantics-19.txt#section-6

  // invariant HTTP message order (aka message frames) per section 6
  HTTP-message = framing-control-data header payload trailer

  framing-control-data  = request-line CRLF
                        / status-line CRLF
                        / hpack( *pseudo-header )

  header  = field-name ":" *OWSP field-value CRLF
          / hpack( *(field-name field-value) )

  payload  = *octet

  trailer  = header

HTTP/0 has just two frames (framing-control-data payload), previously described as (request-line CRLF payload).

HTTP/1 calls "framing-control-data" a "first-line" and puts the payload framing data into the headers frame Transfer-Encoding header.

but, if it does, we will eventually have something like this:

// Alex view of a possible distant future
namespace Http {
     /// HTTP message prefix (i.e. stuff before the body) parser.
     class MessageParser {...};

    namespace One {
        /// HTTP/1 message prefix (i.e. first-line plus headers) parser.
        class MessageParser: public Http::MessageParser {...};

        /// HTTP/1 request prefix (i.e. request-line plus headers) parser.
        class RequestParser: public MessageParser {...};

        /// HTTP/1 reply prefix (i.e. status-line plus headers) parser.
        class ResponseParser: public MessageParser {...};

        /// HTTP/1 chunked-encoding parser
        class TeChunkedParser {...}; // not a MessageParser child!
    }

    namespace Two {
        /// HTTP/2 message prefix (i.e. HEADERS payloads) parser.
        class MessageParser: public Http::MessageParser {...};
    }
}

The above has all the same mistakes from your view of current code. Worse, it compounds the "message" naming to conflate "message" with a single HTTP/2 "frame".

We have previously had a discussion about how "byte" could be as small as 7 bits or as large as 128 depending on CPU designs - such that "char == 1 byte" is trouble and "octet" more accurate descriptor term.

Similarly an HTTP "message" is semantically invariant but the I/O representation of message varies between versions from ASCII to binary and also can be spread out across packets or entire UDP connections - thus very much trouble. "frame" (like "octet") is a more precise term that describes "a sequence of adjacent octets" with shared semantic ordering in the protocol message.
The HTTP/2 and HTTP/3 parsers will need to handle the arrival of individual frames to incrementally parse the message.

Amos view of a possible not-so-distant future (HTTP/2 parts already implemented in PR #893, HTTP/3 parts undergoing testing in experimental branch not yet submitted)

namespace Http {
    /// The API to fetch HTTP semantic details.
    class Parser {...};

    /// HTTP header and trailer (i.e. mime) parser.
    class HeaderParser {...};

    namespace One {
        /// The API to fetch HTTP semantic details.
        using Parser = ::Http::Parser;

        /// The API to fetch MIME/1.0 semantic details.
        using HeaderParser = ::Http::HeaderParser;

        /// HTTP/1 request prefix (i.e. request-line plus headers) parser.
        class RequestParser: public Parser {...};

        /// HTTP/1 reply prefix (i.e. status-line plus headers) parser.
        class ResponseParser: public Parser {...};

        /// HTTP/1 payload encoding parser (i.e. chunk-size plus headers plus payload plus trailer)
        class TeChunkedParser: public Parser, public HeaderParser {...};
    }

    namespace Two {
        /// The API to fetch HTTP semantic details.
        using Parser = ::Http::Parser;

        /// The API to fetch MIME/1.0 semantic details.
        using HeaderParser = ::Http::HeaderParser;

        /// HTTP/2 frame (i.e. request-line/status-line/pseudo-headers, control-message, mime-header, payload, trailer) parser.
        class FrameParser: public Parser, public HeaderParser {...};

        /// HPACK decoder/parser.
        class HpackDecoder {...};
    }

    namespace Three {
        /// The API to fetch HTTP semantic details.
        using Parser = ::Http::Parser;

        /// The API to fetch MIME/1.0 semantic details.
        using HeaderParser = ::Http::HeaderParser;

        /// HTTP/3 frame (ie QUIC payload) parser.
        class FrameParser: public Parser, public HeaderParser {...};

       // work is incomplete, maybe parsers for HTTP payload encodings etc needed.
    }
}

@rousskov
Copy link
Contributor

You came close, however the "headers parser" is a separate "class HttpHeader" in the current code and not touched by this PR.

I think this is just a minor terminology misunderstanding. Today, Http::One::Parser isolates the message prefix (the start line plus the headers), parses the start line, and also does some preprocessing of the isolated message headers (e.g., unfoldMime()). There is also getHostHeaderField() that actually parses the headers, but we can ignore that oddity if you wish. I call those actions "parsing", but there are probably other words that can describe them. Tomorrow, these classes will do even more to extract information from the isolated message headers (probably using other helper classes), so "parsing" would be even more appropriate.

Whatever we end up calling those actions does not really change anything as far as this PR change request is concerned. The PR problem this change request is pointing out is not about naming things. It is about placing things.

This has several issues:

  1. using the word "message" - HTTP is parsed in sections, not a whole message at a time. These Parser are used by higher level code which is about assembling/processing "message" from randomly arriving I/O, they do not even see all of a message (eg the payload section). This is why the parent class is just called Parser today.

This is a similar minor and irrelevant naming disagreement. I see no problem with calling a message parser a MessageParser, even if that parser parses that message incrementally rather than "a whole message at a time".

These parses do not do much with message bodies, so we could name them MessagePrefixParser to be a bit more precise. I was following the current Squid naming tradition of ignoring the body aspect when naming message (prefix) parsers. Again, this naming disagreement is not relevant to the problem this change request is discussing.

  1. the "plus headers" notes on your parser. Nothing in Http namespace parse any of the Mime section (yet). One day hopefully, but this is not true of "current state".

I believe I have fully covered this naming misunderstanding at the beginning of this response.

HTTP/2 introduces the term "frame". ...
Latest IETF specification ... is https://datatracker.ietf.org/doc/html/draft-ietf-httpbis-semantics-19.txt#section-6

I did not notice any significant disagreements or conflicts in the above section of your response compared to what I have said. The framing-control-data BNF and related figures in your response appear to be merging two sets of different syntax rules into one, presumably to illustrate message structure similarities, but you do not need to convince me that all HTTP versions have similar message structure. Thankfully, IETF draft-ietf-httpbis-semantics-19.txt does not contain that kind of Frankenstein BNF, but I see no point in arguing about the quality of that model if you find it useful.

However, it may be important to point out explicitly that the "message abstraction" draft-ietf-httpbis-semantics-19.txt is talking about is represented by the Http::Message class (or Http::MessagePrefix or whatever name you prefer -- we are not discussing naming right now), not the proposed Http::Parser class (or Http::HeaderParser or whatever name you prefer)!


// Alex view of a possible distant future
namespace Http {
    ...
    namespace Two {
        /// HTTP/2 message prefix (i.e. HEADERS payloads) parser.
        class MessageParser: public Http::MessageParser {...};
    }
}

The above has all the same mistakes from your view of current code. Worse, it compounds the "message" naming to conflate "message" with a single HTTP/2 "frame".

Just to correct the misunderstanding in the second sentence: The API sketches I posted are not about parsing HTTP/2 frames. As far as HTTP/2 is concerned, the above Two::MessageParser sketch is about what parsing HEADERS frames payload. That parser receives HEADERS payload and extracts/creates the corresponding Http::Two::Message object (or equivalent). We will also parse the frames, of course, but that is a different story.

I hope I have addressed the other "mistakes" earlier in this response.

an HTTP "message" is semantically invariant but the I/O representation of message varies between versions from ASCII to binary and also can be spread out across packets or entire UDP connections - thus very much trouble.

I do not know why you are saying these well-known facts. You can safely assume that I know what HTTP message is, and how it is transferred. If something appears to suggest the opposite, there is probably another (correct) way to interpret what I said.

Amos view of a possible not-so-distant future

namespace Http {
    /// The API to fetch HTTP semantic details.
    class Parser {...};

    /// HTTP header and trailer (i.e. mime) parser.
    class HeaderParser {...};

    namespace One {
        /// The API to fetch MIME/1.0 semantic details.
        using HeaderParser = ::Http::HeaderParser;
    }

    namespace Two {
        /// The API to fetch MIME/1.0 semantic details.
        using HeaderParser = ::Http::HeaderParser;
    }
}

I (still) do not know what "The API to fetch HTTP semantic details" is so I am ignoring the Http::Parser class for now. Hopefully, that problem will be resolved once we resolve the HeaderParser problem.

HTTP/1 header syntax differs from HTTP/2 header syntax. Thus, we will need a distinct header parsing class for each protocol version (and not using Http::HeaderParser references to some version-agnostic header parser in the Http namespace). If those future HTTP/1 and HTTP/2 header parsers happen to share some code, it is possible that we will place that shared code into the version-agnostic Http::HeaderParser class. Right now, the existing header-handling code in Http::One::Parser is specific to HTTP/1 so it should not be moved into the version-agnostic Http namespace.

We may be making progress though: The above C++ sketch is a good illustration for some of the key problems this change request is about! And I hope my response will remove the naming issues out of the way (for now).

@squid-anubis squid-anubis added the M-failed-description https://github.com/measurement-factory/anubis#pull-request-labels label Aug 8, 2022
@squid-anubis squid-anubis removed the M-failed-description https://github.com/measurement-factory/anubis#pull-request-labels label Aug 23, 2022
@squid-anubis squid-anubis added M-failed-other https://github.com/measurement-factory/anubis#pull-request-labels and removed M-failed-other https://github.com/measurement-factory/anubis#pull-request-labels labels Feb 13, 2023
@squid-anubis squid-anubis added M-failed-other https://github.com/measurement-factory/anubis#pull-request-labels and removed M-failed-other https://github.com/measurement-factory/anubis#pull-request-labels labels Jan 9, 2024
@yadij yadij added the S-waiting-for-PR Closure of other PR(s), current or future, is expected (and usually required) label Oct 2, 2024
@yadij
Copy link
Contributor Author

yadij commented Oct 2, 2024

Waiting on PR #1908

@squid-anubis squid-anubis added M-failed-other https://github.com/measurement-factory/anubis#pull-request-labels and removed M-failed-other https://github.com/measurement-factory/anubis#pull-request-labels labels Oct 10, 2024
@kinkie kinkie added the M-ignored-by-merge-bots https://github.com/measurement-factory/anubis/blob/master/README.md#pull-request-labels label Nov 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
M-ignored-by-merge-bots https://github.com/measurement-factory/anubis/blob/master/README.md#pull-request-labels S-waiting-for-PR Closure of other PR(s), current or future, is expected (and usually required)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants