-
-
Notifications
You must be signed in to change notification settings - Fork 146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Invalid rendering of a list followed by another block #371
Comments
Hiya! ❤️ Thanks for this comprehensive report; I'm pleased to see more and more folks using Comrak for Markdown manipulation and not just HTML rendering! Unfortunately, I don't have time to look into it fully at the moment; I'm moving overseas in a couple days. If any others would like to chip in (or compare/contrast with cmark-gfm's behaviour), that'd be awesome! |
A couple things I noticed about your AST, which you alluded to
When I look at an AST that is normally built, each item is wrapped in a paragraph, whether the list is tight nor not. So for example,
generates <?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE document SYSTEM "CommonMark.dtd">
<document sourcepos="1:1-2:5" xmlns="http://commonmark.org/xml/1.0">
<list sourcepos="1:1-2:5" type="bullet" tight="true">
<item sourcepos="1:1-1:5">
<paragraph sourcepos="1:3-1:5">
<text sourcepos="1:3-1:5" xml:space="preserve">one</text>
</paragraph>
</item>
<item sourcepos="2:1-2:5">
<paragraph sourcepos="2:3-2:5">
<text sourcepos="2:3-2:5" xml:space="preserve">two</text>
</paragraph>
</item>
</list>
</document> Now I'm basing this off of the AST output at https://gitlab-org.gitlab.io/ruby/gems/gitlab-glfm-markdown/?text=-%20one%0A-%20two%0A. But that should be accurate, and in fact when I look at the Lines 679 to 693 in 120a36c
So I don't think that is fragile, but probably the correct way. One other thing I noticed in the AST you provided is that you had two wdyt? |
@digitalmoksha thanks for your input. Indeed, this is what I ended up doing - wrapping every list item in a paragraph. I guess my point is that it's not obvious when reading the common mark specification, for example their AST definition, that items must contain a paragraph. Now this is how comrak parses markdown in practice, but in theory inline text or inline code should be allowed as well. And this alternative isn't rendered properly. Now I understand that this is more work to do to make the pretty printer compliant - as comrak is open-source and based on best effort, I think it's a reasonable answer to say "we always assume that items contain a paragraph node, and it's on you". But then maybe this invariant should be made clear somewhere; either in the documentation, or through the types themselves (by making it impossible to have an item without a paragraph somehow)? At least in my experience, it wasn't obvious at all what was the problem, from looking at the spec and at comrak types. Yet another possibility is to wrap item content as paragraph on the fly when pretty-printing (no need to actually allocate a new node I guess, but just take the same code path as if the content was wrapped in a paragraph when it's not), so that with minimal change the pretty-printer would correctly handle a larger class of ASTs. What do you think? I'm happy to help on either front - implementation or documentation. |
I'd make the observation that not every list item's contents necessarily shall be in a paragraph; e.g.: - ```
xyz
``` This produces the following XML AST: <?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE document SYSTEM "CommonMark.dtd">
<document xmlns="http://commonmark.org/xml/1.0">
<list type="bullet" tight="true">
<item>
<code_block xml:space="preserve">xyz
</code_block>
</item>
</list>
</document> But a code block is, like a paragraph, a block type. The spec and DTD are explicit — the spec not very clearly so, imo — that list items can only contain blocks, and therefore cannot contain e.g. a code inline. See § 5 Container blocks:
A container block can only have other blocks as its contents, and list items are container blocks. As you've noticed, the DTD agrees: <!ELEMENT item (%block;)*> So I think when you say:
By the theory of the spec, it should not be allowed, and it's just that Comrak's type definitions are too loose. The inline documentation is actually explicit about this too: Lines 42 to 44 in 56581d7
And Lines 419 to 425 in 56581d7
See also Lines 614 to 623 in 56581d7
I mention these not to try to make my point loudly, but because these functions (derived from the spec) are important to how the parser ensures the correct types of things are nested (and breaks out of elements when necessary), and also how it chooses what and when to process. Comrak parses things correctly for itself, of course, but it's not preventing users from creating a non-conformant document, with the result that its formatters then act out when supplied them. The options I can see right now are (reworded for clarity):
My feelings on them are:
Please let me know what you think! |
Thanks for the elaborate and considerate answer @kivikakk. I might have got confused by reading the AST definition, I thought I thus agree that you should then absolutely not make comrak non-compliant just for that use-case, it's not worth it (agree with feelings on 1.). I think 2. isn't that difficult, because block elements and inline elements are clearly delimited - but it would break backward compatibility (you could just have an additional layer of enum type
This is a good point, although I don't really remember where I tried to look for this in the first place 😅 - I clearly missed the documentation that you point to, though.
I think this sounds good, also the same question arises: how to make users aware that they should run |
It's a little more complicated than that. The node types you see in the Even if we could solve that (which would mean changing the tree type entirely), there's still complexity in that some block types can contain only other blocks; some block types contain only inlines; some inlines can contain other inlines; other inlines may not have any children at all. There's the question of how far we'd want to go too; see Without an exhaustive typed definition, there's always a bit further one could go, but it would also affect greatly how the parsing and processing goes, since the code (modeled after
I think I agree! A simple check can just run through the tree and ensure that |
Ah, I see. I might have overlooked the complexity.
Sure, I can take a stab at it! |
#425 now provides a method to detect the original offending case ( |
Thank you! |
We're using comrak to build a markdown document programmatically and output it to a file (as markdown). At some point, we produce a list of items, where each item is a code snippet contained in a
NodeValue::Code
.The
NodeValue::Code
nodes are directly added to the list item's children, which are themselves added to theNodeValue::List
node. Then some other content is appended.When rendered with
format_markdown
, comrak doesn't properly insert a new line as a separation between the list and the next element. If the next element turns out to be e.g. aParagraph
, then it is rendered as:instead of
This is an issue, because the first one is different semantically, being equivalent under the commonmark spec to:
Here is a debug print (in comrak 0.17.0) of the AST being invalidly rendered:
test2.log
Looking at the rendering code in
cm.rs
, I found that strange bit:comrak/src/cm.rs
Lines 407 to 421 in 26ad754
It turns out the renderer only inserts a newline after a list when the next node is code or another list. I don't really see why: we should always insert a new line I believe (excepted when the list is the last element of the children of a block).
However, the bug above doesn't occur when parsing and re-emitting markdown from a source file, e.g. with the
comrak
binary. After some investigation, it seemss thatcomrak
always parse list items asNodeValue::Paragraph
.Paragraph
does insert a new line after itself:comrak/src/cm.rs
Line 564 in 26ad754
However, when the list is tight, I think this bunch of code cancels out the newline inserted by paragraphs:
comrak/src/cm.rs
Line 100 in 26ad754
in_tight_list_item
doesn't seem to depend on the position of the item in the list). Or some other mechanism which let theself.need_cr
to2
and properly insert the required line.In any case, one possible workaround is to always wrap the content of list items in a
NodeValue::Paragraph
. However, it feels a bit fragile, and I think the renderer should always emit a new line, instead of relying on the unspoken invariant that list items are wrapped in paragraphs to function properly.If maintainers agree with the baseline, I can tentatively submit a patch.
The text was updated successfully, but these errors were encountered: