Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ppx. #89

Merged
merged 35 commits into from
Apr 1, 2016
Merged

Add ppx. #89

merged 35 commits into from
Apr 1, 2016

Conversation

aantron
Copy link
Contributor

@aantron aantron commented Feb 26, 2016

This is a preliminary (but working) implementation of a PPX rewriter for TyXML, based on the HTML5 parser in Markup.ml. Here are usage examples, paired with the dumped TyXML expressions that are generated:

Basic

let Html5 = Html5.M
let _ = [%tyxml "<img src='foo' alt='bar' id='baz'>"]
let _ =
  [Html5.img ~src:(Html5.Xml.W.return "foo") ~alt:(Html5.Xml.W.return "bar")
     ~a:[Html5.a_id (Html5.Xml.W.return "baz")] ()]

Antiquotation

let Html5 = Html5.M
let _ = [%tyxml "<p>foo<em>bar</em></p><p>" (pcdata "bar") "</p>"]
let _ =
  [Html5.p
     (Html5.Xml.W.cons (Html5.pcdata "foo")
        (Html5.Xml.W.cons
           (Html5.em
              (Html5.Xml.W.cons (Html5.pcdata "bar") (Html5.Xml.W.nil ())))
           (Html5.Xml.W.nil ())));
  Html5.p (Html5.Xml.W.cons (Html5.pcdata "bar") (Html5.Xml.W.nil ()))]

SVG

let Html5 = Html5.M
let Svg = Svg.M
let _ = [%tyxml "<p><svg><g id='foo'/></svg></p>"]
let _ =
  [Html5.p
     (Html5.Xml.W.cons
        (Html5.svg
           (Svg.Xml.W.cons
              (Svg.g ~a:[Svg.a_id (Svg.Xml.W.return "foo")]
                 (Svg.Xml.W.nil ())) (Svg.Xml.W.nil ())))
        (Html5.Xml.W.nil ()))]

Errors have location information, although without aantron/markup.ml#6, these are points instead of ranges, and, for errors inside a tag, the location reported is the start of the tag:

let _ = [%tyxml "<ul><ul></ul></ul>"]
File "test.ml", line 1, characters 18-18:
Error: This expression has type
         ([> Html5_types.ul ] as 'a) Html5.elt Html5.Xml.W.tlist =
           'a Html5.elt list
       but an expression was expected of type
         ([< Html5_types.ul_content_fun ] as 'b) Html5.elt Html5.list_wrap =
           'b Html5.elt list
       Type 'a Html5.elt = 'a Html5.M.elt is not compatible with type
         'b Html5.elt = 'b Html5.M.elt 
       Type 'a = [> `Ul ] is not compatible with type
         'b = [< `Li of Html5_types.li_attrib ] 
       The second variant type does not allow tag(s) `Ul

This would resolve #68. I will amend the commit to say "Closes #68" before this is merged into master, when review is done.

This makes TyXML depend on Markup.ml. However, projects using TyXML would not pull in Markup.ml as a run-time dependency, because it is used only in TyXML's PPX.

How it works and reading order

The PPX itself does a pretty straightforward translation of a Markup.ml signal stream into a TyXML expression tree. However, since different TyXML elements and attributes have different signatures, some type information is needed for the PPX to be able to assemble the TyXML tree correctly. This type information is generated by a second PPX rewriter called ppx_reflect, which is a build tool that runs on html5_sigs.mli, svg_sigs.mli, and html5_types.mli.

For a bottom-up review, you can look at the files in order: ppx_common.mli, ppx_attribute_value.mli, ppx_element_content.mli, ppx_sigs_reflected.mli, ppx_reflect.ml, ppx_namespace.mli, ppx_attribute.mli, ppx_element.mli, ppx_tyxml.ml.

Notes

  • There is an ambiguity in interpeting newlines: was the newline inserted by the user using \n, or by creating a new line in the file? This affects how source location information is reported. I chose to always assume the latter, because it seems that people would want to have a multiline template more than inserting pointless \n escapes into markup meant for TyXML trees – unless, perhaps, they are creating <pre> elements.

Issues

(Edited to reflect status)

  • Support for date format in <input> min, max. Is there a good library for parsing this?
  • User-facing documentation. Considering saving this for another PR.
  • Make strict parsing optional.
  • Support for namespaced attributes like xmlns, xml:space, etc.
  • Proper OASIS packaging: a Findlib package which provides an executable but no library. For now, I created a library package that has no visible modules.
  • Don't assume Ocaml ≥ 4.02. Did the OASIS flag eliminate this concern?
  • Generalize antiquotations. They are currently only supported as the entire content of an element. It should be possible to fix this after dealing with Improve location reporting aantron/markup.ml#6. That should allow antiquotations in attributes and interspersed in content. We should probably leave this for a future PR.
  • Switch to re completely.
  • Break up Ppx_attributes.parse and Ppx_tyxml.markup_expr.
  • Change strategy for merging antiquotations with parser output (explained
    in a comment in the code).
  • Consider changing the quotation name.
  • Consider allowing specification of the module name used for qualifying combinators. I assumed that the HTML5 module in scope will be called Html5, and similarly there will be a module Svg. This is in line with what pa_tyxml expects. However, these obviously shadow the real Html5 and Svg.
  • Deal with string delimiter representation in 4.03.
  • Ensure 4.03 compatibility.
  • Fix labeled argument filter (see comments below).
  • Module-qualify top-level calls to pcdata and other combinators.
  • Whether the quotation is HTML or SVG is determined automatically from the elements inside, but this does not work for a, because there are a elements in both HTML and SVG. Deal with this.
  • Document clearly how locations are adjusted before being reported to users in errors.
  • Annotate a_fs_rows and a_fs_cols.

Also, as we have agreed, the PPX would benefit from having a test suite (as would the rest of TyXML).

@aantron
Copy link
Contributor Author

aantron commented Feb 26, 2016

@Drup Regarding #72, eventually, all attributes whose TyXML functions don't have "normal" names should be marked with [@@reflect.attribute ...], so that should make them easier to find and deal with on an ongoing basis. Right now, I think I have all of the ones in html5_sigs.mli marked, but none of the ones svg_sigs.mli (an oversight). Will address that on the next amend, after I take a little break :)

@Drup
Copy link
Member

Drup commented Feb 26, 2016

This is incredible work ! I'll review a bit later, but just to answer some of your comment:

  • I still believe we should support sloppy parsing, for two reasons. 1) People will want to copy/paste some html from the web, and even if the typing is strict, given that the parser is more resilient, we can accept it (type systems are not know to bend gracefully when constrained). 2) This: Templating and ppx for tyxml #68 (comment)

Also, is it possible to only raise warnings for those errors ?

I am not sure what is the proper way to declare, in OASIS, a findlib package which provides an executable but no library. For now, I created a library package that has no visible modules.

You can look at lwt's ppx for an example.

  • I think we should change the quotations, at least to support pure svg quotations too. [%html ..] and [%svg ...] looks good to me (with the qualified version tyxml.html ..). Also, it could be quite handy to have this possible: [%html.Hmlt5.R .. ] and then it uses the specified module.

@Drup Drup added this to the 4.0 milestone Feb 26, 2016
@aantron
Copy link
Contributor Author

aantron commented Feb 26, 2016

I'm open to changing the name of the quotation. However, I want to note that it is already possible to have a pure SVG quotation. You simply write SVG and Markup.ml will detect what it is:

let _ = [%tyxml "<g/>"]
let _ = [Svg.g (Svg.Xml.W.nil ())]

Unfortunately, this is problematic with the a element, which occurs in both HTML and SVG.

@dinosaure
Copy link
Member

👍 I need this!

@Drup
Copy link
Member

Drup commented Mar 18, 2016

I'm reading through the code more seriously this time. It's actually a bit bigger than I expected

  • The implementation of ppx_reflect is not very reasonable. Either we use ppx_tools/rewriter, or we make a custom implementation. I'm probably going to do that myself.
  • str will have to go. Either use re or one of the string libraries.
  • I'm not fond of your coding style, but it's documented, which is very good. I'm going to change various things in the coding style regardless.
  • Ppx_attributes.parse and Ppx_tyxml.markup_expr are really too big/convoluted, though, we will need to break them up a bit.
  • The comments inside ppx_tyxml, and in particular the details about the overuse of references, should probably be solved before release.

@aantron
Copy link
Contributor Author

aantron commented Mar 18, 2016

I wasn't aware of ppx_tools/rewriter, but it makes sense. By "not very reasonable", do you mean that it is packaged as a ppx whose actual purpose is a side effect? In that case, I agree. Otherwise, please clarify. Also, please let me know soon if '"probably" going to do it [yourself]' becomes definitely will or will not, so I can decide what to focus on.

Do you want me to wait on amendments so you can make the coding style adjustment, or you want to do it at the end?

@Drup
Copy link
Member

Drup commented Mar 18, 2016

do you mean that it is packaged as a ppx whose actual purpose is a side effect

That, and the raw string emission.

I'm definitely going to do it, yes. I'm putting them there: https://github.com/ocsigen/tyxml/tree/ppx

@Drup
Copy link
Member

Drup commented Mar 18, 2016

@aantron Could you detail what should be the content of labeled_attributes exactly ? The implementation of reflect seems to contain a weird heuristic which doesn't work anymore on 4.03.

@Drup
Copy link
Member

Drup commented Mar 29, 2016

@aantron I realized we can't transform ppx_reflect into a proper ppx: the input is a signature and the ouput is a module. I cleaned up the code emission to use ast_helper instead.

I changed Str into Re_str, but we still need to replace it by proper use of re, but it can wait.

@aantron
Copy link
Contributor Author

aantron commented Mar 29, 2016

@Drup: Oops, I somehow missed the question about labeled_attributes. It is a reflection of which elements have attributes whose values must be passed as labeled arguments in TyXML, e.g. a's ~src. Hope that answers it. I will make a note to add this information to the comment in ppx_sigs_reflected.mli.

I can change to proper usage of re – it shouldn't be a problem. BTW, the only reason I used Str to begin with was to avoid adding a dependency myself, but since you now have TyXML depending on re, I won't use it (or rather Re_str) anymore.

What is your approximate time table for releasing 4.0? I don't, of course, intend to be making last-minute changes – hope to be done, except for nits, well before release. But I do need to take a bit more of a break from (public) OCaml, at the moment.

@Drup
Copy link
Member

Drup commented Mar 29, 2016

What is your approximate time table for releasing 4.0

When it's ready, as early as possible. End of week would be nice.

It is a reflection of which elements have attributes whose values must be passed as labeled arguments in TyXML

Then I really don't understand the code. Why are you using this sort of weird heuristic instead of just looking if there is a label ?

@aantron
Copy link
Contributor Author

aantron commented Mar 29, 2016

Can you clarify what you mean by heuristic? Are you referring to the additional filtering beyond just looking for labels? Not all labeled are arguments are used to pass attributes, some are used to pass child elements, such as ~figcaption to figure.

@Drup
Copy link
Member

Drup commented Mar 29, 2016

By heuristic, I'm mostly talking about things like that.

@aantron
Copy link
Contributor Author

aantron commented Mar 29, 2016

Yes, in 4.02 the first case matches types of the form _ wrap when the label is required, which in the signature all happen to be attribute types. The second case matches types of the form _ elt wrap when the label is optional, which all happen to be element types, so the reflector needs to disregard those labels. The third case is like the first, but for optional arguments. The fourth case is for everything else, which is never an attribute.

This code does not make sense as is with the different representation of labeled arguments in 4.03.

In any case, even for the 4.02 version, I need to rearrange it and comment it. I had trouble comprehending it a month on after writing it.

By the way, what are we doing about the parts of the AST that are incompatible between 4.02 and 4.03?

@Drup
Copy link
Member

Drup commented Mar 30, 2016

As discussed, I pushed a version that replaces the handling of antiquotations by inserting dummy elements, letting markup work and replace them afterwards. It's quite brittle right now, since everything is accepted everywhere. I'll revisit tomorrow.

By the way, what are we doing about the parts of the AST that are incompatible between 4.02 and 4.03?

I would really like to avoid cppo, so we will have to be creative. :]

Yes, in 4.02 the first case matches

I think we should just annotate such functions arguments with [@attributes]. It's only for special labeled attributes, there aren't that many of them, so it's fine.

@aantron
Copy link
Contributor Author

aantron commented Mar 30, 2016

As discussed, I pushed a version that replaces the handling of antiquotations by inserting dummy elements, letting markup work and replace them afterwards. It's quite brittle right now, since everything is accepted everywhere. I'll revisit tomorrow.

Ok, thanks. Do you want to apply this to attribute antiquotations yourself as well, or I will do it?

By the way, what are we doing about the parts of the AST that are incompatible between 4.02 and 4.03?

I would really like to avoid cppo, so we will have to be creative. :]

I haven't done an exhaustive search recently, but IIRC the only sore point now is the new type Parsetree.constant replacing Asttypes.constant. cppo might actually be a nicer solution than some others I can immediately think of, such as conditionally compiling different versions of some module. We could also contribute to ppx_tools, they already have Ast_convenience.get_str. We could add get_delimiter to give the delimiter instead (cc @alainfrisch). But then we would be waiting for the next ppx_tools releases for both 4.02 and 4.03.

I think we should just annotate such functions arguments with [@attributes]. It's only for special labeled attributes, there aren't that many of them, so it's fine.

In that case we will have to maintain a small parser for that kind of annotation. IMO it's much easier to keep the approach we have now. The labeled argument attributes are actually quite regular, in that their name always matches the name of the attribute, and their nature is always given by their type (α wrap where α is not _ elt). That makes them easy to detect. Adding and maintaining annotations that carry the same information seems redundant and somewhat error-prone.

let gen_token, find_expr =
let r = ref 0 in
let tbl = H.create 17 in
let make_id () = incr r ; "tyxml" ^ string_of_int !r in
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me that a more "distinctive" (ugly) prefix than tyxml would be a good idea.. just in case of conflict with something the user chooses somehow.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that can be changed later easily, let's make it work before :)

@Drup
Copy link
Member

Drup commented Mar 30, 2016

So, the new implementation of the placeholder ... is a bit tricker than expected. Issues are things like that:

utop # [%tyxml "<p>foo" (pcdata "bar") "</p>"] ;;
let _ = [Html5.p (Html5.Xml.W.cons (Html5.pcdata "footyxml1") (Html5.Xml.W.nil ()))];;
- : [> Html5_types.p ] Html5.elt list = [<p>footyxml1</p>]

utop # [%tyxml "<p id=foo"bla"></p>"] ;;
let _ = [Html5.p ~a:[Html5.a_id (Html5.Xml.W.return "footyxml1")] (Html5.Xml.W.nil ())];;
- : [> Html5_types.p ] Html5.elt list = [<p id="footyxml1"></p>] 

Basically, we need to do plaintext search. I'm not really fond of the idea. We also need a better token (with a terminator, in particular).

@aantron
Copy link
Contributor Author

aantron commented Mar 30, 2016

Yeah, I assumed you were going to do a search in content. It's trickier for attributes, however, because not all attribute types support a (immediately) sensible notion of concatenation (e.g. concatenating a prefix "5" with an AST standing for some integer expression, in the case of an integer-valued attribute). So it may make sense to ban antiquotations that are not the entire attribute value.

@Drup
Copy link
Member

Drup commented Mar 30, 2016

@aantron Ok, the whole thing is implemented now. The error cases are mostly decent. I need to clean up a bit more, but that's okay. Locations need a lot of work, though.

There is one error case that is not great:

# let bar = "p" [%tyxml "<"bar">" "</"bar">"] ;;
Error: Bad token '(' in tag: invalid start character

It should be rejected, clearly, but we need a better failure mode.

It's trickier for attributes, however, because not all attribute types support a sensible notion of concatenation

I decided to simply forbid it. It's simpler and more uniform. It works quite well now! :)

@Drup
Copy link
Member

Drup commented Mar 30, 2016

@aantron I completed the test framework to work with the ppx (and added some basic ones).

@Drup
Copy link
Member

Drup commented Mar 31, 2016

I just added comment handling. @aantron What is ``PI` ?

@aantron
Copy link
Contributor Author

aantron commented Mar 31, 2016

``PI is XML processing instructions (`). Per standard, the HTML parser does not emit them, but the writer accepts them.

@Drup
Copy link
Member

Drup commented Mar 31, 2016

I fixed up ppx_reflect a bit wrt argument extraction, so it should be a bit more ready for 4.03 (not tested yet, will do a second run when we have the bits we need in ppx_tools).

I also added the following form in the ppx extension: [%svg ..] [%html5 ... ]. They can be prefixed by tyxml. for disambiguation and can be follow by a module name that will be used as implementation.
For example: [%html5.F ...]

  1. Location errors are .. not great.
  2. There is no enforcement of namespacing yet.

@aantron
Copy link
Contributor Author

aantron commented Mar 31, 2016

Alright. Regarding locations, are you going to fix them, since you're on a roll? Let me know if you want me to jump in :p

Taking care of 2. in aantron/markup.ml#12 as per IRC discussion, but still thinking about it more carefully first.

@Drup
Copy link
Member

Drup commented Mar 31, 2016

Next things I plan to do:

  • Minor location fixes (for things not related to Markup's location directly)
  • Documentation

@aantron
Copy link
Contributor Author

aantron commented Mar 31, 2016

Ok, after giving it more thought, I'm not convinced that specifying the language to Markup.ml is necessary for usage in TyXML (or a good idea). Given that it would have a complicated interaction with element context, I'd rather put it on hold.

For TyXML's [%html5 ...] and [%svg ...] I think you want to do something like this:

  • Select context depending on the quotation:

    • If [%html5], parse with ?context:None
    • If [%svg], parse with ?context:(Some (Fragment "svg")`

    The only purpose of ?context here is to disambiguate the HTML and SVG a elements. In all other cases, you can rely on Markup.ml to "synthesize" the namespace.

  • Take the namespaces of top-level elements, and check they match the quotation. Here, make an exception for the svg element – if [%html5], it becomes Html5.svg.

I think that covers all cases – IIRC a and svg are the only elements that need any special treatment.

EDIT: I guess I could move this kind of check into Markup.ml (modulo special-casing of svg, that is a TyXML thing), but want to get your opinion.

@aantron
Copy link
Contributor Author

aantron commented Mar 31, 2016

If you are keeping the [%tyxml] quotation, you may also want to find the first element in the resulting list AST, get its language module, and qualify the top-level pcdata and tot (comments) function ASTs with that same module. For [%html5], [%svg], the module should already be chosen by the quotation.

Drup and others added 25 commits April 2, 2016 00:43
Use Re_str temporarly.
Also added OPAM dependency on ppx_tools and conditioned building of
ppx_reflect on --enable-ppx.
It works with string only because we emit a .ml, it wouldn't work if
we were giving the generated AST to the compiler ...
Accepted syntaxes:
%tyxml (auto)
%(tyxml.)html5(.Mod)
%(tyxml.)svg(.Mod)
Only builds by the "make" command are fixed by this change.
TyXML does not have <frameset> or <frame>, so these attributes are
pointless.

Also removed PPX machinery for handling these attributes' values.
@Drup
Copy link
Member

Drup commented Apr 1, 2016

Let's merge this first version, we will continue work in various other PRs!

@Drup Drup merged commit 173dbd4 into ocsigen:master Apr 1, 2016
@aantron
Copy link
Contributor Author

aantron commented Apr 1, 2016

🎉

Agreed. The PR was getting quite monstrous.

@Drup
Copy link
Member

Drup commented Apr 1, 2016

For TyXML's [%html5 ...] and [%svg ...] I think you want to do something like this: [...]

This looks good. I also think we don't need the [%tyxml] notation. Can you implement this ?

@aantron aantron deleted the ppx branch April 21, 2016 14:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Templating and ppx for tyxml
3 participants