-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch to plurimath gem #149
Comments
@ronaldtse If you got some test suite by chance, or some technical description of the input format, that would be very helpful. |
I'll do my best, but it won't be very reliable. For example https://www.electropedia.org/iev/iev.nsf/display?openform&ievref=102-02-13 — I can probably detect and convert |
Thanks! I've extracted some from IEV. If @w00lf has a set of troublesome examples, it would be great to check them too. |
@ronaldtse Short follow-up: I'm doing fine with converting HTML math expressions to AsciiMath. It's certainly doable and I've already developed a tool which supports many features they use in IEV. The difficult part is telling HTML math from rich text apart. It's easy for a human but not necessarily for a computer. Detecting numbers isn't reliable, they may be used in different contexts. Detecting operators isn't reliable, because minus can be confused with dash. Detecting But perhaps it isn't needed at all? Perhaps we can keep HTML math as rich text, and IEC will gradually convert them to formulas during their ongoing work on these concepts? I know that it will take years. The question is if they really need anything more than that. And we need rich text conversion from HTML to AsciiDoc anyway. |
We have the following agreement with the IEV team on semantic enrichment:
Given that it is very difficult to bring semantic enrichment to 100%, I think best effort is acceptable. We have to further consider that any "units" used in the IEV should also be converted into semantic units, i.e. UnitsML. For now let's delegate the decision on what "good enough" in math means here to you, since you are knee deep in this 😉 |
Then I guess heuristics will do. |
I'm pretty sure that some concepts need to be fixed, otherwise we'll end up with nasty false positives. One example is https://www.electropedia.org/iev/iev.nsf/display?openform&ievref=102-03-30, this fragment precisely: |
In this case the heuristic could know that “forefinger” is too long for a math symbol, but it’s no way a great rule. Let us also report this to IEC. |
Length checks will not work. There are formulas which would be broken this way, for example this one in in https://www.electropedia.org/iev/iev.nsf/display?openform&ievref=112-01-13:
|
@ronaldtse What to do with lone Greek letters which aren't part of longer mathematical formulas like in following example (https://www.electropedia.org/iev/iev.nsf/display?openform&ievref=103-07-03):
|
True, this probably will require manual conversion.
They should be converted to normal |
@ronaldtse What to do if given formula cannot be represented in AsciiMath, typically due to unsupported symbols? For example in 103-03-01:
Fallback to MathML, perhaps? Using LaTeX or Unicode? Or do you have some better idea? Opening a feature request in AsciiMath may work too in a long run. |
Perhaps a better question would be: Given that AsciiMath is generally preferred but unsuitable for more complicated formulas, which syntax should be supplemental: LaTeX or MathML? |
Short follow-up. The current plan is:
While it sounds odd, there is a rationale for that. ad 1. HTML math is sequential in its nature, AsciiMath is sequential too, MathML is more structural. It's far easier to convert HTML math to AsciiMath than to MathML. My almost-complete-converter to AsciiMath is simpler and smaller than my work-in-progress-converter to MathML. ad 2. However, AsciiMath does not support some of the features used in MathML, especially special characters which need to be written in Unicode rather than using their English names composed of ASCII characters. That's why some HTML math formulas cannot be represented as AsciiMath in the easy-to-edit form. Or maybe I'm wrong and ad 3. However, AsciiMath is easier for users, and we want to have AsciiMath when possible. That's why we'll try to convert it back to AsciiMath and use some other notation when it's impossible. |
@skalee full agree with the statements. Steps 2-3 will normalize the asciimath so it’s good. |
This task will depend on the |
In order to reduce code duplication in projects, extract logic to another gem. It looks like the most up-to-date version is here: https://github.com/metanorma/stepmod-utils/blob/728bd50bf609afd6c7ef0a6848f45a8419a57819/lib/stepmod/utils/html_to_asciimath.rb.
Extracted from #144:
The text was updated successfully, but these errors were encountered: