-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Output NISO STS XML format from an ISO OBP HTML #7
Comments
ruby 3.3.0, branch |
@roberthopman sorry for the delay in replying!
No, it is supposed to provide content. Right now, the output is incorrect. The task is to fix the output. So there are 3 steps:
|
This is now updated in #6, with now a document structure created using the It already does a reasonable transform of the HTML file into STS by declarative building. There are a number of TODOs in the code:
We're getting there. |
@ronaldtse I have a few questions related to this
Is there any guidelines or mapping available for all the HTML classes to sts-ruby classes? Or is there some example documents that I can use as reference related to how the expected output should be for the |
From:
The ISO OBP HTML is actually rendered from data of an XML format called "NISO STS" (the ISO flavor of it).
Instead of just the HTML output, we also want to output the NISO XML format.
Use case
Some ISO authors have to start documents from the ISO website as they are unable to obtain the STS files.
On the ISO OBP, informative content, such vocabulary, is freely available. The best way is to give them an automated way to extract this data.
Mechanism
The steps shall be as follows:
index.html
for a particular URNindex.html
, convert it into an STS XML document (using the code in the PR Add obp2sts command and related code to convert OBP HTML to STS XML #6), and then using the newsts
gem to write it as STS XML.This is a Ruby script that somewhat parses
index.html
, it's not yet complete. It is provided in:CLI:
=> writes out:
output/index.html.xml
: STS XML file generated by obp2stsoutput/index.html.sts.xml
: STS XML file generated by thests
gem givenoutput/index.html.xml
as inputLibrary:
Work to be done
StsHtml
class completely converts all content from HTML to STSStsHtml#to_xml
is properly parseable by thests
gem (main branch)The text was updated successfully, but these errors were encountered: