Skip to content

metanorma/isodoc

isodoc: Processor to generate HTML/Word from Metanorma XML

Gem Version Build Status Code Climate Pull Requests Commits since latest

Purpose

This Gem converts documents in the Metanorma document model into HTML and Microsoft Word.

Usage

The Gem contains the subclasses Iso::HtmlWordConvert (for HTML output) and IsoDoc::WordConvert (for Word output). They are initialised with the following rendering parameters:

i18nyaml

YAML file giving internationalisation equivalents for keywords in rendering output; see https://github.com/metanorma/metanorma-iso#document-attributes for further documentation

bodyfont

Font for body text

headerfont

Font for header text

monospacefont

Font for monospace text

titlefont

Font for document title text (currently used only in GB)

script

The ISO 15924 code for the main script that the standard document is in; used to pick the default fonts for the document

alt

Generate alternate rendering (currently used only in ISO)

compliance

Generate alternate rendering (currently used only in GB)

htmlstylesheet

Stylesheet for HTML output

htmlcoverpage

Cover page for HTML output

htmlintropage

Introductory page for HTML output

scripts

Scripts page for HTML output

scripts-pdf

Scripts page for HTML > PDF output

wordstylesheet

Stylesheet for Word output

standardstylesheet

Secondary stylesheet for Word output

header

Header file for Word output

wordcoverpage

Cover page for Word output

wordintropage

Introductory page for Word output

ulstyle

Style identifier in Word stylesheet for unordered lists

olstyle

Style identifier in Word stylesheet for ordered list

suppressheadingnumbers

Suppress heading numbers for clauses (does not apply to annexes)

The IsoDoc gem classes themselves are abstract (though their current implementation contains rendering specific to the ISO standard.) Subclasses of the Isodoc gem classes are specific to different standards, and are associated with templates and stylesheets speciific to the rendering of those standards. Subclasses also provide the default values for the rendering parameters above; they should be used only as overrides.

e.g.

IsoDoc::Convert::Iso.new(
  bodyfont: "Zapf Chancery",
  headerfont: "Comic Sans",
  monospacefont: "Andale Mono",
  alt: true,
  script: "Hans",
  i18nyaml: "i18n-en.yaml"
)

The conversion takes place with a convert method, with three arguments: the filename to be used for the output (once its file type suffix is stripped), the XML document string to be converted (optional), and a "debug" argument (optional), which stops execution before the output file is generated. If the document string is nil, its contents are read in from the filename provided. So:

# generates test.html
IsoDoc::Iso::HtmlConvert.new({}).convert("test.xml")

# generates test.doc, with Chinese font defaults rather than Roman
IsoDoc::Iso::WordConvert.new({script: "Hans"}).convert("test.xml")

# generates test.html, based on file1.xml
IsoDoc::Iso::HtmlConvert.new({}).convert("test", File.read("file1.xml"))

# generates HTML output for the given input string, but does not save it to disk.
IsoDoc::Iso::HtmlConvert.new({}).convert("test", <<~"INPUT", true)
  <iso-standard xmlns="http://riboseinc.com/isoxml">
    <preface><foreword>
    <note>
      <p id="_f06fd0d1-a203-4f3d-a515-0bdba0f8d83f">These results are based on a
      study carried out on three different types of kernel.</p>
    </note>
    </foreword></preface>
  </iso-standard>
  INPUT
Note
In the HTML stylesheets specific to standards, the Cover page and Intro page must be XHTML fragments, not HTML fragments. In particular, unlike Word HTML, all HTML attributes need to be quoted: <p class="MsoToc2">, not <p class=MsoToc2>.

Converting Word output into “Native Word” (.docx)

This gem relies on html2doc to generate Microsoft Word documents.

Please see this post-processing procedure to convert output into a native-docx document.