-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dumpers and loaders: Separate concept of syntax and datamodel #687
Comments
Most packages I'm familiar with assume that |
cmungall
added a commit
that referenced
this issue
Dec 7, 2023
cmungall
added a commit
that referenced
this issue
Dec 21, 2023
cmungall
added a commit
that referenced
this issue
Mar 12, 2024
cmungall
added a commit
that referenced
this issue
Mar 13, 2024
cmungall
added a commit
that referenced
this issue
Mar 14, 2024
* First pass at oboformat and obographs conformance suite. See - owlcollab/oboformat#146 - geneontology/obographs#106 Note that that may move from the OAK repo in the longer term. * Allow choice of format when exporting OWL. Addung missing features to obo conversion * lint * lint * Lint. Removing prints. * Added format_utilities.py -- see #687 * lint * Fixing spelling mistakes * dumper * Fixed tests and dumper --------- Co-authored-by: Nico Matentzoglu <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Currently the Dumper.dump() method accepts a
syntax
argument, this is sometimes calledformat
in other frameworks, e.g. robot.The problem here is that this is frequently ambiguous, especially in the context of OAK which is pluralistic and supports multiple ways of modeling and serializing ontologies, e.g. owl mapped to rdf serialized as rdf/xml; skos (natively rdf), serialized to turtle.
This is compounded with loaders, where we might want to use a suffix to guess the underlying model and serialization and choose the appropriate parser. Unlike the owlapi, rdflib requires the format of rdf to be known in advance (and in my experience this is a good thing - there is a lot of confusion caused by the owlapi cycling through multiple parsers and models).
Examples:
.owl
clearly means the OWL data model, in the OBO universe this is conventionally mapped to RDF and serialized as RDF/XML, but outside this universe the serialization is more typically Turtle, and may not be an RDF serialization at all.xml
means OWL/XML as far as the OWLAPI is concerned, but rdflib uses this to mean RDF/XML (which is very different!).rdf
might typically mean some kind of RDF serialization of OWL, but SKOS is valid for the ontology-like artefacts in OAK and it can also be serialized as .rdf. Same for the extended RDFS-like model used by schema.org. On top of this, again, we don't know if this means rdf/xml, rdf/turtle, n-quads...On top of this, there are various aliases (e.g ttl vs turtle). Frameworks like pyoxigraphs use mime types to try and enforce some kind of standard but this seems overkill
Proposal:
model
argumentsyntax
and sensible defaultsThe syntax for bipartitle syntaxes would be
.model.syntax
. For example,.owl.ttl
,.skos.nt
There is a potential argument for a tripartite model here, because of owl mapping to rdf, and to reduce the ambiguity of
.owl.xml
. However, this is likely overkill.Something like
Unambiguous OWL syntaxes
.owx
.ofn
.omn
Model optional, if specified, MUST be
owl
OWL layered on RDF
owl.ttl
(akaturtle
)owl.nt
(akantriples
)owl.rdfxml
(maps toxml
syntax in rdflib)owl.jsonld
Non-canonical
.owl.xml
- discouraged, but default interpretation is.owx
.owl.rdf
- discouraged, but default interpretation is OWL layered on RDF and serialized as Turtle.owl
- discouraged, but default interpretation is OWL layered on RDF and serialized as Turtle(?) (or: RDF/XML, as per OBO)SKOS
.skos.{syntax}
As per OWL layered on RDF
OBO Format and OBOGraphs
TODO
Aliases
TBD: favor shorter form (i.e. suffix) as the canonical format name?
ttl
=turtle
rdfx
=rdfxml
nt
=ntriples
The text was updated successfully, but these errors were encountered: