Skip to content

Latest commit

 

History

History
185 lines (147 loc) · 7 KB

README.md

File metadata and controls

185 lines (147 loc) · 7 KB

SPASE RDF Tools

Toolset to produce SPASE RDF and explore the resulting Knowledge Graph.

The SPASE Knowledge Graph

The SPASE Knowledge is composed of two main parts:

  1. The SPASE ontology, which is an automatically generated OWL Ontology using the SPASE Base Model XSD file available here. The ontology generation algorithm takes every entity on the SPASE XSD file and turns it into an OWL Class, all the relationships between entities get mapped to owl:ObjectProperties and every literal property of each entity gets mapped to owl:DataTypeProperty, all properties get assigned their corresponding domain and range.
  2. SPASE RDF Individuals Data, which is an automatically generate TTL file containing RDF that represents the different SPASE resources on the XML files provided by hpde. This RDF complies with the SPASE Ontology.

RDF Exploration

Running on Docker with Docker compose

Requirements

Get the code

  • Clone this repo and its submodules:
      git clone --recurse-submodules -j8 [email protected]:polyneme/topst-spase-rdf-tools.git
      cd topst-spase-rdf-tools

Decompress pre-processed data

Decompress the pre-processed data under: topst-spase-rdf-tools/data/spase.ttl.zip:

  cd data
  unzip spase.ttl.zip
  cd ..

Setup

docker compose build

Execution

docker compose up

Open:

Running (almost) without Docker

Requirements

  • Java
  • Docker (ensure docker memory limit is much larger than 4GB. >16GB recommended.)
  • Python 3.8+

Setup

  1. Install and run Fuseki:

    • Download the latest version of Jena Fuseki from here. You can use:

      cd ~/Applications/
      curl https://archive.apache.org/dist/jena/binaries/apache-jena-fuseki-<fuseki_version>.zip -o apache-jena-fuseki-<fuseki_version>.zip
      

      Just replace <fuseki_version> with the latest version available:

      curl https://archive.apache.org/dist/jena/binaries/apache-jena-fuseki-4.9.0.zip -o apache-jena-fuseki-4.9.0.zip
      
    • Unzip the Jena Fuseki package:

      unzip apache-jena-fuseki-4.9.0.zip
    • Run Fuseki Server:

      cd apache-jena-fuseki-4.9.0
      ./fuseki-server
    • Open your browser to check your Fuseki is up and running: http://localhost:3030

  2. Load the RDF data into a Fuseki dataset:

    • Create a dataset by opening Fuseki on a browser and click on the add one link: Fuseki dataset creation step 1
    • Name the dataset spase and select the dataset type (choose persistent if you plan to re-use this dataset on future runs) and then click create dataset: Fuseki dataset creation step 2
    • Upload the pre-processed data, click on add data > select files and select the spase.owl and spase.ttl files under data, then click on upload all: Fuseki dataset creation step 3
    • Your new Fuseki dataset should be available under http://localhost:3030/spase
  3. Install and run the RDF Exploration Jupyter notebook:

    • Go to this repo directory:
      cd ~/git/spase-rdf-tools/ # replace with the right location
    • Get into the python package for the RDF Tools:
      cd spase_rdf_tools
    • [Optional] Create and activate a virtual environment:
      python3 -m venv venv
      source venv/bin/activate
    • Install python requirements:
         pip install -r requirements.txt
    • Setup Jupyter extensions for KG Exploration:
       jupyter nbextension enable  --py --sys-prefix graph_notebook.widgets
       python -m graph_notebook.static_resources.install
       python -m graph_notebook.nbextensions.install
       python -m graph_notebook.ipython_profile.configure_ipython_profile
    • Run the Jupyter notebook with the extensions:
         jupyter notebook --NotebookApp.kernel_manager_class=notebook.services.kernels.kernelmanager.AsyncMappingKernelManager --ip 0.0.0.0 ./
    • Open the Jupyter notebook URL and navigate to the SPASE RDF Exploration notebook.
    • For more information on the graph-notebook please check their repository.
  4. Install and run graph-explorer:

    • Clone the graph-explorer repo (a copy of the repo is included here as a git submodule):
        git clone https://github.com/aws/graph-explorer/
    • Navigate to the graph-explorer directory:
        cd  graph-explorer
    • Build the Docker image:
       docker build -t graph-explorer .
    • Start the Docker container:
      docker run -p 80:80 -p 443:443 --env HOST=localhost graph-explorer
    • Go to graph-explorer in your browser by opening https://localhost/explorer (Click on Advanced > Proceed to localhost if prompted): Fuseki dataset creation step 3
    • Add a new connection by clicking on plus sign in the top right corner, name the connection spase, choose RDF (Resource Description Framework) - SPARQL as Graph type, and set the endpoint value to http://localhost:3030/spase: Fuseki dataset creation step 3
    • Synchronise your connection and navigate your graph.
    • For more information on graph-explorer, please check their repository.

RDF Generation

Requirements

  • Python 3.9.0+

Install dependencies

  • Install python dependencies:
    pip install -r requirements.txt

Commands available

python3 spase_rdf_tools.py --help

Usage: spase_rdf_tools.py [OPTIONS] COMMAND [ARGS]...

Options:
  --help  Show this message and exit.

Commands:
  create-owl             Creates OWL Ontology using python module
  create-python-model    Creates Python model from XSD file using xsdata
  download-hpde          Downloads and decompress HPDE files from GitHub...
  download-spase-schema  Downloads SPASE XSD schema file from spase-group...
  generate-rdf           Creates TTL RDf File using python module to lo...

This is also available as a Jupyter notebook under spase_rdf_tools/SPASE RDF Generation.ipynb.