Skip to content

Latest commit

 

History

History
51 lines (41 loc) · 1.7 KB

README.md

File metadata and controls

51 lines (41 loc) · 1.7 KB

Dataset Cache CircleCI

A solution for downloading and caching datasets

How does it work?

dataset-cache allows you to download your datasets, cache them and guarantee that the data you're using is the correct one.

When a file is downloaded it will be saved to the output directory, then it will be hashed using sha256 and renamed to it's hash for caching.

Directories can be provided in the form of a tarball (tar.gz). After downloading, the contents will be extracted and the directory will be hashed using hash-then. The contents then will be saved in a directory named using the hash.

Use

As a library

Donwload and cache a file

dataset.get({
  url: 'https://raw.githubusercontent.com/empiricalci/fixtures/master/my-file.txt',
  hash: '8b4781a921e9f1a1cb5aa3063ca8592cac3ee39276d8e8212b336b6e73999798'
}, '/output_dir').then(function (data) {
  console.log(data) // {path: '..', hash: '..', valid: '..', cached: '..'} 
})

Donwload and cache a directory If directory: true the library will extract the .zip or .tar.gz

dataset.get({
  url: 'https://github.com/empiricalci/fixtures/raw/master/my-files.tar.gz',
  hash: '0e4710c220e7ed2d11288bcf3cf111ac01bdd0cb2a4d64f81455c5b31f1a4fbe',
  directory: true
}, output_dir).then(function (data) {
  console.log(data) // {path: '..', hash: '..', valid: '..', cached: '..'} 
})

Install multiple datasets at once

dataset.install({
  resource1: {url: '..', hash: '..'},
  dataset2: {url: '..', hash: '..'}
}, function (datasets) {
  console.log(datasets)
})