Skip to content
This repository has been archived by the owner on Jan 10, 2025. It is now read-only.

Define a schema for the different components of the smooshr data model #84

Open
stuartlynn opened this issue Oct 26, 2020 · 0 comments
Open

Comments

@stuartlynn
Copy link
Contributor

As we move to a different storage system and way of representing operations on a dataset, we will need a more robust schema. Currently, the very simple schema we have is

  • Project: Contains multiple datasets
  • Dataset: represents the full dataset as a set of summary data and multiple Columns and MetaColumns
  • Column: Represents a column in the original dataset, has a name and a list of unique entries
  • MetaColumn: A simple way of treating two columns as 1, this ultimetly gets merged in to a single column when we run the code output
  • Entry: A unique entry in a column which has a value and the number of times it occurs in that column
  • Mapping: A collections of entries for a specific column that will be mapped to another value,

We probably want to rethink this schema to make it a lot more rhobust to other tasks we want to run in smooshr.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant