Skip to content

Goal: to find associations between dna data and visual traits.

License

Notifications You must be signed in to change notification settings

Imageomics/dna-trait-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

60 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

dna-trait-analysis

Goal: to find associations between dna data and visual traits.

Installation

conda create -n gtp python=3.10
conda activate gtp
pip install -e .

Set Options

To personalize options can either: - Change the configurations in src/gtp/configs/default_configs.yaml - Copy that file and create a new one. Then update DEFAULT_YAML_CONFIG_PATH in src/gtp/configs/loaders.py

Run Tests

To run tests, run the follwing command:

pytest tests

NOTE: these tests depend on the current configs and may need to be adjusted for your development environment.

Preprocess Data

To preprocess the Data, you can either:

  • Run the notebook in notebooks/preprocess_pipeline.ipynb.
  • Or, run the commands:
python -m gtp.pipelines.preprocess --method phenotype
python -m gtp.pipelines.preprocess --method genotype

NOTE: preprocessing genotype data takes a long time. You can add the num-processes argument to the command to help speed this up. 4 by default. Do not be fooled by the initial processing speed as smaller files are processed first.

Create Data Splits

To keep data splits consistent across runs, precompute the splits with the following command:

python -m gtp.pipelines.create_training_splits

Training the Model

Run the following command:

python -m gtp.pipelines.train --chromosome 18 --species erato --color color_3 --wing forewings --epochs 100 --batch_size 32 --exp_name debug

Calculate Attributions

To calculate the attributions of a single chromosome or entire genome (depending on configs: see experiment.genotype_scope), run the following command:

python -m gtp.pipelines.process_attributions --species erato --color color_3 --wing forewings --exp_name debug

NOTE: you have to have trained a model for every chromosome for a species in order for the above command to work on the entire genome. Otherwise, it'll only plot / calculate data for chromosomes with a trained model.

Plot Attributions

To plot the attributions of a single chromosome or entire genome (depending on configs: see experiment.genotype_scope), run the following command:

python -m gtp.pipelines.plot_attributions --species erato --color color_3 --wing forewings --exp_name debug

NOTE: you have to have trained a model for every chromosome for a species in order for the above command to work on the entire genome. Otherwise, it'll only plot / calculate data for chromosomes with a trained model.

About

Goal: to find associations between dna data and visual traits.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published