Skip to content
mgiresi edited this page Mar 23, 2015 · 34 revisions

Workflow vignette community website

Members:

  • Melissa Giresi (team co-lead)
  • Zhian Kamvar (team co-lead)
  • Hilmar Lapp
  • Stéphanie Manel
  • Simone Coughlan
  • Margarita López-Uribe
  • Nik Grunwald

Goals

  • develop main page skeleton. (README, USERS.md, DEVS.md, users/ devs/ .travis.yml, Makefile)
  • Github repository for this project is popgenInfo
  • Goals for Users as target audience
    • Introduction to R (very short)
    • Required packages and Software (booting)
    • Population genetics in R (the state of things)
  • Goals for Developers as target audience
    • Current R packages available
    • Classes available
    • Writing S3 classes
    • Writing S4 classes
  • Utilize continuous integration with Makefile to create resulting markdown files (Hilmar and Zhian)
  • Name the repository

Biological questions

  • Basic population genetic : genetic diversity , HW tests, …
  • Genetic differentiation; population based distances
  • Individual based distances
  • Landscape vs genetic data analysis
  • Parentage analysis
  • Inferring structure from individuals: detecting populations, detecting migrants, hybridization, ..
  • Inferring migration rate and route of migration: historic rate of migration ; recent gene flow (parentage analysis (?))
  • Inferring demographic history: effective size, time of divergence, coalescence demographic scenarios (Christine!)
  • Detecting the signal of selection : outlier detection method ; SNP/environment; Linkage disequilibrium methods; neutrality tests;
  • Association study: SNP/phenotype

Related work and resources

Bioconductor lists common workflows. These are populated by workflow documents written in Rmarkdown: How to write a workflow vignette. These apparently are essentially Rmarkdown documents which are then compiled to HTML (see RNAseq, user/pwd = readonly/readonly).

Status

Day 2:

  • Found out how Bioconductor does workflow vignettes. Will be following that concept.
  • Have directory structure. Subdirectories will have Rmd files, which will get rendered to HTML.
  • At the point of creating Makefile to automate rendering.
  • Developing spreadsheet of which packages are available / usable towards which biological use-case.
  • Community contribution would be through pull requests that comprise of Rmd files or changes. Will need basic documentation on how to contribute, but then should be very amenable community contribution.

Day 3:

  • Working in a number of workflow vignettes written in Rmarkdown. Dividing this by type of input (starting) data.
  • Importing data in R will be next.
  • Wrapping this all into a website. Have spoken with Carl Boettiger about best ways to get this automated through Continuous Integration.
  • Documentation about how to contribute also coming along.
  • Some ideas but open question about how best to disseminate.

Day 4:

  • Achieved much better clarity on how the website will be organized.
  • Have 3 workflow vignettes nearly done. 4 more are in the works.
  • Have rendering pipeline in the form of locally working pipeline. Working on dockerizing and adding deployment step.
  • Working with sequences in R has been quite challenging. Especially working with independent loci; difficult to read data in a generalizable manner.
  • Need suggestions how best to convert between data structures in R and related perspectives from package developers.
  • Also short workflows that involve multiple packages.

Day 5 show-and-tell:

Products

  • Community website
    • Hosted as Github Pages. Source is managed as a Github repository. Community contributions are through pull requests.
    • Pull requests are automatically tested for successful rendering to HTML, and status is indicated on the pull request page (see NESCent/popgenInfo#14 for an example).
    • Website is automatically rebuilt and deployed through continuous integration upon every commit to the master branch. Current status is indicated as a status badge: Circle CI
  • Docker image for a reproducible and transparent software environment for population genetics in R.
  • 6 workflow vignettes (4 still awaiting posting to Github).
    1. Population genetic statistics from sequence data (HTML rendering)
    2. Population structure from sequence data (HTML rendering)
    3. [Population genetic statistics from SNP data] :coming soon
    4. [Population structure from SNP data] : coming soon
    5. [Population genetic statistics from microsatellite data] : coming soon
    6. [Population structure from microsatellite data] : coming soon

Plan for follow-up

  • Write Developer vignettes (Zhian)
    • Intro
    • List of basics
    • How to write S4 and S3 classes, with adegenet and strataG as examples, respectively.
  • Table of Contents (Zhian)
    • Fill out markdown files. (everyone else)
  • Team will meet via google-hangouts on a bi-weekly (every two week basis) to discuss progress/issues
    • Fill out availability on google-drive spreadsheet (everyone)