Team 4

Jump to bottom Edit New page

Hilmar Lapp edited this page Mar 23, 2015 · 12 revisions

Identifying and visualizing outliers in multi-variate summary statistics space

Members:

Katie Lotterhos (team lead)
Sara Schaal
Daren Card
Liuyang Wang
Caity Collins
Bob Verity

Goals

goals go here
To create a package for summarizing multivariate data that is a common product in population genetics analyses.
To construct a Shiny-based visualization medium for multivariate datasets.

Status

Day 2:

Collecting problems that need multi-variate summaries. Have several from people here at the event.
Feedback sought: data sets and problems needing Fst statistics. Especially sequence data.
Looking into Mahalanobis distance-methods, including the one implemented in hclust in R.
Created Github repo and package skeleton.

Day 3:

Have now real and simulated datasets from which sites under selection and neutral sites can be drawn.
Have a working connection to shiny app.
Facing the challenge that the statistics from different methods can have vastly different variances.
Working on plots that can be reproduced in shiny.

Day 4:

Working on statistics angle of multivariate summary.
Lots of progress with Shiny app. Challenges with circular dependencies.
Some nice charts and plots in the works. Aiming to use ggplot2, however that makes huge objects by storing data internally. No easy solutions out there.

Products

Status: We have a Github repository and skeleton package complete.

R package "MINOTAUR" (still under development)