Skip to content
Hilmar Lapp edited this page Mar 23, 2015 · 12 revisions

Identifying and visualizing outliers in multi-variate summary statistics space

Members:

  • Katie Lotterhos (team lead)
  • Sara Schaal
  • Daren Card
  • Liuyang Wang
  • Caity Collins
  • Bob Verity

Goals

  • goals go here
  • To create a package for summarizing multivariate data that is a common product in population genetics analyses.
  • To construct a Shiny-based visualization medium for multivariate datasets.

Status

Day 2:

  • Collecting problems that need multi-variate summaries. Have several from people here at the event.
  • Feedback sought: data sets and problems needing Fst statistics. Especially sequence data.
  • Looking into Mahalanobis distance-methods, including the one implemented in hclust in R.
  • Created Github repo and package skeleton.

Day 3:

  • Have now real and simulated datasets from which sites under selection and neutral sites can be drawn.
  • Have a working connection to shiny app.
  • Facing the challenge that the statistics from different methods can have vastly different variances.
  • Working on plots that can be reproduced in shiny.

Day 4:

  • Working on statistics angle of multivariate summary.
  • Lots of progress with Shiny app. Challenges with circular dependencies.
  • Some nice charts and plots in the works. Aiming to use ggplot2, however that makes huge objects by storing data internally. No easy solutions out there.

Products

Status: We have a Github repository and skeleton package complete.