-
Notifications
You must be signed in to change notification settings - Fork 2
Team 2
Hilmar Lapp edited this page Mar 23, 2015
·
18 revisions
Members:
- Thibaut Jombart (team lead)
- Emmanuel Paradis
- Klaus Schliep
- Jerome Goudet
- goals go here
Day 2:
- Reviewed what is available in R for working with VCF files. There is one package (
popgenome
), not very easy to use. - Feedback sought: how big will your data files be in 5 years from now?
- my guess is 106 loci, on hundreds to thousands individuals (we are already at 105 loci, and sequencing costs keep falling) [jerome]
- Plan next to optimize
geneind
code to reduce memory consumption. - Plan to interface
hierfstat
withadegenet
andpegas
by making use of the classgeneind
andloci
Day 3:
- able to read VCF for 1000 Genome project in less than a minute (not including genotype).
- reworking package '5' to load data faster, including ploidy
- now looking into how data can be moved faster and more seamlessly into
hierfstat
fromadegenet
- Can perhaps also look into being compatible for individual-lacking VCF files? (Such as those from 1001 Arabidopsis genomes)
Day 4:
- Finalizing fast scanning and reading of VCF files. 1M loci in just a few seconds. Cleaning up code.
- Simplified data structure in adegenet. May break some code that depended on earlier versions. Need help for testing. If you find problems, please file issue on Github.
- Added function for genetic distances in
hierfstat
. Also discovered numeric type bug that is being fixed now.hierfstat
is on Github now.
New R packages:
-
apex: Extension of the R package
ape
for multiple genes
Updates to R packages: