Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Export to additional population genetics formats #10

Open
lvclark opened this issue Jul 20, 2020 · 3 comments
Open

Export to additional population genetics formats #10

lvclark opened this issue Jul 20, 2020 · 3 comments
Labels
enhancement good first issue Feel free to make a pull request to address this issue! hacktoberfest

Comments

@lvclark
Copy link
Owner

lvclark commented Jul 20, 2020

Overview

There has been some interest in using genotypes called by polyRAD for population genetics analysis. Two pieces of software that have long supported polyploid genotypes are Structure and SPAGeDi. Although they were originally designed for relatively small sets of PCR markers, they both scale up to large sequence-based datasets. They each use a custom format, and it would be nice if polyRAD exported to those formats.

I may eventually get to writing these export functions. If you would like to see them (or others), feel free to comment here! If you have some R experience and would like to write the functions yourself, see below. First-time contributors are welcome!

Contributing

You will need to make your own fork of polyRAD, clone that fork to your computer, commit your changes, push those commits to GitHub, and open a pull request. See the Github documentation if you need help understanding those tasks.

R code

Name the function Export_Structure or Export_SPAGeDi, as appropriate. Add it to the file R/data_export.R.

The first argument for the function should be a RADdata object named object. There should also be an argument called file indicating the path to write to. Another possible argument might indicate a particular ploidy to use for export; see other export functions, and check the documentation for Structure and SPAGeDi to see if variable ploidy is allowed.

The function will need to call GetProbableGenotypes with omit1allelePerLocus = FALSE and multiallelic = "correct". This will get you a list, the first item of which is a matrix with individuals and rows and alleles in columns, showing the copy number of that allele in that individual. This will then need to be converted to a different format, where each allele is assigned an integer, and the individual has a number of alleles up to its ploidy. For example:

Allele_AG Allele_AA Allele_CA
Ind 1 0 3 1
Ind 2 2 2 0

Gets changed to

Allele 1 Allele 2 Allele 3 Allele 4
Ind 1 2 2 2 3
Ind 2 1 1 2 2

Your function will make use of object$alleles2loc in order to group columns of the matrix output by GetProbableGenotypes into loci. A loop in R to process loci might look like:

geno <- GetProbableGenotypes(object, omit1allelePerLocus = FALSE)[[1]]

for(L in seq_len(nLoci(object)){
  theseal <- which(object$alleles2loc == L)
  thesegeno <- geno[, theseal]
  # do some processing here
}

See the MakeGTstrings internal function in src/PrepVCFexport.cpp. For SPAGeDi in particular this function could simply have an argument added to it to start numbering from 1 rather than 0. For Structure, MakeGTstrings should probably not be used directly but can provide some inspiration for what needs to be done.

See the Structure or SPAGeDi documentation for details on file format needed. Your function should export a text file in that format.

Test the function on exampleRAD after running IterateHWE. Install Structure or SPAGeDi on your computer and see if it can import the file.

Documentation and integration

  • Update man/ExportGAPIT.Rd to document your function.
  • Add the function name to the export call in NAMESPACE
  • Update the "Formats supported" section of README.md
  • Update the "Summary of available functions" section of vignettes/polyRADtutorial.Rmd.
  • Run vignettes/render_vignettes.R , with vignettes as the working directory, to recompile the vignette.
  • Add yourself as a contributor to the DESCRIPTION file! Use the "ctb" role code in Authors@R and Author.
  • Run R CMD build and R CMD check to make sure the package builds and passes checks.
@lvclark lvclark added enhancement good first issue Feel free to make a pull request to address this issue! labels Jul 20, 2020
@lvclark
Copy link
Owner Author

lvclark commented Aug 8, 2020

I have written an Export_Structure function since a user was asking for it, but someone can still write a function to export to SPAGeDi if interested.

@lvclark
Copy link
Owner Author

lvclark commented Aug 31, 2020

Another format which might be appreciated by plant breeders is HapMap. To the best of my knowledge, there isn't software that can import a HapMap file with polyploid genotypes, but the format has the advantage of being easily comprehensible to the human eye. A tetraploid SNP genotype could for example be represented as AAAG. The Hap2SNP function should be very helpful for converting polyRAD's haplotype-based alleles back to SNPs. See also the above note about GetProbableGenotypes and MakeGTstrings.

@lvclark
Copy link
Owner Author

lvclark commented Mar 15, 2021

There are also requests for export to GenoDive and to adegenet::genind objects.

@lvclark lvclark changed the title Export to Structure and/or SPAGeDi formats Export to additional population genetics formats Mar 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement good first issue Feel free to make a pull request to address this issue! hacktoberfest
Projects
None yet
Development

No branches or pull requests

1 participant