You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There has been some interest in using genotypes called by polyRAD for population genetics analysis. Two pieces of software that have long supported polyploid genotypes are Structure and SPAGeDi. Although they were originally designed for relatively small sets of PCR markers, they both scale up to large sequence-based datasets. They each use a custom format, and it would be nice if polyRAD exported to those formats.
I may eventually get to writing these export functions. If you would like to see them (or others), feel free to comment here! If you have some R experience and would like to write the functions yourself, see below. First-time contributors are welcome!
Contributing
You will need to make your own fork of polyRAD, clone that fork to your computer, commit your changes, push those commits to GitHub, and open a pull request. See the Github documentation if you need help understanding those tasks.
R code
Name the function Export_Structure or Export_SPAGeDi, as appropriate. Add it to the file R/data_export.R.
The first argument for the function should be a RADdata object named object. There should also be an argument called file indicating the path to write to. Another possible argument might indicate a particular ploidy to use for export; see other export functions, and check the documentation for Structure and SPAGeDi to see if variable ploidy is allowed.
The function will need to call GetProbableGenotypes with omit1allelePerLocus = FALSE and multiallelic = "correct". This will get you a list, the first item of which is a matrix with individuals and rows and alleles in columns, showing the copy number of that allele in that individual. This will then need to be converted to a different format, where each allele is assigned an integer, and the individual has a number of alleles up to its ploidy. For example:
Allele_AG
Allele_AA
Allele_CA
Ind 1
0
3
1
Ind 2
2
2
0
Gets changed to
Allele 1
Allele 2
Allele 3
Allele 4
Ind 1
2
2
2
3
Ind 2
1
1
2
2
Your function will make use of object$alleles2loc in order to group columns of the matrix output by GetProbableGenotypes into loci. A loop in R to process loci might look like:
geno<- GetProbableGenotypes(object, omit1allelePerLocus=FALSE)[[1]]
for(Lin seq_len(nLoci(object)){
theseal<- which(object$alleles2loc==L)
thesegeno<-geno[, theseal]
# do some processing here
}
See the MakeGTstrings internal function in src/PrepVCFexport.cpp. For SPAGeDi in particular this function could simply have an argument added to it to start numbering from 1 rather than 0. For Structure, MakeGTstrings should probably not be used directly but can provide some inspiration for what needs to be done.
See the Structure or SPAGeDi documentation for details on file format needed. Your function should export a text file in that format.
Test the function on exampleRAD after running IterateHWE. Install Structure or SPAGeDi on your computer and see if it can import the file.
Documentation and integration
Update man/ExportGAPIT.Rd to document your function.
Add the function name to the export call in NAMESPACE
Update the "Formats supported" section of README.md
Update the "Summary of available functions" section of vignettes/polyRADtutorial.Rmd.
Run vignettes/render_vignettes.R , with vignettes as the working directory, to recompile the vignette.
Add yourself as a contributor to the DESCRIPTION file! Use the "ctb" role code in Authors@R and Author.
Run R CMD build and R CMD check to make sure the package builds and passes checks.
The text was updated successfully, but these errors were encountered:
Another format which might be appreciated by plant breeders is HapMap. To the best of my knowledge, there isn't software that can import a HapMap file with polyploid genotypes, but the format has the advantage of being easily comprehensible to the human eye. A tetraploid SNP genotype could for example be represented as AAAG. The Hap2SNP function should be very helpful for converting polyRAD's haplotype-based alleles back to SNPs. See also the above note about GetProbableGenotypes and MakeGTstrings.
Overview
There has been some interest in using genotypes called by polyRAD for population genetics analysis. Two pieces of software that have long supported polyploid genotypes are Structure and SPAGeDi. Although they were originally designed for relatively small sets of PCR markers, they both scale up to large sequence-based datasets. They each use a custom format, and it would be nice if polyRAD exported to those formats.
I may eventually get to writing these export functions. If you would like to see them (or others), feel free to comment here! If you have some R experience and would like to write the functions yourself, see below. First-time contributors are welcome!
Contributing
You will need to make your own fork of polyRAD, clone that fork to your computer, commit your changes, push those commits to GitHub, and open a pull request. See the Github documentation if you need help understanding those tasks.
R code
Name the function
Export_Structure
orExport_SPAGeDi
, as appropriate. Add it to the fileR/data_export.R
.The first argument for the function should be a
RADdata
object namedobject
. There should also be an argument calledfile
indicating the path to write to. Another possible argument might indicate a particular ploidy to use for export; see other export functions, and check the documentation for Structure and SPAGeDi to see if variable ploidy is allowed.The function will need to call
GetProbableGenotypes
withomit1allelePerLocus = FALSE
andmultiallelic = "correct"
. This will get you a list, the first item of which is a matrix with individuals and rows and alleles in columns, showing the copy number of that allele in that individual. This will then need to be converted to a different format, where each allele is assigned an integer, and the individual has a number of alleles up to its ploidy. For example:Gets changed to
Your function will make use of
object$alleles2loc
in order to group columns of the matrix output byGetProbableGenotypes
into loci. A loop in R to process loci might look like:See the
MakeGTstrings
internal function insrc/PrepVCFexport.cpp
. For SPAGeDi in particular this function could simply have an argument added to it to start numbering from 1 rather than 0. For Structure,MakeGTstrings
should probably not be used directly but can provide some inspiration for what needs to be done.See the Structure or SPAGeDi documentation for details on file format needed. Your function should export a text file in that format.
Test the function on
exampleRAD
after runningIterateHWE
. Install Structure or SPAGeDi on your computer and see if it can import the file.Documentation and integration
man/ExportGAPIT.Rd
to document your function.export
call inNAMESPACE
README.md
vignettes/polyRADtutorial.Rmd
.vignettes/render_vignettes.R
, withvignettes
as the working directory, to recompile the vignette.DESCRIPTION
file! Use the "ctb" role code inAuthors@R
andAuthor
.R CMD build
andR CMD check
to make sure the package builds and passes checks.The text was updated successfully, but these errors were encountered: