Skip to content

Efficient types and methods for K-mer biosequences

License

Notifications You must be signed in to change notification settings

BioJulia/Kmers.jl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

50 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Kmers

Latest Release MIT license Documentation

Description

Kmers.jl provide the Kmer <: BioSequence type which implement the concept of a k-mer, a biological sequence of exactly length k.

K-mers are used frequently in bioinformatics because, when k is small and known at compile time, these sequences can be efficiently represented as integers and stored directly in CPU registers, allowing for much more efficient computation than arbitrary-length sequences.

In Kmers.jl, the Kmer type is psrameterized by its length, and its data is stored in an NTuple. This makes Kmers bitstypes and highly efficient.

Conceptually, one may use the following analogy:

  • BioSequence is like AbstractVector
  • LongSequence is like Vector
  • Kmer is like SVector from StaticArrays

Kmers.jl is tightly coupled to the BioSequences.jl package, and relies on its internals. Hence, you should expect strict compat bounds on BioSequences.jl.

Usage

⚠️ WARNING ⚠️

Kmers are parameterized by their length. That means any operation on Kmers that change their length, such as push, pop, slicing, or masking (logical indexing) will be type unstable and hence slow and memory inefficient, unless you write your code in such as way that the compiler can use constant folding.

Further, as Kmers are immutable and their operations are aggressively inlined and unrolled, they become inefficent as they get longer. For example, reverse-complementing a 32-mer takes 26 ns, compared to 102 ns for the equivalent LongSequence. However, for 512-mers, the LongSequence takes 126 ns, and the Kmer 16 μs!

Kmers.jl is intended for high-performance computing. If you do not need the extra performance that register-stored sequences provide, you might consider using LongSequence from BioSequences.jl instead

Installation

You can install BioSequences from the julia REPL. Press ] to enter pkg mode, and enter the following:

pkg> add Kmers

If you are interested in the cutting edge of development, please check out the master branch to try new features before release.

Contributing

We appreciate contributions from users including reporting bugs, fixing issues, improving performance and adding new features.

Take a look at the contributing files detailed contributor and maintainer guidelines, and code of conduct.

Questions?

If you have a question about contributing or using BioJulia software, come on over and chat to us on the Julia Slack workspace, or you can try the Bio category of the Julia discourse site.

About

Efficient types and methods for K-mer biosequences

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages