Skip to content

Miscellaneous scripts for quick genome assembly analysis

Notifications You must be signed in to change notification settings

markhilt/genome_analysis_tools

Repository files navigation

genome_analysis_tools

This repo contains scripts that I frequenctly use in genome assembly analysis.

find_telomeres.py and n50.py are used for diagnosis of assemblies.

asm_cov_break.py and convert_mol_cov.py are used to parse output from tigmint molecule to break assemblies in regions where 10X barcode coverage drops.

overlap_scaffoling.py is used to build assembly scaffolds based on linkage information. Two files are needed: a fasta file with assembled sequences to build scaffolds from, and a tab delimited text file that contains the linkage information. This file should look like, e.g.:

contig1f  contig34r
contig2r  contig12f contig4f
contig5r

Each line represents a new scaffold. The contig names need to match the input fasta headers, except the final character that needs to be f or r for forward or reverse, to indicate the orientation that the contig should have in the scaffold. If a line contains a single contig name, only this contig will be put into the scaffold, but will be reverse complemented if the final character is r. overlap_scaffoling.py is overlap-aware, meaning that it will try to find overlaps between the ends of contigs before building the scaffold.

merge_transcripts.py: Use to merge sequences in a multiple sequence alignment in fasta format. Sequences are merged when they have the same header, up until "@". The header should contain e.g. a taxon name followed by @ and a sequence name. Script is useful when there are several representative sequences from a single taxon, e.g. because the gene is fragmented, and you want to have a single sequence per taxon for phylogenomics. Accepts both amino acid and nucleotide sequences. When there are mismatches between the sequences to be merged, a residue is arbitrarily picked, and the script lets out a warning.

About

Miscellaneous scripts for quick genome assembly analysis

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages