This website uses cookies in order to improve our services. If you proceed visiting this website you accept the usage of cookies. For more info please read our Data Privacy statement.

 

NGS Sequence Assembly & Binning Software Tools

Complex Analysis Workflow | Quality Control & Data Trimming | Assembling & Binning | Alignment & Mapping | Data Analysis | Variant Analysis & Variant CallingData Bases & Data Base Search | Visualisation


 Assembling | Binning  


 Assembling

Sequence assembly is the alignment and merging of short sequence fragments in order to construct the original sequence. The short sequence fragments, also called reads, are generated by DNA shotgun sequencing with different technologies. The read length can vary between 50 and 100.000 nucleotides depending on the used technology.

Different factors can influence assembling efficiency including read length, read quality, coverage rate and reference sequences. De novo assembling does not uses a reference sequence and can be used for sequencing of unknown genomes.

This website introduces different DNA sequence assembly software tools for assembling of paired-end reads, short reads, long reads, whole genomes, small genomes, metagenome analysis, de-novo assembly as well as binning software.


AByss/ Trans_ Abyss

AByss/ Trans_ Abyss is a de novo, parallel, paired-end sequence assembler that is designed for short reads. The single-processor version is useful for assembling genomes up to 100 Mbases in size. The parallel version is implemented using MPI and is capable of assembling larger genomes. Trans-AByss is for assembling transcriptome data.
Source: Canadas Michael Smith Science Centre

Website

AHA (A Hybrid Assembler)

AHA (A Hybrid Assembler) uses PacBio's exceptionally long reads to improve existing assemblies and fill in gaps.

Website

Allora

Allora (A Long Read Assembler) is a PacBio de novo assembly algorithm optimized for bacterial and BAC assembly using ultra long reads.
Source: PacBio

Website

ALLPATHS-LG

Website

AMOS

The AMOS consortium is committed to the development of open-source whole genome assembly software. The project acronym (AMOS) represents our primary goal -- to produce AModular, Open-Source whole genome assembler. Open-source so that everyone is welcome to contribute and help build outstanding assembly tools, and modular in nature so that new contributions can be easily inserted into an existing assembly pipeline. This modular design will foster the development of new assembly algorithms and allow the AMOS project to continually grow and improve in hopes of eventually becoming a widely accepted and deployed assembly infrastructure. Source: John Hopkins University

Website

Anchored Assembly

Anchored Assembly is a bioinformatics method that can be used on regular, short read NGS data to detect longer indels (changes of 30-50bp) and structural variants (changes > 50bp). Using a combination of mapping and assembly of reads, we can accurately detect changes often not detected by other methods.
Source: Spiralgenetics

Website

AssembleViral454

AV454 AssembleViral454 is a new assembler, based on the ARACHNE package, designed for small and non-repetitive genomes sequenced at high depth.
Source: Broadinstitute

Website

BioEdit

BioEdit is a biological sequence alignment editor written for Windows 95/98/NT/2000/XP/7. An intuitive multiple document interface with convenient features makes alignment and manipulation of sequences relatively easy on your desktop computer.
Source: Mbio

Website

CABOG (Celera Assembler with Best Overlap Graph)

CABOG (Celera Assembler with Best Overlap Graph) is scientific software for DNA research. CABOG has been a critical component of many genome sequencing projects. CABOG operates on small genomes such as bacterial as well as large genomes such as mammalian. CABOG is an extension of the Celera Assembler software that was originally developed at Celera for the 2001 publication of the first draft human genome sequence. The software was released to the public domain in 2004. Its open source repository on Source Forge is an internet resource for scientists around the world.
Source: J. Graig Venture Institute

Website

Celera Assembler

Celera Assembler is a de novo whole-genome shotgun (WGS) DNA sequence assembler. It reconstructs long sequences of genomic DNA from fragmentary data produced by whole-genome shotgun sequencing.
Source: Sourceforge

Website

DNAbaser

DNA Baser Sequence Assembler is revolutionary bioinformatics software for automatic DNA sequence assembly , DNA sequence analysis, contig editing, file format conversion and mutation detection.

Source: DNA Baser

Website

EULER-SR

The EULER-SR assembly package contains a suite of programs for correcting errors in short reads and assembling them. Our assembler may take as input classical Sanger reads, 454 sequences, and Illumina reads.
Source: University of California

Website

fermi

A WGS de novo assembler based on the FMD-index for large genomes

Website

FLASH (Fast Length Adjustment of SHort reads)

FLASH (Fast Length Adjustment of SHort reads) is a very fast and accurate software tool to merge paired-end reads from next-generation sequencing experiments. FLASH is designed to merge pairs of reads when the original DNA fragments are shorter than twice the length of reads. The resulting longer reads can significantly improve genome assemblies.
Source: John Hopkins University 

Website

GapFiller

GapFiller is a seed-and-extend local assembler to fill the gap within paired reads.
It can be used for both DNA and RNA and it has been tested on Illumina data.
Source: Sourceforge

Website

Geneious

Geneious assembler! No more struggling with confusing online browsers- sequence assembly and reference mapping follow the same steps.

Website

HGAP (hierarchical genome-assembly process) assembler5

The Hierarchical Genome Assembly Process (HGAP) for long single pass reads generated by the PacBio® Single Molecule Real Time (SMRT) sequencer was developed to allow the complete and accurate shotgun assembly of bacterial sized genomes. The process itself relies on a succession of steps to generate de novo assemblies of a genome.
Source: GitHub

Website

IMAGE

IMAGE stands for Iterative Mapping and Assembly for Gap Elimination. It is a software designed to close gaps in any draft assembly using Illumina paired end reads.
Source: Sourceforge

Website

MaSuRCA

MaSuRCA is whole genome assembly software. It combines the efficiency of the de Bruijn graph and Overlap-Layout-Consensus (OLC) approaches. MaSuRCA can assemble data sets containing only short reads from Illumina sequencing or a mixture of short reads and long reads (Sanger, 454).
Source: University of Maryland

Website

meta-IBDA

Meta-IDBA is an iterative De Bruijn Graph De Novo short read assembler specially designed for de novo metagenomic assembly. One of the most difficult problem in metagenomic assembly is that similar subspecies of the same species mix together to make the de Bruijn graph very complicated and intractable. Meta-IDBA handles this problem grouping similar regions of similar subspecies by partitioning the graph into components based on the topological structure of the graph. Each component represents a similar region between subspecies from the same species or even from different species. After the components are separated, all contigs in it are aligned to produced a consensus and also the multiple alignment.
Source: University of Hong Kong

Website

MetaVelvet

MetaVelvet : An extension of Velvet assembler to de novo metagenome assembly from short sequence reads
We modified and extended a single-genome and de Bruijn-graph based assembler, Velvet, for de novometagenome assembly. Our fundamental ideas are first decomposing de Bruijn graph constructed from mixed short reads into individual sub-graphs and second building scaffolds based on every decomposed de Bruijn sub-graph as isolate species genome.
Source: MetaVelvet

Website

MIRA

MIRA - Sequence assembler and sequence mapping for whole genome shotgun and EST / RNASeq sequencing data. Can use Sanger, 454, Illumina and IonTorrent data. PacBio: CCS and error corrected data usable, uncorrected not yet. The mira genome fragment assembler is a specialised assembler for sequencing projects classified as 'hard' due to high number of similar repeats. For EST transcripts, miraEST is specialised on reconstructing pristine mRNA transcripts while detecting and classifying single nucleotide polymorphisms (SNP) occuring in different variations thereof.
Source: Sourceforge

Website

Newbler

Newbler, the GS Data Analysis Software package includes the tools to investigate complex genomic variation in samples including de novo assembly, reference guided alignment and variant calling, and low abundance variant identification and quantification. The suite of software is provided with the GS Junior and GS FLX System at no additional cost and allows researchers to begin interpreting sequence data immediately, without the need to invest in complex and expensive third party solution. Each of the software tools incorporates flow and signal information into the sequence analysis algorithms leading to higher confidence variant calling. Additionally, researchers can interrogate sequence data down to the flow-by-flow signal intensities used in base calling.
This software is preferential available for Roche NGS technology users and included in the software package.
Source: Roche

Website

QUAST

QUAST Is a tool for evaluation the quality of genome assemblies by computing various metrics and provides reports.

Website

REAPR

REAPR is a tool that evaluates the accuracy of a genome assembly using mapped paired end reads, without the use of a reference genome for comparison. It can be used in any stage of an assembly pipeline to automatically break incorrect scaffolds and flag other errors in an assembly for manual inspection. It reports mis-assemblies and other warnings, and produces a new broken assembly based on the error calls.
Source: Wellcome Trust Sanger Institute

Website

SGA

SGA is a de novo genome assembler based on the concept of string graphs. The major goal of SGA is to be very memory efficient, which is achieved by using a compressed representation of DNA sequence reads.

Website

Spades

Spades 3.5.0 for assembly

Website

SOAPdenovo

SOAPdenovo is a novel short-read assembly method that can build a de novo draft assembly for human-sized genomes. The program is specially designed to assemble Illumina short reads.
Source: Sourceforge

Website

Seqman Ngen

Seqman Ngen The keystone of Lasergene Genomics Suite, SeqMan NGen, is groundbreaking sequence assembly software that has the ability to assemble any size genome quickly and accurately on a desktop computer.
Source: DNASTAR

Website

StringTie

StringTie is a fast and highly efficient assembler of RNA-Seq alignments into potential transcripts. It uses a novel network flow algorithm as well as an optional de novo assembly step to assemble and quantitate full-length transcripts representing multiple splice variants for each gene locus.
Source: John Hopkin University

Website

Velvet Sequence

Velvet Sequence is an assembler for very short reads.

Website

VICUNA

VICUNA is a de novo assembly program targeting populations with high mutation rates.

Website

V-FAT

V-FAT is a tool to perform automated computational finishing and annotation of de novo viral assemblies. V-FAT uses reference and read data to order and merge contigs, correct frameshifts, and produce NCBI-ready annotation files. It also performs a set of quality assurance measurements including coverage computation by gene or amplicon and identification of potential consensus errors.
Source: Broadinstitute

Website


Binning

MetaCluster

MetaCluster5.0 is an unsupervised binning method that can samples with low-abundance species, or samples (even with high-abundance) with many extremely-low-abundance species.
Source: University of Hong Kong

Website

PhymmBL

Phymm, a new classification approach for metagenomics data which uses interpolated Markov models (IMMs) to taxonomically classify DNA sequences, can accurately classify reads as short as 100 bp. Its accuracy for short reads represents a significant leap forward over previous composition-based classification methods. PhymmBL (rhymes with "thimble"), the hybrid classifier included in this distribution which combines analysis from both Phymm and BLAST, produces even higher accuracy.
Source: Johns Hopkins University

Website

S-GSOM

The S-GSOM is a semi-supervised seeding method for binning metagenomics sequences that does not depend on knowledge of completed genomes. Instead, it extracts the flanking sequences of highly conserved 16S rRNA from the metagenome and uses them as seeds (labels) to assign other reads based on their compositional similarity.
Source: The University of Melbourne

Website

SOrt-ITEMS

SOrt-ITEMS is a similarity based binning method. User needs to perform a similarity search of the input metagenomic sequences (reads) against the nr database using BLAST. The generated blastx output (corresponding to the metagenomic reads) is then taken as the input by SOrt-ITEMS program.
Source: TATA consultancy services

Website

TACOA

TACOA is a software that can accurately predict the taxonomic origin of genomic fragments from metagenomic data sets by combining the advantages of the k -NN approach with a smoothing kernel function.
Source: Universität Bielefeld

Website

TETRA

The TETRA software operates solely on tetranucleotides. Based on a Markov model, it evaluates the levels of over- and underrepresentation for each of the 256 possible tetranucleotides of a submitted DNA sequence considering its base compositon. These data are then normalized via a z-transformation and their correlation coefficients are calculated.
Source: megx

Website

 

Pin It