This website uses cookies in order to improve our services. If you proceed visiting this website you accept the usage of cookies. For more info please read our Data Privacy statement.

 

NGS Software Tools - Quality Control & Data Trimming


Complex Analysis Workflow | Quality Control & Data Trimming | Assembling & Binning | Alignment & Mapping | Data Analysis | Variant Analysis & Variant Calling | Data Bases & Data Base Search | Visualisation


Read Quality Control & Data Trimming

AmpliconNoise

AmpliconNoise removes noise from 454 reads.
Source: Qiime

Best Match Tagger

Best Match Tagger is for removing human reads from metagenomics datasets.

Website

CD-HIT-OTU

CD-HIT-OTU is a trimming tool for identifying true Operational Taxonomic Units (OTUs) and produces much fewer spurious OTUs.
Source: Weizhongli-Lab

Website

Cutadapt

Cutadapt removes adapter sequences from high-throughput sequencing data. This is usually necessary when the read length of the sequencing machine is longer than the molecule that is sequenced, for example when sequencing microRNAs.
Source: Code Google

Website

DESeq

Estimate variance-mean dependence in count data from high-throughput sequencing assays and test for differential expression based on a model using the negative binomial distribution.
Source: Bioconductor

Website

ea-utils

Command-line tools for processing biological sequencing data. Barcode demultiplexing, adapter trimming, etc.
Primarily written to support an Illumina based pipeline - but should work with any FASTQs.

Website

FastQC

FastQC provides a set of quality control checks on raw sequence data coming from high throughput sequencing pipelines.
Source: Bioinformatics babraham

Website

FastQValidator

The fastQValidator validates the format of fastq files.

Website

Lucy

Lucy is a program for DNA sequence quality trimming and vector removal. Its purpose is to process DNA sequence data acquired from DNA sequencers to prepare the data for downstream processing applications such as genome assembly.
Source: Sourceforge

Website

Nanocorrect

Experimental pipeline for correcting nanopore reads
source: GitHub

Website

Nanopolish

Signal-level algorithms for MinION data
Source: Github

Website

NGSqc Tool Kit

NGSqc Tool Kit is a toolkit for the quality control (QC) of next generation sequencing (NGS) data.

Website

NGS Read Trimmer

NGS Read Trimmer is a Java-based tool for removal of adaptor sequences from targeted high throughput sequencing data generated by sequencing SureSelect qxt libraries on an Illumina platform. It utilizes paired-end information by looking at the 5’ overlap of both mates at the same time searching for an expected adaptor motif via a semi-global alignment. Complete removal of these adapters decreases the rate of false positive variant calls improving accuracy.
Source: Agilent

WebsiteWebsite

Picard

A set of Java command line tools for manipulating high-throughput sequencing data (HTS) data and formats.
Source: Broad Institute

Website

PRINSEQ

PRINSEQ can be used to filter, reformat, or trim your genomic and metagenomic sequence data. It generates summary statistics of your sequences in graphical and tabular format. It is easily configurable and provides a user-friendly interface.
Source: Sourceforge

Website

Quiver

Quiver is an algorithm for calling highly accurate consensus from multiple PacBio reads, using a pair-HMM exploiting both the basecalls and QV metrics to infer the true underlying DNA sequence.
Source: github

Website

Qualimap 2

Qualimap 2 is a platform-independent application written in Java and R that provides both a Graphical User Inteface (GUI) and a command-line interface to facilitate the quality control of alignment sequencing data and its derivatives like feature counts.
Source: Qualimap

Website

RNA-SeQC

RNA-SeQC is a java program which computes a series of quality control metrics for RNA-seq data. The input can be one or more BAM files. The output consists of HTML reports and tab delimited files of metrics data. This program can be valuable for comparing sequencing quality across different samples or experiments to evaluate different experimental parameters. It can also be run on individual samples as a means of quality control before continuing with downstream analysis.
Source: Broadinstitute

Website

RC454 (ReadClean454)

RC454 (ReadClean454) is a program that takes a set of 454 read and quality files as well as a consensus assembly for those reads and corrects for known 454 error modes such as homopolymer indels and carry forward/incomplete extension (CAFIE). It will also correct for any indel that breaks the reading frame, unless it occurs in more than 25% of the reads. Since the algorithm is aggressive in correcting for errors, it is important to align the reads to their own assembly rather than to an external reference to prevent misalignments as much as possible. RC454 uses Mosaik to align the corrected reads between each step, and as such it is required to run the script.
Source: Broadinstitute

Website

SAMTools

SAMTools provide various utilities for manipulating alignments in the SAM format, including sorting, merging, indexing and generating alignments in a per-position format.
Source: Sourceforge

Website

Scythe

Scythe uses a Naive Bayesian approach to classify contaminant substrings in sequence reads. It considers quality information, which can make it robust in picking out 3'-end adapters, which often include poor quality bases.
Source: Github

Website

Segminator II

Segminator II, the seamless mapping and aligning of read data to a reference sequence, the association of multiple datasets to a single reference sequence, the per site coverage, entropy and base frequencies, along with the ability to query sites based on the latter, codon frequencies and consensus sequence generation, phylogenetic inference, read visualization and, a bayesian framework for platform error correction.
Source: Bioinformatics Manchaster

Website

Sickle

Sickle is a windowed adaptive trimming tool for FASTQ files.

Website

Sequencing Analysis Viewer (SAV)

The Sequencing Analysis Viewer (SAV) allows you to monitor important quality metrics generated by the Real-Time Analysis (RTA) software on the Illumina sequencing systems in real time.
Source: Illumina

Website

SolexaQA

SolexaQA calculates sequence quality statistics and creates visual representations of data quality for second-generation sequencing data. Originally developed for the Illumina system (historically known as “Solexa”), SolexaQA now also supports Ion Torrent and 454 data. Running directly on FASTQ files (now with support for compressed files too).
Source: Sourceforge

Website

FASTX-Toolkit

The FASTX-Toolkit is a collection of command line tools for Short-Reads FASTA/FASTQ files preprocessing.
Source: Hannonlab

Website

Trimmomatic

Trimmomatic is a flexible read trimming tool for Illumina NGS data.

Website

Pin It