Complex Analysis Workflow | Quality Control & Data Trimming | Assembling & Binning | Alignment & Mapping | Data Analysis | Variant Analysis & Variant Calling | Data Bases & Data Base Search | Visualisation

Genome Annotation  | RNA Analysis  | Recombination Analysis  | Analysis (not specified)

Genome Annotation


FragGeneScan is an application for finding (fragmented) genes in short reads. It can also be applied to predict prokaryotic genes in incomplete assemblies or complete genomes.
Source: Sourceforge



Genome Annotation Transfer Utility (GATU) annotates a genome based on a very closely related reference genome. The proteins/mature peptides of the reference genome are BLASTed against the genome to be annotated in order to find the genes/mature peptides in the genome to be annotated.
Source: Viral bioinformatics Resource Center



GeneMark is a family of gene prediction programs developed at Georgia Institute of Technology , Atlanta, Georgia, USA.
Source: GeneMark



Glimmer is a system for finding genes in microbial DNA, especially the genomes of bacteria, archaea, and viruses. Glimmer (Gene Locator and Interpolated Markov ModelER) uses interpolated Markov models (IMMs) to identify the coding regions and distinguish them from noncoding DNA.
Source: Johns Hopkins University



HMMER is used for searching sequence databases for homologs of protein sequences, and for making protein sequence alignments. It implements methods using probabilistic models called profile hidden Markov models (profile HMMs).
Source: HMMER



MAKER is a portable and easily configurable genome annotation pipeline. Its purpose is to allow smaller eukaryotic and prokaryotic genome projects to independently annotate their genomes and to create genome databases. MAKER identifies repeats, aligns ESTs and proteins to a genome, produces ab-initio gene predictions and automatically synthesizes these data into gene annotations having evidence-based quality values. MAKER is also easily trainable: outputs of preliminary runs can be used to automatically retrain its gene prediction algorithm, producing higher quality gene-models on seusequent runs.
Source: Yandell lab



PASA, acronym for Program to Assemble Spliced Alignments, is a eukaryotic genome annotation tool that exploits spliced alignments of expressed transcript sequences to automatically model gene structures, and to maintain gene structure annotation consistent with the most recently available experimental sequence data. PASA also identifies and classifies all splicing variations supported by the transcript alignments.
Source: GitHub



RATT is software to transfer annotation from a reference (annotated) genome to an unannotated query genome.
Source: Sourceforge



System for Automated Bacterial Integrated Annotation - SABIA ( being a very well-known bird in Brazil) is a new tool developed for the assembly and annotation of prokaryote genomes (Bioinformatics. 2004 Nov 1;20(16):2832-3). It performs automatic tasks of assembly analysis, ORFs identification/analysis, and extragenic regions analysis. The system integrates several public domains and newly developed software programs capable of dealing with several types of databases and is portable to different operational systems.
Source: Sabia


RNA analysis


Annocript is a pipeline for the annotation of de-novo generated transcriptomes. It executes BLAST analysis with UniProt, NCBI Conserved Domain Database and Nucleotide divisions, Gene Ontology, UniPathways and the Enzyme Commission. It gives information about the longest ORF (using DNA2PEP) and non-coding potential of the sequences.
Source: Github

CIRI CircRNA Identifier

CIRI CircRNA Identifier is a de novo circular RNA identification tool.


CPSS A computational platform for the analysis of small RNA deep sequencing data.



CoRAL RNA-CODE , Classification of RNAs by Analysis of Length
CoRAL is a machine learning package that can predict the precursor class of small RNAs present in a high-throughput RNA-sequencing dataset. In addition to classification, it also produces information about the features that are most important for discriminating different populations of small non-coding RNAs.
Source: WangLab



DARIO is a free web server for the analysis of short RNAs from high throughput sequencing data.


lncRScan SVM

The lncRScan SVM package is used to classify protein coding and long non-coding RNA (lncRNA) transcripts using support vector machine (SVM).



iSeeRNA RNA-Sequencing combined with ab initio assembly promise quantity discovery of novel transcripts including protein coding transcripts (mRNAs) as well as non-coding RNAs.



LncRNA2Function, which enables researchers to browse the lncRNAs associated with a specific functional term, the functional terms associated with a specific lncRNA, or to assign functional terms to a set of human lncRNA genes such as a cluster of co-expressed lncRNAs.
Source: Harbin Institute of Technology


Mix2 RNA-Seq Data Analysis Software

Mix2 RNA-Seq Data Analysis Software is a software tool for the accurate estimation of RNA concentration from RNA-Seq data.



miRanda is an algorithm for the detection of potential microRNA target sites in genomic sequences.miRanda reads RNA sequences (such as microRNAs) from file1 and genomic DNA/RNA sequences fromfile2. Both of these files should be in FASTA format.
Source: cbio



miRDeep* is an integrated application tool for miRNA identification from RNA sequencing data.



miRSpring is a miRNA sequence profiling (miRspring) document
A revolutionary new way of sharing and analysing sequencing data for small RNA.



Scripture is a method for transcriptome reconstruction that relies solely on RNA-Seq reads and an assembled genome to build a transcriptome ab initio. The statistical methods to estimate read coverage significance are also applicable to other sequencing data. Scripture also has modules for ChIP-Seq peak calling.
Source: The Broadinstitute



The MAP-RSeq workflow integrates a suite of open source bioinformatics tools along with in-house developed methods to analyze paired-end RNA-Seq data. Read alignment is performed with Tophat which uses Bowtie – a fast, memory efficient, short sequence aligner. Tophat aligns reads to the transcriptome and further to the genome to report both existing and novel junctions. Along with the alignment (BAM) and junction (BED) files, Tophat also provides a list of expressed fusion transcripts using the TopHat-Fusion algorithm. The BAM file is processed using HTSeq to summarize expression at gene level. Exon quantification is obtained with in-house methods that leverage BEDTools. In addition to raw gene and exon expression counts, MAP-RSeq also provides normalized values (RPKM). For accurate variant detection, GATK is used to call SNVs that are further annotated with quality score, coverage and additional criteria using VQSR.
Source: Mayo Clinic



ncPRO-seq (Non-Coding RNA PROfiling in sRNA-seq) is a tool for annotation and profiling of ncRNAs using deep-sequencing data developed by the Bioinformatics Laboratory of the institut Curie. This comprehensive and flexible ncRNA analysis pipeline, aims in interrogating and performing detailed analysis on small RNAs derived from annotated non-coding regions in miRBase, Rfam and repeatMasker, and regions defined by users. The ncPRO-seq pipeline also has a module to identify regions significantly enriched with short reads that cannot be classified as known ncRNA families.
Source: Institute Curie



TopHat is a fast splice junction mapper for RNA-Seq reads. It aligns RNA-Seq reads to mammalian-sized genomes using the ultra high-throughput short read aligner Bowtie, and then analyzes the mapping results to identify splice junctions between exons.
Source: John Hopkin University


Recombination Analysis


Bellerophon is a program for detecting chimeric sequences in a multiple sequence dataset by comparative analysis. Bellerophon was specifically developed to detect 16S rRNA gene chimeras in PCR-clone libraries but can be applied to other gene datasets.



DualBrothers is a recombination detection software based on the dual Multiple Change-Point (MCP) model. This model allows for changes in topology and evolutionary rates across sites in a multiple sequence alignment. We use Bayesian approach together with an MCMC sampling to simulate from the posterior distribution of the dual MCP model parameters. Please, see details of model specification and sampling algorithm in References below.
Source: DualBrothers



DnaSP, DNA Sequence Polymorphism, is a software package for the analysis of nucleotide polymorphism from aligned DNA sequence data. DnaSP can estimate several measures of DNA sequence variation within and between populations (in noncoding, synonymous or nonsynonymous sites, or in various sorts of codon positions), as well as linkage disequilibrium, recombination, gene flow and gene conversion parameters.
Source: Universitat de Barcelona


GENECONV: Statistical Tests for Detecting Gene Conversion - Version. Given an alignment of DNA or protein sequences, GENECONV finds the most likely candidates for aligned gene conversion events between pairs of sequences in the alignment. The program can also look for gene conversion events from outside of the alignment. Candidate events are ranked by multiple-comparison corrected P-values and listed to a spreadsheet-like output file.
Source: Washington University 


Lamarc is a program for doing Likelihood Analysis with Metropolis Algorithm usingRandom Coalescence. Lamarc estimates effective population sizes, population exponential growth rates, a recombination rate, past migration rates for one to npopulations assuming a migration matrix model with asymmetric migration rates and different subpopulation sizes, and optionally divergence form ancestral populations (the population tree must be known).
Source: Washington University

NCBI Genotyping

NCBI Genotyping, this tool helps identify the genotype of a viral sequence. A window is slid along the query sequence and each window is compared by BLAST to each of the reference sequences for a particular virus. This approach is especially useful for the analysis of recombinant sequences.
Source: NCBI


4SIS: Four-sequence Informative Sites Analysis
Two types of informative sites were distinguished, corresponding to the clustering of the putative recombinant with either of the parental representatives. The optimal breakpoint was located by maximizing a chi-square value. Statistical significance was assessed by performing a certain times of permutations and for each permutation, maximizing the chi-sqaure value. The P value reflects the proportion of permutated informative sites, with chi-square values being equal to or greater than the observed value.
Source: University of Manchaster


Analysis (not specified)


AlleleSeq enables to to study expression and DNA binding differences between pairs of sequence alleles on the maternally- and paternally-derived chromosomes within an individual, phenomena known as allele-specific expression (ASE) and allele specific binding (ASB).
Source: GersteinLab



AMADEA is the first data transformation tool based on Data Morphing.
Source: ISOFT



AUGUSTUS is a program that predicts genes in eukaryotic genomic sequences. It can be run on this web server, on a new web serverfor larger input files or be downloaded and run locally. It is open source so you can compile it for your computing platform. You can now run AUGUSTUS on the German MediGRID. This enables you to submit larger sequence files and allows to use protein homology information in the prediction. The MediGRID requires an instant easy registration by email for first-time users.
Source: University of Greifswald


CEAS (Cis-regulatory Element Annotation System)

CEAS Presents a tool designed to characterize genome-wide protein-DNA interaction patterns from ChIP-chip and ChIP-Seq of both sharp and broad binding factors. As a stand-alone extension of our web application CEAS (Cis-regulatory Element Annotation System), it provides statistics on ChIP enrichment at important genome features such as specific chromosome, promoters, gene bodies, or exons, and infers genes most likely to be regulated by a binding factor. CEAS also enables biologists to visualize the average ChIP enrichment signals over specific genomic features, allowing continuous and broad ChIP enrichment to be perceived which might be too subtle to detect from ChIP peaks alone.
Source: Harvard University



diffReps enables the differential analysis for ChIP-seq with biological replicates.


Enlis Genome Software

The Enlis Genome software platform is a framework for analyzing genomic sequencing data. It brings unparalleled clarity and significant ease of use to the study of genomic data. The platform is being adapted for many different types of genomic analysis, including cancer genomics, clinical genetic testing, scientific research, and personal genomic exploration.
Source: Enlis



eXpress is a streaming tool for quantifying the abundances of a set of target sequences from sampled subsequences. Example applications include transcript-level RNA-Seq quantification, allele-specific/haplotype expression analysis (from RNA-Seq), transcription factor binding quantification in ChIP-Seq, and analysis of metagenomic data. It is based on an online-EM algorithm [1] that results in space (memory) requirements proportional to the total size of the target sequences and time requirements that are proportional to the number of sampled fragments.
Source: Berkeley University of California



FastTree infers approximately-maximum-likelihood phylogenetic trees from alignments of nucleotide and protein sequences. FastTree handles alignments with up to one million of sequences within reasonable amount of time and memory. For large alignments, FastTree is 100-1,000 times faster than PhyML 3.0 or RAxML 7. FastTree is an open source software.
Source: microbesonline



GARLI performs heuristic phylogenetic searches under the General Time Reversible (GTR) model of nucleotide substitution and its submodels, with or without gamma distributed rate heterogeneity and a proportion of invariant sites.
Source: University of Texas



GeneSpring NGS software includes data analysis workflows for Methyl-Seq, RNA-Seq, DNA-Seq, Chip-Seq and Small RNA-Seq data. GeneSpring NGS provides SureSelect customers with an easy to use QC, visualization and reporting tool for Methyl-Seq, RNA-Seq and DNA-Seq applications. GeneSpring NGS also allows the detection of transcriptomic changes like splice variants and gene fusions, or identify structural variation from whole genome or target enriched samples. Small RNA-Seq allows for the measurement of expression levels of known small RNA genes, mature miRNA and the detection of novel genes.
Source: Agilent



GISTIC is a methods with enhanced power and specificity to identify genes targeted by somatic copy-number alterations (SCNAs) that drive cancer growth. By separating SCNA profiles into underlying arm-level and focal alterations, it improves the estimation of background rates for each category.
Source: Broadinstitute



Hotspot is a program for identifying regions of local enrichment of short-read sequence tags mapped to the genome using a binomial distribution model. Regions flagged by the algorithm are called "hotspots." The algorithm utilizes a local background model that automatically normalizes for large regions of elevated tag levels due to, for example, copy number effects. Hotpsot is otherwise able to detect regions of enrichment of highly-variable size, making it applicable to both broad and highly-punctate signals.
Source: uwencode


LifeScope™ Genomic Analysis

LifeScope™ Genomic Analysis Software leverages years of customer feedback and development for analysis tools for SOLiD™ system data, to enable faster translation of next-generation data to biologically meaningful results. Designed to match the accuracy of the next generation 5500 Genetic Analyzers with Exact Call Chemistry (ECC), LifeScope™ streamlines your data analysis.
Source: LifeTechnologies



To address the lack of powerful ChIP-Seq analysis method, we present a novel algorithm, named Model-based Analysis of ChIP-Seq (MACS), for identifying transcript factor binding sites.
Source: Python


MetaGenomic ANalyser (MEGAN5)

MetaGenomic ANalyser (MEGAN5) is a tool for taxonomical classification of reads.



METAVIR is a web server designed to annotate viral metagenomic sequences (raw reads or assembled contigs). A set of published viromes, identified as "public projects", is already available, and your own data sets can be processed in a private environment.



Mothur is currently the most cited bioinformatics tool for analyzing 16S rRNA gene sequences. Step inside the wiki and user forum and learn how you can use mothur to process data generated by Sanger, PacBio, IonTorrent, 454, and Illumina (MiSeq/HiSeq). If you would like to contribute code to the project feel free to download the source code and make your own improvements. Alternatively, if you have an idea or a need, but lack the programming expertise, let us know and we'll add it to the queue of features we would like to add. Our current goal is to release a new iteration of the project every couple of months.
Source: Mothur



MuTect is a method developed at the Broad Institute for the reliable and accurate identification of somatic point mutations in next generation sequencing data of cancer genomes.
Source: Broadinstitute


Nexus Copy Number™

Nexus Copy Number™ software offers simple yet powerful tools for copy number (CNV) and sequence variation analysis and visualization from aCGH, SNP array as well as next-gen sequencing (NGS) data.
Source: biodiscovery



NGSengine® Your platform-independent NGS HLA Typing software. In one single method you determine reference libraries, align reads, phase the sequence data and perform typing.
NGSengine is the perfect match for our NGSgo® reagents, delivering complete allele separation and full gene coverage. NGSengine gives you genotyping results with minimal editing, no additional efforts for new sequences and >95% unambiguous results.
Source: Gendx


Omnomics NGS

Omnomics NGS (Euformatics) is a software platform for clinical analysis of patient NGS data. One place that provides all the relevant genomic and mutation information for clinicians and molecular genetics laboratories based on patient data and external information sources.
Source: Euformatics



PathSeq is a computational tool for the identification and analysis of microbial sequences in high-throughput human sequencing data that is designed to work with large numbers of sequencing reads in a scalable manner. This process is composed of a subtractive phase in which input reads are subtracted by alignment to human reference sequences, and an analytic phase in which the remaining reads are aligned to microbial reference sequences (viral, fungal, bacterial, archaeal) and de novo assembled.
Source: Broadinstitute



PICRUSt (pronounced “pie crust”) is a bioinformatics software package designed to predict metagenome functional content from marker gene (e.g., 16S rRNA) surveys and full genomes.
Source: github



RAxML is a standard tool for Maximum-likelihood based phylogenetic inference.



Recco analyzes alignments of sequences that evolved subject to recombination and mutation. The analysis provides evidence as to whether a dataset contains recombination, which sequence is a recombinant and where the recombination breakpoints are. The analysis is based on explaining one sequence with all other sequences in the alignment using mutation and recombination.
Source: Max Planck Insitute



RepeatMasker is a program that screens DNA sequences for interspersed repeats and low complexity DNA sequences. The output of the program is a detailed annotation of the repeats that are present in the query sequence as well as a modified version of the query sequence in which all the annotated repeats have been masked
Source: Institute for Systems Biology



SICER is a clustering approach for identification of enriched domains from histone modification ChIP-Seq data.
Source: George Washington University


SMRT Portal

SMRT Portal is a browser-based application to perform secondary analysis of sequencing data generated by the PacBio RS II instrument.
Source: PacBio



SomaticSniper The purpose of this program is to identify single nucleotide positions that are different between tumor and normal (or, in theory, any two bam files).


Genome Browser

The Genome Browser zooms and scrolls over chromosomes, showing the work of annotators worldwide. The Gene Sorter shows expression, homology and other information on groups of genes that can be related in many ways. Blat quickly maps your sequence to the genome. The Table Browser provides convenient access to the underlying database. VisiGene lets you browse through a large collection of in situ mouse and frog images to examine expression patterns. Genome Graphs allows you to upload and display genome-wide data sets.
Source: University of California



The TransDecoder identifies candidate coding regions within transcript sequences, such as those generated by de novo RNA-Seq transcript assembly using Trinity, or constructed based on RNA-Seq alignments to the genome using Tophat and Cufflinks.
Source: Github



'vegan' is a CRAN package for the analysis of ecological communities. It has tools for analysing ecological diversity, and for the multivariate analysis of communities (NMDS, pCCA, pRDA etc.)
Source: R-Forge



VIROME is a web-application designed for scientific exploration of metagenome sequence data collected from viral assemblages occurring within a number of different environmental contexts.
Source: Virome


ViroScore® Suite

The ViroScore® Suite is a dedicated genotype interpretation, sequence database and analysis tool. ViroScore® allows interpretation and sub-typing of HIV and other viruses using dedicated genetic sequence databases and built-in analysis and reporting tools. ViroScore® has proved an efficient system to categorize strands of HIV using genetic sequencing and link this genetic subtyping with drug resistance and treatment outcomes documented in patients.
Source: Advanced Biological Laboratories


