This website uses cookies in order to improve our services. If you proceed visiting this website you accept the usage of cookies. For more info please read our Data Privacy statement.

 

NGS Software Tools - Data Bases & Data Base Search


Complex Analysis Workflow | Quality Control & Data Trimming | Assembling & Binning | Alignment & Mapping | Data Analysis | Variant Analysis & Variant Calling | Data Bases & Data Base Search | Visualisation


Webresources 

Apollo

Apollo is designed to support geographically dispersed researchers, and the work of a distributed community is coordinated through automatic synchronization: all edits in one client are instantly pushed to all other clients, allowing users to see annotation updates from collaborators in real-time during the editing process.
Source: Apollo

Website

BioProject

BioProject is a collection of biological data related to a single initiative, originating from a single organization or from a consortium. A BioProject record provides users a single place to find links to the diverse data types generated for that project.
Source: NCBI

Website


Cancer Genetics Data Bases

Catalogue of somatic mutations in cancer (COSMIC)

All cancers arise as a result of the acquisition of a series of fixed DNA sequence abnormalities, mutations, many of which ultimately confer a growth advantage upon the cells in which they have occurred. There is a vast amount of information available in the published scientific literature about these changes. COSMIC is designed to store and display somatic mutation information and related details and contains information relating to human cancers.

There are two types of data in COSMIC: Expert manual curation data and systematic screen data. It is useful to understand the differences of these data types and use them appropriately.

Source: COSMIC

FORCE Genetic Mutation Database

FORCE has created a Genetic Mutation Database to allow users to search for a particular mutation and connect with others who have the same mutation.

Use the Mutation Search form to search by mutation, or ethnicity. Use the Submit Mutation form to enter your own information.

Source: FORCE

The Cancer Genome Atlas (TCGA)

The Cancer Genome Atlas (TCGA) Data Portal provides a platform for researchers to search, download, and analyze data sets generated by TCGA. It contains clinical information, genomic characterization data, and high level sequence analysis of the tumor genomes.

Source: TCGA

The Familial Cancer Database (FaCD)

The goal of FaCD is to assist clinicians and genetic counselors in making a genetic differential diagnosis in cancer patients, as well as in becoming aware of the tumor spectrum associated with hereditary disorders that have already been diagnosed in their patients. FaCD is not an expert system, but a tool for experts. It is not a substitute for consulting an expert on the clinical genetics of cancer.

Source: FaCD


Data Bases and Data Base Search

16S Biodiversity Tool

The 16S Biodiversity Tool enables the visualization of 16S rRNA amplicons from environmental samples using the RDP database.

Website

Cafe Variome

Cafe Variome should not be thought as a database but instead as a "shop window" for what exists in various data sources. It is designed to enable users to ask the question "where can certain data be found?", seeking to access those data.
Source: CafeVariome

Website

COG data bases

Phylogenetic classification of proteins encoded in complete genomes
Source: NCBI

Website

Conexio Assign software

The Conexio Assign software assists with the assignment of a human leukocyte antigen (HLA) type. The software is designed to analyze data from libraries prepared with the Illumina TruSight HLA Sequencing Panel for DNA and then sequenced on the MiSeq System. Using Assign, you can import sequence data, perform base calling, edit sequences, and compare a consensus sequence with a library of sequences of HLA alleles.
Source: Illumina

Website

dbNSP Single Nucleotide Polymorphism Database

dbSNP the Single Nucleotide Polymorphism Database[1] (dbSNP) is a free public archive for genetic variation within and across different species developed and hosted by the National Center for Biotechnology Information (NCBI) in collaboration with the National Human Genome Research Institute (NHGRI).

Website

WuXi NextCODE’s unique GOR informatics system

The Exchange is powered by WuXi NextCODE’s unique GOR informatics system. Optimized on whole-genome data from 350,000 people, it standardizes, manages, and queries massive sequence data with unrivalled computational efficiency. You can instantly visualize aligned raw sequence and collaborate and share data – in full compliance with your rules and consents, without transferring big files, straight from your browser.
Source: nextcode

Website

EnsemblGenomes

The Ensembl genome annotation system, developed jointly by the EBI and the Wellcome Trust Sanger Institute, has been used for the annotation, analysis and display of vertebrate genomes since 2000.
Since 2009, the Ensembl site has been complemented by the creation of five new sites, for bacteria, protists, fungi, plants and invertebrate metazoa, enabling users to use a single collection of (interactive and programatic) interfaces for accessing and comparing genome-scale data from species of scientific interest from across the taxonomy.
Source: EMBL-EBI

Website

Fungene

Fungene is a functional gene pipeline and repository.

Website

GeneInsight Suite®

GeneInsight Suite® is an IT platform developed at Partners HealthCare, streamlines interpretation and management of vast amounts of data, offering a key step towards the promise of personalized medicine and better patient care.
Source: GeneInsight

Website

Geneticist Assistant

The Geneticist Assistant NGS Interpretative Workbench is a unique tool for the management, control, visualization, functional interpretation and historical knowledge base of next generation sequencing Whole Exome data or targeted at specific genes for the purpose of identifying potentially pathogenic variants associated with specific conditions such as hereditary colon cancer and others.
Source: Softgenetics

Website

Genomic Ordered Relations (GOR) Architecture and Database

Genomic Ordered Relations (GOR) Architecture and Database
We use a unique information infrastructure called the Genomic Ordered Relational architecture, or GOR architecture, which was initially developed at deCODE Genetics (a subsidiary of Amgen) to meet the demands of datasets that are orders of magnitude larger than any found elsewhere. This platform simplifies comparisons of variation data from hundreds of thousands of patients with similar signs and symptoms, enabling clients to dynamically retrieve, edit, and annotate their sequencing data on-the-fly without the substantial time lags commonly associated with big data storage and retrieval. The unique data structure of this infrastructure enables fast, efficient performance by our Clinical Sequence Analyzer (CSA).
Source: NextCode

Website

GenePool

Upload genomics data to GenePool, or connect GenePool to a remote data store. Our interactive data uploader takes you step-by-step through getting your data into GenePool – whether it's coming from your local machine, Amazon S3 bucket,Google Cloud Storage, Illumina BaseSpace account, DNAnexus account, or Seven Bridges Genomics account. GenePool supports data generated from a wide variety of genomics assays in common data formats. GenePool makes it simple for you to attach your patient & sample metadata to imported genomics data.
Source: StationX

Website

Genesearch NGS

Genesearch NGS integrates in a user friendly software package a highly specific and selective proprietary algorithm for heterozygous detection, tools to understand small insertions and deletions, a simple mutation/variant database as well as connections to various public and private bioinformatics tools (BLAT, Splice Predictor, Alamut,...).
Source: Phenosystems

Website

GenomeStudio

GenomeStudio is a modular software platform that allows you to view and analyze data for a wide range of microarray and sequencing applications.
Source: Illumina

Website

GOLD Genomes

GOLD:Genomes Online Database, is a World Wide Web resource for comprehensive access to information regarding genome and metagenome sequencing projects, and their associated metadata, around the world.
Source: Joint Genome Institute

Website

The greengenes web

The greengenes web application provides access to the 2011 version of the greengenes 16S rRNA gene sequence alignment for browsing, blasting, probing, and downloading. The data and tools presented by greengenes can assist the researcher in choosing phylogenetically specific probes, interpreting microarray results, and aligning/annotating novel sequences. If you are an ARB user, you can use greengenes to keep your own local database current.
Source: greengenes

Website

Integrated microbial genomes database

Website

InterPro

InterPro is a resource that provides functional analysis of protein sequences by classifying them into families and predicting the presence of domains and important sites. To classify proteins in this way, InterPro uses predictive models, known as signatures, provided by several different databases (referred to as member databases) that make up the InterPro consortium.
Source: EMBL-EBI

Website

KEGG PATHWAY

KEGG PATHWAY is a collection of manually drawn pathway maps representing our knowledge on the molecular interaction and reaction networks.
Source: Kanehisa Laboratories

Website

Kraken

Kraken is a system for assigning taxonomic labels to short DNA sequences, usually obtained through metagenomic studies. Kraken aims to achieve high sensitivity and high speed by utilizing exact alignments of k-mers and a novel classification algorithm.
Source: Johns Hopkins University

Website

MEME-ChIP

MEME-ChIP performs comprehensive motif analysis (including motif discovery) on LARGE (50MB maximum) sets of nucleotide sequences such as those identified by ChIP-seq or CLIP-seq experiments (sample output from sequences).
Source: EMBL Australia

Website

MetaPhlAn

MetaPhlAn is a computational tool for profiling the composition of microbial communities from metagenomic shotgun sequencing data. MetaPhlAn relies on unique clade-specific marker genes identified from 3,000 reference genomes.
Source: The Huttenhower Lab

Website

MG-RAST

MG-RAST (the Metagenomics RAST) server is an automated analysis platform for metagenomes providing quantitative insights into microbial populations based on sequence data. The server primarily provides upload, quality control, automated annotation and analysis for prokaryotic metagenomic shotgun samples. MG-RAST is Firefox optimized
Source: Metagenomics

Website

MiniLIMS

MiniLIMS tracks samples, libraries and sequencing results out-of-the-box for the common NGS platforms. Drop it on to a LAMP stack Linux box, walk through the installation wizard and you're ready to go.
Source: Bioteam

Website

miRBase

The miRBase database is a searchable database of published miRNA sequences and annotation. Each entry in the miRBase Sequence database represents a predicted hairpin portion of a miRNA transcript (termed mir in the database), with information on the location and sequence of the mature miRNA sequence (termed miR).
Source: University of Manchester

Website

WuXi NextCODE Clinical Sequence Analyzer (CSA)

WuXi NextCODE Clinical Sequence Analyzer (CSA)
Wuxi NextCODE provides clinicians with a proprietary clinical genome interpretation solution—the WuXi NextCODE Clinical Sequence Analyzer (CSA) system—to enable physicians to easily analyze patients’ genomic information to make a diagnosis or define their patients’ risks. Designed by physicians with clinically-intuitive and easy-to-use features, the CSA enables users to rapidly analyze sequencing data and interpret genomes, exomes or transcriptomes, leading to the discovery of de novo and rare mutations, and high-impact genetic variants. The user-friendly interface enables clinicians to manage patient data, generate provisional or final reports, and notate diagnosis all within the same system, supporting the entire clinical team.
Source: NextCODE

Website

NextBio

NextBio provides a state of the art scientific platform to aggregate and interpret large quantities of genomic data for research and clinical applications. Our products support the entire life cycle of genomic data in the private data center with an intuitive user interface designed especially for biologists and clinicians backed by a highly scalable, Big Data enterprise platform capable of analyzing petabytes of data in real-time. From instrument readouts to data interpretation for research and the clinic, NextBio products empower a paradigm shift in translational research and medicine.
Source: NextBio

Website

NxClinical

NxClinical is a revolutionary new solution for case review and reporting. It is the world’s most comprehensive tool for genetic data analysis and management for clinical labs, designed to improve both productivity and patient care.
Source: BioDiscovery

Website

OCIMUM

OCIMUM offers different databases and bioinformatics analysis tools

Website

OMIM

An Online Catalog of Human Genes and Genetic Disorders
Source:OMIM

Website

PathoScope

PathoScope takes next-generation sequencing reads from a mixture sample and predicts which genomes are present. We use a Bayesian framework combined with an initial reference-based alignment to assign reads to the correct genome of origin.
Source: Sourceforge

Website

Pfam database

The Pfam database is a large collection of protein families, each represented by multiple sequence alignments and hidden Markov models (HMMs).
Source: EMBL-EBI

Website

PhAnToMe database

PhAnToMe (Phage Annotation Tools and Methods) is a platform that we are currently developing for phage genome annotations. PhAnToMe will extend the SEED database to handle the nuances of both phages and prophages, establish a consistent nomenclature for phage genes, and develop a new tool for the identification of prophages. This new resource is expected to provide high quality annotations to over 1,000 existing phage and prophage genomes and dozens of existing phage metagenomes.
Source: PhAnToMe

Website

Phenomizer

Phenomizer is a freely available tool for clinical genetics.

Website

phyloseq

The phyloseq package is a tool to import, store, analyze, and graphically display complex phylogenetic sequencing data that has already been clustered into Operational Taxonomic Units (OTUs), especially when there is associated sample data, phylogenetic tree, and/or taxonomic assignment of the OTUs.
Source: Github

Website

PICRUST

PICRUST is a bioinformatics software package designed to predict metagenome functional content from marker gene (e.g., 16S rRNA) surveys and full genomes.
Source: GitHub

Website

piRNA

piRNABank is a web analysis system and resource, which provides comprehensive information on piRNAs in the three widely studied mammals namely Human, Mouse, Rat and one fruit fly, Drosophila.
Source: Institute of Bioinformatics and Applied Biotechnology (IBAB)

Website

PolyPhen

PolyPhen-2 (Polymorphism Phenotyping v2) is a tool which predicts possible impact of an amino acid substitution on the structure and function of a human protein using straightforward physical and comparative considerations. Please, use the form below to submit your query.
Source: PolyPhen

Website

RAST (Rapid Annotation using Subsystem Technology)

RAST (Rapid Annotation using Subsystem Technology) is a fully-automated service for annotating bacterial and archaeal genomes. It provides high quality genome annotations for these genomes across the whole phylogenetic tree.
Source: National Microbial Pathogen Data Resource 

Website

RDP (Ribosomal Database)

RDP (Ribosomal Database) provides quality-controlled, aligned and annotated Bacterial and Archaeal 16S rRNA sequences, and Fungal 28S rRNA sequences, and a suite of analysis tools to the scientific community.
Source: Michigan State University

Website

READSCAN

READSCAN is a highly scalable parallel program to identify non-host sequences (of potential pathogen origin) and estimate their genome relative abundance in high-throughput sequence datasets.
Source: King Abdullah University of Science and Technology

Website

Rfam

The Rfam database is a collection of RNA families, each represented by multiple sequence alignments, consensus secondary structures and covariance models (CMs).
Source: EMBL-EBI

Website

SIFT

SIFT predicts whether an amino acid substitution affects protein function. SIFT prediction is based on the degree of conservation of amino acid residues in sequence alignments derived from closely related sequences, collected through PSI-BLAST. SIFT can be applied to naturally occurring nonsynonymous polymorphisms or laboratory-induced missense mutations.
Source: J Craig Venture Institute

Website

SILVA

SILVA provides comprehensive, quality checked and regularly updated datasets of aligned small (16S/18S, SSU) and large subunit (23S/28S, LSU) ribosomal RNA (rRNA) sequences for all three domains of life (Bacteria, Archaea and Eukarya). SILVA are the official databases of the software package ARB.
Source: Silva

Website

SolveBio

SolveBio is the most powerful and flexible platform for clinical genomics professionals who need up-to-date reference data and tools for managing, curating, and reporting on genomic variation.
Source: NextBio

Website

SuperPhy

SuperPhy, a integrated platform for the predictive genomic analyses ofEscherichia coli.
Source: SuperPhy

Website

SURPI™

SURPI™ is a computational pipeline for pathogen identification from complex metagenomic next-generation sequencing (NGS) data generated from clinical samples.

Source: University of California

Website

The Animal Genome Size Database

The Animal Genome Size Database, Release 2.0, is a comprehensive catalogue of animal genome size data. Haploid DNA contents (C-values, in picograms) are currently available for 5635 species (3731 vertebrates and 1904 non-vertebrates) based on 7286 records from 683 published sources.
Source: Animal Genome Size Database

Website

The Ensembl project

The Ensembl project produces genome databases for vertebrates and other eukaryotic species, and makes this information freely available online.

Website

The Human Phenotype Ontology (HPO)

The Human Phenotype Ontology (HPO) aims to provide a standardized vocabulary of phenotypic abnormalities encountered in human disease. Each term in the HPO describes a phenotypic abnormality, such as atrial septal defect. The HPO is currently being developed using the medical literature, Orphanet, DECIPHER, and OMIM. HPO currently contains approximately 11,000 terms and over 115,000 annotations to hereditary diseases.
Source: Github

Website

The Plant DNA C-values Database

The Plant DNA C-values Database currently contains data for 8510 plant species. It combines data from the Angiosperm DNA C-values Database (release 8.0, Dec 2012), Gymnosperm DNA C-values Database (release 5.0, Dec. 2012), the Pteridophyte DNA C-values Database (release 5.0, Dec. 2012), the Bryophyte DNA C-values Database (release 3.0, Dec. 2010), together with Algae DNA C-values database (release 1.0, Dec. 2004).
Source: KEW Royal Botanic Gardens

Website

UNITE

UNITE is a rDNA sequence database designed to provide a stable and reliable platform for sequence-borne identification of ectomycorrhizal asco- and basidiomycetes. It has many of the characteristics of other sequence databases, but one of the things that sets UNITE apart from these is sequence reliability. We aim at including only high-quality sequences of well identified fungi, hence initially sacrifying quantity for quality.
Source: NordForsk

Website

VectorBase

VectorBase is a Bioinformatics Resource Center (BRC) focused on invertebrate vectors of human disease. VectorBase is one of four Bioinformatics Resource Centersfunded by NIAID to provide web-based resources to the scientific community conducting basic and applied research on organisms considered potential agents of biowarfare or bioterrorism or causing emerging or re-emerging diseases.
Source: VectorBase

Website

Pin It