This website uses cookies in order to improve our services. If you proceed visiting this website you accept the usage of cookies. For more info please read our Data Privacy statement.


General Genetic Databases

General Genetics Databases | Animal Genetics | Cancer Genetics | Human Genetics | Microbial Genetics | Plant & Fungi Genetics | RNA Databases

This website presents a number of general genetic databases. These databases contain sequence data from different classes of organism including animals, plants, bacteria, archaea, vertebrates, mammals, and eukaryote. This website also introduces phylogenetic databases of protein encoding regions, gene-protein interactions, and genomic structural variants. Furthermore, you will find databases of eukaryotic pathogens and of genetic information related to drug development and targeting.

This list of databases was established in order to help finding the best reference database for your sequencing project. Specific databases are listed in the corresponding categories.


AgBase is a curated, open-source, Web-accessible resource for functional analysis of agricultural plant and animal gene products. Our long-term goal is to serve the needs of the agricultural research communities by facilitating post-genome biology for agriculture researchers and for those researchers primarily using agricultural species as biomedical models.

Source: Mississippi State University

Barcode of Life Data Systems

The Barcode of Life Data Systems is designed to support the generation and application of DNA barcode data. The platform consists of four main modules: a data portal, a database of barcode clusters, an educational portal, and a data collection workbench.

Source: BOLD Systems

Bgee: Gene Expression Evolution


COG data bases

Phylogenetic classification of proteins encoded in complete genomes
Source: NCBI


CTD - Comparative Toxicogenomics Database

CTD is a robust, publicly available database that aims to advance understanding about how environmental exposures affect human health. It provides manually curated information about chemical–gene/protein interactions, chemical–disease and gene–disease relationships. These data are integrated with functional and pathway data to aid in development of hypotheses about the mechanisms underlying environmentally influenced diseases.

We also have additional ongoing projects involving manual curation of exposome data and chemical–phenotype relationships to help identify pre–disease biomarkers resulting from environmental exposures.

Source: MDI Biological Laboratory & NC State University

Database of Genomic Variants archive (DGVa)

The Database of Genomic Variants archive (DGVa) is a repository that provides archiving, accessioning and distribution of publicly available genomic structural variants, in all species.

Source: EMBL-EBI

dbNSP Single Nucleotide Polymorphism Database

dbSNP the Single Nucleotide Polymorphism Database[1] (dbSNP) is a free public archive for genetic variation within and across different species developed and hosted by the National Center for Biotechnology Information (NCBI) in collaboration with the National Human Genome Research Institute (NHGRI).



dbVar is NCBI's database of genomic structural variation – it contains insertions, deletions, duplications, inversions, multinucleotide substitutions, mobile element insertions, translocations, and complex chromosomal rearrangements

Source: NCBI


Search small variations in dbSNP or large structural variations in dbVar.

Source: NCBI

DDBJ - DNA Data Bank of Japan

The principal purpose of DDBJ operations is to improve the quality of INSD, as public domains. When researchers make their data open to the public through INSD and commonly shared in world wide, we at DDBJ Center make efforts to describe information on the data as rich as possible, according to the unified rules of INSD, preferably without any stress by using DDBJ.

Nucleotide sequence records organismic evolution more directly than other biological materials and thus is invaluable not only for research in life sciences but also human welfare in general. The database is, so to speak, a common treasure of human beings. With this in mind, we make the database online accessible to anyone in the world.

Source: DDBJ


Drug2Gene reports relations between genes/proteins and drugs/compounds including bioactivity data where available. The data has been collected from 22 public databases and integrated to provide a 'one-stop shop' for identifying tool compounds for genes or finding all known targets of a drug.

Source: Metalife AG


The Ensembl genome annotation system, developed jointly by the EBI and the Wellcome Trust Sanger Institute, has been used for the annotation, analysis and display of vertebrate genomes since 2000.
Since 2009, the Ensembl site has been complemented by the creation of five new sites, for bacteria, protists, fungi, plants and invertebrate metazoa, enabling users to use a single collection of (interactive and programatic) interfaces for accessing and comparing genome-scale data from species of scientific interest from across the taxonomy.
Source: EMBL-EBI



euGenes provides a common summary of gene and genomic information from eukaryotic organism databases.


Exome Aggregation Consortium (ExAC)

The Exome Aggregation Consortium (ExAC) is a coalition of investigators seeking to aggregate and harmonize exome sequencing data from a wide variety of large-scale sequencing projects, and to make summary data available for the wider scientific community.

The data set provided on this website spans 60,706 unrelated individuals sequenced as part of various disease-specific and population genetic studies.

Source: Broad Institute

EuPathDB (formerly ApiDB)

EuPathDB (formerly ApiDB) is an integrated database covering the eukaryotic pathogens in the genera listed in our Data Summary page.
While these organisms are supported by a taxon-specific database/website built upon the same infrastructure, the EuPathDB portal offers an entry point to all of these resources, and the opportunity to leverage orthology for searches across genera.

The EuPathDB databases are funded by NIAID and they have strict guidelines as to which organisms we can support. EuPathDB is one of four NIH funded bioinformatics resource centers. Mainly we are to incorporate any eukaryotic pathogen that is deemed emerging or re-emerging (see list). We also have some organisms that are not on the list, those were either added in the past on the behest of the community and additional funding (such as TrichDB), organisms that are related to emerging or re-emerging pathogen (such as Plasmodium or Neospora) and the trypanosomatids (funded as a pilot project by the Bill and Melinda Gates foundation).

Source: EuPathDB


GenBank ® is the NIH genetic sequence database, an annotated collection of all publicly available DNA sequences (Nucleic Acids Research, 2013 Jan;41(D1):D36-42). GenBank is part of the International Nucleotide Sequence Database Collaboration , which comprises the DNA DataBank of Japan (DDBJ), the European Molecular Biology Laboratory (EMBL), and GenBank at NCBI. These three organizations exchange data on a daily basis.

Source: NCBI


The GeneDB project is a core part of the Sanger Institute's Pathogen Genomics activities.
GeneDB currently provides access to more than 40 genomes, at various stages of completion, from early access to partial genomes with automatic annotation through to complete genomes with extensive manual curation.

Source: Sanger Institute


Upload genomics data to GenePool, or connect GenePool to a remote data store. Our interactive data uploader takes you step-by-step through getting your data into GenePool – whether it's coming from your local machine, Amazon S3 bucket,Google Cloud Storage, Illumina BaseSpace account, DNAnexus account, or Seven Bridges Genomics account. GenePool supports data generated from a wide variety of genomics assays in common data formats. GenePool makes it simple for you to attach your patient & sample metadata to imported genomics data.
Source: StationX


Genome - NCBI

This resource organizes information on genomes including sequences, maps, chromosomes, assemblies, and annotations.

Source: NCBI

GOLD Genomes

GOLD:Genomes Online Database, is a World Wide Web resource for comprehensive access to information regarding genome and metagenome sequencing projects, and their associated metadata, around the world.
Source: Joint Genome Institute



Database of Complete Genome Homologous Genes Families
HOGENOM is a database of homologous genes from fully sequenced organisms (bacteria, archeaea and eukarya) , structured under ACNUC sequence database management system.
It allows to select sets of homologous genes among species, and to visualize multiple alignments and phylogenetic trees.
It is as well possible to search for orthologous genes in a wide range of taxons.
Thus HOGENOM is particularly useful for comparative sequence analysis, phylogeny and molecular evolution studies. More generaly, HOGENOM gives an overall view of what is known about a peculiar gene family. Note that HOGENOM is splitted into two databases: HOGENOM contains the protein sequences while HOGENOMDNA contains the nucleotide sequences. Protein sequences of HOGENOM have been generated by translating the CDS of HOGENOMDNA and using associated cross-references to generate the annotations.



This database will enable the reliable identification of an individual IRG or IRG signatures from high-throughput data sets (i.e. microarray, proteomic data etc.). It will also assist in identifying regulatory elements, chromosomal location and tissue expression of IRGs in human and mouse. This upgraded version, Interferome v2.0 has quantitative data, more detailed annotation and search capabilities and can be queried for one gene or thousands as in a gene list from a microarray experiment.

Source: Monash University

JGI Genome Portal

JGI Genome portal is a database that provides genomic information of Bacteria, Archaea and Eukaryota.


KEGG: Kyoto Encyclopedia of Genes and Genomes

KEGG is a database resource for understanding high-level functions and utilities of the biological system, such as the cell, the organism and the ecosystem, from molecular-level information, especially large-scale molecular datasets generated by genome sequencing and other high-throughput experimental technologies.

Source: Kanehisa Laboratories


A Gene Annotation & Analysis Resource



OCIMUM offers different databases and bioinformatics analysis tools



The EnsEMBL database was used to decide on a set of 1-to-1 orthologous markers from those mammalian genomes available. Exons of reasonable length for further amplification from genomic DNA and sequencing in additional species were then selected. For phylogenomic purposes, CoDing Sequences (CDSs) were also collected. The phylogenetic utility and the evolutionary characteristics of these candidate markers were then evaluated using a homemade bioinformatics pipeline. The resulting OrthoMaM database can be interrogated through this website.


Pfam database

The Pfam database is a large collection of protein families, each represented by multiple sequence alignments and hidden Markov models (HMMs).
Source: EMBL-EBI



PhenomicDB is a multi-organism phenotype-genotype database including human, mouse, fruit fly, C.elegans, and other model organisms.
The inclusion of gene indices (NCBI Gene) and orthologues (same gene in different organisms) from HomoloGene allows to compare phenotypes of a given gene over many organisms simultaneously.



PHI-base is a web-accessible database that catalogues experimentally verified pathogenicity, virulence and effector genes from fungal, Oomycete and bacterial pathogens, which infect animal, plant, fungal and insect hosts. PHI-base is therefore an invaluable resource in the discovery of genes in medically and agronomically important pathogens, which may be potential targets for chemical intervention. In collaboration with the FRAC team, PHI-base also includes antifungal compounds and their target genes.

Source: Phibase


Repbase Update (RU) is a database of prototypic sequences representing repetitive DNA from different eukaryotic species. RU is being used in genome sequencing projects worldwide as a reference collection for masking and annotation of repetitive DNA (e.g. by RepeatMasker or CENSOR).

Source: Genetic Information Research Institute


The mission of SOURCE is to provide a unique scientific resource that pools publicly available data commonly sought after for any clone, GenBank accession number, or gene. SOURCE is specifically designed to facilitate the analysis of large sets of data that biologists can now produce using genome-scale experimental approaches.

Source: Princeton University

The Ensembl project

The Ensembl project produces genome databases for vertebrates and other eukaryotic species, and makes this information freely available online.


The Eukaryotic Promoter Database

The Eukaryotic Promoter Database is an annotated non-redundant collection of eukaryotic POL II promoters, for which the transcription start site has been determined experimentally. Access to promoter sequences is provided by pointers to positions in nucleotide sequence entries. The annotation part of an entry includes description of the initiation site mapping data, cross-references to other databases, and bibliographic references. EPD is structured in a way that facilitates dynamic extraction of biologically meaningful promoter subsets for comparative sequence analysis. This database contains 4806 promoters from several species.

Source: EPD

The Gene Ontology Project

The Gene Ontology (GO) project is a major bioinformatics initiative to develop a computational representation of our evolving knowledge of how genes encode biological functions at the molecular, cellular and tissue system levels. Biological systems are so complex that we need to rely on computers to represent this knowledge. The project has developed formal ontologies that represent over 40,000 biological concepts, and are constantly being revised to reflect new discoveries. To date, these concepts have been used to "annotate" gene functions based on experiments reported in over 100,000 peer-reviewed scientific papers.

The Gene Ontology project provides a controlled vocabulary of terms for describing gene product characteristics and gene product annotation data from GO Consortium members, as well as tools to access and process these data. Read more about the Gene Ontology.

Source: Gene Ontology Consortium

The RIKEN integrated database of mammals

The integrated database of mammals in RIKEN has been developed to secure the sustainability, utility and publicity of data produced from multiple large-scale programs using mammals in RIKEN, such as, FANTOM, ENU mutagenesis program and RIKEN Cerebellar Development Transcriptome Database (CDT-DB) with the fundamental data incubation system, SciNeS. In this database, imported data is connected with public information such as genes and ontologies using the semantic-web technology. Data and metadata are available in multiple data formats as RDF, OWL and so on for the reuse in other databases.

Source: RIKEN

The TDR Targets Database

The TDR Targets project seeks to exploit the availability of diverse datasets to facilitate the identification and prioritization of drugs and drug targets in neglected disease pathogens. This database functions both as a website where researchers can look for information on targets of interest, and as a tool for prioritization of targets in whole genomes. Using the TDRtargets database as a tool, researchers can quickly prioritize genes of interest by running simple queries (such as looking for small enzymes, or proteins with high quality structural models), assigning numerical weights to each query (in the history page), and combining these results to produce a ranked list of candidate targets. The name of the database includes the initialism 'TDR' for Tropical Disease Research, a special programme within the World Health Organization.

Source: The TDR Drug Targets Network

UCSC Genome Browser website

Welcome to the UCSC Genome Browser website. This site contains the reference sequence and working draft assemblies for a large collection of genomes. It also provides portals to ENCODE data at UCSC (2003 to 2012) and to the Neandertal project. Download or purchase the Genome Browser source code, or the Genome Browser in a Box (GBiB) at our online store.

We encourage you to explore these sequences with our tools. The Genome Browser zooms and scrolls over chromosomes, showing the work of annotators worldwide. The Gene Sorter shows expression, homology and other information on groups of genes that can be related in many ways. Blat quickly maps your sequence to the genome. The Table Browser provides convenient access to the underlying database. VisiGene lets you browse through a large collection of in situ mouse and frog images to examine expression patterns. Genome Graphs allows you to upload and display genome-wide data sets.

Source: UCSC Genome Bioinformatics

Pin It