This website uses cookies in order to improve our services. If you proceed visiting this website you accept the usage of cookies. For more info please read our Data Privacy statement.


Human Genetic Databases

General Genetics Databases Animal Genetics | Cancer Genetics | Human Genetics | Microbial Genetics | Plant & Fungi Genetics | RNA Databases


1000 Genomes Project

The 1000 Genomes Project ran between 2008 and 2015, creating the largest public catalogue of human variation and genotype data. As the project ended, the Data Coordination Centre at EMBL-EBI has received continued funding from the Wellcome Trust to maintain and expand the resource.

Source: EMBL-EBI


AlzGene is a collection of published Alzheimer's disease genetic association studies, with random-effects meta-analyses for polymorphisms with genotype data in at least three case-control samples.

The Healthy EXomes database is a repository of benign genetic variability. It contains whole-exome sequencing data from individuals over the age of 65 who were confirmed postmortem to be neuropathologically normal. Coming soon!

The mutations database provides a list of rare variants reported in the three genes known to cause autosomal-dominant familial Alzheimer’s disease — APP, presenilin-1 and presenilin-2 — as well as those reported in MAPT (tau). Classsifying variants as "pathogenic" or "not pathogenic" is an ongoing challenge in the field and these classifications must be interpreted with caution as they are subject to change as new data become available.



ClinVar aggregates information about genomic variation and its relationship to human health.

Source: NCBI


ConsensusPathDB-human integrates interaction networks in Homo sapiens including binary and complex protein-protein, genetic, metabolic, signaling, gene regulatory and drug-target interactions, as well as biochemical pathways. Data originate from currently 32 public resources for interactions (listed below) and interactions that we have curated from the literature. The interaction data are integrated in a complementary manner (avoiding redundancies), resulting in a seamless interaction network containing different types of interactions.

Source: Max Planck Institute

DASHR – Database of small human noncoding RNAs

The DASHR database provides the most comprehensive information to date on human small non-coding RNA (sncRNA) genes, precursor and mature sncRNA annotations, sequence, expression levels and RNA processing information across 42 normal tissues and cell types in human. The content of the database derives from integrating annotation data with curation, annotation, and computational analysis of 187 small-RNA (smRNA-seq) deep sequencing datasets with over 2.5 billion reads from over 30 independent studies. DASHR contains information on over 48,000 precursor and mature sncRNA annotations in the human genome, of which 82% are expressed in one or more of the curated tissues and cell types.

Source: Wang Lab

Database of Genomic Variants

The objective of the Database of Genomic Variants is to provide a comprehensive summary of structural variation in the human genome. We define structural variation as genomic alterations that involve segments of DNA that are larger than 50bp. The content of the database is only representing structural variation identified in healthy control samples.

The Database of Genomic Variants provides a useful catalog of control data for studies aiming to correlate genomic variation with phenotypic data. The database is continuously updated with new data from peer reviewed research studies. We always welcome suggestions and comments regarding the database from the research community.

Source: Centre for Applied Genomics


DisGeNET is a discovery platform integrating information on gene-disease associations (GDAs) from several public data sources and the literature (Piñero et al., 2015 ). The current version contains (DisGeNET v3.0) contains 429111 associations, between 17181 genes and 14619 diseases, disorders and clinical or abnormal human phenotypes. Given the large number of GDAs compiled in DisGeNET, we have also developed a score in order to rank the associations based on the supporting evidence. Importantly, useful tools have also been created to explore and analyze the data contained in DisGeNET. DisGeNET can be queried through Search and Browse functionalities available from this web interface, or by a plugin created for Cytoscape to query and analyze a network representation of the data. Moreover, DisGeNET data can be queried by downloading the SQLite database to your local repository. Furthermore, an RDF (Resource Description Framework) representation of DisGeNET database is also available.

Source: Integrative Biomedical Informatics Group

ENCODE: Encyclopedia of DNA Elements

The ENCODE (Encyclopedia of DNA Elements) Consortium is an international collaboration of research groups funded by the National Human Genome Research Institute (NHGRI). The goal of ENCODE is to build a comprehensive parts list of functional elements in the human genome, including elements that act at the protein and RNA levels, and regulatory elements that control cells and circumstances in which a gene is active.

Source: Stanford University


EpimiRBase was established in 2015 in order to provide complete and up-to-date information on all publications relating to microRNA and epilepsy. The fully-searchable database includes information on up- and down-regulated microRNAs in the brain and blood, as well as functional studies, and covers both experimental models and human epilepsy. We hope you find this a useful resource for your research.


Exome Variant Server

The goal of the NHLBI GO Exome Sequencing Project (ESP) is to discover novel genes and mechanisms contributing to heart, lung and blood disorders by pioneering the application of next-generation sequencing of the protein coding regions of the human genome across diverse, richly-phenotyped populations and to share these datasets and findings with the scientific community to extend and enrich the diagnosis, management and treatment of heart, lung and blood disorders.



Welcome to FINDbase worldwide, an online resource documenting frequencies of pathogenic genetic variations leading to inherited disorders in various populations worldwide. The initial data came from previously published reports as well as from unpublished information contributed from individual researchers prior of publication.

Since 2008, FINDbase has undergone a major upgrade and a substantial content update with the documentation of additional inherited disorders and a completely new set of pharmacogenomic markers. This information is available in two separate modules, namely Causative mutations and Pharmacogenomic markers.


GeneCards®: The Human Gene Database

GeneCards is a searchable, integrative database that provides comprehensive, user-friendly information on all annotated and predicted human genes. It automatically integrates gene-centric data from ~125 web sources, including genomic, transcriptomic, proteomic, genetic, clinical and functional information.

Source: Weizmann Institute of Science


GeneLoc shows a gene in a genomic context, with a list of genes and markers in its scalable genomic neighborhood. The GeneLoc algorithm creates an integrated map of the human genome. GeneLoc unifies gene collections, eliminates redundancies, and assigns each gene a meaningful location-based identifier, which also serves as its GeneCards ID.

GeneLoc currently uses gene sets from NCBI and Ensembl. It compares these collections, deciding which entries should be consolidated and which are discrete. Since the gene annotations use the same assembly and coordinate scheme, GeneLoc effects this gene integration by comparing genomic locations.

Source: Weizmann Institute of Science

Gene Ontology Annotation (UniProt-GOA) Database

The UniProt GO annotation program aims to provide high-quality Gene Ontology (GO) annotations to proteins in the UniProt Knowledgebase (UniProtKB). The assignment of GO terms to UniProt records is an integral part of UniProt biocuration . UniProt manual and electronic GO annotations are supplemented with manual annotations supplied by external collaborating GO Consortium groups, to ensure a comprehensive GO annotation dataset is supplied to users.

Source: EMBL-EBI


This Environmental Genome Project web resource integrates gene, sequence and polymorphism data into individually annotated gene models. The human genes included are related to DNA repair, cell cycle control, cell signaling, cell division, homeostasis and metabolism, and are thought to play a role in susceptibility to environmental exposure.

Source: University of Utah Genome Center

Genome Reference Consortium

The GRC is working hard to provide the best possible reference assembly for human. We do this by both generating multiple representations (alternate loci) for regions that are too complex to be represented by a single path. Additionally, we are releasing regional fixes known as patches. This allows users who are interested in a specific locus to get an improved representation without affecting users who need chromosome coordinate stability.

Source: NCBI

Genome Variation Server (GVS)

The Genome Variation Server (GVS), fed by a local database, enables rapid access to human genotype data found in dbSNP, and provides tools for analysis of genotype data. The current release of genotype data found in the GVS database is that of dbSNP build 144 (May 2015). The variation locations are mapped to the human genome reference sequence of December 2013 (UCSC hg38, NCBI build 38). This GVS database contains 11.7 million variations with corresponding genotype data.


GWAS Central

GWAS Central (previously the Human Genome Variation database of Genotype-to-Phenotype information) is a database of summary level findings from genetic association studies, both large and small. We actively gather datasets from public domain projects, and encourage direct data submission from the community.
GWAS Central is built upon a basal layer of Markers that comprises all known SNPs and other variants from public databases such as dbSNP and the DBGV. Allele and genotype frequency data, plus genetic association significance findings, are added on top of the Marker data, and organised the same way that investigations are reported in typical journal manuscripts.

Source: GWAS Central


GWIPS-viz aims to provide on-line tools for the analysis and visualization of ribo-seq data obtained with the ribosome profiling technique, see Ingolia et al (2009) Science.

GWIPS-viz is based on the UCSC Genome Browser, developed by the Genome Informatics Group, Center for Biomolecular Science and Engineering, University of California, Santa Cruz.

Source: University of California, Santa Cruz

HapMap Project

The International HapMap Project is a multi-country effort to identify and catalog genetic similarities and differences in human beings. Using the information in the HapMap, researchers will be able to find genes that affect health, disease, and individual responses to medications and environmental factors. The Project is a collaboration among scientists and funding agencies from Japan, the United Kingdom, Canada, China, Nigeria, and the United States. [See Participating Groups and Initial Planning Groups.] All of the information generated by the Project will be released into the public domain.

Source: NCBI

H-Invitational Database (H-InvDB)

H-Invitational Database (H-InvDB) is an integrated database of human genes and transcripts.
By extensive analyses of all human transcripts, we provide curated annotations of human genes and transcripts that include gene structures, alternative splicing variants, non-coding functional RNAs, protein functions, functional domains, sub-cellular localizations, metabolic pathways, protein 3D structure, genetic polymorphisms (SNPs, indels and microsatellite repeats), relation with diseases, gene expression profiling, and molecular evolutionary features, protein-protein interactions (PPIs) and gene families/groups.
H-InvDB was produced based upon the annotation technology established in the H-Invitational Project for annotation of human full-length cDNAs (2004), was updated by the "Genome Information Integration Project" (2005-2008) and "METI integrated database project" (2008-2011) as a key integrated database of human genes, and then updated by AIST and Tokai University School of Medicine with support from JSPS KAKENHI, Grant-in-Aid for Publication of Scientific Research Results.

Source: Tokai University School of Medicine

HMDD (the Human microRNA Disease Database)

HMDD (the Human microRNA Disease Database) is a database that curated experiment-supported evidence for human microRNA (miRNA) and disease associations. miRNAs are one class of important regulatory RNAs, which mainly repress gene express at the post-transcriptional level. Increasing reports have shown that miRNAs play important roles in various critical biological processes.

Currently, HMDD collected 10368 entries that include 572 miRNA genes, 378 diseases from 3511 papers. (29. February 2016)


Human Epigenome Atlas

The Human Epigenome Atlas includes human reference epigenomes and the results of their integrative and comparative analyses. Human Epigenome Atlas provides detailed insights into locus-specific epigenomic states like histone marks and DNA methylation across tissues and cell types, developmental stages, physiological conditions, genotypes, and disease states.

Source: Baylor College of Medicine

Human Gene Mutation Database (HGMD®)

The Human Gene Mutation Database (HGMD®) represents an attempt to collate known (published) gene lesions responsible for human inherited disease. and is maintained in Cardiff by D.N. Cooper, E.V. Ball, P.D. Stenson, A.D. Phillips, K. Howells, S. Heywood, M.J. Hayden, M.E. Mort and M.P. Horan.

Source: Cardiff University 

Locus Reference Genomic

The new system, founded on the RefSeqGene project, was named Locus Reference Genomic (LRG). As of October 2013, over 700 LRGs have been created, of which over 400 are public and in use by the community ( The aim of the project is to create an LRG for every locus with clinical implications.

Source: EMBL-EBI

LOVD - Leiden Open (source) Variation Database

LOVD stands for Leiden Open (source) Variation Database.
LOVD's purpose : To provide a flexible, freely available tool for Gene-centered collection and display of DNA variations. LOVD 3.0 extends this idea to also provide patient-centered data storage and storage of NGS data, even of variants outside of genes. LOVD is open source, released under the GPL license, and is actively being improved.

Source: Leiden University Medical Center


A human mitochondrial genome database.
A compendium of polymorphisms and mutations in human mitochondrial DNA.
MITOMAP reports published and unpublished data on human mitochondrial DNA variation. Currently our variant tables report frequencies from 30589 human mitochondrial DNA sequences.


mtDB - Human Mitochondrial Genome Database


Neuroscience Information Framework - NIF

The Neuroscience Information Framework is a dynamic inventory of Web-based neuroscience resources: data, materials, and tools accessible via any computer connected to the Internet. An initiative of the NIH Blueprint for Neuroscience Research, NIF advances neuroscience research by enabling discovery and access to public research data and tools worldwide through an open source, networked environment.

Source: NIF

NHGRI Structural Variation Project

The sequence-based Survey of Human Structural Variation aims to characterize common structural variants that are larger than SNPs, for example, multi-base insertions/deletions, inversions, translocations, and duplications. The approach entails sequencing the ends of fosmids and BACs from multiple individuals. This strategy can be efficiently scaled with current technology and is complementary to efforts to obtain human structural variation information by other technologies.

Source: NCBI


An Online Catalog of Human Genes and Genetic Disorders


Personal Genome Project

The Personal Genome Project was founded in 2005 and is dedicated to creating public genome, health, and trait data. Sharing data is critical to scientific progress, but has been hampered by traditional research practices—our approach is to invite willing participants to publicly share their personal data for the greater good.

Source: PersonalGenomes


SolveBio is the most powerful and flexible platform for clinical genomics professionals who need up-to-date reference data and tools for managing, curating, and reporting on genomic variation.
Source: NextBio


The Human Phenotype Ontology (HPO)

The Human Phenotype Ontology (HPO) aims to provide a standardized vocabulary of phenotypic abnormalities encountered in human disease. Each term in the HPO describes a phenotypic abnormality, such as atrial septal defect. The HPO is currently being developed using the medical literature, Orphanet, DECIPHER, and OMIM. HPO currently contains approximately 11,000 terms and over 115,000 annotations to hereditary diseases.
Source: Github


The Human Variome Project

The Human Variome Project is an international non-governmental organisation that is working to ensure that all information on genetic variation and its effect on human health can be collected, curated, interpreted and shared freely and openly.

Source: Human Variome Project International Limited

The Singapore Human Mutation And Polymorphism Database

This website has been created to provide clinicans and scientists access to a central genetic database for the local population. The existence of population and country-specific databases is very important and valuable for the study of population history, genetic-testing, as well as disease association studies. The availability of allele and genotype frequencies for specific ethnic groups in our population would be useful for the selection of informative genetic markers in association studies involving complex traits or phenotypes. Thus, these led to the creation of this database which contains the allele and genotype frequencies. The data catalogued in the database include mutations in Mendelian diseases identified in Singapore, and also frequencies of polymorphisms which had been investigated in either controls or samples associated with specific phenotypes or diseases. Polymorphisms captured include single nucleotide polymorphisms, variable number of tandem repeats and insertions/deletions, but not microsatellite markers.

Source: BioInformatics Institute


TIARA genome database, which contains personal genomic information obtained from heterogenous technologies including next generation sequencing (NGS) and ultra-high-resolution comparative genomic hybridization (CGH) arrays. This database improves the accuracy of detecting personal genomic variations, such as SNPs, short indels, and structural variants (SVs).

Moreover, TIARA provides the genomic variants between whole genome sequencing and transcriptome sequencing for matched samples as well as the features of allele specific gene expression and transcriptional base modifications (TBM), or RNA editing.

Source: Seoul National University College of Medicine

Vega Genome Browser

This site presents data from the manual annotation of the human genome by the Havana group at the Welcome Trust Sanger Institute. A first pass annotation of the whole genome has been completed as part of the Gencode project. Vega also shows Loss Of Function (LoF) loci.


Pin It