Genomic annotations

In the Knowledge Portals, the term "annotations" refers to analytical distillations of experimental data that characterize specific genomic regions in specific tissues. Raw data are processed using an appropriate method to generate annotations, which may then be further analyzed for enrichment of genetic association signals. The aim of these studies is to provide evidence that may support the role of sequence variation in disease risk, which can help researchers identify the genes and pathways directly involved in the disease process. Since the majority of significant genetic associations are located outside of protein-coding regions, determining whether they are located in genomic regions with the potential to affect gene regulation, such as open chromatin, promoters, or enhancers, can suggest hypotheses about how these variants impact disease.

Raw data are loaded into the Common Metabolic Diseases Genome Atlas (CMDGA), are processed by the CMDGA team to create annotations, and are then transferred to the Human Genetics Amplifier (HuGeAMP) platform powering the Knowledge Portals for further analysis and display on the Portals. Click here to see a searchable list of all genomic annotations currently displayed on the Knowledge Portals. 

Would you like to contribute data to CMDGA? Please see this page for a description of the data types and formats that are currently accepted.


Annotation classification

Annotation post-processing methods

Viewing annotations in the Knowledge Portals



Annotation classification

Annotations are classified at CMDGA into Annotation categories and Annotation types. Some annotation categories and types that are stored at CMDGA are not yet represented in the Knowledge Portals. The annotations currently represented in the Knowledge Portals are:

Annotation category Annotation type Experimental techniques
cis-regulatory elements Accessible chromatin ATAC-seq, snATAC-seq, DNase-seq
cis-regulatory elements Chromatin states ChIP-seq and CHROMHMM analysis
cis-regulatory elements Binding sites ChIP-seq
cis-regulatory elements Candidate regulatory elements from ENCODE ChIP-seq, DNase-seq
Target gene links Target gene predictions Cicero, ABC, CHiCAGO

 

To simplify visualizations in the Knowledge Portals, we have grouped chromatin states into broad categories:

Chromatin state in Knowledge Portals Chromatin state from CHROMHMM
Active enhancer
Active_enhancer_1
Active_enhancer_2
EnhA1
EnhA2
Enhancer
Genic_enhancer
Weak_enhancer
EnhG
EnhG1
EnhG2
EnhWk
EnhBiv
Enh
Active promoter
Active_TSS
TssA
Promoter
Weak_TSS
Flanking_TSS
Bivalent/poised_TSS
TssAFlnk
TssFlnk
TssBiv
BivFlnk
TssFlnkU
TssFlnkD
BivFlnk
TssFlnkU
TssFlnkD
Other
Strong_transcription
Repressed_polycomb
Weak_repressed_polycomb
Quiescent/low_signal
Weak_transcription
Tx
Txn
ReprPC
ReprPCWk
Quies
TxWk
Het
ZNF/Rpts
TxFlnk
Ctcf

 

Tissues are also grouped into broad categories. Each annotation is derived from data generated in a specific tissue or cell line, termed the "biosample". Each biosample is mapped to a broad tissue group representing the organ or organ system from which it was derived.

Annotation post-processing methods

We apply stratified LD score regression (S-LDSC; Bulik-Sullivan et al., 2015; GitHub repository) to the bottom-line ancestry-specific genetic associations in the Knowledge Portal database to calculate global enrichment of genetic associations within the tissue-specific epigenomic annotations described above. Significant enrichment of genetic association signals for a disease in annotated regions of a particular type in a specific tissue suggests that the tissue may be relevant for the disease. We previously used the GREGOR method for this calculation, but have replaced it with S-LDSC because S-LDSC is more robust to potential confounders and produces more accurate and specific enrichments. The method generates p-values for each annotation and ancestry, representing the significance of association between a trait and a tissue, and also assigns a fold enrichment value.

Viewing annotations in the Knowledge Portals

Although different Knowledge Portals focus on different diseases and traits, all genomic annotations are accessible via all portals. Genomic annotations are displayed on these pages and interfaces:

  • the Genomic Region Miner module on the Region page (view an example) allows you to visualize all annotations for a tissue group or for an annotation type in that region
  • the Globally enriched annotations table on the Phenotype page (view an example) displays tissue-specific annotation types in which genetic associations for a phenotype are enriched
  • the Variant Sifter tool allows you to filter variants by their location within regions annotated as tissue-specific enhancers, promoters, binding sites, accessible chromatin regions, or regions linked to specific genes
  • the Non-coding Genetic Association Interactive Tool (NC-GAIT) allows you to perform custom aggregation tests on rare variant associations located within epigenomically annotated regions