Genomic annotations | Knowledge Portal Network

In the Knowledge Portals, the term "annotations" refers to analytical distillations of experimental data that characterize specific genomic regions in specific tissues. Raw data are processed using an appropriate method to generate annotations, which may then be further analyzed for enrichment of genetic association signals. The aim of these studies is to provide evidence that may support the role of sequence variation in disease risk, which can help researchers identify the genes and pathways directly involved in the disease process. Since the majority of significant genetic associations are located outside of protein-coding regions, determining whether they are located in genomic regions with the potential to affect gene regulation, such as open chromatin, promoters, or enhancers, can suggest hypotheses about how these variants impact disease.

Raw data are loaded into the Common Metabolic Diseases Genome Atlas (CMDGA), are processed by the CMDGA team to create annotations, and are then transferred to the Human Genetics Amplifier (HuGeAMP) platform powering the Knowledge Portals for further analysis and display on the Portals. Click here to see a searchable list of all genomic annotations currently displayed on the Knowledge Portals.

Would you like to contribute data to CMDGA? Please see this page for a description of the data types and formats that are currently accepted.

Annotation classification

Annotation post-processing methods

Viewing annotations in the Knowledge Portals

Annotation classification

Annotations are classified at CMDGA into Annotation categories and Annotation types. Some annotation categories and types that are stored at CMDGA are not yet represented in the Knowledge Portals. The annotations currently represented in the Knowledge Portals are:

Annotation category	Annotation type	Experimental techniques
cis-regulatory elements	Accessible chromatin	ATAC-seq, snATAC-seq, DNase-seq
cis-regulatory elements	Chromatin states	ChIP-seq and CHROMHMM analysis
cis-regulatory elements	Binding sites	ChIP-seq
cis-regulatory elements	Candidate regulatory elements from ENCODE	ChIP-seq, DNase-seq
Target gene links	Target gene predictions	Cicero, ABC, CHiCAGO

To simplify visualizations in the Knowledge Portals, we have grouped chromatin states into broad categories:

Chromatin state in Knowledge Portals

Chromatin state from CHROMHMM

Active enhancer

Active_enhancer_1

Active_enhancer_2

EnhA1

EnhA2

Enhancer

Genic_enhancer

Weak_enhancer

EnhG

EnhG1

EnhG2

EnhWk

EnhBiv

Enh

Active promoter

Active_TSS

TssA

Promoter

Weak_TSS

Flanking_TSS

Bivalent/poised_TSS

TssAFlnk

TssFlnk

TssBiv

BivFlnk

TssFlnkU

TssFlnkD

BivFlnk

TssFlnkU

TssFlnkD

Other

Strong_transcription

Repressed_polycomb

Weak_repressed_polycomb

Quiescent/low_signal

Weak_transcription

Txn

ReprPC

ReprPCWk

Quies

TxWk

Het

ZNF/Rpts

TxFlnk

Ctcf

Tissues are also grouped into broad categories. Each annotation is derived from data generated in a specific tissue or cell line, termed the "biosample". Each biosample is mapped to a broad tissue group representing the organ or organ system from which it was derived.

Annotation post-processing methods

We apply stratified LD score regression (S-LDSC; Bulik-Sullivan et al., 2015; GitHub repository) to the bottom-line ancestry-specific genetic associations in the Knowledge Portal database to calculate global enrichment of genetic associations within the tissue-specific epigenomic annotations described above. Significant enrichment of genetic association signals for a disease in annotated regions of a particular type in a specific tissue suggests that the tissue may be relevant for the disease. We previously used the GREGOR method for this calculation, but have replaced it with S-LDSC because S-LDSC is more robust to potential confounders and produces more accurate and specific enrichments. The method generates p-values for each annotation and ancestry, representing the significance of association between a trait and a tissue, and also assigns a fold enrichment value.

Viewing annotations in the Knowledge Portals

Although different Knowledge Portals focus on different diseases and traits, all genomic annotations are accessible via all portals. Genomic annotations are displayed on these pages and interfaces:

the Genomic Region Miner module on the Region page (view an example) allows you to visualize all annotations for a tissue group or for an annotation type in that region
the Globally enriched annotations table on the Phenotype page (view an example) displays tissue-specific annotation types in which genetic associations for a phenotype are enriched
the Variant Sifter tool allows you to filter variants by their location within regions annotated as tissue-specific enhancers, promoters, binding sites, accessible chromatin regions, or regions linked to specific genes
the Non-coding Genetic Association Interactive Tool (NC-GAIT) allows you to perform custom aggregation tests on rare variant associations located within epigenomically annotated regions