In the Knowledge Portals, the term "annotations" refers to analytical distillations of experimental data that characterize specific genomic regions in specific tissues. Raw data are processed using an appropriate method to generate annotations, which may then be further analyzed for enrichment of genetic association signals. The aim of these studies is to provide evidence that may support the role of sequence variation in disease risk, which can help researchers identify the genes and pathways directly involved in the disease process. Since the majority of significant genetic associations are located outside of protein-coding regions, determining whether they are located in genomic regions with the potential to affect gene regulation, such as open chromatin, promoters, or enhancers, can suggest hypotheses about how these variants impact disease.
Raw data are loaded into the Common Metabolic Diseases Genome Atlas (CMDGA), are processed by the CMDGA team to create annotations, and are then transferred to the Human Genetics Amplifier (HuGeAMP) platform powering the Knowledge Portals for further analysis and display on the Portals. Click here to see a searchable list of all genomic annotations currently displayed on the Knowledge Portals.
Would you like to contribute data to CMDGA? Please see this page for a description of the data types and formats that are currently accepted.
Annotation classification
Annotation post-processing methods
Viewing annotations in the Knowledge Portals
Annotations are classified at CMDGA into Annotation categories and Annotation types. Some annotation categories and types that are stored at CMDGA are not yet represented in the Knowledge Portals. The annotations currently represented in the Knowledge Portals are:
Annotation category | Annotation type | Experimental techniques |
---|---|---|
cis-regulatory elements | Accessible chromatin | ATAC-seq, snATAC-seq, DNase-seq |
cis-regulatory elements | Chromatin states | ChIP-seq and CHROMHMM analysis |
cis-regulatory elements | Binding sites | ChIP-seq |
cis-regulatory elements | Candidate regulatory elements from ENCODE | ChIP-seq, DNase-seq |
Target gene links | Target gene predictions | Cicero, ABC, CHiCAGO |
To simplify visualizations in the Knowledge Portals, we have grouped chromatin states into broad categories:
Chromatin state in Knowledge Portals | Chromatin state from CHROMHMM | |||||||||||||||
Active enhancer |
|
|||||||||||||||
Enhancer |
|
|||||||||||||||
Active promoter |
|
|||||||||||||||
Promoter |
|
|||||||||||||||
Other |
|
Tissues are also grouped into broad categories. Each annotation is derived from data generated in a specific tissue or cell line, termed the "biosample". Each biosample is mapped to a broad tissue group representing the organ or organ system from which it was derived.
Annotation post-processing methods
We apply stratified LD score regression (S-LDSC; Bulik-Sullivan et al., 2015; GitHub repository) to the bottom-line ancestry-specific genetic associations in the Knowledge Portal database to calculate global enrichment of genetic associations within the tissue-specific epigenomic annotations described above. Significant enrichment of genetic association signals for a disease in annotated regions of a particular type in a specific tissue suggests that the tissue may be relevant for the disease. We previously used the GREGOR method for this calculation, but have replaced it with S-LDSC because S-LDSC is more robust to potential confounders and produces more accurate and specific enrichments. The method generates p-values for each annotation and ancestry, representing the significance of association between a trait and a tissue, and also assigns a fold enrichment value.
Viewing annotations in the Knowledge Portals
Although different Knowledge Portals focus on different diseases and traits, all genomic annotations are accessible via all portals. Genomic annotations are displayed on these pages and interfaces:
- the Genomic Region Miner module on the Region page (view an example) allows you to visualize all annotations for a tissue group or for an annotation type in that region
- the Globally enriched annotations table on the Phenotype page (view an example) displays tissue-specific annotation types in which genetic associations for a phenotype are enriched
- the Variant Sifter tool allows you to filter variants by their location within regions annotated as tissue-specific enhancers, promoters, binding sites, accessible chromatin regions, or regions linked to specific genes
- the Non-coding Genetic Association Interactive Tool (NC-GAIT) allows you to perform custom aggregation tests on rare variant associations located within epigenomically annotated regions