Filtering variants with masks

A mask is a set of criteria that you can use to filter a set of variants. Masks filter variants by their categories or scores generated by multiple tools that are commonly used to estimate the deleteriousness of variants. Some masks incorporate allele frequency as an additional criterion. The purpose of filtering variants with masks is to generate a set of potentially impactful variants that can be used for gene burden testing.

Click on a link to see documentation for:

Masks available in the GAIT tool

Masks available on the Gene page

Masks available in the GAIT tool

In the Knowledge Portals, the GAIT tool offers the ability to choose one of 7 masks in order to assemble a list of the variants that are most likely to affect a gene product and then to perform an aggregation test with that set of variants. These masks are defined and were used in the study Flannick et al., 2019.

In decreasing order of stringency (i.e., the most stringent mask retrieves the smallest number of variants that have the most severe predicted impact), the masks are named:

  • LofTee 
  • 16/16
  • 11/11
  • 5/5
  • 5/5 + LofTee LC 1%
  • 5/5 + 1/5 1%
  • 5/5 + 0/5 1%

The methods used to predict variant deleteriousness and their categories or thresholds are:

  • LofTee HC: Variants predicted with high confidence by the Loss-Of-Function Transcript Effect Estimator (LofTee; Karczewski et al, 2020 ) to cause a loss of function of the encoded protein
  • VEST3>90%: Variants predicted with > 90% probability to be deleterious by the Variant Effect Scoring Tool VEST 3.0 (Carter et al., 2013)
  • CADD>90%: Variants predicted with > 90% probability to be deleterious by the Combined Annotation Dependent Depletion tool CADD (Rentzsch et al., 2019)
  • DANN>90%: Variants predicted with > 90% probability to be deleterious by the Deleterious Annotation of genetic variants using Neural Networks DANN tool (Quang et al., 2015)
  • Eigen-raw>90%: Variants predicted with > 90% probability to be deleterious by the Eigen method (Ionita-Laza et al., 2016)
  • Eigen-PC-raw>90%: Variants predicted with > 90% probability to be deleterious by the Eigen-PC method (Ionita-Laza et al., 2016)
  • FATHMM pred=D: Variants predicted to be deleterious by the Functional Analysis through Hidden Markov Models (FATHMM) method (Shihab et al., 2013).
  • FATHMM-MKL pred=D: Variants predicted to be deleterious by the Functional Analysis through Hidden Markov Models (FATHMM-MKL) method (Shihab et al., 2013).
  • PROVEAN pred=D: Variants predicted to be deleterious by the Protein Variation Effect Analyzer (PROVEAN) method (Choi and Chan, 2015).
  • MetaSVM pred=D: Variants predicted to be deleterious by the MetaSVM method (Dong et al., 2015)
  • MetaLR pred=D: Variants predicted to be deleterious by the MetaLR method (Dong et al., 2015)
  • MCAP>0.025: Variants with a Mendelian Clinically Applicable Pathogenicity (M-CAP) (Jagadeesh et al., 2016) score over 0.025.
  • PolyPhen HDIV pred=D: Variants predicted to be deleterious by the PolyPhen HDIV method (Adzhubei et al., 2010)
  • PolyPhen HVAR pred=D: Variants predicted to be deleterious by the PolyPhen HVAR method (Adzhubei et al., 2010)
  • SIFT pred=del: Variants predicted to be deleterious by the Sorting Intolerant From Tolerant (SIFT) algorithm (Kumar et al, 2009 )
  • LRT pred=D: Variants predicted to be deleterious by the Likelihood Ratio Test (LRT; Chun and Fay, 2009).
  • MutTaster pred=(D or A): Variants predicted to be probably deleterious (D) or variants known to be deleterious (A) by the MutationTaster method (Schwarz et al., 2010)
  • VEP impact=HIGH: Variants predicted to have high impact by the Ensembl Variant Effect Predictor (VEP)
  • VEP impact=MOD: Variants predicted to have moderate impact by the Ensembl Variant Effect Predictor (VEP)
  • LofTee LC: Variants predicted with low confidence by the Loss-Of-Function Transcript Effect Estimator (LofTee; Karczewski et al, 2020 ) to cause a loss of function of the encoded protein
  • Max MAF<1%: Variants with minor allele frequency less than 1%

In the table below, each column indicates a mask, and an "X" indicates that variants meeting the criterion in that row are included in the mask. For example, the mask named "16/16" includes all variants within the selected gene that are predicted with high confidence by LofTee to cause loss of function plus those that have scores of > 90% from any of the VEST3, CADD, DANN, Eigen-raw, and Eigen-PD-raw methods.

 Mask
CriterionLofTee16/1611/115/55/5 + LofTee LC 1%5/5 + 1/5 1%5/5 + 0/5 1%
LofTee HCxxxxxxx
VEST3>90% xxxxxx
CADD>90% xxxxxx
DANN>90% xxxxxx
Eigen-raw>90% xxxxxx
Eigen-PC-raw>90% xxxxxx
FATHMM pred=D  xxxxx
FATHMM-MKL pred=D  xxxxx
PROVEAN pred=D  xxxxx
MetaSVM pred=D  xxxxx
MetaLR pred=D  xxxxx
MCAP>0.025  xxxxx
PolyPhen HDIV pred=D   xxxx
PolyPhen HVAR pred=D   xxxx
SIFT pred=del   xxxx
LRT pred=D   xxxx
MutTaster pred=(D or A)   xxxx
VEPimpact=HIGH    x  
VEPimpact=MOD     xx
LofTee LC    x  
Max MAF<1%    xxx

 

Masks available on the Gene page

On the Gene page, pre-computed rare variant gene-level association scores are available from three different data sources, each of which has used a custom set of masks:

  •  AMP T2D-GENES T2D exome sequence analysis (T2D associations), published in Flannick et al. 2019, and AMP T2D-GENES quantitative trait exome sequence analysis (23 cardiometabolic traits), published in Dornbos et al. 2022. These gene-level scores are available for the set of 7 masks described above.
  • A rare coding variant analysis based on exome and whole-genome sequence data, from Jurgens, Wang, et al, with gene-level associations for 601 diseases across more than 750,000 individuals, provides gene-level association scores calculated using 9 masks documented here.
  • Gene-level association scores, derived from the Genebass resource and based on exome sequencing data from the UK Biobank, are available for the following masks:
    • synonymous: all synonymous variants in the gene
    • missense|LC: in-frame insertions and deletions plus low-confidence LoF variants (filtered out by LOFTEE)
    • pLoF: High-confidence LoF variants (as indicated by LOFTEE), including stop-gained, essential splice, and frameshift variants
    • missense+LoF: variants retrieved by either the missense|LC or pLoF masks