T2DKP now offers a T2D-specific exome sequence collection of unprecedented size

The largest known exome sequence analysis specific to a complex disease was published today in Nature, and all of the results are now freely available in the Type 2 Diabetes Knowledge Portal (T2DKP) to support researchers worldwide as they make decisions about how to prioritize potential T2D drug targets for investigation. The paper, “Exome sequencing of 20,791 cases of type 2 diabetes and 24,440 controls” (Flannick et al.), describes a multi-ancestry analysis of both variant-level and gene-level genetic associations for type 2 diabetes.

The paper is the culmination of years of work from a global collaboration to generate exome sequences across five ancestry groups. The project began as an effort by the Type 2 Diabetes Genetic Exploration by Next-generation sequencing in multi-Ethnic Samples (T2D-GENES) consortium to perform exome sequencing and T2D association analysis for about 13,000 samples, and evolved into a consortium of consortia—about 30 international sites in all, including the GoT2D, ESP, SIGMA, LuCAMP, and ProDIGY consortia—that partnered to design a study including as many exomes as possible. The Accelerating Medicines Partnership in Type 2 Diabetes grew out of this effort, and today supports a wide range of genetic association and other studies aimed at elucidating the mechanisms behind T2D, as well as supporting the T2DKP to serve these results to the world.

The study included participants of African American, East Asian, European, Hispanic/Latino, and South Asian ancestry. The researchers sequenced exomes (the protein-coding regions of the genome) from these participants and performed gene-level association analysis in order to detect rare variants and uncover allelic series within genes. They also performed single-variant association analysis for a subset of the samples using genome-wide arrays and imputation. A comparison of the two methods confirmed that the strength of exome sequencing is its ability to identify informative, often rare, alleles that may yield clues to disease mechanisms, while array-based GWAS provides a more comprehensive picture of strongly associated loci.

The researchers found exome-wide significant gene-level T2D associations for three genes (MC4R, PAM, and SLC30A8). Replication of the gene-level associations in a meta-analysis of three independent exome sequencing datasets confirmed the significance of these associations and found exome-wide significance for a fourth gene, UBE2NL. The variant alleles uncovered in these genes are effectively “experiments of nature” that may subtly alter the structure, function, or stability of the gene products and could be very helpful in suggesting further research directions to discover the roles of these proteins in T2D risk.

But what of the other genes whose gene-level associations didn’t meet exome-wide significance? Suspecting that these associations could still provide valuable information, the authors decided to test whether these association scores were meaningful. They created sets of genes that were known or likely to have a role in T2D risk: for example, genes known to be T2D drug targets, genes in which mutations cause maturity onset diabetes of the young (MODY), or genes whose mouse homologs confer glycemic phenotypes when knocked out. In each set, genes in the sets had more significant gene-level T2D associations than would be expected by chance, suggesting that their scores were meaningful despite relatively low statistical significance. Analysis of additional sets of genes, for example those located in strongly T2D-associated GWAS loci, supported this conclusion.

Thus, although future studies with larger sample sizes will be needed to uncover strongly significant gene-level associations, the associations generated from this study can still provide evidence to support prioritization of research effort and resources. For example, the gene-level scores could help suggest which gene in a T2D-associated locus is most likely to be relevant to T2D. The series of variant alleles in individual genes that were identified in this study could help indicate whether it is gain or loss of protein function that affects T2D risk, an important piece of information for drug development.

So that researchers worldwide may benefit from these results, with agreement from all of the authors the results were made available in the T2DKP when the pre-print of the paper was posted to BioRxiv. “A main message of the paper is that rare variants potentially provide a much more valuable resource for drug development than previously thought,”  said Jason Flannick, first author on the paper. “We can actually detect evidence of their disease association in many genes that could be targeted by new medications or studied to understand the fundamental processes underlying disease. But because there is so much more information than just the variants in the genes cited in the paper, making all of the results available to everybody is critical for them to have the largest impact.”

In the T2DKP, this dataset is termed the AMP T2D-GENES exome sequence analysis set and is described on the Data page. The single-variant T2D associations may be browsed and searched throughout the T2DKP: on Gene and Variant pages, in Interactive Manhattan plots, and via the Variant Finder tool. The Genetic Association Interactive Tool (GAIT) for single variants and the custom burden test for genes provide secure interaction with the individual-level data from this set, allowing the user to filter samples and set custom parameters before performing on-the-fly association analysis.

The gene-level association scores are displayed in the T2DKP via two avenues. A new page lists genes with their association scores and other information such as the number of variants used to calculate the score. The variants comprising the scores may be filtered by any of 7 different categories, and the results of two different aggregation test methods are also available. Gene-level scores are also shown in the Gene Prioritization Toolkit on Gene pages. See our recent blog post for a description of this interface.

In addition to the sheer volume of these exome sequencing results, their open availability in the T2DKP is a remarkable milestone for the diabetes genetics research community. "I believe the T2D genetics community is setting examples both for human genetics, in data aggregation and joint analysis, and in its commitment to sharing of these results on an open platform enabling non-experts to make direct use of the results," says Noël Burtt, Director of Operations and Development for Knowledge Portals and Diabetes at the Broad Institute. The T2DKP team is proud to be a part of this collaborative effort.

Read the press release