We are interested in incorporating many different genetic and genomic data types in disease areas that are relevant to the existing Knowledge Portals:
Click on a link above to see details about submitting each data type. We are also interested in discussing new data types for the portals and collaborations on methods, tools, and visualizations. Please contact us.
Genetic association data
We welcome submissions of genetic association summary statistics for traits that are relevant to the Knowledge Portals. We have analyzed the privacy risks inherent in sharing summary statistics in the Knowledge Portals and found that they are extremely low (read our white paper).
Upon receipt of your summary statistics we will integrate them into the Knowledge Portal database, making them available for browsing and querying via the interfaces, tools, and APIs of the relevant Portal(s). At your request, we are also able to provide the summary statistic files for public download from the Portal. If you would like us to provide these files, please let us know when submitting your dataset.
We can also accept raw, individual-level data for analysis or extended processing. For more information on submitting such data, see these detailed instructions written for the Type 2 Diabetes Knowledge Portal.
Summary statistic file formats
Our minimum file format includes:
- variant rsID or chromosome and position (hg19; if your results are not in the hg19 genome version we can perform LiftOver in either direction)
- reference allele
- effect allele
- effect size or odds ratio
- p-value
Our preferred format includes the above values plus:
- sample size for each variant
- effect allele frequency, or if not available, minor allele frequency
- sample size per variant, or effective sample size for binary traits
Please submit files in .tsv format, compressed if necessary.
Accompanying information
So that we can document the dataset appropriately, we would also like to have:
- a brief description of the dataset
- the total sample size, and sample size for each phenotype
- definitions of the phenotypes assayed
- ancestry of the participants
- reference to a publication describing the study, if available
- image file(s) for logo(s) of consortia involved, if you would like us to display them in the dataset documentation
- if you would like us to provide the summary statistic files for download, please also supply a README file to accompany them
When you are ready to submit data, please contact us. Data files may be transferred via email, Dropbox, any other means of file transfer that you use, or via an Aspera site that we can set up. Let us know whether you would like us to to provide the files for download as well as integrating the results in the Portal.
Credible sets
We are interested in receiving investigator-generated credible sets for as many Knowledge Portal traits as possible. Submitted credible set files should include:
- Variant ID
- Credible set ID, or, if IDs have not been assigned, a description of the researchers' strategy for assigning variants to credible sets (e.g., considering variants within 500kb of a lead variant)
- Posterior probability (preferred), or log(Bayes factor) and a suggested method for converting log(BF) to posterior probability
Genomic data
Genomic data come to the Knowledge Portals via our sister resource, the Common Metabolic Diseases Genome Atlas (CMDGA). See this page for more information on the genomic data types currently displayed on the Knowledge Portals. To inquire about genomic dataset submission, please contact the CMDGA team.
Curated knowledge
Currently, the curated knowledge available in the Knowledge Portals is in the form of expert-curated sets of predicted effector genes for various traits and diseases. Submitted files should be in comma-separated values (.csv) format, but since the methods for generating these predictions vary widely, there is as yet no standard specification for the file content. Please contact us if you have generated a list of predicted effector genes that you would like to have displayed in the Knowledge Portals.