What kinds of analyses can you do using the bottom-line genetic association files from the Knowledge Portal Network? Here are some example use cases.
1. Identify significant "lead SNPs" for a trait, using the clumping algorithm of the popular PLINK software package. The output will be a list of independent SNPs associated with the trait, and other SNPs in linkage disequilibrium with them.
Download instructions as a plain text file
#Download PLINK1.9 from https://www.cog-genomics.org/plink/1.9/
plink
#Download genotype information for calculating LD from https://cncr.nl/research/magma/
g1000_eur.{bed,bim,fam}
#Download bottom line results for a trait
T2D.sumstats.gz
#Columns
zcat T2D.sumstats.gz | head
CHROM POS P N
6 33025792 0.5319 825254
6 31228974 0.3039 123309
17 37868463 0.9932 2630
12 128927440 0.8391 535350
5 143346236 0.6235 8319
1 43912702 0.3543 6658
2 84658819 0.4396 10392
#add rsIDs
zcat T2D.sumstats.gz | awk -F"\t" -v OFS="\t" 'FNR == NR {m[$1":"$4] = $2} FNR != NR {if (FNR == 1) {print "SNP",$0} if (m[$1":"$2]) {print m[$1":"$2],$0}}' g1000_eur.bim - > T2D.annot.sumstats
#run clumping
plink --bfile g1000_eur \
--clump T2D.annot.sumstats \
--clump-p1 1e-5 \
--clump-p2 1e-2 \
--clump-r2 0.1 \
--clump-kb 250 \
--clump-snp-field SNP \
--out T2D
#output file
head T2D.clump
2. Identify significant gene-level associations for a trait, using the popular MAGMA algorithm. The output will be a list of gene-level associations and their statistical significance.
Download instructions as a plain text file
#Download MAGMA from https://cncr.nl/research/magma/
magma
#Download genotype information and auxiliary files from https://cncr.nl/research/magma/
g1000_eur.{bed,bim,fam}
#download synonym file
g1000_eur.synonyms
#download gene loc file
NCBI37.3.gene.loc
#download bottom line results for a trait
T2D.sumstats.gz
#add rsIDs
zcat T2D.sumstats.gz | awk -F"\t" -v OFS="\t" 'FNR == NR {m[$1":"$4] = $2} FNR != NR {if (FNR == 1) {print "SNP",$0} if (m[$1":"$2]) {print m[$1":"$2],$0}}' g1000_eur.bim - > T2D.annot.sumstats
#make snploc and p-value files
cat T2D.annot.sumstats | cut -f1,2,3 > T2D.snploc
cat T2D.annot.sumstats | cut -f1,4,5 > T2D.pval
#run annotation magma --annotate window=50 --snp-loc T2D.snploc --gene-loc NCBI37.3.gene.loc --out T2D #produces T2D.genes.annot
magma --bfile g1000_eur synonyms=g1000_eur.synonyms --gene-annot T2D.genes.annot --pval T2D.pval ncol=N --out T2D #produces T2D.genes.out
3. Conduct a PheWAS for a variant. The result will be p-values for the variant across all traits.
Download instructions as a plain text file
#download summary stat files for traits of interest trait1.sumstats.gz
trait2.sumstats.gz
…
traitN.sumstats.gz
#variant of interest
chrom=11
pos=11324143
#conduct the phewas
for file in *.sumstats.gz; do
zcat $file | \
awk -v OFS="\t" -v fname="$file" '{print fname, $0}'; done \
| (sed -u 1q; awk -v chrom="$chrom" -v pos="$pos" '$2 == chrom && $3 == pos') \
| sed '1 s/^\S\S*/Trait/' > phewas.out