Tutorial: Use cases for the Knowledge Portal Network bottom-line genetic associations

What kinds of analyses can you do using the bottom-line genetic association files from the Knowledge Portal Network? Here are some example use cases.

 

1. Identify significant "lead SNPs" for a trait, using the clumping algorithm of the popular PLINK software package. The output will be a list of independent SNPs associated with the trait, and other SNPs in linkage disequilibrium with them. 

Download instructions as a plain text file

 

#Download PLINK1.9 from https://www.cog-genomics.org/plink/1.9/
plink

 

#Download genotype information for calculating LD from https://cncr.nl/research/magma/
g1000_eur.{bed,bim,fam}

 

#Download bottom line results for a trait
T2D.sumstats.gz

 

#Columns 
zcat T2D.sumstats.gz | head 
CHROM POS P N
6 33025792 0.5319 825254
6 31228974 0.3039 123309
17 37868463 0.9932 2630
12 128927440 0.8391 535350
5 143346236 0.6235 8319
1 43912702 0.3543 6658
2 84658819 0.4396 10392
 

#add rsIDs
zcat T2D.sumstats.gz | awk -F"\t" -v OFS="\t" 'FNR == NR {m[$1":"$4] = $2} FNR != NR {if (FNR == 1) {print "SNP",$0} if (m[$1":"$2]) {print m[$1":"$2],$0}}' g1000_eur.bim - > T2D.annot.sumstats

 

#run clumping
plink --bfile g1000_eur \
--clump T2D.annot.sumstats \
--clump-p1 1e-5 \
--clump-p2 1e-2 \
--clump-r2 0.1 \
--clump-kb 250 \
--clump-snp-field SNP \
--out T2D
 

#output file
head T2D.clump


 

2. Identify significant gene-level associations for a trait, using the popular MAGMA algorithm. The output will be a list of gene-level associations and their statistical significance.

Download instructions as a plain text file

 

#Download MAGMA from https://cncr.nl/research/magma/
magma

 

#Download genotype information and auxiliary files from https://cncr.nl/research/magma/
g1000_eur.{bed,bim,fam}
#download synonym file
g1000_eur.synonyms

 

#download gene loc file
NCBI37.3.gene.loc

 

#download bottom line results for a trait
T2D.sumstats.gz

 

#add rsIDs
zcat T2D.sumstats.gz | awk -F"\t" -v OFS="\t" 'FNR == NR {m[$1":"$4] = $2} FNR != NR {if (FNR == 1) {print "SNP",$0} if (m[$1":"$2]) {print m[$1":"$2],$0}}' g1000_eur.bim - > T2D.annot.sumstats

 

#make snploc and p-value files
cat T2D.annot.sumstats | cut -f1,2,3 > T2D.snploc
cat T2D.annot.sumstats | cut -f1,4,5 > T2D.pval

 

#run annotation magma --annotate window=50 --snp-loc T2D.snploc --gene-loc NCBI37.3.gene.loc --out T2D #produces T2D.genes.annot

 

magma --bfile g1000_eur synonyms=g1000_eur.synonyms --gene-annot T2D.genes.annot --pval T2D.pval ncol=N --out T2D #produces T2D.genes.out


3. Conduct a PheWAS for a variant. The result will be p-values for the variant across all traits.

Download instructions as a plain text file

 

#download summary stat files for traits of interest trait1.sumstats.gz
trait2.sumstats.gz

traitN.sumstats.gz

 

#variant of interest
chrom=11
pos=11324143

 

#conduct the phewas
for file in *.sumstats.gz; do 
  zcat $file | \
  awk -v OFS="\t" -v fname="$file" '{print fname, $0}'; done \
| (sed -u 1q; awk -v chrom="$chrom" -v pos="$pos" '$2 == chrom && $3 == pos') \
| sed '1 s/^\S\S*/Trait/' > phewas.out