APOL1 differential expression methods and supporting data

Submitted by mariacos on

Tutorial for the Differential Expression View

We conducted the differential expression analysis with the R package DESeq2. We first filtered for protein-coding genes with more than 10 counts in at least 14 samples (our smaller group size). Fourteen low-risk (LR) samples were compared to 16 high-risk (HR) samples adjusting for age, sex, and RNA-Seq batch. Cook’s distance was used to flag associations influenced by sample outliers.

Gene expression was variance stabilized (DESeq2), batch corrected (Combat), and quantile normalized. Gene expression was not normalized by gene length, so while we can compare correlation trends, we cannot compare expression levels of different genes.

Glomerular tissue is composed of multiple kidney and infiltrating immune cell types. To ensure our differential expression results were not biased from cell type variation between samples, we compared the quantile normalized gene expression of cell-type-specific genes between groups. Using the genes unique to a cell type, we approximated the cell-type abundances as the mean of the cell-type-specific quantile normalized gene expression in each sample. The difference in mean cell-type expression between groups was tested with a Wilcoxon test and a Bonferroni adjusted significance threshold of 0.05/24=0.002.

Column descriptions

Gene Symbol: The gene symbol is hyperlinked to the gene page, which provides information for this gene from all analyses.

Gene Name: The descriptive name of the gene product.

Ensembl Gene ID: The Ensembl ID is hyperlinked to the gene page, which provides information for this gene from all analyses.

View Plot: Mouse over to view a box plot of normalized gene expression. Each point is a sample; points are jittered for viewing purposes.

Log2 Fold Change: Log2(normalized counts in HR / normalized counts in LR). Positive values reflect genes with higher expression in HR, as compared to LR. For example, a log2 fold change of 1 indicates a doubling of expression in HR, while -1 indicates expression in HR is halved compared to LR.

P-value: Significance of the Wald test with a null hypothesis that gene expression is the same between high-risk and low-risk.

Adjusted P-value: P-value adjusted for multiple testing with FDR/Benjamini-Hochberg method. Genes with low mean normalized counts were not used in FDR adjustment, are indicated with “NA”, and should be interpreted with caution.

Mean Normalized Counts: Mean of normalized counts for all samples. Note: this is not normalized by gene length, so cannot be used to compare genes.

Outlier: Genes with outlying samples detected by Cooks distance. Note that while these genes may be differentially expressed, their fold change is influenced by outlying samples, and results should be interpreted with caution.

research from
apol1_diff_exp