Perform a statistical test of module expression
tmodUtest.Rd
Perform a statistical test of module expression
Usage
tmodUtest(
l,
modules = NULL,
qval = 0.05,
order.by = "pval",
filter = FALSE,
mset = "all",
cols = "Title",
useR = FALSE,
nodups = TRUE
)
tmodGeneSetTest(
l,
x,
modules = NULL,
qval = 0.05,
order.by = "pval",
filter = FALSE,
mset = "all",
cols = "Title",
Nsim = 1000,
nodups = TRUE
)
tmodCERNOtest(
l,
modules = NULL,
qval = 0.05,
order.by = "pval",
filter = FALSE,
mset = "all",
cols = "Title",
nodups = TRUE
)
tmodPLAGEtest(
l,
x,
group,
modules = NULL,
qval = 0.05,
order.by = "pval",
mset = "all",
cols = "Title",
filter = FALSE,
nodups = TRUE
)
tmodZtest(
l,
modules = NULL,
qval = 0.05,
order.by = "pval",
filter = FALSE,
mset = "all",
cols = "Title",
nodups = TRUE
)
tmodHGtest(
fg,
bg,
modules = NULL,
qval = 0.05,
order.by = "pval",
filter = FALSE,
mset = "all",
cols = "Title",
nodups = TRUE
)
Arguments
- l
sorted list of HGNC gene identifiers
- modules
optional list of modules for which to make the test
- qval
Threshold FDR value to report
- order.by
Order by P value ("pval") or none ("none")
- filter
Remove gene names which have no module assignments
- mset
Which module set to use. Either a character vector ("LI", "DC" or "all", default: all) or an object of class tmod (see "Custom module definitions" below)
- cols
Which columns from the MODULES data frame should be included in resulsts
- useR
use the R
wilcox.test
function; slow, but with exact p-values for small samples- nodups
Remove duplicate gene names in l and corresponding rows from ranks
- x
Expression matrix for the tmodPLAGEtest; a vector for tmodGeneSetTest
- Nsim
for tmodGeneSetTest, number of replicates for the randomization test
- group
group assignments for the tmodPLAGEtest
- fg
foreground gene set for the HG test
- bg
background gene set for the HG test
Value
The statistical tests return a data frame with module names, additional statistic (e.g. enrichment or AUC, depending on the test), P value and FDR q-value (P value corrected for multiple testing using the p.adjust function and Benjamini-Hochberg correction. The data frame has class 'colorDF' (see package colorDF for details), but except for printing using colors on the terminal behaves just like an ordinary data.frame. To strip the coloring, use [colorDF::uncolor()].
Details
Performs a test on either on an ordered list of genes (tmodUtest, tmodCERNOtest, tmodZtest) or on two groups of genes (tmodHGtest). tmodUtest is a U test on ranks of genes that are contained in a module.
tmodCERNOtest is also a nonparametric test working on gene ranks, but it originates from Fisher's combined probability test. This test weights genes with lower ranks more, the resulting p-values better correspond to the observed effect size. In effect, modules with small effect but many genes get higher p-values than in case of the U-test.
tmodPLAGEtest is based on the PLAGE, "Pathway level analysis of gene expression" published by Tomfohr, Lu and Kepler (2005), doi 10.1186/1471-2105-6-225. In essence it is just a t-test run on module eigengenes, but it performs really well. This approach can be used with any complex linear model; for this, use the function eigengene(). See users guide for details.
tmodZtest works very much like tmodCERNOtest, but instead of combining the rank-derived p-values using Fisher's method, it uses the Stouffer method (known also as the Z-transform test).
tmodGeneSetTest is an implementation of the function geneSetTest from the limma package (note that tmodUtest is equivalent to the limma's wilcoxGST function).
For a discussion of the above three methods, read M. C. Whitlock, "Combining probability from independent tests: the weighted Z-method is superior to Fisher's approach", J. Evol. Biol. 2005 (doi: 10.1111/j.1420-9101.2005.00917.x) for further details.
tmodHGtest is simply a hypergeometric test.
In tmod, two module sets can be used, "LI" (from Li et al. 2013), or "DC" (from Chaussabel et al. 2008). Using the parameter "mset", the module set can be selected, or, if mset is "all", both of sets are used.
Custom module definitions
Custom and arbitrary module, gene set or pathway definitions can be also provided through the mset option, if the parameter is a list rather than a character vector. The list parameter to mset must contain the following members: "MODULES", "MODULES2GENES" and "GENES".
"MODULES" and "GENES" are data frames. It is required that MODULES contains the following columns: "ID", specifying a unique identifier of a module, and "Title", containing the description of the module. The data frame "GENES" must contain the column "ID".
The list MODULES2GENES is a mapping between modules and genes. The names of the list must correspond to the ID column of the MODULES data frame. The members of the list are character vectors, and the values of these vectors must correspond to the ID column of the GENES data frame.
Examples
data(tmod)
fg <- tmod$MODULES2GENES[["LI.M127"]]
bg <- tmod$GENES$ID
result <- tmodHGtest( fg, bg )
#> Warning: No genes in bg match any of the genes in the GENES
#> Warning: No genes in fg match any of the genes in the GENES
## A more sophisticated example
## Gene set enrichment in TB patients compared to
## healthy controls (Egambia data set)
if (FALSE) {
data(Egambia)
library(limma)
design <- cbind(Intercept=rep(1, 30), TB=rep(c(0,1), each= 15))
fit <- eBayes( lmFit(Egambia[,-c(1:3)], design))
tt <- topTable(fit, coef=2, number=Inf, genelist=Egambia[,1:3] )
tmodUtest(tt$GENE_SYMBOL)
tmodCERNOtest(tt$GENE_SYMBOL)
}