Upset plot
upset.Rd
Upset plots help to interpret the results of gene set enrichment.
Usage
upset(
modules,
mset = NULL,
min.size = 2,
min.overlap = 2,
max.comb = 4,
min.group = 2,
value = "number",
cutoff = NULL,
labels = NULL,
group.stat = "jaccard",
group.cutoff = 0.1,
group = TRUE,
pal = brewer.pal(8, "Dark2"),
lab.cex = 1
)
Arguments
- modules
optional list of modules for which to make the test
- mset
Which module set to use. Either a character vector ("LI", "DC" or "all", default: all) or an object of class tmod (see "Custom module definitions" below)
- min.size
minimal number of modules in a comparison to show
- min.overlap
smallest overlap (number of elements) between two modules to plot
- max.comb
Maximum number of combinations to show (i.e., number of dots on every vertical segment in the upset plot)
- min.group
Minimum number of modules in a group. Group with a smaller number of members will be ignored. Change this value to 1 to see also modules which could not be grouped.
- value
what to show on the plot: "number" (number of common elements; default), "soerensen" (Sørensen–Dice coefficient), "overlap" (Szymkiewicz–Simpson coefficient) or "jaccard" (Jaccard index)
- cutoff
Combinations with the `value` below cutoff will not be shown.
- labels
Labels for the modules. Character vector with the same length as `modules`
- group.stat
Statistics for finding groups (can be "number", "overlap", "soerensen" or "jaccard"; see function modOverlaps)
- group.cutoff
cutoff for group statistics
- group
Should the modules be grouped by the overlap?
- pal
Color palette to show the groups.
- lab.cex
Initial cex (font size) for labels
Details
The plot consists of three parts. The main part shows the overlaps between the different modules (module can be a gene set, for example). Each row corresponds to one module. Each column corresponds to an intersection of one or more gene sets. Dots show which gene sets are in that combination. Which combinations are shown depends on the parameters `min.overlap` (which is the cutoff for the similarity measure specified by the `value` parameter), the parameter `min.group` which specifies the minimum number of modules in a group and the parameter `max.comb` which specifies the maximum number of combinations tested (too many combinations are messing the plot).
Above the intersections, you see a plot showing a similarity measure of the intersected gene sets. By default it is the number of module members (genes in case of a gene set), but several other measures (e.g. the Jaccard index) are also implemented.
To the left are the module descriptions (parameter `label`; if label is empty, the labels are taken from the mset object provided or, if that is NULL, from the default tmod module set). The function attempts to scale the text in such a way that all labels are visible.
By default, upset attempts to group the modules. This is done by defining a similarity measure (by default the Jaccard index, parameter `group.stat`) and a cutoff threshold (parameter `group.cutoff`).
Examples
if (FALSE) {
data(Egambia)
design <- cbind(Intercept=rep(1, 30), TB=rep(c(0,1), each= 15))
library(limma)
fit <- eBayes( lmFit(Egambia[,-c(1:3)], design))
tt <- topTable(fit, coef=2, number=Inf, genelist=Egambia[,1:3] )
res <- tmodCERNOtest(tt$GENE_SYMBOL)
upset(res$ID, group.cutoff=.1, value="jaccard")
}