Upset plot

Upset plots help to interpret the results of gene set enrichment.

Usage

upset(
  modules,
  mset = NULL,
  min.size = 2,
  min.overlap = 2,
  max.comb = 4,
  min.group = 2,
  value = "number",
  cutoff = NULL,
  labels = NULL,
  group.stat = "jaccard",
  group.cutoff = 0.1,
  group = TRUE,
  pal = brewer.pal(8, "Dark2"),
  lab.cex = 1
)

Arguments

modules: optional list of modules for which to make the test
mset: Which module set to use. Either a character vector ("LI", "DC" or "all", default: all) or an object of class tmod (see "Custom module definitions" below)
min.size: minimal number of modules in a comparison to show
min.overlap: smallest overlap (number of elements) between two modules to plot
max.comb: Maximum number of combinations to show (i.e., number of dots on every vertical segment in the upset plot)
min.group: Minimum number of modules in a group. Group with a smaller number of members will be ignored. Change this value to 1 to see also modules which could not be grouped.
value: what to show on the plot: "number" (number of common elements; default), "soerensen" (Sørensen–Dice coefficient), "overlap" (Szymkiewicz–Simpson coefficient) or "jaccard" (Jaccard index)
cutoff: Combinations with the `value` below cutoff will not be shown.
labels: Labels for the modules. Character vector with the same length as `modules`
group.stat: Statistics for finding groups (can be "number", "overlap", "soerensen" or "jaccard"; see function modOverlaps)
group.cutoff: cutoff for group statistics
group: Should the modules be grouped by the overlap?
pal: Color palette to show the groups.
lab.cex: Initial cex (font size) for labels

Value

upset returns invisibly the identified module groups: a list of character vectors.

Details

The plot consists of three parts. The main part shows the overlaps between the different modules (module can be a gene set, for example). Each row corresponds to one module. Each column corresponds to an intersection of one or more gene sets. Dots show which gene sets are in that combination. Which combinations are shown depends on the parameters `min.overlap` (which is the cutoff for the similarity measure specified by the `value` parameter), the parameter `min.group` which specifies the minimum number of modules in a group and the parameter `max.comb` which specifies the maximum number of combinations tested (too many combinations are messing the plot).

Above the intersections, you see a plot showing a similarity measure of the intersected gene sets. By default it is the number of module members (genes in case of a gene set), but several other measures (e.g. the Jaccard index) are also implemented.

To the left are the module descriptions (parameter `label`; if label is empty, the labels are taken from the mset object provided or, if that is NULL, from the default tmod module set). The function attempts to scale the text in such a way that all labels are visible.

By default, upset attempts to group the modules. This is done by defining a similarity measure (by default the Jaccard index, parameter `group.stat`) and a cutoff threshold (parameter `group.cutoff`).

Examples

if (FALSE) {
data(Egambia)
design <- cbind(Intercept=rep(1, 30), TB=rep(c(0,1), each= 15))
library(limma)
fit <- eBayes( lmFit(Egambia[,-c(1:3)], design))
tt <- topTable(fit, coef=2, number=Inf, genelist=Egambia[,1:3] )
res <- tmodCERNOtest(tt$GENE_SYMBOL)

upset(res$ID, group.cutoff=.1, value="jaccard")
}

Usage

Arguments

Value

Details

See also

Examples