Skip to contents

Upset plots help to interpret the results of gene set enrichment.

Usage

upset(
  modules,
  mset = NULL,
  min.size = 2,
  min.overlap = 2,
  max.comb = 4,
  min.group = 2,
  value = "number",
  cutoff = NULL,
  labels = NULL,
  group.stat = "jaccard",
  group.cutoff = 0.1,
  group = TRUE,
  pal = brewer.pal(8, "Dark2"),
  lab.cex = 1
)

Arguments

modules

optional list of modules for which to make the test

mset

Which module set to use. Either a character vector ("LI", "DC" or "all", default: all) or an object of class tmod (see "Custom module definitions" below)

min.size

minimal number of modules in a comparison to show

min.overlap

smallest overlap (number of elements) between two modules to plot

max.comb

Maximum number of combinations to show (i.e., number of dots on every vertical segment in the upset plot)

min.group

Minimum number of modules in a group. Group with a smaller number of members will be ignored. Change this value to 1 to see also modules which could not be grouped.

value

what to show on the plot: "number" (number of common elements; default), "soerensen" (Sørensen–Dice coefficient), "overlap" (Szymkiewicz–Simpson coefficient) or "jaccard" (Jaccard index)

cutoff

Combinations with the `value` below cutoff will not be shown.

labels

Labels for the modules. Character vector with the same length as `modules`

group.stat

Statistics for finding groups (can be "number", "overlap", "soerensen" or "jaccard"; see function modOverlaps)

group.cutoff

cutoff for group statistics

group

Should the modules be grouped by the overlap?

pal

Color palette to show the groups.

lab.cex

Initial cex (font size) for labels

Value

upset returns invisibly the identified module groups: a list of character vectors.

Details

The plot consists of three parts. The main part shows the overlaps between the different modules (module can be a gene set, for example). Each row corresponds to one module. Each column corresponds to an intersection of one or more gene sets. Dots show which gene sets are in that combination. Which combinations are shown depends on the parameters `min.overlap` (which is the cutoff for the similarity measure specified by the `value` parameter), the parameter `min.group` which specifies the minimum number of modules in a group and the parameter `max.comb` which specifies the maximum number of combinations tested (too many combinations are messing the plot).

Above the intersections, you see a plot showing a similarity measure of the intersected gene sets. By default it is the number of module members (genes in case of a gene set), but several other measures (e.g. the Jaccard index) are also implemented.

To the left are the module descriptions (parameter `label`; if label is empty, the labels are taken from the mset object provided or, if that is NULL, from the default tmod module set). The function attempts to scale the text in such a way that all labels are visible.

By default, upset attempts to group the modules. This is done by defining a similarity measure (by default the Jaccard index, parameter `group.stat`) and a cutoff threshold (parameter `group.cutoff`).

See also

[modGroups()], [modOverlaps()]

Examples

if (FALSE) {
data(Egambia)
design <- cbind(Intercept=rep(1, 30), TB=rep(c(0,1), each= 15))
library(limma)
fit <- eBayes( lmFit(Egambia[,-c(1:3)], design))
tt <- topTable(fit, coef=2, number=Inf, genelist=Egambia[,1:3] )
res <- tmodCERNOtest(tt$GENE_SYMBOL)

upset(res$ID, group.cutoff=.1, value="jaccard")
}