Multiple Data Types

In this example we will use the Bioconductor package regioneR to create different sets of genomic regions and positions representing different data types: a small set of large regions, a number of point positions in the genome representing some kind of numeric value (expression, methylation…), a small number of positions with labels to be used as markers and a set of overlapping medium regions to be plotted as sequencing reads. In addition to regioneR and rakyoploteR we will use the zoo package to compute the rolling meand and sd of the data points.

Data Creation

We will use different calls to createRandomRegions from regioneR to create the random positions and regions in the genome and calls to runif and rnorm to create the data values:

library(karyoploteR)
library(regioneR)
library(zoo)

set.seed(1234)

#Parameters
  data.points.colors <- c("#FFBD07", "#00A6ED",  "#FF3366", "#8EA604", "#C200FB")

  num.data.points <- 3000
  num.big.regions.up <- 30
  num.big.regions.down <- 30

  num.mid.regions <- 6000
  
  num.marks <- 90

#Create the random fake data  

  #Big regions
  big.regs.up <- joinRegions(createRandomRegions(nregions = num.big.regions.up, length.mean = 20000000, length.sd = 10000000, non.overlapping = TRUE, mask=NA), min.dist = 1)
  big.regs.down <- joinRegions(createRandomRegions(nregions = num.big.regions.down, length.mean = 10000000, length.sd = 5000000, non.overlapping = TRUE, mask=big.regs.up), min.dist = 1)

  #Data points
  data.points <- createRandomRegions(nregions = num.data.points, length.mean = 1, length.sd = 0, non.overlapping = TRUE, mask=NA)
  mcols(data.points) <- data.frame(y=rnorm(n = num.data.points, 0.5, sd = 0.1))
  dp.colors <- sample(head(data.points.colors, 2), size = num.data.points, replace = TRUE)

  #and move the data points with the big regions
  data.points[overlapsAny(data.points, big.regs.up)]$y <- data.points[overlapsAny(data.points, big.regs.up)]$y + runif(n=numOverlaps(data.points, big.regs.up), min = 0.1, max=0.3)
  data.points[overlapsAny(data.points, big.regs.down)]$y <- data.points[overlapsAny(data.points, big.regs.down)]$y - runif(n=numOverlaps(data.points, big.regs.down), min = 0.1, max=0.3)

  #markers
  marks <- createRandomRegions(nregions = num.marks, length.mean = 1, length.sd = 0)
  mcols(marks) <- data.frame(labels=paste0("rs", floor(runif(num.marks, min = 10000, max=99999))))

  #medium regions
  mid.regs <- createRandomRegions(nregions = num.mid.regions, length.mean = 5000000, length.sd = 1000000, non.overlapping = FALSE)

Plotting

Once we have all the data available we can start plotting it. We will create a Karyoplot of the human genome with 2 data panels, one over and one below the ideogram. In order to better see the data we will create a karyoplot with only 3 chromosomes.

Data Panel 1

In the data panel 1, the top one, we will plot the big regions and the data points going from the ideogram r0=0 to 80% of the panel r1=0.8 and we’ll use the top 20% (r0=0.8, r1=1) to plot the markers.

So we’ll plot the big regions as rectangles using kpRect. We want them to span all the vertical space (y0=0, y1=1) in the first 80% of the panel (r0=0, r1=0.8) and to have no border (border=NA).

After that we will plot the data points with kpPoints in the exact same region as the rectangles. In addition, we will add an axis.

Once the data points are plotted, we will add the mean and sd of the data points. We’ll do that chromosome per chromosome since the rollmean and rollaply functions from zoo do not understand about chromosome separation. The mean will be plotted with kpLines and the sd around the mean with a call to kpPlotRibbon, that plots a variable width polygon from y0 to y1.

Finally, the markers will be plotted with a call to kpPlotMarkers, setting r1=1.1 so the tip the marker lines end slightly over the top of the big regions. With this function the marker labels will be moved as necessary to avoid label overlapping.

Data Panel 2

In the data panel 2 we’ll plot the medium regions in the 80% of the data panel furthest from the ideogram (r0=0.2, r1=1) and the coverage of these regions in the closest 20%.

To plot the regions we’ll use the kpPlotRegions function, that automatically piles up overlapping regions.

To plot the coverage we’ll user the kpPlotCoverage that given a set of regions computes the coverage and plots it as an area plot. By default the coverage would have had the base on the ideogram side, since all data panels are oriented from the ideogram out. Since we want the coverage to be inverted, we can simply flip the r0 and r1 values (r0=0.2, r1=0) and the plot will be flipped. This is a very useful trick to invert the plotting coordinates.

Finally, we add a horizontal bar depicting a fake 30x level with kpAbline and add a text label outside the data panel margins with a simple kpText.

  kp <- plotKaryotype(plot.type = 2, chromosomes = c("chr1", "chr2", "chr3"))

  ### Data Panel 1 ###

  #Big regions
  kpRect(kp, data = big.regs.up, y0=0, y1=1, col="#FFDDDD", border=NA, r0=0, r1=0.8)
  kpRect(kp, data = big.regs.down, y0=0, y1=1, col="#DDFFDD", border=NA, r0=0, r1=0.8)
  
  #Data points
  kpAxis(kp, ymin = 0, ymax = 1, r0=0, r1=0.8, numticks = 5, col="#666666", cex=0.5)
  kpPoints(kp, data=data.points, pch=16, cex=0.5, col=dp.colors, r0=0, r1=0.8)
  
  #Mean and sd of the data points.  
  for(chr in seqlevels(kp$genome)) {
    chr.dp <- sort(keepSeqlevels(x = data.points, value = chr, pruning.mode = "coarse"))
    rmean <- rollmean(chr.dp$y, k = 6, align = "center")  
    rsd <- rollapply(data = chr.dp$y, FUN=sd, width=6)
    kpLines(kp, chr = chr, x=start(chr.dp)[3:(length(chr.dp)-3)], y=rmean, col=data.points.colors[3], r0=0, r1=0.8)
    kpPlotRibbon(kp, chr=chr, data=chr.dp[3:(length(chr.dp)-3)], y0=rmean-rsd, y1=rmean+rsd, r0=0, r1=0.8, col="#FF336633", border=NA)
  }
    
  #Markers
  kpPlotMarkers(kp, data=marks, label.color = "#333333", r1=1.1, cex=0.5, label.margin = 5)

  ### Data Panel 2 ###
    
  #medium regions and their coverage
    
    kpPlotRegions(kp, data = mid.regs, r0 = 0.2, r1=1, border=NA, data.panel=2)
    kpPlotCoverage(kp, data=mid.regs, r0=0.2, r1=0, col=data.points.colors[2], data.panel = 2)
    kpPlotCoverage(kp, data=mid.regs, r0=0.2, r1=0.12, col=data.points.colors[1], data.panel = 2)
    
    kpText(kp, chr=seqlevels(kp$genome), y=0.4, x=0, data.panel = 2, r0=0.2, r1=0, col="#444444", label="30x", cex=0.8, pos=2)
    kpAbline(kp, h=0.4, data.panel = 2, r0=0.2, r1=0, col=data.points.colors[3])

plot of chunk Figure