Rainfall Plots
Rainfall plots show the distances between consecutive genomic features in a logarithmic scale, so regions with a concentration of such features can be identified as a drop in the distance between features. Rainfall plots were first introduced in Nik-Zainal et al., 2012 and are usually used to plot the distances between cancer somatic mutations and to identify regions of the genome with a higher than usual mutation rate.
For this example we’ll use WGS somatic mutation data for pancreas cancer from Alexandrov et al., 2012 that can be downloaded from the Sanger Institute FTP server at ftp://ftp.sanger.ac.uk/pub/cancer/AlexandrovEtAl/. There’s data for multiple other cancer types and data is in a tabular format.
somatic.mutations <- read.table(file="ftp://ftp.sanger.ac.uk/pub/cancer/AlexandrovEtAl/somatic_mutation_data/Pancreas/Pancreas_raw_mutations_data.txt", header=FALSE, sep="\t", stringsAsFactors=FALSE)
somatic.mutations <- setNames(somatic.mutations, c("sample", "mut.type", "chr", "start", "end", "ref", "alt", "origin"))
head(somatic.mutations)
## sample mut.type chr start end ref alt origin
## 1 8014741 indel 11 108150257 108150257 G - ORIGINAL-DATA
## 2 8014741 indel 6 47645613 47645618 GGCCAC - ORIGINAL-DATA
## 3 8014741 subs 11 102663435 102663435 C T ORIGINAL-DATA
## 4 8014741 subs 1 157095639 157095639 C T ORIGINAL-DATA
## 5 8014741 subs 12 126929530 126929530 T C ORIGINAL-DATA
## 6 8014741 subs 12 25398283 25398283 A C ORIGINAL-DATA
We’ll split the data by sample and work with the data from a single sample:
APGI_1992. Once we have the data for the single sample, we will transform it
to a genomic ranges object using
regioneR’s toGRanges
function
and convert it to use UCSC chromosome names so it’s compatible with the
standard genomes in Bioconductor.
library(regioneR)
somatic.mutations <- split(somatic.mutations, somatic.mutations$sample)
sm <- somatic.mutations[["APGI_1992"]]
sm.gr <- toGRanges(sm[,c("chr", "start", "end", "mut.type", "ref", "alt")])
seqlevelsStyle(sm.gr) <- "UCSC"
sm.gr
## GRanges object with 10170 ranges and 3 metadata columns:
## seqnames ranges strand | mut.type ref alt
## <Rle> <IRanges> <Rle> | <character> <character> <character>
## [1] chr10 100455463 * | subs T C
## [2] chr10 100463182 * | subs C T
## [3] chr10 100922694 * | subs C T
## [4] chr10 101137819 * | subs C A
## [5] chr10 101200589 * | subs G C
## ... ... ... ... . ... ... ...
## [10166] chrX 99576784 * | subs T C
## [10167] chrX 99762728 * | subs A G
## [10168] chrY 13269826 * | subs T C
## [10169] chrY 13417920 * | subs G C
## [10170] chrY 59020275 * | subs C A
## -------
## seqinfo: 24 sequences from an unspecified genome; no seqlengths
Once we have a GRanges with the variants (or any other genomic feature), we
can create the rainfall plot with the function kpPlotRainfall
. This function
needs a KaryoPlot
object and a GRanges. We’ll use the plot type number 4,
with all chromosomes in a single line, since rainfall plots are usually plotted
this way.
library(karyoploteR)
kp <- plotKaryotype(plot.type=4)
kpPlotRainfall(kp, data = sm.gr)
In addition to the distance between consecutive mutations we can plot the
different nucleotide substitutions in different colors. To assign a color
to each variant we’ll use the getVariantsColors
function, giving it the
reference and alternative base for each variant.
variant.colors <- getVariantsColors(sm.gr$ref, sm.gr$alt)
kp <- plotKaryotype(plot.type=4)
kpPlotRainfall(kp, data = sm.gr, col=variant.colors)
It is now quite similar the rainfall plots we can see in the original publication but we can further refine by changing the ideograms to a lighter version, adjust some of the margins and rotate the chromosome labels so they don’t overlap. We can also add a main title and a y axis.
pp <- getDefaultPlotParams(plot.type = 4)
pp$data1inmargin <- 0
pp$bottommargin <- 20
kp <- plotKaryotype(plot.type=4, ideogram.plotter = NULL,
labels.plotter = NULL, plot.params = pp)
kpAddCytobandsAsLine(kp)
kpAddChromosomeNames(kp, srt=45)
kpAddMainTitle(kp, main="Somatic Mutations - APGI_1992", cex=1.2)
kpAxis(kp, ymax = 7, tick.pos = 1:7)
kpPlotRainfall(kp, data = sm.gr, col=variant.colors)
kpAddLabels(kp, labels = c("Distance between mutations (log10)"), srt=90, pos=1, label.margin = 0.04)
With this, we have a nicely finished rainfall plot, but since we are working with karyoploteR we can easily add additional information combined with the rainfall plot. For example, we can add a density plot, as done in Domanska et al..
kp <- plotKaryotype(plot.type=4, ideogram.plotter = NULL,
labels.plotter = NULL, plot.params = pp)
kpAddCytobandsAsLine(kp)
kpAddChromosomeNames(kp, srt=45)
kpAddMainTitle(kp, main="Somatic Mutations - APGI_1992", cex=1.2)
kpPlotRainfall(kp, data = sm.gr, col=variant.colors, r0=0, r1=0.7)
kpAxis(kp, ymax = 7, tick.pos = 1:7, r0=0, r1=0.7)
kpAddLabels(kp, labels = c("Distance between mutations (log10)"), srt=90, pos=1, label.margin = 0.04, r0=0, r1=0.7)
kpPlotDensity(kp, data = sm.gr, r0=0.72, r1=1)
kpAddLabels(kp, labels = c("Density"), srt=90, pos=1, label.margin = 0.04, r0=0.71, r1=1)
Or show a single chromosome and add a few genes in the kataegis region. In this cas we’ll change the order in which the different parts are plotted, so the genes are above the density but below the rainfall plot.
kata.genes <- toGRanges(data.frame(chr=c("chr11"),
start=c(34460472,34642588),
end=c(34460561,34684834),
labels=c("CAT","EHF"),
stringsAsFactors = FALSE))
kp <- plotKaryotype(plot.type=4, plot.params = pp,
chromosomes="chr11")
kpAddMainTitle(kp, main="Somatic Mutations - APGI_1992 - Chromosome 11", cex=1.2)
kpAddBaseNumbers(kp)
kpPlotDensity(kp, data = sm.gr, window.size = 10e5, r0=0.62, r1=0.8)
kpAddLabels(kp, labels = c("Density"), srt=90, pos=1, label.margin = 0.04, r0=0.62, r1=0.8)
kpPlotMarkers(kp, data = kata.genes, text.orientation = "horizontal", r1=1.1, line.color = "#AAAAAA")
kpPlotRainfall(kp, data = sm.gr, col=variant.colors, r0=0, r1=0.6)
kpAxis(kp, ymax = 7, tick.pos = 1:7, r0=0, r1=0.6)
kpAddLabels(kp, labels = c("Distance between mutations (log10)"), srt=90, pos=1, label.margin = 0.04, r0=0, r1=0.6)