Swarchal .

Not another heatmap tutorial

Less of a tutorial, more notes for myself so I remember how to do this.

Making heatmaps in R sucks, the gplots::heatmap.2() function is ok if you don’t mind spending 3 hours reading about par() and trialling all possible combinations of margins and it has some strange defaults – when has anyone ever wanted a trace on their heatmap?

‘No one uses ggplot for heatmaps’

This was said at a recent R users meeting while demonstrating the various obscure alternative plotting libraries to produce an ugly heatmap, and it’s wrong (even if it’s just me). Though I can understand why you would think this. First off, there’s no geom_heatmap() so that will throw some of the easily panicked users. Secondly, ggplot is pretty reliant on the idea of ‘tidy data’ in which rows are observations and columns are variables, which doesn’t play nicely with a heatmap which is essentially just an image of a matrix. Fortunately another of Hadley Wickham’s packages comes in use with reshape2::melt() that can be used to transform a matrix into a tidy dataframe which can then be easily used with ggplot.

Example: Mouse protein expression dataset

To demonstrate I’m using a dataset which contains expression levels of 77 proteins measured from the cerebral cortex of mice [link]. The actual data is stratified into various treatment groups and control or 34 trisomic mice (Down syndrome).

The data is structured as you would expect: Row per mouse, column per protein with a normalised value indicating the expression level.

MouseID Protein1 Protein2 Treatment Trisomy
micky1 0.3 0.7 Saline 1
pinky2 0.2 0.8 Memantine 0

To get a heatmap of this data:

ggplot heatmap

This produces a working heatmap that summarises the data and also sensibly handles missing values without any fuss. If we want to cluster the data we can use hclust() the standard hierarchical clustering algorithm in R, and re-order the matrix manually before passing to melt()

heatmap clustered

Unfortunately there isn’t an easy way to add dendrograms to these plots, though I would argue that most of the time they’re not much use.

And since we’re using ggplot, we have access to all the nice functionalities so we can subset our heatmap into multiples by some variable with a single line of code. It’s probably not the best idea to do this, but it’s always there.

heatmap multiples