Not another heatmap tutorial
Less of a tutorial, more notes for myself so I remember how to do this.
Making heatmaps in R sucks, the
gplots::heatmap.2() function is ok if you don’t mind spending 3 hours reading about
par() and trialling all possible combinations of margins and it has some strange defaults – when has anyone ever wanted a trace on their heatmap?
‘No one uses ggplot for heatmaps’
This was said at a recent R users meeting while demonstrating the various obscure alternative plotting libraries to produce an ugly heatmap, and it’s wrong (even if it’s just me). Though I can understand why you would think this. First off, there’s no
geom_heatmap() so that will throw some of the easily panicked users. Secondly, ggplot is pretty reliant on the idea of ‘tidy data’ in which rows are observations and columns are variables, which doesn’t play nicely with a heatmap which is essentially just an image of a matrix. Fortunately another of Hadley Wickham’s packages comes in use with
reshape2::melt() that can be used to transform a matrix into a tidy dataframe which can then be easily used with ggplot.
Example: Mouse protein expression dataset
To demonstrate I’m using a dataset which contains expression levels of 77 proteins measured from the cerebral cortex of mice [link]. The actual data is stratified into various treatment groups and control or 34 trisomic mice (Down syndrome).
The data is structured as you would expect: Row per mouse, column per protein with a normalised value indicating the expression level.
To get a heatmap of this data:
This produces a working heatmap that summarises the data and also sensibly handles missing values without any fuss. If we want to cluster the data we can use
hclust() the standard hierarchical clustering algorithm in R, and re-order the matrix manually before passing to
Unfortunately there isn’t an easy way to add dendrograms to these plots, though I would argue that most of the time they’re not much use.
And since we’re using ggplot, we have access to all the nice functionalities so we can subset our heatmap into multiples by some variable with a single line of code. It’s probably not the best idea to do this, but it’s always there.