Axes (ggplot2)

Table of contents

Problem

You want to change the order or direction of the axes.

Solution

Note: In the examples below, where it says something like scale_y_continuous, scale_x_continuous, or ylim, the y can be replaced with x if you want to operate on the other axis.

This is the basic boxplot that we will work with, using the built-in PlantGrowth data set.

library(ggplot2)

bp <- ggplot(PlantGrowth, aes(x=group, y=weight)) + geom_boxplot()
bp

Swapping X and Y axes

Swap x and y axes (make x vertical, y horizontal):

bp + coord_flip()

Discrete axis

Changing the order of items

# Manually set the order of a discrete-valued axis
bp + scale_x_discrete(limits=c("trt1","trt2","ctrl"))

# Reverse the order of a discrete-valued axis
# Get the levels of the factor
flevels <- levels(PlantGrowth$group)
# "ctrl" "trt1" "trt2"
# Reverse the order
flevels <- rev(flevels)
# "trt2" "trt1" "ctrl"
bp + scale_x_discrete(limits=flevels)

# Or it can be done in one line:
bp + scale_x_discrete(limits = rev(levels(PlantGrowth$group)) )

Setting tick mark labels

For discrete variables, the tick mark labels are taken directly from levels of the factor. However, sometimes the factor levels have short names that aren't suitable for presentation.

bp + scale_x_discrete(breaks=c("ctrl", "trt1", "trt2"), labels=c("Control", "Treat 1", "Treat 2"))

# Hide x tick marks, labels, and grid lines
bp + scale_x_discrete(breaks=NA)

# Hide all tick marks and labels (on X axis), but keep the gridlines
bp + opts(axis.ticks = theme_blank(), axis.text.x = theme_blank())

If you want to remove the grid lines but keep the tick marks and labels, it requires a bit of a hack.

Continuous axis

Setting range and reversing direction of an axis

Note that if any scale_y_continuous command is used, it overrides any ylim command, and the ylim will be ignored.

# Set the range of a continuous-valued axis
# These are equivalent
bp + ylim(0,8)
bp + scale_y_continuous(limits=c(0,8))

If the y range is reduced using the method above, the data outside the range is ignored. This might be OK for a scatterplot, but it can be problematic for the box plots used here. For bar graphs, if the range does not include 0, the bars will not show at all!

To avoid this problem, you can use coord_cartesian instead. Instead of setting the limits of the data, it sets the viewing area of the data. This avoids one problem, but there is another problem: the tick markers may not be appropriate for the new range. This issue can be fixed by specifying them with scale_y_continuous.

# These two do the same thing; all data points outside the graphing range are dropped,
# resulting in a misleading box plot
bp + ylim(5,7.5)
bp + scale_y_continuous(limits=c(5,7.5))

# Using coor_cartesian "zooms" into the area but the tick marks might not
# be right because they are set for the "natural" window
bp + coord_cartesian(ylim=c(5,7.5))

# Specify tick marks directly
bp + coord_cartesian(ylim=c(5,7.5)) + 
    scale_y_continuous(breaks=seq(0, 10, 0.5))  # Ticks from 0-10, every .5

Reversing the direction of an axis

# Reverse order of a continuous-valued axis
bp + ylim(max(PlantGrowth$weight),min(PlantGrowth$weight))

Setting and hiding tick markers

# Setting the tick marks on an axis
# This will show tick marks on every 0.25 from 1 to 10
# The scale will show only the ones that are within range (3.50-6.25 in this case)
bp + scale_y_continuous(breaks=seq(1,10,1/4))

# The breaks can be different sizes
bp + scale_y_continuous(breaks=c(4, 4.25, 4.5, 5, 6,8))

# Suppress tick marks
bp + scale_y_continuous(breaks=NA)

# Hide tick marks and labels (on Y axis), but keep the gridlines
bp + opts(axis.ticks = theme_blank(), axis.text.y = theme_blank())

Scaling axes

By default, the axes are linearly scaled. It is possible to scale the axes with log, power, roots, and so on.

# Create some noisy exponentially-distributed data
set.seed(204)
n <- 100
df <- data.frame(xval = 2^((1:n+rnorm(n,sd=5))/20), yval = 2*2^((1:n+rnorm(n,sd=5))/20))

# A scatterplot with regular (linear) axis scaling
sp <- ggplot(df, aes(xval, yval)) + geom_point()
sp

# log2 scaling of the axes, with visually-equal spacing
sp + scale_x_log2() + scale_y_log2()

# log2 transformation of axes, with visually-diminishing spacing
sp + coord_trans(x="log2", y="log2")

Other scaling methods include the following. These can be used with coord_trans by using just the last portion of the name:

scale_x_asn scale_x_log10 scale_x_prob
scale_x_atanh scale_x_log2 scale_x_probit
scale_x_exp scale_x_logit scale_x_reverse
scale_x_inverse scale_x_pow scale_x_sqrt
scale_x_log scale_x_pow10

It is also possible to force the scaling of the axes to be equal, with one visual unit being representing the same numeric unit on both axes. When the scales are equal, it is also possible to control the relative length of the axes; for example, so that one axis is 3 times longer than the other.

# Force equal scaling
sp + coord_equal()

# Equal scaling, with y axis 3 times longer than x axis
sp + coord_equal(ratio=3)

Axis labels and text formatting

To set and hide the axis labels:

bp + opts(axis.title.x = theme_blank()) +   # Remove x-axis label
     ylab("Weight (Kg)")                    # Set y-axis label

To change the fonts, and rotate tick mark labels:

# Change font options:
# X-axis label: bold, red, and 20 points
# X-axis tick marks: rotate 90 degrees CCW, move to the left a bit (hjust), and 16 points
bp + opts(axis.title.x = theme_text(face="bold", colour="#990000", size=20),
          axis.text.x  = theme_text(angle=90, hjust=1.2, size=16))

Tick mark label text formatters

You may want to display your values as percents, or dollars, or in scientific notation. To do this you can use a formatter:

# Label formatters
bp + scale_y_continuous(formatter="percent") +
     scale_x_discrete(formatter="abbreviate")  # In this particular case, it has no effect

(Note: In the upcoming version of ggplot2, 0.9.0, the parameter is labels instead of formatter.)

Other useful formatters for continuous scales include percent and scientific. For discrete scales, abbreviate will remove vowels and spaces and shorten to four characters.

Sometimes you may need to create your own formatting function. This one will display numeric minutes in HH:MM:SS format.

# Self-defined formatting function for times.
timeHMS_formatter <- function(x) {
    h <- floor(x/60)
    m <- floor(x %% 60)
    s <- round(60*(x %% 1))                   # Round to nearest second
    lab <- sprintf('%02d:%02d:%02d', h, m, s) # Format the strings as HH:MM:SS
    lab <- gsub('^00:', '', lab)              # Remove leading 00: if present
    lab <- gsub('^0', '', lab)                # Remove leading 0 if present
}

bp + scale_y_continuous(formatter=timeHMS_formatter)

Hiding gridlines

To hide all gridlines, both vertical and horizontal:

# Hide all the gridlines
bp + opts(panel.grid.minor=theme_blank(), panel.grid.major=theme_blank())

# Hide just the minor gridlines
bp + opts(panel.grid.minor=theme_blank())

Hiding only horizontal or vertical gridlines

Hiding just the vertical or horizontal gridlines with ggplot2 requires a bit of a hack. An internal function called guide_grid must be redefined to not draw the vertical or horizontal grid lines.

# Save the original definition of the guide_grid
guide_grid_orig <- guide_grid

# Create the replacement function
guide_grid_no_vline <- function(theme, x.minor, x.major, y.minor, y.major) {
  ggname("grill", grobTree(
    theme_render(theme, "panel.background"),
    theme_render(
      theme, "panel.grid.minor", name = "y",
      x = rep(0:1, length(y.minor)), y = rep(y.minor, each=2),
      id.lengths = rep(2, length(y.minor))
    ),
    theme_render(
      theme, "panel.grid.major", name = "y",
      x = rep(0:1, length(y.major)), y = rep(y.major, each=2),
      id.lengths = rep(2, length(y.major))
    )
  ))
}

# Assign the function inside ggplot2
assignInNamespace("guide_grid", guide_grid_no_vline, pos="package:ggplot2")

# Draw the graph
bp

# Restore the original guide_grid function so that it will draw all gridlines again
assignInNamespace("guide_grid", guide_grid_orig, pos="package:ggplot2")

This will hide vertical grid lines -- even if the x and y axes have been swapped with coord_flip(), it will still hide the vertical lines.

Unfortunately, the replacement guide_grid does not get stored as part of the definition of the plot -- the call to assignInNamespace() must be done just before outputting the plot. This can be annoying if you define the plots in one place and output them in another.

This will hide horizontal grid lines:

# Save the original definition of guide_grid
guide_grid_orig <- ggplot:::guide_grid

# Create the replacement function
guide_grid_no_hline <- function(theme, x.minor, x.major, y.minor, y.major) {
  ggname("grill", grobTree(
    theme_render(theme, "panel.background"),
    theme_render(
      theme, "panel.grid.minor", name = "x",
      x = rep(x.minor, each=2), y = rep(0:1, length(x.minor)),
      id.lengths = rep(2, length(x.minor))
    ),
    theme_render(
      theme, "panel.grid.major", name = "x",
      x = rep(x.major, each=2), y = rep(0:1, length(x.major)),
      id.lengths = rep(2, length(x.major))
    )
  ))
}

# Assign the function inside ggplot2
assignInNamespace("guide_grid", guide_grid_no_hline, pos="package:ggplot2")

# Draw the graph
bp

# Restore the original guide_grid function so that it will draw all gridlines again
assignInNamespace("guide_grid", guide_grid_orig, pos="package:ggplot2")