Are you an experienced user wanting to fully master the grammar of ggplot? Start with my rstudio::conf(2022) talk on {ggtrace}!
Are you an aspiring developer wanting to extend ggplot? Start with my useR! 2022 talk on {ggtrace}!
There’s a lot that can be said about ggplot internals, and what you
can get out of using {ggtrace}
depends on how comfortable
you are with ggplot and ggplot internals. Whether you’re here because
you want to become a better ggplot user or because you’re an aspiring
extension developer, {ggtrace}
has you covered for all
stages of your ggplot journey!
Traditionally, the ggplot community has been thought to split between
users (people who use ggplot to make plots) and
developers (people who write extension packages and
contribute to {ggplot2}
). I believe that this binary
distinction is outdated for many reasons, a recent one being that the
capacity of the user is ever-expanding and encroaching on the
“internals” territory. While the distinction between user-facing
code and ggplot internals is clear, that doesn’t map
neatly onto the user-developer dichotomy.
Here is one attempt at trying to address that issue. The following
outlines the five “stages” of the ggplot2 journey, and where
{ggtrace}
fits in.
You are someone who is comfortable with using ggplot but have not heard about ggplot internals before. If you are wondering why you’d even bother learning about ggplot internals, see part 1 of my blog post series on delayed aesthetic evaluation, a case study of a set of somewhat niche {ggplot2} functions that lie at the intersection of user-facing code and ggplot internals. You might also want to start with another blog post of mine on stat_*() layers, for extra scaffolding. These will give you a practical background on ggplot internals, just to get started. Hopefully this will also get you excited about the internals too.
Learning objectives:
Every layer has a stat
and a geom
. The
stat_*()
and geom_*()
layer functions are two
sides of the same coin.
The job of users ends with the ggplot code we write. The job of the internals is to spell out the assumptions behind the concise code that we write as users to make the figure.
Each layer has an underlying dataframe representation that contains only the kind of information that’s relevant for drawing that layer.
We can use {ggtrace}
to reference internals
snapshots of a layer’s data for declaring more complex aesthetic
mappings and using unconventional stat-geom pairs in a layer.
You are someone who recognizes the division of labor between
the user and the internals, and feel empowered by this
knowledge. To continue to get a better sense of what the
internals does, I recommend watching my rstudio::conf(2022)
talk on {ggtrace} and ggplot internals. If that felt a little too
fast/dense, you can watch a slower
(hour-long), broken down version of me covering the same content.
This is where I showcase the inspect family of workflow
functions in {ggtrace}
and it is also where you will be
introduced to ggproto
.
The talks cover the content from Chapter 20.2
of the ggplot2 book, but with emphasis on practicality for the user.
I’d actually recommend reading that entire book chapter anyways (or read
it with a recording of me going over Part
1 and Part
2 of the chapter), as it covers many fundamental concepts that take
some time to digest. As you read through it, I also highly recommend
referencing Emi Tanaka’s
awesome slides
on ggplot2 internals and Bob
Rudis’s short
chapter on demystifying ggplot2 as companion guides.
Learning objectives:
A ggplot object is not the ggplot figure
itself. A ggplot object merely contains the instructions for plotting
the figure. The figure is what you get as a result of executing those
instructions in the internals, by
print()
/plot()
-ing the ggplot object.
The the process of making a figure in the internals happens in
steps, by first making each layer’s data drawing-ready
(ggplot_build()
) and then drawing the plot
(ggplot_gtable()
).
The internals are implemented in the ggproto object oriented
system which is difficult to grok, but a lot of it is just data
wrangling. We can get pretty far in understanding the internals by using
{ggtrace}
to just focus on the part of the internals that
takes the user-supplied data and turns it into a “drawing-ready”
data.
You are someone who is aware of the existence of ggproto, and you are interested in knowing more about the implementational details of the internals. You can gear up for a deep dive into the internals by first watching Thomas Lin Pedersen’s rstudio::conf(2020) talk on extending ggplot, which gives a nice overview of what kind of ggproto objects and methods exist and what the most relevant ones are / what they do. In case you want a more comprehensive documentation on ggproto (though not necessary to read them at this stage), check out the package vignette on ggproto and Brodie Gaslam’s even more comprehensive unofficial reference for ggplot internals.
You can follow up on Thomas’s talk by watching my useR!
2022 talk on {ggtrace} and ggplot internals, which showcases all
workflows in {ggtrace}
(inspect, capture,
highjack) for interacting with ggproto methods. The part
2 of my blog post series on delayed aesthetic evaluation picks up
where I left off in that talk to expand on the possible extension points
to different Stat
methods.
That leads nicely into the case study chapter of the
ggplot2 book. It’s a huge chapter, so if you want some diversity you
can also reference Emi’s slides
on writing ggplot2 extensions ggplot and read the package
vignette on extending ggplot side-by-side, which touches on the same
topics but with different examples. In the process, you’ll inevitably
encounter {grid}
, which is itself a scary beast. There are
a lot of resources on {grid}
, most notably the R
Graphics (3rd edition) book, but there are also some resources
written with ggplot in mind, like yet another one of Emi’s slides
on {grid}, and functions for interacting with ggplot’s
{gtable}
graphical objects from {lemon}
and {gridExtra}
.
Learning objectives:
ggproto methods are called step-by-step in the internals to the execute instructions for plotting.
The ggproto objects Stat
and Geom
do a
lof of the work, and offer powerful extension points.
ggprotos are mostly stateless and ggproto methods are essentially
functions, though they defy common expectations about how a function
should look like and behave. {ggtrace}
allows you to
interact with these ggproto methods as if they are stand-alone
functions, so you can learn their behavior through trial and
error.
You understand the role that ggproto objects and methods play in the internals and you are excited about writing your own extensions. At this point you are now a developer - your training wheels are off and you’re in the territory of figuring things out for yourself.
Being a developer requires a new skill - debugging. A few people have written on the topic of debugging ggplot internals, including Hiroaki Yutani’s blog post on using browser() for debugging ggproto methods and Dewey Dunnington’s {ggdebug} package which gives you freakishly powerful control over the internals. People have also written packages for less “intrusive” ways of debugging and interacting with the internals, including {gginnards} by Pedro J. Aphalo and {gggrid} by Paul Murrell.
Standing on the shoulders of these giants, {ggtrace}
aims to offer the best of both worlds for developers, with high-level
workflow functions in the form of
ggtrace_{action}_{value}()
, the low-level functions
ggtrace()
and with_ggtrace()
, and the
interactive debugging functions ggedit()
and
ggdebugonce()
. There’s not a whole lot of new stuff that
ggtrace offers in this space (the package isn’t even that much
code) but it embodies a transformative reframing of ggplot internals as
functional programming, the kind that we’re familiar with as R users. By
treating ggproto methods like functions, we can leverage our existing
debugging skills for understanding the extending ggplot internals.
Learning objectives:
Only a small subset of ggprotos are exported by
{ggplot2}
and available for subclassing, and only a handful
of ggproto methods are productive extension points. A big part of
developing an extension is locating the appropriate extension
point.
{ggtrace}
allows developers to work backwards from
getting the desired output to work first, then identifying the
implementational details that need to be changed to produce that output.
In other words, you can find a hack that works first through trial and
error, then develop on a principled way of implementing the solution by
following best practices for extending ggplot.
In the process of writing new ggproto objects and ggproto
methods, developers need to debug frequently. {ggtrace}
functions implement different strategies of debugging, spanning the
spectrum of interactive to programmatic.
Lastly, {ggplot2}
is always evolving, and you can take
part in the process! It might not be obvious as users, but the
internals are undergoing constant change. So at some point you
might also want to start keeping an eye on the Github issues.
It’s also a nice place to be because you can eavesdrop on the thoughts
and insights from the core developers (e.g., should users be
able to specify pieces of a scale_*()
modularly?, how should the
guides system be converted to ggproto?, should layers
get “state”?), which can help you understand the motivation behind
how and why things in the internals are designed that way. If reading
those discussions inspire new ideas or strong feelings, you can submit
an issue or PR to make your voice heard.