--- title: "Getting Started" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Getting Started} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ## Short version Are you an **experienced user** wanting to fully master the grammar of ggplot? Start with my [rstudio::conf(2022) talk on {ggtrace}](https://www.rstudio.com/conference/2022/talks/cracking-open-ggplot-internals-ggtrace/)! Are you an **aspiring developer** wanting to extend ggplot? Start with my [useR! 2022 talk on {ggtrace}](https://www.youtube.com/watch?v=2JX8zu4QxMg&t=2959s)! ## Long version There's a lot that can be said about ggplot internals, and what you can get out of using `{ggtrace}` depends on how comfortable you are with ggplot and ggplot internals. Whether you're here because you want to become a better ggplot user or because you're an aspiring extension developer, `{ggtrace}` has you covered for all stages of your ggplot journey! Traditionally, the ggplot community has been thought to split between **users** (people who use ggplot to make plots) and **developers** (people who write extension packages and contribute to `{ggplot2}`). I believe that this binary distinction is outdated for many reasons, a recent one being that the capacity of the user is ever-expanding and encroaching on the "internals" territory. While the distinction between _user-facing code_ and _ggplot internals_ is clear, that doesn't map neatly onto the user-developer dichotomy. Here is one attempt at trying to address that issue. The following outlines the five "stages" of the ggplot2 journey, and where `{ggtrace}` fits in. ### Stage 1 (experienced user) **You are someone who is comfortable with using ggplot but have not heard about ggplot internals before.** If you are wondering why you'd even bother learning about ggplot internals, see [part 1 of my blog post series on delayed aesthetic evaluation](https://yjunechoe.github.io/posts/2022-03-10-ggplot2-delayed-aes-1/), a case study of [a set of somewhat niche {ggplot2} functions](https://ggplot2.tidyverse.org/reference/aes_eval.html) that lie at the intersection of user-facing code and ggplot internals. You might also want to start with another blog post of mine on [stat_*() layers](https://yjunechoe.github.io/posts/2020-09-26-demystifying-stat-layers-ggplot2/), for extra scaffolding. These will give you a _practical_ background on ggplot internals, just to get started. Hopefully this will also get you excited about the internals too. **Learning objectives:** - Every layer has a `stat` and a `geom`. The `stat_*()` and `geom_*()` layer functions are two sides of the same coin. - The job of _users_ ends with the ggplot code we write. The job of the _internals_ is to spell out the assumptions behind the concise code that we write as users to make the figure. - Each layer has an underlying dataframe representation that contains only the kind of information that's relevant for drawing that layer. - We can use `{ggtrace}` to reference internals snapshots of a layer's data for declaring more complex aesthetic mappings and using unconventional stat-geom pairs in a layer. ### Stage 2 (curious user) **You are someone who recognizes the division of labor between the user and the internals, and feel empowered by this knowledge.** To continue to get a better sense of what the internals does, I recommend watching my [rstudio::conf(2022) talk on {ggtrace} and ggplot internals](https://www.rstudio.com/conference/2022/talks/cracking-open-ggplot-internals-ggtrace/). If that felt a little too fast/dense, you can watch a [slower (hour-long), broken down version](https://www.youtube.com/watch?v=NhMI-GvppMo&list=PL3x6DOfs2NGhUxGvu46tXQtBl0qyTcDet&index=28) of me covering the same content. This is where I showcase the _inspect_ family of workflow functions in `{ggtrace}` and it is also where you will be introduced to [`ggproto`](https://ggplot2-book.org/internals.html#introducing-ggproto). The talks cover the content from [Chapter 20.2 of the ggplot2 book](https://ggplot2-book.org/internals.html#ggplotbuild), but with emphasis on practicality for the user. I'd actually recommend reading that entire book chapter anyways (or read it with a recording of me going over [Part 1](https://www.youtube.com/watch?v=__yndiht_gw&list=PL3x6DOfs2NGhUxGvu46tXQtBl0qyTcDet&index=24) and [Part 2](https://www.youtube.com/watch?v=JFBROPy_BeU&list=PL3x6DOfs2NGhUxGvu46tXQtBl0qyTcDet&index=26) of the chapter), as it covers many fundamental concepts that take some time to digest. As you read through it, I also highly recommend referencing [Emi Tanaka](https://twitter.com/statsgen)'s awesome [slides on ggplot2 internals](https://emitanaka.org/datavis-adv-workshop/slides/day1-session2.html) and [Bob Rudis](https://twitter.com/hrbrmstr)'s [short chapter on demystifying ggplot2](https://rud.is/books/creating-ggplot2-extensions/demystifying-ggplot2.html) as companion guides. **Learning objectives:** - A ggplot _object_ is not the ggplot _figure_ itself. A ggplot object merely contains the instructions for plotting the figure. The figure is what you get as a result of executing those instructions in the internals, by `print()`/`plot()`-ing the ggplot object. - The the process of making a figure in the internals happens in steps, by first making each layer's data drawing-ready (`ggplot_build()`) and then drawing the plot (`ggplot_gtable()`). - The internals are implemented in the ggproto object oriented system which is difficult to grok, but a lot of it is just data wrangling. We can get pretty far in understanding the internals by using `{ggtrace}` to just focus on the part of the internals that takes the user-supplied data and turns it into a "drawing-ready" data. ### Stage 3 (aspiring developer) **You are someone who is aware of the existence of ggproto, and you are interested in knowing more about the implementational details of the internals.** You can gear up for a deep dive into the internals by first watching [Thomas Lin Pedersen](https://twitter.com/thomasp85)'s [rstudio::conf(2020) talk on extending ggplot](https://www.rstudio.com/resources/rstudioconf-2020/extending-your-ability-to-extend-ggplot2/), which gives a nice overview of what kind of ggproto objects and methods exist and what the most relevant ones are / what they do. In case you want a more comprehensive documentation on ggproto (though not necessary to read them at this stage), check out the [package vignette on ggproto](https://ggplot2.tidyverse.org/reference/ggplot2-ggproto.html) and [Brodie Gaslam](https://twitter.com/BrodieGaslam)'s even more comprehensive [unofficial reference for ggplot internals](https://htmlpreview.github.io/?https://raw.githubusercontent.com/brodieG/ggbg/development/inst/doc/extensions.html#ggproto-method). You can follow up on Thomas's talk by watching my [useR! 2022 talk on {ggtrace} and ggplot internals](https://www.youtube.com/watch?v=2JX8zu4QxMg&t=2959s), which showcases all workflows in `{ggtrace}` (_inspect_, _capture_, _highjack_) for interacting with ggproto methods. The [part 2 of my blog post series on delayed aesthetic evaluation](https://yjunechoe.github.io/posts/2022-07-06-ggplot2-delayed-aes-2/) picks up where I left off in that talk to expand on the possible extension points to different `Stat` methods. That leads nicely into the [case study chapter of the ggplot2 book](https://ggplot2-book.org/spring1.html). It's a huge chapter, so if you want some diversity you can also reference Emi's [slides on writing ggplot2 extensions ggplot](https://emitanaka.org/datavis-adv-workshop/slides/day1-session3.html) and read the [package vignette on extending ggplot](https://ggplot2.tidyverse.org/articles/extending-ggplot2.html#creating-your-own-theme) side-by-side, which touches on the same topics but with different examples. In the process, you'll inevitably encounter `{grid}`, which is itself a scary beast. There are a lot of resources on `{grid}`, most notably the [R Graphics (3rd edition)](https://www.routledge.com/R-Graphics-Third-Edition/Murrell/p/book/9780367780692) book, but there are also some resources written with ggplot in mind, like yet another one of Emi's [slides on {grid}](https://emitanaka.org/datavis-adv-workshop/slides/day1-session1.html), and functions for interacting with ggplot's `{gtable}` graphical objects from [`{lemon}`](https://cran.r-project.org/web/packages/lemon/vignettes/gtable_show_lemonade.html) and [`{gridExtra}`](https://cran.r-project.org/web/packages/gridExtra/vignettes/gtable.html). **Learning objectives:** - ggproto methods are called step-by-step in the internals to the execute instructions for plotting. - The ggproto objects `Stat` and `Geom` do a lof of the work, and offer powerful extension points. - ggprotos are mostly stateless and ggproto methods are essentially functions, though they defy common expectations about how a function should look like and behave. `{ggtrace}` allows you to interact with these ggproto methods as if they are stand-alone functions, so you can learn their behavior through trial and error. ### Stage 4 (developer) **You understand the role that ggproto objects and methods play in the internals and you are excited about writing your own extensions.** At this point you are now a _developer_ - your training wheels are off and you're in the territory of figuring things out for yourself. Being a developer requires a new skill - **debugging**. A few people have written on the topic of debugging ggplot internals, including [Hiroaki Yutani](https://twitter.com/yutannihilat_en)'s blog post on [using browser() for debugging ggproto methods](https://yutani.rbind.io/post/a-tip-to-debug-ggplot2/) and [Dewey Dunnington](https://twitter.com/paleolimbot)'s [{ggdebug}](https://github.com/paleolimbot/ggdebug) package which gives you freakishly powerful control over the internals. People have also written packages for less "intrusive" ways of debugging and interacting with the internals, including [{gginnards}](https://cran.r-project.org/web/packages/gginnards/vignettes/user-guide-1.html) by [Pedro J. Aphalo](https://github.com/aphalo) and [{gggrid}](https://www.stat.auckland.ac.nz/~paul/Reports/gggrid/gggrid.html) by [Paul Murrell](https://www.stat.auckland.ac.nz/~paul/). Standing on the shoulders of these giants, `{ggtrace}` aims to offer the best of both worlds for developers, with high-level workflow functions in the form of `ggtrace_{action}_{value}()`, the low-level functions `ggtrace()` and `with_ggtrace()`, and the interactive debugging functions `ggedit()` and `ggdebugonce()`. There's not a whole lot of new stuff that ggtrace offers in this space (the package isn't even _that_ much code) but it embodies a transformative reframing of ggplot internals as functional programming, the kind that we're familiar with as R users. By treating ggproto methods like functions, we can leverage our existing debugging skills for understanding the extending ggplot internals. **Learning objectives:** - Only a small subset of ggprotos are exported by `{ggplot2}` and available for subclassing, and only a handful of ggproto methods are productive extension points. A big part of developing an extension is locating the appropriate extension point. - `{ggtrace}` allows developers to work backwards from getting the desired output to work first, then identifying the implementational details that need to be changed to produce that output. In other words, you can find a hack that works first through trial and error, then develop on a principled way of implementing the solution by following best practices for extending ggplot. - In the process of writing new ggproto objects and ggproto methods, developers need to debug frequently. `{ggtrace}` functions implement different strategies of debugging, spanning the spectrum of interactive to programmatic. ### Stage 5 (contributor) Lastly, `{ggplot2}` is always evolving, and you can take part in the process! It might not be obvious as users, but the _internals_ are undergoing constant change. So at some point you might also want to start keeping an eye on the [Github issues](https://github.com/tidyverse/ggplot2/issues). It's also a nice place to be because you can eavesdrop on the thoughts and insights from the core developers (e.g., [should users be able to specify pieces of a `scale_*()` modularly?](https://github.com/tidyverse/ggplot2/issues/4269), [how should the guides system be converted to ggproto?](https://github.com/tidyverse/ggplot2/issues/3329), [should layers get "state"?](https://github.com/tidyverse/ggplot2/issues/3175)), which can help you understand the motivation behind how and why things in the internals are designed that way. If reading those discussions inspire new ideas or strong feelings, you can submit an issue or PR to make your voice heard.