In literate programming, the typical paradigm of source code is
reversed; instead of a wall of code with the occasional comment, the
user writes human readable text (like this paragraph) with
source code interspersed. In the R language, this is primarily done with
the rmarkdown
package, which takes a plaint text R markdown file
(.Rmd
) containing code “chunks” and executes that code when
converting to a regular markdown file (.md
) and then
possibly some other format (.html
, .pdf
,
etc).
Markdown is a lightweight plain-text language used to format text.
Let’s look at the original description of markdown from John Gruber’s
website, the creator of the markdown standard. Using the
rvest
package, we can programmatically scrape Gruber’s
blog, extract HTML paragraph tags, and convert those tags to character
vectors.
markdown_blog <-
read_html("https://daringfireball.net/projects/markdown/") %>%
html_elements("p") %>%
html_text()
Gruber first explains what exactly his markdown language is.
Markdown is a text-to-HTML conversion tool for web writers. Markdown allows you to write using an easy-to-read, easy-to-write plain text format, then convert it to structurally valid XHTML (or HTML).
He continues by outlining why markdown was created, his rationale for it’s format, and some inspiration for it’s syntax.
The overriding design goal for Markdown’s formatting syntax is to make it as readable as possible. The idea is that a Markdown-formatted document should be publishable as-is, as plain text, without looking like it’s been marked up with tags or formatting instructions. While Markdown’s syntax has been influenced by several existing text-to-HTML filters, the single biggest source of inspiration for Markdown’s syntax is the format of plain text email.
This entire vignette was written in markdown and converted to HTML
using pandoc. However, as you may have
noticed, we haven’t exactly been conforming to this original desire for
markdown to be readable as is. We didn’t copy the text from his blog and
past it as text into this vignette. This is where the
gluedown
package comes in.
The gluedown
package helps ease the transition between
the incredibly powerful vector support in R and the readability of
markdown. Since this vignette was written in R Markdown
(.Rmd
), we are able to (1) use the power of packages like
rvest
to collect, process, and/or analyze some kind of data
and then (2) transition that result to the human readable markdown
format.
When writing this vignette, three kinds of files are used.
.Rmd
file containing source code is a programming
environment.md
file created by rmarkdown
is a
human-readable plain text version of that input code also containing the
output text..html
format of this vignette created by
pandoc
is the final presentation format.In the rest of this vignette, we will see some of the various use
cases for gluedown
. We will see how easy it is to
transition between R vectors and readable results in markdown/HTML.
Printing vectors as markdown lists was the initial inspiration for
the package. In R, atomic vectors the fundamental object type that
composes more complex objects like lists and dataframes. The
state.name
vector built into base R is a character vector
of all 50 state names.
If we as a user want to use those state names as text in our
markdown document we can use the cat()
function and tell
rmarkdown
to print the results of that function “as is”
(rather than as code output).
Alabama Alaska Arizona
That output obviously isn’t very appealing. We could tweak our use of
cat()
a little to separate them on new lines.
Alabama
Alaska
Arizona
This is more readable, but with some more work, we can use
cat()
to print an ordered list.
This workflow gets tiresome, although it’s made slightly more simple
with the fantastic glue
package
from Jim Hester.
This is the technique used in this package. Vector inputs are passed
to glue::glue()
and the appropriate markdown syntax is
implemented.
The md_order()
function simplifies the
glue::glue()
workflow and allows users to more easily
customize the appearance of the list in markdown format.
# markdown only cares about the first number
md_order(state.name[1:3], seq = FALSE)
#> 1. Alabama
#> 1. Alaska
#> 1. Arizona
# markdown ignored padding and allows for use of parentheses
md_order(state.name[1:10], seq = TRUE, pad = TRUE, marker = ")")
#> 01) Alabama
#> 02) Alaska
#> 03) Arizona
#> 04) Arkansas
#> 05) California
#> 06) Colorado
#> 07) Connecticut
#> 08) Delaware
#> 09) Florida
#> 10) Georgia
Although, as we can see below, all these different options are
rendered as the same kind of HTML <ol>
fragment.
This ordered list is a markdown container block. As described in the GitHub Flavored Markdown specification:
We can think of a document as a sequence of blocks—structural elements like paragraphs, block quotations, lists, headings, rules, and code blocks. Some blocks (like block quotes and list items) contain other blocks; others (like headings and paragraphs) contain inline content—text, links, emphasized text, images, code spans, and so on.
We can nest md_*()
functions to create inline content
within a code block. Let’s use some inline functions to create a new
vector names inline
with five states, each formatted in
another syntax. We’ll take a look at what that vector really
looks like with a simple print()
.
inlines <- c(
md_bold(state.name[4]),
md_code(state.name[5]),
md_link(state.name[6], "https://Colorado.gov"),
md_italic(state.name[7]),
md_strike(state.name[8])
)
str(inlines, vec.len = 3)
#> chr [1:5] "**Arkansas**" "`California`" "[Colorado](https://Colorado.gov)" ...
Using md_bullet()
we will print that vector as a bullet
point list container block and each list item will be rendered as a
separate inline.
California
These functions demonstrate how gluedown
can be used to
transition between R vectors, simply formatted markdown text, and
beautifully formatted HTML text.
Aside from container blocks and inlines, there is a third type of markdown content. The leaf blocks cannot contain inline content. The thematic break is an example of a leaf block.
Code blocks are another type of leaf block. The code we’ve been
writing so far is contained within rmarkdown
chunks, which execute the code within. By default, those code
chunks are then displayed as regular code blocks in the intermediary
.md
file. Sometimes we might want to use code blocks to
display other types of text. Perhaps we want to show the content of a
function. The md_fence()
function creates a new
code fence from the lines created by deparse()
.
lines <- deparse(md_bullet)
md_fence(lines)
function (x, marker = c("*", "-", "+"))
{
marker <- match.arg(marker)
glue::glue("{marker} {x}")
}
Or perhaps we want to display some code from another language that isn’t supposed to be executed
The package has been designed to fit well in a traditional R workflow
so users can seamlessly create content with their code and display that
content with gluedown
. In that spirit, all functions are
designed to fit within the tidyverse ecosystem by working with pipes.
Pipes allow users to pass the results of one function into the beginning
of the next. By ending this “pipeline” with md_quote()
, we
chain together five coding steps:
<blockquote>
tagread_html("https://w.wiki/A58") %>% # 1
html_element("blockquote") %>% # 2
html_text(trim = TRUE) %>% # 3
str_remove("\\[(.*)\\]") %>% # 4
md_quote() # 5
We the People of the United States, in Order to form a more perfect Union, establish Justice, insure domestic Tranquility, provide for the common defence, promote the general Welfare, and secure the Blessings of Liberty to ourselves and our Posterity, do ordain and establish this Constitution for the United States of America.
The package primarily uses GitHub Flavored Markdown (GFM), a site-specific version of the CommonMark specification, an unambiguous implementation of John Gruber’s original Markdown. With this flavor, some useful extensions like task lists are supported on GitHub. Elsewhere, like this HTML vignette, a task list will just render as a bullet list. You can learn more about how GFM us implemented in this package’s other vignette.
legislation <- c("Houses passes", "Senate concurs", "President signs")
md_task(legislation, check = 1:2)
Markdown tables are another extremely useful extension. The
md_table()
functions wraps around the much more powerful
knitr::kable()
function, which allows data frames to be
printed in a number of alternative formats. Printing data frames is a
very typical use case for documenting the process of data science. With
small summary tables like the one below, a markdown table is much more
readable than the plain text tibble or data frame printed by
default.
print(head(state.x77))
#> Population Income Illiteracy Life Exp Murder HS Grad Frost Area
#> Alabama 3615 3624 2.1 69.05 15.1 41.3 20 50708
#> Alaska 365 6315 1.5 69.31 11.3 66.7 152 566432
#> Arizona 2212 4530 1.8 70.55 7.8 58.1 15 113417
#> Arkansas 2110 3378 1.9 70.66 10.1 39.9 65 51945
#> California 21198 5114 1.1 71.71 10.3 62.6 20 156361
#> Colorado 2541 4884 0.7 72.06 6.8 63.9 166 103766
Population | Income | Illiteracy | Life Exp | Murder | HS Grad | Frost | Area | |
---|---|---|---|---|---|---|---|---|
Alabama | 3615 | 3624 | 2.1 | 69.05 | 15.1 | 41.3 | 20 | 50708 |
Alaska | 365 | 6315 | 1.5 | 69.31 | 11.3 | 66.7 | 152 | 566432 |
Arizona | 2212 | 4530 | 1.8 | 70.55 | 7.8 | 58.1 | 15 | 113417 |
Arkansas | 2110 | 3378 | 1.9 | 70.66 | 10.1 | 39.9 | 65 | 51945 |
California | 21198 | 5114 | 1.1 | 71.71 | 10.3 | 62.6 | 20 | 156361 |
Colorado | 2541 | 4884 | 0.7 | 72.06 | 6.8 | 63.9 | 166 | 103766 |
You can also use gluedown
to format R [inline code
results][inline]. First, use R to calculate a result.
rand <- sample(state.name, 1)
# `r md_bold(rand)`
var <- sample(colnames(state.x77), 1)
# `r md_code(var)`
Then, you can easily print that result in the middle of regular text
with markdown formatting. In this case, our randomly selected state is…
South Dakota and the Area
variable was
randomly selected from the state.x77
dataframe. Calculating
results and using those calculations in the body of a text document
increases reproducibility.
In a meta-study
of psychology journals, researchers found that “around 15% of the
articles contained at least one statistical conclusion that proved, upon
recalculation, to be incorrect.” These errors can be mitigated by using
inline printing of results like we did above. With the
gluedown
package, programmers can
emphasize those results without worry.