library(ddplot)
D3.js
is a famous JavaScript library that allows one to
create extremely flexible SVG graphics however D3
has (at
least according to me) a pretty steep learning curve. Further, in order
to understand some core concepts, one need to have some basics in
HTML
, CSS
and JavaScript
.
ddplot
aims to simply the process using a set of functions
that render several graphics using a simple R
API. Finally,
ddplot
is built upon the amazing r2d3
package
which makes it a breeze to interface D3.js
with
R
, so a big thanks to the developers.
scatterPlot()
Let’s work with the mpg
data frame from the
ggplot2
package.
library(ggplot2) # needed for the mpg data frame
scatterPlot(
data = mpg,
x = "hwy",
y = "cty",
xtitle = "hwy variable",
ytitle = "cty variable",
title = "cty and hwy relationship",
titleFontSize = 20
)
In comparison to ggplot2
, graphics’ customization in
ddplot
is limited nonetheless you get a fully vectorized
SVG which is cool.
scatterPlot(
data = mpg,
x = "displ",
y = "cty",
col = "tomato",
bgcol = "pink",
size = 3,
stroke = "royalblue",
strokeWidth = 1,
xtitle = "displ variable",
ytitle = "cty variable",
xticks = 3,
yticks = 3)
histogram()
The histogram()
function allows you to visualize the
distribution of a vector of data:
histogram(
x = mpg$hwy,
bins = 20,
fill = "crimson",
stroke = "white",
strokeWidth = 1,
title = "Distribution of the hwy variable",
width = "20",
height = "10"
)
animatedHistogram()
This function allows you to create a one-click histogram animation. Useful for presentation purposes. Click on the following empty plot and see what happens:
animatedHistogram(
x = mpg$hwy,
duration = 2000,
delay = 100,
fill = "lime",
stroke = "white",
bgcol = "white"
)
Note that you can customize the animation using the two parameters
duration
and delay
.
barChart()
The barChat()
function allows you to create bar charts
however you need to make the aggregation beforehand. In the following
example, we will plot the average cty
for each
manufacturer
using the dplyr
package.
library(dplyr)
mpg %>% group_by(manufacturer) %>%
summarise(mean_cty = mean(cty)) %>%
barChart(
x = "manufacturer",
y = "mean_cty",
xFontSize = 10,
yFontSize = 10,
fill = "orange",
strokeWidth = 2,
ytitle = "average cty value",
title = "Average City Miles per Gallon by manufacturer"
)
The bars can be easily sorted in ascending
or
descending
order using the sort
parameter:
mpg %>% group_by(manufacturer) %>%
summarise(mean_cty = mean(cty)) %>%
barChart(
x = "manufacturer",
y = "mean_cty",
sort = "ascending",
xFontSize = 10,
yFontSize = 10,
fill = "orange",
strokeWidth = 1,
ytitle = "average cty value",
title = "Average City Miles per Gallon by manufacturer",
titleFontSize = 16
)
horzBarChart()
If you’ve many categories, it might be a good idea to go for a
horizontal bar chart. It has the same parameters as the
barChart()
function except that the x-axis parameter is
named value
and the y-axis parameter named
label
, this naming convention aims to mitigate some
confusion that can arise.
If we want to replicate the above graphic in a horizontal way, we can do:
mpg %>% group_by(manufacturer) %>%
summarise(mean_cty = mean(cty)) %>%
horzBarChart(
label = "manufacturer",
value = "mean_cty",
sort = "ascending",
labelFontSize = 10,
valueFontSize = 10,
fill = "orange",
stroke = "crimson",
strokeWidth = 1,
valueTitle = "average cty value",
title = "Average City Miles per Gallon by manufacturer",
titleFontSize = 16
)
As in barChart()
, we can aslo sort in descending
order:
mpg %>% group_by(manufacturer) %>%
summarise(mean_cty = mean(cty)) %>%
horzBarChart(
label = "manufacturer",
value = "mean_cty",
sort = "descending",
labelFontSize = 10,
valueFontSize = 10,
bgcol = "black",
axisCol = "white",
fill = "white",
stroke = "white",
strokeWidth = 1,
valueTitle = "average cty value",
labelTitle = "Manufacturers",
title = "Average City Miles per Gallon by manufacturer",
titleFontSize = 16
)
lollipopChart()
lollipop chart follows the same behavior as bar charts but instead of
bars you get lollipops, hence the name. Below an example of a lollipop
chart with ddplot
:
mpg %>% group_by(drv) %>%
summarise(median_cty = median(cty)) %>%
lollipopChart(
x = "drv",
y = "median_cty",
sort = "ascending",
xtitle = "drv variable",
ytitle = "median cty",
title = "Median cty per drv",
xFontSize = 20
)
It’s possible to grasp the distribution of some variable according to a specific categorical variable using the same function:
mpg %>% filter(year == 2008) %>%
lollipopChart(
x = "manufacturer",
y = "hwy",
circleFill = 'red',
circleStroke = 'orange',
circleRadius = 5,
sort = "none",
xFontSize = 10
)
From above, it’s quite easy to notice that although Toyota has two cars with high highway miles per galon (hwy), it also produces many other vehicles with poor hwy.
horzLollipop()
Same with bar charts, if you have a variable that has many
categorical values, you can work with the reversed version of
lollipopChart()
which is horzLollipop()
:
mpg %>% group_by(manufacturer) %>%
summarise(median_cty = median(cty)) %>%
horzLollipop(
label = "manufacturer",
value = "median_cty",
sort = "descending")
You can also do:
mpg %>% filter(year == 2008) %>%
horzLollipop(
label = "manufacturer",
value = "hwy",
circleFill = 'red',
circleStroke = 'orange',
circleRadius = 5,
sort = "none"
)
pieChart()
Pie charts and donut charts are pretty straightforward to set up.
We’ll use a sample from the starwars
data frame to plot a
simple pie chart.
# starwars is part of the dplyr data frame
mini_starwars <- starwars %>% tidyr::drop_na(mass) %>%
sample_n(size = 5) # getting 5 random values
pieChart(
data = mini_starwars,
value = "mass",
label = "name"
)
Using the padRadius
, padAngle
and
cornerRadius
parameters, one can get fanciers pie
charts:
pieChart(
data = mini_starwars,
value = "mass",
label = "name",
padRadius = 200,
padAngle = 0.1,
cornerRadius = 50,
innerRadius = 10
)
If you need a donut chart, you just need to play with the
innerRadius
parameter:
pieChart(
data = mini_starwars,
value = "mass",
label = "name",
innerRadius = 120,
cornerRadius = 20,
title = "5 Starwars characters ranked by their mass",
titleFontSize = 16,
bgcol = "yellow"
)
lineChart()
The lineChart()
function is used to plot time series
data. The use must provide a date
variable that has the
yyyy-mm-dd
format. In the following example, we’ll use the
Air Passenger
built-in ts
data and convert it
to a classical data frame:
# 1. converting AirPassengers to a tidy data frame
airpassengers <- data.frame(
passengers = as.matrix(AirPassengers),
date= zoo::as.Date(time(AirPassengers))
)
# 2. plotting the line chart
lineChart(
data = airpassengers,
x = "date",
y = "passengers"
)
You can modify the line interpolation using the curve
parameter:
lineChart(
data = airpassengers,
x = "date",
y = "passengers",
curve = "curveStep"
)
lineChart(
data = airpassengers,
x = "date",
y = "passengers",
curve = "curveCardinal"
)
lineChart(
data = airpassengers,
x = "date",
y = "passengers",
curve = "curveBasis"
)
animLineChart()
Heavily inspired from Jure
Stabuc’s example, the animLineChart()
function create
an empty SVG but when each time you click on it a line chart animation
starts. Note that the line lasts after the end of the animation. Go
ahead, click on the empty graphic below:
animLineChart(
data = airpassengers,
x = "date",
y = "passengers",
duration = 10000, # in milliseconds (10 seconds)
curve = "curveCardinal"
)
areaChart()
areaChart()
works similarly except that instead of a
line you get an area.
# 1. converting AirPassengers to a tidy data frame
airpassengers <- data.frame(
passengers = as.matrix(AirPassengers),
date= zoo::as.Date(time(AirPassengers))
)
# 2. plotting the area chart
areaChart(
data = airpassengers,
x = "date",
y = "passengers",
fill = "purple",
bgcol = "white"
)
areaBand()
areaBand()
lets you plot a filled area between two
y-values. For the sake of the example, let’s create an additional column
passengers_upper
that has an additional 40 passengers for
each observation:
airpassengers <- data.frame(
passengers_lower = as.matrix(AirPassengers),
passengers_upper = as.matrix(AirPassengers) + 40,
date= zoo::as.Date(time(AirPassengers))
)
areaBand(
data = airpassengers,
x = "date",
yLower = "passengers_lower",
yUpper = "passengers_upper",
fill = "yellow",
stroke = "black"
)
stackedAreaChart()
This function allows you to create a stacked area chart. You need two components:
pivot_wider()
from the
tidyr
package to make wider.yyyy-mm-dd
format that will plotted
in the x-axis.Let’s work with the following data frame (shortened) provided by Mike Bostock in his stacked area chart example:
data <- data.frame(
date = c(
"2000-01-01", "2000-02-01", "2000-03-01", "2000-04-01",
"2000-05-01", "2000-06-01", "2000-07-01",
"2000-08-01", "2000-09-01", "2000-10-01"
),
Trade = c(
2000,1023, 983, 2793, 1821, 1837, 1792, 1853, 791, 739
),
Manufacturing = c(
734, 694, 739, 736, 685, 621, 708, 685, 667, 693
),
Leisure = c(
1782, 1779, 1789, 658, 675, 833, 786, 675, 636, 691
),
Agriculture = c(
655, 587,623, 517, 561, 2545, 636, 584, 559, 2504
)
)
data
#> date Trade Manufacturing Leisure Agriculture
#> 1 2000-01-01 2000 734 1782 655
#> 2 2000-02-01 1023 694 1779 587
#> 3 2000-03-01 983 739 1789 623
#> 4 2000-04-01 2793 736 658 517
#> 5 2000-05-01 1821 685 675 561
#> 6 2000-06-01 1837 621 833 2545
#> 7 2000-07-01 1792 708 786 636
#> 8 2000-08-01 1853 685 675 584
#> 9 2000-09-01 791 667 636 559
#> 10 2000-10-01 739 693 691 2504
Note that when running stackedAreaChart()
all the
variables available within the considered data frame will be plotted. If
you want to restrict the plotting to only specific variables, just drop
the unneeded columns:
stackedAreaChart(
data = data,
x = "date",
legendTextSize = 14
)
You can modify the color scheme using the colorCategory
parameter:
stackedAreaChart(
data = data,
x = "date",
legendTextSize = 14,
curve = "curveCardinal",
colorCategory = "Accent",
bgcol = "white",
stroke = "black",
strokeWidth = 1
)
stackedAreaChart(
data = data,
x = "date",
legendTextSize = 14,
curve = "curveBasis",
colorCategory = "Set3",
bgcol = "black",
axisCol = "white",
xticks = 4,
stroke = "black"
)
You can find list of D3 categorical color schemes here
Finally, if you hover over the chart you’ll notice a tooltip that identified the different area categories.
barChartRace()
This function allows you to create an animated bar chart race.
barChartRace()
is similar to barChart()
but
takes a third variable mapped to the time dimension, with options for
styling transitions.
Let’s make a bar chart race of population growth among various
countries using a subset of the gapminder
dataset from the
{gapminder}
package:
<<<<<<< HEAD
gapminder_subset <- gapminder::gapminder %>%
select(country, year, pop) %>%
filter(country %in% c("Japan", "Mexico", "Germany", "Brazil", "Philippines", "Vietnam")) %>%
mutate(pop = pop/1e6)
=======
gapminder_subset <- gapminder::gapminder %>% select(country, year, pop) %>%
filter(country %in% c("Japan", "Mexico", "Germany", "Brazil", "Mexico", "Philippines", "Vietnam")) %>%
mutate(pop = pop/1e6)
>>>>>>> 6bab1415a132b17bda7192e7e2e63758614d5161
gapminder_subset %>%
slice_sample(n = 10)
#> year pop country
#> 1 2007 91.07729 Philippines
#> 2 1997 76.04900 Vietnam
#> 3 1972 107.18827 Japan
#> 4 1967 39.46391 Vietnam
#> 5 1952 30.14432 Mexico
#> 6 1987 142.93808 Brazil
#> 7 1997 168.54672 Brazil
#> 8 1962 41.12148 Mexico
#> 9 1952 69.14595 Germany
#> 10 1957 91.56301 Japan
In this example, we simply pass call barChartRace()
like
barChart()
, but with an additional variable mapped to the
time dimension specified with time = year
:
gapminder_subset %>%
barChartRace(
x = "pop",
y = "country",
time = "year",
ytitle = "Country",
xtitle = "Population (in millions)",
title = "Bar chart race of country populations"
)
You can also stylize transitions with the frameDur
,
transitionDur
, and ease
arguments. For
example, setting the time spent pausing on each frame to zero with
frameDur = 0
will create a smooth animation:
gapminder_subset %>%
barChartRace(
x = "pop",
y = "country",
time = "year",
transitionDur = 1000,
frameDur = 0,
ytitle = "Country",
xtitle = "Population (in millions)",
title = "Bar chart race of country populations"
)
As you might have noticed, the value of the column passed to the
time
argument is automatically labelled at the bottom-right
corner of the plot panel. We can stylize this with a list of options
passed to the timeLabelOpts
argument (or turn it off with
timeLabel = FALSE
). We also give the bars a little bounce
here with ease = "BackInOut"
for fun.
gapminder_subset %>%
barChartRace(
x = "pop",
y = "country",
time = "year",
ease = "BackInOut",
ytitle = "Country",
xtitle = "Population (in millions)",
title = "Bar chart race of country populations",
timeLabelOpts = list(
size = 40,
prefix = "Year: ",
xOffset = 0.2
)
)