R-Object to R-Object Hash Maps

Introduction

Hash tables are among the most useful data structures for efficient coding. An hash table can be abstractly thought as a map from a set of unique keys (which may be strings, numbers, or even more complicated objects) to a set of values. From a computational viewpoint, its most distinctive feature is that its read/write operations (i.e. storing, retrieving or deleting a particular key or key-value pair) have average \(O(1)\) time complexity, independent of the table size.

Many programming languages come with their own native implementation of hash tables (for instance std::unordered_map/sets in C++, or dicts and sets in Python); in base R, the objects which come closest to hash tables are environments. These, however, can be somewhat cumbersome to handle from the user point of view, and only support string type keys. The purpose of {r2r} is to provide a more flexible implementation of hash tables in R, building on top of base R environments.

In particular, r2r hash tables support:

arbitrary R objects as keys and values,
arbitrary key comparison and hash functions,
customizable behaviour (throw or return a default value) on missing key exceptions.

This document provides a quick hands-on introduction to r2r hash tables.

library(r2r)

Basic Manipulations

We create an empty hash map with:

m <- hashmap()

We can insert key-value pairs in m in several different ways:

m[["key"]] <- "value"
m[c(1, 2, 3)] <- c("a", "b", "c") # Vectorized over keys and values
m[[c(4, 5, 6)]] <- c("d", "e", "f") # Not vectorized

The following queries explain the differences between the [[ and [ operator mentioned in the comments above:

m[["key"]]
#> [1] "value"

m[c(1, 2, 3)]
#> [[1]]
#> [1] "a"
#> 
#> [[2]]
#> [1] "b"
#> 
#> [[3]]
#> [1] "c"
m[[c(1, 2, 3)]]
#> NULL

m[c(4, 5, 6)]
#> [[1]]
#> NULL
#> 
#> [[2]]
#> NULL
#> 
#> [[3]]
#> NULL
m[[c(4, 5, 6)]]
#> [1] "d" "e" "f"

Single element insertions and queries can also be performed through the generics insert() and query()

insert(m, "user", "vgherard") # Modifies `m` in place
query(m, "user")
#> [1] "vgherard"

Sets

In addition to hash maps, we can also create hash sets, which simply store keys:

s <- hashset()
insert(s, 1)
s[[2]] <- T # equivalent to insert(s, 2)
s[c(1, 2, 3)]
#> [[1]]
#> [1] TRUE
#> 
#> [[2]]
#> [1] TRUE
#> 
#> [[3]]
#> [1] FALSE

Key and value types

There is no restriction on the type of object you can use as keys and values. For instance:

m[[ lm(wt ~ mpg, mtcars) ]] <- list("This is my fit!", 840)
m[[ lm(wt ~ mpg, mtcars) ]]
#> [[1]]
#> [1] "This is my fit!"
#> 
#> [[2]]
#> [1] 840
m[[ lm(cyl ~ mpg, mtcars) ]]
#> NULL

Setting default values

You can set default values for missing keys. For instance:

m <- hashmap(default = 0)

which is useful for creating a counter:

objects <- list(1, 1, "1", FALSE, "1", 1)
for (object in objects)
    m[[object]] <- m[[object]] + 1
m[["1"]]
#> [1] 2

Alternatively, you may throw an exception upon querying a missing key:

m <- hashmap(on_missing_key = "throw")
tryCatch(m[["Missing key"]], error = function(cnd) "Oops!")
#> [1] "Oops!"

Using custom key comparison and hash functions

hashmaps and hashmaps use by default base::identical() to compare keys. For instance:

m <- hashmap()
m[[1]] <- "double"
m[["1"]] <- "character"
m[[1]]
#> [1] "double"

This behavior can be changed by explicitly providing a key comparison function. For this to work correctly, one must also explicitly provide an hash function which produces the same hashes for equivalent keys. A simple way to do this is to apply a preprocessing function to keys, as illustrated by the following example.

We assume that keys are length one complex numbers, and consider two keys equivalent when they have the same direction in the complex plane. The direction of a complex vector can be found applying the R function Arg(), which is thus a sensible key preprocessing function. We can instruct an hashmap to preprocess its keys in this way through the constructor’s key_preproc_fn argument:

m <- hashmap(key_preproc_fn = Arg)

Let us check that everything works as intended:

m[list(1, 1 + 1i, 1i)] <- list("EAST", "NORTH-EAST", "NORTH")
m[[10]]
#> [1] "EAST"
m[[100i]]
#> [1] "NORTH"
m[[2 + 2i]]
#> [1] "NORTH-EAST"