RcppSimdJson: Rcpp Bindings for the simdjson Header Library

CI License CRAN Dependencies Downloads Code Coverage Last Commit

Motivation

simdjson by Daniel Lemire (with contributions by Geoff Langdale, John Keiser and many others) is an engineering marvel. Through very clever use of SIMD instructions, it manages to parse JSON files faster than disc access. Wut? Yes you read that right: parallel processing with so little overhead that the net throughput is limited only by disk speed.

Moreover, it is implemented in neat modern C++ and can be accessed as a header-only library. (Well, one library in two files, really.) Which makes R packaging easy and convenient and compelling. So here we are.

For further introduction, see the arXiv paper by Langdale and Lemire (out/to appear in VLDB Journal 28(6) as well) and/or the video of the recent talk by Daniel Lemire at QCon (voted best talk).

Example

jsonfile <- system.file("jsonexamples", "twitter.json", package="RcppSimdJson")
library(RcppSimdJson)
validateJSON(jsonfile)                  # validate a JSON file
res <- fload(jsonfile)                  # parse a JSON file

Comparison

A simple file-oriented parsing benchmark against the other R-accessible JSON parsers:

> print(res)
Unit: microseconds
     expr       min        lq      mean   median        uq        max neval   cld
  yyjsonr   312.267   347.683   405.177   390.11   425.827    926.776   100 a
 simdjson   274.367   323.998   447.691   467.79   526.237    773.070   100 a
  jsonify  2727.874  2813.681  2952.804  2896.84  2972.852   7442.755   100  b
 jsonlite  4237.538  4435.683  4587.428  4552.38  4668.345   7082.673   100   c
  RJSONIO  9131.864  9425.515  9707.274  9599.48  9845.006  13516.616   100    d
   ndjson 91668.822 92628.357 95386.212 93192.37 94507.484 152179.095   100     e
>

Or in chart form, also including the second benchmark parsing strings:

Status

All three major OSs are supported, and JSON can be parsed from file and string under a variety of settings. A C++17 compiler is required for ease of setup (though the upstream can fall back to older compiler; one can edit src/Makevars accordingly if need be).

Contributing

Any problems, bug reports, or features requests for the package can be submitted and handled most conveniently as Github issues in the repository.

Before submitting pull requests, it is frequently preferable to first discuss need and scope in such an issue ticket. See the file Contributing.md (in the Rcpp repo) for a brief discussion.

See Also

For standard JSON work on R, as well as for other nicely done C++ libraries, consider these:

Author

For the R package, Dirk Eddelbuettel and Brendan Knapp.

For everything pertaining to simdjson, Daniel Lemire (and many contributors).