Column Text Format (CTF) is a new tabular data format designed for simplicity and performance. CTF is the simplest column store you can imagine: it represents each column in a table as a plain text file. The underlying plain text means the data is human readable and familiar to programmers, unlike specialized binary formats. CTF is faster than row oriented formats like CSV when loading a subset of the columns in a table. This package provides functions to read and write CTF data from R.
What is CTF good for?
What are the alternatives to CTF?
If CTF isn’t exactly what you need, then you will be better off with a more established and stable data format. CSV works fine in many cases. If you need better performance, then consider existing columnar storage technologies such as HDF5 or Apache Parquet.
Anything else?
We created CTF in 2021, and we expect the metadata file associated with it to evolve significantly. Until version 1.0 is ready, anything could change at any time, and we make no promises about compatibility.
library(ctf)
The following examples use R’s builtin iris
dataset.
First, let’s save iris
in CTF format inside iris_ctf_data
, a subdirectory of our temporary directory.
<- file.path(tempdir(), "iris_ctf_data")
d write.ctf(iris, d)
The code above created the directory iris_ctf_data
inside a temporary directory, and wrote files corresponding to the columns in iris
, plus one file for the metadata.
list.files(d)
## [1] "Petal.Length" "Petal.Width" "Sepal.Length"
## [4] "Sepal.Width" "Species" "iris-metadata.json"
colnames(iris)
## [1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" "Species"
The column files are just plain text. Let’s verify by reading the first 5 lines.
<- file.path(d, "Petal.Length")
pl_file readLines(pl_file, n = 5L)
## [1] "1.4" "1.4" "1.3" "1.5" "1.4"
1:5, "Petal.Length"] iris[
## [1] 1.4 1.4 1.3 1.5 1.4
We can read the data saved in ctf format back into R as iris2
, and make sure the data matches our original iris
data.
<- read.ctf(d)
iris2 head(iris2)
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5.0 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
# Same thing:
head(iris)
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5.0 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
Clean up:
unlink(d, recursive = TRUE)