Speed-up the computations on a LAScatalog

What takes time when processing a LAScatalog is not necessarily the computation itself but the time required to read the files. In fact the read time (i.e the time needed to load the data in R) might be much longer than the actual computation time. This vignette explains why and how to speed-up the computation by a factor of 2 to 8 by carefully preparing the catalog.

Generic considerations on `LAScatalog` processing

When processing a LAScatalog the area covered is divided into chunks that are then processed sequentially. In the following examples we present the case where chunks are equal to tiles, i.e. when processing each file sequentially. This is the common way to process data but not the only one. In any case, the explanation remains valid even when chunks are not equal to tiles.

So each file is processed sequentially. For a given processed file, the content is read and loaded into R. In the following figure chunk number 42 is currently read and processed (in blue):

But the current processed file is not the only one that needs to be read. To properly process the catalog without edge artifacts we need to load an extra buffer around the processed file (in red).

To load a buffer the processing engine must not only read the processed file but also the 8 neighbouring tiles (in red) to selectively read a small buffer around the processed file. Thus, for each processed file it is not one file that is read but nine. This is one of the reasons why the read time is far from negligible compared to the actual computation time.

To process a LAScatalog faster we need to read the files faster.

Read a las file vs read a laz file

A laz file is a strongly compressed las file. Reading a laz file is thus slower than reading a las file because it must be un-compressed on the fly at reading time. The following graph shows a benchmark of read time for a single file.

t1 = system.time(readLAS("file.laz"))
t2 = system.time(readLAS("file.las"))

So let’s assume that the total computation time is 1 unit of time divided into 0.25 units of actual processing time and 0.75 units of read time (which is a fairly reasonable ratio). We can divide the read time by 3, and thus have 0.25 units of read time and 0.25 units of computing time, which gives a computation time of 0.5 instead of 1. We can therefore speed-up the computation time by a factor of 2 by using the las format instead of laz. Obviously the gain is less significant for more computationally demanding processes.

So for faster computation users can opt for las files instead of laz files. Obviously, there are good reasons to use laz files instead of las files. The strong compression brought by the laz format has a lot of advantages for storing data. It is up to the user to choose a format by considering the trade-offs between space and computation time. This section explains how it works only to help users make a decision that best suits their needs.

Indexation of the points with lax files

Another way to speed-up the total computation time is to avoid reading all 8 neighbouring tiles to load a buffer. Instead, we can read only parts of the neighbouring tiles. The gain comes from the fact that we read only a small portion of the neighbouring files to extract the buffer, skipping most of the file contents. Indeed, the buffer usually corresponds to only a very small percentage of the actual contents of a file (equivalent to a few thousands square meters).

This is made possible by indexing the las or laz files with lax files. A lax file is a tiny file associated with a las or laz file that spatially indexes the points to make faster spatial queries. This file type was created by Martin Isenburg in LAStools. For a better understanding of how it works one can refer to a talk given by Martin Isenburg about lasindex.

By adding lax files along with your las/laz files the buffer can be added around the processing file by only partially reading the 8 neigbouring files (in red)

The best way to create a lax file is to use laxindex from LAStools. It is a free and open-source part of LAStools. If you cannot or do not want to use LAStools the lidR package has an (undocumented) internal function that creates lax files using the rlas package:

lidR:::catalog_laxindex(ctg)

The gain is really significant and allows an additional 2- to 3-fold saving in terms of read time, which significantly speeds up the computation time. Changing from laz to las format has a cost because it implies storing more data. However, using lax files provides a significant gain for free, so there is no incentive not to create lax files.

Load only attributes of interest

We demonstrated the importance of decreasing the time taken to read files to improve the overall computation time. The faster you read the files the faster you perform the computation because the read time is non-negligible. When reading a las file the computation time can be roughly split in 3 equal parts. (1) 1/3 is actual reading, (2) 1/3 is data storage in C++ data structures, (3) 1/3 is copy of the data into R objects. When reading only attributes of interest we can speed-up the steps (2) and (3). This is not as significant for laz files because step (1) is more than 1/3.

system.time(readLAS(file.las))
system.time(readLAS(file.las, select = "xyz"))
system.time(readLAS(file.laz))
system.time(readLAS(file.laz, select = "xyz"))

Changing the hardware

For a fast reading, opting for a fast SSD disk instead of a slow HDD disk may significantly speed-up the computation time independently of the power of your processor. Hardware matters!

Benchmarks

The following are benchmarks for some functions

Simple canopy height model

A simple point-to-raster canopy height model using 25 files of 150 x 150 m with 30 pts/m² on a laptop with an SSD and an intel core i7 processor.

chm <- rasterize_canopy(ctg, 1, p2r())

Format	Runtime
laz	40 sec
laz + lax	20 sec
las	10 sec
las + lax	7 sec

Here we found an almost 8-fold increase in speed simply by changing the file types.

Area-based Approach

Computation of a single metric on 360 files of 1 x 1 km with 3 pts/m² (~300 km² and 900 millions points) on a laptop with an SSD and an intel core i7 processor.

hmean <- pixel_metric(ctg, mean(Z), 20)

Format	Runtime
laz	45 min
las	15 min
las + lax	8 min

Here we found an almost 6-fold speed-up by changing only the file types.

Clip ground inventories

Extraction of 140 plots of 12 m radius from 1 x 1 km files with 3 pts/m² on a laptop with an SSD and an intel core i7 processor.

plots <- clip_roi(ctg, shapefile, radius = 12)

Format	Runtime
las	45 sec
las + lax	4 sec