Configuration options are essentially global variables the user can set. They are used to alter the default behavior of certain raster format drivers, and in some cases the GDAL core. A large number of configuration options are available. An overall discussion along with full list of available options and where they apply is in the GDAL documentation at https://gdal.org/user/configoptions.html.
This quick reference covers a small subset of configuration options that may be useful in common scenarios, with links to topic-specific documentation provided by the GDAL project. Options can be set from R with gdalraster::set_config_option()
. Note that specific usage is context dependent. Passing value = ""
(empty string) will unset a value previously set by set_config_option()
:
library(gdalraster)
set_config_option("GDAL_NUM_THREADS", "ALL_CPUS")
# unset:
set_config_option("GDAL_NUM_THREADS", "")
GDAL doc: https://gdal.org/user/configoptions.html#general-options
GDAL_RASTERIO_RESAMPLING
The $read()
method of a GDALRaster
object will perform automatic resampling if the specified output size (out_xsize * out_ysize
) is different than the size of the source region being read (xsize * ysize
). In that case, resampling can be configured to override the default NEAR
to one of BILINEAR
, CUBIC
, CUBICSPLINE
, LANCZOS
, AVERAGE
, MODE
, RMS
, or GAUSS
:
# bilinear interpolation (2x2 neighborhood of pixels)
set_config_option("GDAL_RASTERIO_RESAMPLING", "BILINEAR")
CPL_TMPDIR
By default, temporary files are written into the current working directory. This can be changed with:
set_config_option("CPL_TMPDIR", "<dirname>") # tmpdir to use
GDAL doc: https://gdal.org/user/configoptions.html#performance-and-caching
GDAL_NUM_THREADS
Sets the number of worker threads to be used by GDAL operations that support multithreading. This affects several different parts of GDAL including multi-threaded compression for GeoTiff and SOZip, and multithreaded computation during warp()
(see topics below).
GDAL_CACHEMAX
The size limit of the block cache is set upon first use (first I/O). Setting GDAL_CACHEMAX
after that point will not resize the cache. It is a per-session setting. If GDAL_CACHEMAX
has not been set upon first use of the cache, then the default cache size (5%
of physical RAM) will be in effect for the current session. See also GDAL Block Cache.
# set to a specific size in MB
set_config_option("GDAL_CACHEMAX", "800")
# or percent of physical RAM
set_config_option("GDAL_CACHEMAX", "10%")
GDAL_MAX_DATASET_POOL_SIZE
The default number of datasets that can be opened simultaneously by the GDALProxyPool
mechanism (used by VRT for example) is 100
. This can be increased to get better random I/O performance with VRT mosaics made of numerous underlying raster files. Note: on Linux systems, the number of file handles that can be opened by a process is generally limited to 1024
. This is currently clamped between 2
and 1000
. Also note that gdalwarp
increases the pool size to 450
:
# default is 100
set_config_option("GDAL_MAX_DATASET_POOL_SIZE", "450")
PG_USE_COPY
This configures PostgreSQL/PostGIS to use COPY
for inserting data which is significantly faster than INSERT
. This can increase performance substantially when using gdalraster::polygonize()
to write polygons to PostGIS vector. See also GDAL configuration options for PostgreSQL.
# use COPY for inserting to PostGIS
set_config_option("PG_USE_COPY", "YES")
SQLITE_USE_OGR_VFS
For the SQLite-based formats GeoPackage (.gpkg) and Spatialite (.sqlite), setting SQLITE_USE_OGR_VFS
enables extra buffering/caching by the GDAL/OGR I/O layer and can speed up I/O. Be aware that no file locking will occur if this option is activated, so concurrent edits may lead to database corruption. This setting may increase performance substantially when using gdalraster::polygonize()
to write polygons to a vector layer in these formats. Additional configuration and performance hints for SQLite databases are in the driver documentation at: https://gdal.org/drivers/vector/sqlite.html#configuration-options.
# SQLite: GPKG (.gpkg) and Spatialite (.sqlite)
# enable extra buffering/caching by the GDAL/OGR I/O layer
set_config_option("SQLITE_USE_OGR_VFS", "YES")
OGR_SQLITE_JOURNAL
SQLite is a transactional DBMS. When many INSERT
statements are executed in close sequence, application code may group them into large batches within transactions in order to get optimal performance. By default, if no transaction is explicitly started, SQLite will autocommit on every statement which will be slow.
The OGR_SQLITE_JOURNAL
option configures operation of the rollback journal that implements transactions in SQLite. The SQLite documentation describes the default operation:
The DELETE journaling mode is the normal behavior. In the DELETE mode, the rollback journal is deleted at the conclusion of each transaction. Indeed, the delete operation is the action that causes the transaction to commit.
The DELETE
mode requires file system I/O so performance is degraded if many INSERT
s are autocommitted individually. Using MEMORY
journaling mode (or even OFF
) can be much faster in this case:
The MEMORY journaling mode stores the rollback journal in volatile RAM. This saves disk I/O but at the expense of database safety and integrity. If the application using SQLite crashes in the middle of a transaction when the MEMORY journaling mode is set, then the database file will very likely go corrupt.
See the SQLite documentation for all available journal modes. This setting also applies when using gdalraster::polygonize()
to write polygons to a vector layer in GeoPackage (.gpkg) or Spatialite (.sqlite) formats (see SQLITE_USE_OGR_VFS
above).
# configure SQLite to store the rollback journal in RAM
set_config_option("OGR_SQLITE_JOURNAL", "MEMORY")
GDAL doc: https://gdal.org/user/configoptions.html#networking-options
CPL_VSIL_USE_TEMP_FILE_FOR_RANDOM_WRITE
Whether to use a local temporary file to support random writes in certain virtual file systems. The temporary file will be located in CPL_TMPDIR
(see above).
# YES|NO to use a temp file
set_config_option("CPL_VSIL_USE_TEMP_FILE_FOR_RANDOM_WRITE", "YES")
GDAL doc: https://gdal.org/user/configoptions.html#proj-options
OSR_DEFAULT_AXIS_MAPPING_STRATEGY
This option can be set to either TRADITIONAL_GIS_ORDER
or AUTHORITY_COMPLIANT
. GDAL >= 3.5 defaults to AUTHORITY_COMPLIANT
. Determines whether to honor the declared axis mapping of a CRS or override it with the traditional GIS ordering (x = longitude, y = latitude).
OSR_WKT_FORMAT
As of GDAL 3.0, the default format for exporting a spatial reference definition to Well Known Text is WKT 1. This can be overridden with:
# SFSQL/WKT1_SIMPLE/WKT1/WKT1_GDAL/WKT1_ESRI/WKT2_2015/WKT2_2018/WKT2/DEFAULT
set_config_option("OSR_WKT_FORMAT", "WKT2")
GDAL doc: https://gdal.org/programs/gdalwarp.html#memory-usage
The performance and caching topic above generally applies to processing with gdalraster::warp()
(reproject/resample/crop/mosaic).
GDAL_NUM_THREADS
Multithreaded computation in warp()
can be enabled with:
# note this also affects several other parts of GDAL
set_config_option("GDAL_NUM_THREADS", "4") # number of threads or ALL_CPUS
Increasing the memory available to warp()
may also increase performance (i.e., the options passed in cl_arg
include a value like c("-wm", "1000")
). The warp memory specified by "-wm"
is shared among all threads. It is especially beneficial to increase this value when running warp()
with multithreading enabled.
Multithreading could also be enabled by including a GDAL warp option in cl_arg
with c("-wo", "NUM_THREADS=<value>")
greater than 1, which is equivalent to setting the GDAL_NUM_THREADS
configuration option as shown above.
This option can be combined with the -multi
command-line argument passed to warp()
in cl_arg
. With -multi
, two threads will be used to process chunks of the raster and perform input/output operation simultaneously, whereas the GDAL_NUM_THREADS
configuration option affects computation separately.
GDAL_CACHEMAX
Increasing the size of the I/O block cache may also help. This can be done by setting GDAL_CACHEMAX
as described in the performance and caching topic above.
GDAL doc: https://gdal.org/drivers/raster/gtiff.html#configuration-options
The behavior of the GTiff driver is highly configurable, including with respect to overview creation. For full discussion, see the link above and also the documentation for the gdaladdo
command-line utility.
GDAL_NUM_THREADS
The GTiff driver supports multi-threaded compression (default is compression in the main thread). GDAL documentation states that it is worth it for slow compression algorithms such as DEFLATE
or LZMA
. Starting with GDAL 3.6, this option also enables multi-threaded decoding when read requests intersect several tiles/strips:
# specify the number of worker threads or ALL_CPUS
# note this also affects several other parts of GDAL
set_config_option("GDAL_NUM_THREADS", "ALL_CPUS")
COMPRESS_OVERVIEW
Raster overviews (a.k.a. pyramids) can be built with the $buildOverviews()
method of a GDALRaster
object. It may be desirable to compress the overviews when building:
# applies to external overviews (.ovr), and internal overviews if GDAL >= 3.6
# LZW is a good default but several other compression algorithms are available
set_config_option("COMPRESS_OVERVIEW", "LZW")
PREDICTOR_OVERVIEW
Sets the predictor to use for overviews with LZW
, DEFLATE
and ZSTD
compression. The default is 1
(no predictor), 2
is horizontal differencing and 3
is floating point prediction. PREDICTOR=2
is only supported for 8, 16, 32 and 64 bit samples (support for 64 bit was added in libtiff > 4.3.0). PREDICTOR=3
is only supported for 16, 32 and 64 bit floating-point data.
# horizontal differencing
set_config_option("PREDICTOR_OVERVIEW", "2")
GDAL doc: /vsicurl/ (HTTP/HTTPS random access)
GDAL_HTTP_CONNECTTIMEOUT
Maximum delay for connection to be established before being aborted.
# max delay for connection establishment in seconds
set_config_option("GDAL_HTTP_CONNECTTIMEOUT", "<seconds>")
GDAL_HTTP_TIMEOUT
Maximum delay for the whole request to complete before being aborted.
# max delay for whole request completion in seconds
set_config_option("GDAL_HTTP_TIMEOUT", "<seconds>")
CPL_VSIL_CURL_CHUNK_SIZE
Partial downloads (requires the HTTP server to support random reading) are done with a 16 KB granularity by default. The chunk size can be configured with this option.
If the driver detects sequential reading, it will progressively increase the chunk size up to 128 times CPL_VSIL_CURL_CHUNK_SIZE
(so 2 MB by default) to improve download performance. When increasing the value of CPL_VSIL_CURL_CHUNK_SIZE
to optimize sequential reading, it is recommended to increase CPL_VSIL_CURL_CACHE_SIZE
as well to 128 times the value of CPL_VSIL_CURL_CHUNK_SIZE
.
# chunk size in bytes
set_config_option("CPL_VSIL_CURL_CHUNK_SIZE", "<bytes>")
CPL_VSIL_CURL_CACHE_SIZE
A global least-recently-used cache of 16 MB shared among all downloaded content is used, and content in it may be reused after a file handle has been closed and reopen, during the life-time of the process or until vsi_curl_clear_cache()
is called. The size of this global LRU cache can be modified with:
# size in bytes defaults to 16 MB
set_config_option("CPL_VSIL_CURL_CACHE_SIZE", "<bytes>")
GDAL doc: /vsis3/ (AWS S3 file system handler)
AWS_NO_SIGN_REQUEST
Request signing can be disabled for public buckets that do not require an AWS account:
# public bucket no AWS account required
set_config_option("AWS_NO_SIGN_REQUEST", "YES")
AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY
AWS_SESSION_TOKEN
AWS_REQUEST_PAYER
If authentication is required, configure credentials with:
set_config_option("AWS_ACCESS_KEY_ID", "<value>") # key ID
set_config_option("AWS_SECRET_ACCESS_KEY", "<value>") # secret access key
# used for validation if using temporary credentials:
set_config_option("AWS_SESSION_TOKEN", "<value>") # session token
# if requester pays:
set_config_option("AWS_REQUEST_PAYER", "<value>") # requester
AWS_REGION
Sets the AWS region to which requests should be sent. Defaults to us-east-1
.
# specify region
set_config_option("AWS_REGION", "us-west-2")
GDAL doc: /vsigs/ (Google Cloud Storage files)
GDAL doc: /vsiaz/ (Microsoft Azure Blob files)
Recognized filenames are of the form /vsiaz/container/key
, where container
is the name of the container and key
is the object “key”, i.e. a filename potentially containing subdirectories.
AZURE_NO_SIGN_REQUEST
Controls whether requests are signed.
# public access
set_config_option("AZURE_NO_SIGN_REQUEST", "YES")
AZURE_STORAGE_CONNECTION_STRING
Credential string provided in the Access Keys section of the administrative interface, containing both the account name and a secret key.
set_config_option("AZURE_STORAGE_CONNECTION_STRING", "<my_connection_string>")
Several other authentication methods are possible for Azure. See the GDAL documentation for details.
GDAL doc: /vsiadls/ (Microsoft Azure Data Lake Storage Gen2)
GDAL doc: /vsizip/ (Seek-Optimized ZIP files, GDAL >= 3.7)
The function gdalraster::addFilesInZip()
can be used to create new or append to existing ZIP files, potentially using the seek optimization extension. Function arguments are available for the options below, or the configuration options can be set to change the default behavior.
GDAL_NUM_THREADS
The GDAL_NUM_THREADS
configuration option can be set to ALL_CPUS
or an integer value to specify the number of threads to use for SOZip-compressed files. This option is similarly described above for compression in GeoTiff. Note that this option also affects several other parts of GDAL.
CPL_SOZIP_ENABLED
Defaults to AUTO
. Determines whether the SOZip optimization should be enabled. If AUTO
, SOZip will be enabled for uncompressed files larger than CPL_SOZIP_MIN_FILE_SIZE
.
# SOZip optimization defaults to AUTO
set_config_option("CPL_SOZIP_ENABLED", "YES")
CPL_SOZIP_MIN_FILE_SIZE
Defaults to 1M
. Determines the minimum file size for SOZip to be automatically enabled. Specified in bytes, or K
, M
or G
suffix can be used respectively to specify a value in kilobytes, megabytes or gigabytes.
# SOZip minimum file size
set_config_option("CPL_SOZIP_MIN_FILE_SIZE", "100K")