Abstract
The Rwtss package is a front-end to the Web Time Series Service \ (WTSS) that offers time series of remote sensing data using a simple API. A WTSS server takes as input a n Earth observation data cube, that has a spatial and a temporal dimension and can have multiple bands attributes. The WTSS API has three commands, which are are list_coverages, that returns a list of coverages available in the server; (b) describe_coverage, that returns the metadata for a given coverage; (c) time_series, that returns a time series for a spatio-temporal location.Recently, the concept of data cubes has emerged as a relevant paradigm for organising Earth observation data for performing analysis. In an intuitive sense, data cubes are temporal sequences of images from the same geographical area. This sequence is organised consistently to allow algorithms to explore the data in both the spatial and temporal directions[Appel2019]. Data cubes are thus an efficient way to explore satellite image archives spanning years or even decades.
Data cubes rely on the fact that Earth observation satellites revisit the same place at regular intervals. Thus measures can be calibrated so that observations of the same place in different times are comparable. These calibrated observations can be organised in regular intervals, so that each measure from sensor is mapped into a three dimensional multivariate array in space-time. Let \(S = \{s_{1}, s_{2},\dotsc, s_{n}\}\) be a set of remote sensing images which shows the same region at consecutive times \(T = \{t_{1}, t_{2}, \dotsc, t_{n}\}\). Each location of a pixel in an image (latitude, longitude, time) maps to a position in a 3D array. Each array position will be associated to a set of attributes values \(A = \{a_{1}, a_{2}, \dotsc, a_{m}\}\) which are the sensor measurements at each location in space-time (see figure below}). For optical sensors, these observations are proportional to Earth’s reflexion of the incoming solar radiation at different wavelengths of the electromagnetic spectrum.
In what follows, we use the term “coverage” in the same sense as the definition of a “data cube,” proposed from Appel and Pebesma (Appel and Pebesma 2019): A regular, dense Earth observation (EO) data cube is a four-dimensional array with dimensions x (longitude or easting), y (latitude or northing), and time, with the following properties:
This definition of EO data cubes reflects their construction. Data cubes are built of 2D images collected on a specific date constrained by satellite orbits. These images are then grouped together in arbitrary intervals. In principle, a data cube can have an image on time 1, another image at time 5, another at time 8, and so on. Since interpolation in space is different from interpolation in time, a data set composed by these images would make a valid data cube. Space needs to be compact (dense) in a data cube, but time does not need to be. Thus, a sequence of 2D + time images need not be dense. 2D images in a data cube can correspond to different instants of acquisition.
Web services interfaces isolate users from concrete implementations, thus increasing portability There are many ways to implement Earth observation data cubes. One of the earlier approaches is to break large images in chunks and store them in an object-relational DBMS, as done in RASDAMAN (Baumann et al. 1998). Google Earth Engine(Gorelick et al. 2017) uses a massive parallel design; image data is partitioned in hundreds or thousands of CPUs, processed separately and later aggregated using map-reduce techniques. The Open Data Cube software, after making the required geometric and radiometric transformations in the images, stores them in files, which are indexed in a PostgreSQL database (Lewis et al. 2017). Similar file-based approaches are used in the R package “gdalcubes” (Appel and Pebesma 2019). Each of these designs is associated to a different API, thus making compatibility and interoperability between them hard to achieve.
Given the different designs of Earth observation data cubes, and their inherent limitations for achieving interoperability, one natural approach is to propose Web services that would present a unified interface for accessing them. Web services interfaces isolate users from concrete implementations, thus increasing portability(Ferris and Farrell 2003). In the geospatial domain, web service standards developed by the Open Geospatial Consortium have achieved considerable impact (Percivall 2010). Since services such as WMS (Web Map Service) and WCS (Web Coverage Service) are used worldwide with success, it is natural to ask: how to design web services for big Earth observation (EO) data? The current work addresses this issue and proposes WTSS (Web Time Series Service), a new services for big EO data.
Given a series of remote sensing snapshots, we can reorganise them into a set of time series. A satellite image time series is obtained by taking measurements in the same pixel location \((x,y)\) in consecutive times \(t_1,...,t_m\), as shown in the figure below.
Recent results in the literature show that analysis of satellite image times series enables extracting long-term trends of land change \citep(Pasquarella et al. 2016). Such information which would be harder to obtain by processing 2D images separately. These analysis are supported by algorithms such as TWDTW (Maus et al. 2016), TIMESTAT (Jonsson and Eklundh 2004) and BFAST (Verbesselt et al. 2010). These algorithms process individual time-series and combine the results for selected periods to generate classified maps.
For example, classification methods such as TWDTW (Maus et al. 2019) can then break an image time series into a set of intervals. As an example, the figure below shows four events extracted from a remote sensing time series, expressed in terms of the its intervals. From 2000 to 2001 the area was a forest that was deforested in 2002. From 2003 to 2005 the area it was used for pasture and from 2005 to 2008, it was transformed into a cropland. This kind of classification is done by algorithms such that split a time series into a set of events. Combining snapshots with time series, scientists can explore the full depth of big remote sensing data archives.
Motivated by the need to retrieve satellite image time series from large 3D arrays, we have designed and implemented the Web Time Series Service (WTSS). A WTSS server takes as input a 3D array. Each array has a spatial and a temporal dimension and can be multidimensional in terms of its attributes. The WTSS service is independent of the actual data architecture used for 3D array store. It can work with solutions such as flat files, MapReduce distributed datasets, array databases or object-relational databases.
The WTSS API has three commands: (a) list_coverages that returns a list of coverages available in the server; (b) describe_coverage that returns the metadata for a given coverage; (c) time_series that returns a time series for a spatio-temporal location.
The first step towards using the service is connecting to a server that supports the WTSS protocol. Currently, Brazil’s National Institute for Space Research (INPE) runs such a service. In the package, the connection is enabled by using the URL of the service. So, the first procedure is to find the URL of a WTSS service.
# Connect to the WTSS server at INPE Brazil
wtss_inpe <- "https://brazildatacube.dpi.inpe.br/wtss/"
This operation allows clients to retrieve the capabilities provided by any server that implements WTSS. It returns a list of coverage names available in a server instance.
# Connect to the WTSS server at INPE Brazil
Rwtss::list_coverages(wtss_inpe)
## [1] "MOD13Q1-6" "MYD13Q1-6"
This operation returns the metadata for a given coverage identified by its name. It includes its range in the spatial and temporal dimensions.
# Connect to the WTSS server at INPE Brazil
desc <- Rwtss::describe_coverage(wtss_inpe, name = "MOD13Q1-6")
## ---------------------------------------------------------------------
## WTSS server URL = https://brazildatacube.dpi.inpe.br/wtss
## Cube (coverage) = MOD13Q1-6
## Timeline - 508 time steps
## start_date: 2000-02-18 end_date: 2022-03-22
## ---------------------------------------------------------------------
The coverage description is also saved as a tibble in the wtss object, to be used whenever required.
This operation requests the time series of values of a coverage attribute at a given location. Its parameters are: (a) wtss.obj: either a WTSS object (created by the operation wtss::WTSS as shown above) or a valid WTSS server URL; (b) name: Cube (coverage) name; (c) attributes: vector of band names (optional). If omitted, all bands are retrieved; (d) longitude: longitude in WGS84 coordinate system; (e)latitude: Latitude in WGS84 coordinate system; (f)start_date (optional): Start date in the format yyyy-mm-dd or yyyy-mm depending on the coverage. If omitted, the first date on the timeline is used; (g) end_date(optional): End date in the format yyyy-mm-dd or yyyy-mm depending on the coverage. If omitted, the last date of the timeline is used; (h) To access the BDC time series it is necessary to provide a token, provide the token through the token
parameter. To create a new BDC token, please see this tutorial.
# Request a time series from the "MOD13Q1" coverage
ndvi_ts <- Rwtss::time_series(
URL = wtss_inpe,
name = "MOD13Q1",
attributes = c("NDVI","EVI"),
longitude = -45.00,
latitude = -12.00,
start_date = "2000-02-18",
end_date = "2016-12-18",
token = "YOUR-BDC-TOKEN"
)
ndvi_ts
The result of the operation is a tibble
, which is a generalization of a data.frame
, the usual way in R to organise data in tables. Tibbles are part of the tidyverse
, a collection of R packages designed to work together in data manipulation (Wickham and Grolemund 2017). The tibble contains data and metadata. The first six columns contain the metadata: satellite, sensor, spatial and temporal information, and the coverage from where the data has been extracted. The spatial location is given in longitude and latitude coordinates for the “WGS84” ellipsoid. The time_series
column contains the time series data for each spatiotemporal location. This data is also organized as a tibble, with a column with the dates and the other columns with the values for each spectral band.
# Showing the contents of a time series
ndvi_ts$time_series[[1]]