The dkanr
package is an R client to the DKAN REST API. dkanr
implements all the methods available via the DKAN REST API and DKAN datastore API. Additionnally, it provides a few wrapper functions to facilitate interacting with the DKAN API from R
.
In this brief introduction, we will see how to download data from a specific data set. In the process, we will see how to:
To set-up a connection without authentication, you just need the site URL.
If authentication is required, you will also need to provide a valid username and password.
You can verify that you are successfully connected by printing your connection information.
## <dkanr settings>
## Base URL: https://data.louisvilleky.gov
## Cookie:
## Token:
list_nodes_all()
While exploring the offerings of a catalog, you can retrieve all the available datasets with a simple query.
# Get a list of all datasets
resp <- list_nodes_all(filters = c(type = 'dataset'), as = 'df')
# Print the first 10 datasets
resp %>%
select(nid, title, uri) %>%
arrange(title) %>%
head(n = 10)
## # A tibble: 10 x 3
## nid title uri
## <chr> <chr> <chr>
## 1 8076 311 Service Requests https://data.louisvilleky.gov/api/data~
## 2 4526 ABC License Data https://data.louisvilleky.gov/api/data~
## 3 4926 ALL Checks https://data.louisvilleky.gov/api/data~
## 4 2686 Abandoned Urban Property https://data.louisvilleky.gov/api/data~
## 5 5566 Absenteeism https://data.louisvilleky.gov/api/data~
## 6 2781 Account Breakdown by Prog~ https://data.louisvilleky.gov/api/data~
## 7 2166 Active Contractors https://data.louisvilleky.gov/api/data~
## 8 8216 Active Permits https://data.louisvilleky.gov/api/data~
## 9 4496 Aerial Photogrids https://data.louisvilleky.gov/api/data~
## 10 5296 Air Emission Sources https://data.louisvilleky.gov/api/data~
Say you are interested in a specific dataset from the catalog, for instance, the “Active Permits” dataset. You can easily retrieve this dataset metadata using the dataset node ID.
# Print only the "Active Permits" dataset information
resp %>%
filter(title == 'Active Permits') %>%
select(nid, title, uri, type)
## # A tibble: 1 x 4
## nid title uri type
## <chr> <chr> <chr> <chr>
## 1 8216 Active Permi~ https://data.louisvilleky.gov/api/dataset/nod~ datas~
## <DKAN Node> #8216
## Type: dataset
## Title: Active Permits
## UUID: 7e83b96e-3b53-4fc5-9a4c-32af30571787
## Created/Modified: 1467293251 / 1520511325
## [1] "vid" "uid"
## [3] "title" "log"
## [5] "status" "comment"
## [7] "promote" "sticky"
## [9] "vuuid" "nid"
## [11] "type" "language"
## [13] "created" "changed"
## [15] "tnid" "translate"
## [17] "uuid" "revision_timestamp"
## [19] "revision_uid" "body"
## [21] "field_additional_info" "field_author"
## [23] "field_contact_email" "field_contact_name"
## [25] "field_data_dictionary" "field_frequency"
## [27] "field_granularity" "field_license"
## [29] "field_public_access_level" "field_related_content"
## [1] "Active Permits"
Once you have identified a dataset of interest, you will probably want to download actual data. Multiple data files and documents may be attached to a single dataset, so you’ll first need to list all the resources (data files, and other type of documents) that are linked to the dataset you are interested in.
Here, a single resource is attached to the “Active Permits” dataset
## [1] "8221"
You can then use the resource node ID to retrieve its metadata.
## <DKAN Node> #8221
## Type: resource
## Title: Active Permits
## UUID: 65c4458b-1804-4bf2-b647-b2744648f647
## Created/Modified: 1467293303 / 1520508729
Data can then be dowloaded either as
Retrieve the resource URL from the resource metadata
## [1] "https://data.louisvilleky.gov/sites//default//files//ActivePermits_7.csv"
Some data files may be directly queried through the API. Only data files that have been imported into the DKAN datastore can be queried through the API.
First, you’ll need to check if the data file you are interested in is available from the DKAN datastore
## [1] TRUE
If this is the case, you’ll be able to retrieve data directly from the datastore. In order to do so, you’ll have to use the resource UUID (Just another unique ID number)
ds_search_all(resource_id = metadata_rs$uuid, as = 'df') %>%
select(PERMITNUMBER, PERMITTYPE, STATUS, SQUAREFEET)
## # A tibble: 100 x 4
## PERMITNUMBER PERMITTYPE STATUS SQUAREFEET
## <chr> <chr> <chr> <chr>
## 1 54912 Building Permit Issued 842
## 2 106485 Building Permit Issued 1300
## 3 107817 Building Permit Issued 487
## 4 113132 Building Permit Issued 3256
## 5 110301 Building Permit Issued 400
## 6 115478 Building Permit Issued 2938
## 7 281965 Building Permit Issued 1050
## 8 281380 Building Permit Issued 360
## 9 283077 Building Permit Issued 1225
## 10 278714 Building Permit Issued 14710
## # ... with 90 more rows