Introduction to PxWebApiData

Øyvind Langsrud, Jan Bruusgaard, Solveig Bjørkholt and Susie Jentoft

2024-11-01

Preface

An introduction to the R-package PxWebApiData is given below. Six calls to the main function, ApiData, are demonstrated. First, two calls for reading data sets are shown. The third call captures meta data. However, in practise, one may look at the meta data first. Then three more examples and some background is given.

Specification by variable indices and variable id’s

The dataset below has three variables, Region, ContentsCode and Tid. The variables can be used as input parameters. Here two of the parameters are specified by variable id’s and one parameter is specified by indices. Negative values are used to specify reversed indices. Thus, we here obtain the two first and the two last years in the data.

A list of two data frames is returned; the label version and the id version.

ApiData("http://data.ssb.no/api/v0/en/table/04861",
        Region = c("1103", "0301"), ContentsCode = "Bosatte", Tid = c(1, 2, -2, -1))
$`04861: Area and population of urban settlements, by region, contents and year`
             region            contents year  value
1 Oslo municipality Number of residents 2000 504348
2 Oslo municipality Number of residents 2002 508134
3 Oslo municipality Number of residents 2023 705945
4 Oslo municipality Number of residents 2024 714630
5         Stavanger Number of residents 2000 106804
6         Stavanger Number of residents 2002 108271
7         Stavanger Number of residents 2023 140012
8         Stavanger Number of residents 2024 142897

$dataset
  Region ContentsCode  Tid  value
1   0301      Bosatte 2000 504348
2   0301      Bosatte 2002 508134
3   0301      Bosatte 2023 705945
4   0301      Bosatte 2024 714630
5   1103      Bosatte 2000 106804
6   1103      Bosatte 2002 108271
7   1103      Bosatte 2023 140012
8   1103      Bosatte 2024 142897

To return a single dataset with only labels use the function ApiData1. The function Apidata2 returns only id’s. To return a dataset with both labels and id’s in one dataframe use ApiData12.

ApiData12("http://data.ssb.no/api/v0/en/table/04861",
        Region = c("1103", "0301"), ContentsCode = "Bosatte", Tid = c(1, 2, -2, -1))
             region            contents year Region ContentsCode  Tid  value
1 Oslo municipality Number of residents 2000   0301      Bosatte 2000 504348
2 Oslo municipality Number of residents 2002   0301      Bosatte 2002 508134
3 Oslo municipality Number of residents 2023   0301      Bosatte 2023 705945
4 Oslo municipality Number of residents 2024   0301      Bosatte 2024 714630
5         Stavanger Number of residents 2000   1103      Bosatte 2000 106804
6         Stavanger Number of residents 2002   1103      Bosatte 2002 108271
7         Stavanger Number of residents 2023   1103      Bosatte 2023 140012
8         Stavanger Number of residents 2024   1103      Bosatte 2024 142897

Specification by TRUE, FALSE and imaginary values (e.g. 3i).

All possible values is obtained by TRUE and corresponds to filter "all": "*" in the api query. Elimination of a variable is obtained by FALSE. An imaginary value corresponds to filter "top" in the api query.

x <- ApiData("http://data.ssb.no/api/v0/en/table/04861",
        Region = FALSE, ContentsCode = TRUE, Tid = 3i)

To show either label version or id version

x[[1]]
                         contents year      value
1 Area of urban settlements (km²) 2022    2250.94
2 Area of urban settlements (km²) 2023    2266.99
3 Area of urban settlements (km²) 2024    2279.97
4             Number of residents 2022 4485236.00
5             Number of residents 2023 4554562.00
6             Number of residents 2024 4619969.00
x[[2]]
  ContentsCode  Tid      value
1        Areal 2022    2250.94
2        Areal 2023    2266.99
3        Areal 2024    2279.97
4      Bosatte 2022 4485236.00
5      Bosatte 2023 4554562.00
6      Bosatte 2024 4619969.00

Show additional information

comment list additional dataset information: Title, latest update and source.

comment(x)
                                                                  label
"04861: Area and population of urban settlements, by contents and year"
                                                                 source
                                                    "Statistics Norway"
                                                                updated
                                                 "2024-10-01T06:00:00Z"
                                                                tableid
                                                                "04861"
                                                               contents
                     "04861: Area and population of urban settlements,"

Obtaining meta data

Meta information about the data set can be obtained by returnMetaFrames = TRUE.

ApiData("http://data.ssb.no/api/v0/en/table/04861",  returnMetaFrames = TRUE)
$Region
   values      valueTexts
1    3101          Halden
2    3103            Moss
3    3105       Sarpsborg
4    3107     Fredrikstad
5    3110          Hvaler
6    3112            Råde
7    3114 Våler (Østfold)
8    3116        Skiptvet
9    3118   Indre Østfold
10   3120       Rakkestad
11   3122          Marker
12   3124         Aremark
13   3201           Bærum
14   3203           Asker
15   3205      Lillestrøm
16   3207    Nordre Follo
17   3209      Ullensaker
18   3212        Nesodden
19   3214           Frogn
20   3216          Vestby
21   3218              Ås
22   3220         Enebakk
 [ reached 'max' / getOption("max.print") -- omitted 914 rows ]

$ContentsCode
   values                      valueTexts
1   Areal Area of urban settlements (km²)
2 Bosatte             Number of residents

$Tid
   values valueTexts
1    2000       2000
2    2002       2002
3    2003       2003
4    2004       2004
5    2005       2005
6    2006       2006
7    2007       2007
8    2008       2008
9    2009       2009
10   2011       2011
11   2012       2012
12   2013       2013
13   2014       2014
14   2015       2015
15   2016       2016
16   2017       2017
17   2018       2018
18   2019       2019
19   2020       2020
20   2021       2021
21   2022       2022
22   2023       2023
 [ reached 'max' / getOption("max.print") -- omitted 1 rows ]

attr(,"text")
      Region ContentsCode          Tid 
    "region"   "contents"       "year" 
attr(,"elimination")
      Region ContentsCode          Tid 
        TRUE        FALSE        FALSE 
attr(,"time")
      Region ContentsCode          Tid 
       FALSE        FALSE         TRUE 

Aggregations using filter agg:

PxWebApi offers two more filters for groupings, agg: and vs:. You can see these filters in the code “API Query for this table” when you have made a table in PxWeb.

agg: is used for readymade aggregation groupings.

This example shows the use of aggregation in age groups and aggregated timeseries for the new Norwegian municipality structure from 2020. Also note the url where /en is replaced by /no. That returns labels in Norwegian instead of in English.

ApiData("http://data.ssb.no/api/v0/no/table/07459",
        Region = list("agg:KommSummer", c("K-3101", "K-3103")),
        Tid = 4i,
        Alder = list("agg:TodeltGrupperingB", c("H17", "H18")),
        Kjonn = TRUE)
$`07459: Befolkning, etter region, kjønn, alder, statistikkvariabel og år`
  region   kjønn             alder statistikkvariabel   år value
1 Halden Kvinner           0-17 år           Personer 2021  2978
2 Halden Kvinner           0-17 år           Personer 2022  2937
3 Halden Kvinner           0-17 år           Personer 2023  2933
4 Halden Kvinner           0-17 år           Personer 2024  2944
5 Halden Kvinner 18 år eller eldre           Personer 2021 12587
6 Halden Kvinner 18 år eller eldre           Personer 2022 12626
7 Halden Kvinner 18 år eller eldre           Personer 2023 12787
 [ reached 'max' / getOption("max.print") -- omitted 25 rows ]

$dataset
  Region Kjonn Alder ContentsCode  Tid value
1 K-3101     2   H17    Personer1 2021  2978
2 K-3101     2   H17    Personer1 2022  2937
3 K-3101     2   H17    Personer1 2023  2933
4 K-3101     2   H17    Personer1 2024  2944
5 K-3101     2   H18    Personer1 2021 12587
6 K-3101     2   H18    Personer1 2022 12626
7 K-3101     2   H18    Personer1 2023 12787
 [ reached 'max' / getOption("max.print") -- omitted 25 rows ]

There are two limitations in the PxWebApi using these filters.

  1. The name of the filter and the id’s are not shown in metadata, only in the code “API Query for this table”.
  2. The filters agg: and vs: can only take single elements as input. Filter "all":"*" eg. TRUE, does not work with agg: and vs:.

The other filter vs:, specify the grouping value sets, which is a part of the value pool. As it is only possible to give single elements as input, it is easier to query the value pool. This means that vs: is redundant.

In this example Region is the value pool and Fylker (counties) is the value set. As vs:Fylker is redundant, both will return the same:

  Region = list("vs:Fylker",c("01","02"))
  Region = list(c("01","02"))

Return the API query as JSON

In PxWebApi the original query is formulated as JSON. Using the parameter returnApiQuery is useful for debugging.

ApiData("http://data.ssb.no/api/v0/en/table/04861",  returnApiQuery = TRUE)
{
  "query": [
    {
      "code": "Region",
      "selection": {
        "filter": "item",
        "values": ["3101", "2399", "9999"]
      }
    },
    {
      "code": "ContentsCode",
      "selection": {
        "filter": "item",
        "values": ["Areal", "Bosatte"]
      }
    },
    {
      "code": "Tid",
      "selection": {
        "filter": "item",
        "values": ["2000", "2023", "2024"]
      }
    }
  ],
  "response": {
    "format": "json-stat2"
  }
} 

To convert an original JSON API query to a PxWebApiData query there is also a simple webpage PxWebApiData call creator.

Readymade datasets by GetApiData

Statistics Norway also provides an API with readymade datasets, available by http GET. The data is most easily retrieved with the GetApiData function, which is the same as using the parameter getDataByGET = TRUE in the ApiData function. This dataset is from Statistics Norway’s Economic trends forecasts.

x <- GetApiData("https://data.ssb.no/api/v0/dataset/934516.json?lang=en")
x[[1]]
   year                                         contents value
1  2024                           Gross domestic product   1.0
2  2024                              GDP Mainland Norway   0.7
3  2024                                 Employed persons   0.5
4  2024                        Unemployment rate (level)   4.1
5  2024                      Wages per standard man-year   5.3
6  2024                       Consumer price index (CPI)   3.4
7  2024                                          CPI-ATE   3.9
8  2024                                   Housing prices   2.5
9  2024                        Money market rate (level)   4.7
10 2024 Import-weighted NOK exchange rate (44 countries)   1.0
11 2025                           Gross domestic product   1.3
12 2025                              GDP Mainland Norway   2.1
13 2025                                 Employed persons   0.7
14 2025                        Unemployment rate (level)   4.1
15 2025                      Wages per standard man-year   4.6
 [ reached 'max' / getOption("max.print") -- omitted 25 rows ]
comment(x)
                                                                          label
"12880: Main economic indicators. Accounts and forecasts, by year and contents"
                                                                         source
                                                            "Statistics Norway"
                                                                        updated
                                                         "2024-09-13T06:00:00Z"

Eurostat data

Eurostat REST API offers JSON-stat version 2. It is possible to use this package to obtain data from Eurostat by using GetApiData or the similar functions with 1, 2 or 12 at the end

This example shows HICP total index, latest two periods for EU and Norway. See Eurostat for more.

urlEurostat <- paste0(   # Here the long url is split into several lines using paste0 
  "https://ec.europa.eu/eurostat/api/dissemination/statistics/1.0/data/prc_hicp_mv12r", 
  "?format=JSON&lang=EN&lastTimePeriod=2&coicop=CP00&geo=NO&geo=EU")
urlEurostat
[1] "https://ec.europa.eu/eurostat/api/dissemination/statistics/1.0/data/prc_hicp_mv12r?format=JSON&lang=EN&lastTimePeriod=2&coicop=CP00&geo=NO&geo=EU"
GetApiData12(urlEurostat)
No encoding supplied: defaulting to UTF-8.
  Time frequency                         Unit of measure
1        Monthly Moving 12 months average rate of change
2        Monthly Moving 12 months average rate of change
3        Monthly Moving 12 months average rate of change
4        Monthly Moving 12 months average rate of change
  Classification of individual consumption by purpose (COICOP)
1                                               All-items HICP
2                                               All-items HICP
3                                               All-items HICP
4                                               All-items HICP
                                                                                   Geopolitical entity (reporting)
1 European Union (EU6-1958, EU9-1973, EU10-1981, EU12-1986, EU15-1995, EU25-2004, EU27-2007, EU28-2013, EU27-2020)
2 European Union (EU6-1958, EU9-1973, EU10-1981, EU12-1986, EU15-1995, EU25-2004, EU27-2007, EU28-2013, EU27-2020)
3                                                                                                           Norway
4                                                                                                           Norway
     Time freq         unit coicop geo    time value
1 2024-08    M RCH_MV12MAVR   CP00  EU 2024-08   3.1
2 2024-09    M RCH_MV12MAVR   CP00  EU 2024-09   2.8
3 2024-08    M RCH_MV12MAVR   CP00  NO 2024-08   3.4
4 2024-09    M RCH_MV12MAVR   CP00  NO 2024-09   3.4

Practical example

We would like to extract the number of female R&D personel in the services sector of the Norwegian business life for the years 2019 and 2020.

  1. Locate the relevant table at https://www.ssb.no that contains information on R&D personel. Having obtained the relevant table, table 07964, we create the link https://data.ssb.no/api/v0/no/table/07964/

  2. Load the package.

library(PxWebApiData)
  1. Check which variables that exist in the data.
variables <- ApiData("https://data.ssb.no/api/v0/no/table/07964/",
                     returnMetaFrames = TRUE)

names(variables)
[1] "NACE2007"     "ContentsCode" "Tid"         
  1. Check which values each variable contains.
values <- ApiData("https://data.ssb.no/api/v0/no/table/07964/",
                  returnMetaData = TRUE)

values[[1]]$values
 [1] "A-N"       "A03"       "B05-B09"   "B06_B09.1" "C"         "C10-C11"  
 [7] "C13"       "C14-C15"   "C16"       "C17"       "C18"       "C19-C20"  
[13] "C21"       "C22"       "C23"       "C24"       "C25"       "C26"      
[19] "C26.3"     "C26.5"     "C27"       "C28"       "C29"       "C30"      
[25] "C30.1"     "C31"       "C32"       "C32.5"     "C33"       "D35"      
[31] "E36-E39"   "F41-F43"   "G-N"       "G46"       "H49-H53"   "J58"      
[37] "J58.2"     "J59-J60"   "J61"       "J62"       "J63"       "K64-K66"  
[43] "M70"       "M71"       "M72"      
 [ reached getOption("max.print") -- omitted 2 entries ]
values[[2]]$values
[1] "EnhetTot"           "EnheterFoU"         "FoUpersonale"      
[4] "KvinneligFoUpers"   "FoUPersonaleUoHutd" "FoUPersonaleDoktor"
[7] "FoUArsverk"         "FoUArsverkPers"     "FoUArsverkUtd"     
values[[3]]$values
 [1] "2007" "2008" "2009" "2010" "2011" "2012" "2013" "2014" "2015" "2016"
[11] "2017" "2018" "2019" "2020" "2021" "2022"
  1. Define these variables in the query to sort out the values we want.
mydata <- ApiData("https://data.ssb.no/api/v0/en/table/07964/",
                Tid = c("2021", "2022"), # Define year to 2021 and 2022
                NACE2007 = "G-N", # Define the services sector
                ContentsCode = c("KvinneligFoUpers")) # Define women R&D personell

mydata <- mydata[[1]] # Extract the first list element, which contains full variable names.

head(mydata)
  industry (SIC2007)             contents year value
1     Services total Female R&D personnel 2021  4904
2     Services total Female R&D personnel 2022  5449
  1. Show additional information.
comment(mydata)
                                                                                                  label
"07964: O07964: R&D personnel and R&D full-time equivalents (FTE) in Business Enterprise sector, by industry (SIC2007), contents and year"
                                                                                                 source
                                                                                    "Statistics Norway"
                                                                                                updated
                                                                                 "2024-02-16T07:00:00Z"
                                                                                                tableid
                                                                                                "07964"
                                                                                               contents
      "07964: O07964: R&D personnel and R&D full-time equivalents (FTE) in Business Enterprise sector,"

Background

PxWeb and it’s API, PxWebApi is used as output database (Statbank) by many statistical agencies in the Nordic countries and several others, i.e. Statistics Norway, Statistics Finland, Statistics Sweden. See list of installations.

For hints on using PxWebApi in general see PxWebApi User Guide.