From the original attributes included in the dimensions, we can
obtain new attributes that facilitate queries or offer new query
possibilities.
Enrich the who dimension
Suppose that we are interested in defining some broader age ranges
than the existing ones. This operation can be done by enriching the
corresponding dimension.
First, we export the attributes to consider in table form, in this
case only the age range.
tb_who <-
enrich_dimension_export(st_mrs_age,
name = "who",
attributes = c("age_range"))
Next, we can see the result of the export operation. It is a table
with the selected attributes where duplicate values have been eliminated
if there are any (in this case there are no repeated values).
1: <1 year |
2: 1-24 years |
3: 25-44 years |
4: 45-64 years |
5: 65+ years |
In the table we add the columns that we want. In this case a new
column to define the new broader age range.
v <-
c("0-24 years", "0-24 years", "25+ years", "25+ years", "25+ years")
tb_who <-
tibble::add_column(tb_who,
wide_age_range = v)
The new table can be seen below.
1: <1 year |
0-24 years |
2: 1-24 years |
0-24 years |
3: 25-44 years |
25+ years |
4: 45-64 years |
25+ years |
5: 65+ years |
25+ years |
We enrich the dimension considering the new data in the table.
st_mrs_age <-
st_mrs_age |>
enrich_dimension_import(name = "who", tb_who)
We can see the result below, where the dimension has the new defined
attribute.
1 |
1: <1 year |
0-24 years |
2 |
2: 1-24 years |
0-24 years |
3 |
3: 25-44 years |
25+ years |
4 |
4: 45-64 years |
25+ years |
5 |
5: 65+ years |
25+ years |
Enrich the where dimension
For the where dimension we can proceed in the same way as we
have done for the who dimension: Export the data, complete it
manually and import it again, as shown below.
tb_where <-
enrich_dimension_export(st_mrs_age,
name = "where",
attributes = c("division"))
The new table for division data can be seen below.
We look for the names of the divisions and add the data of the
regions to which they belong.
tb_where <-
tibble::add_column(
tb_where,
division_name = c(
"New England",
"Middle Atlantic",
"East North Central",
"West North Central",
"South Atlantic",
"East South Central",
"West South Central",
"Mountain",
"Pacific"
),
region = c('1',
'1',
'2',
'2',
'3',
'3',
'3',
'4',
'4'),
region_name = c(
"Northeast",
"Northeast",
"Midwest",
"Midwest",
"South",
"South",
"South",
"West",
"West"
)
)
st_mrs_age <-
st_mrs_age |>
enrich_dimension_import(name = "where", tb_where)
st_mrs_cause <-
st_mrs_cause |>
enrich_dimension_import(name = "where", tb_where)
To add the name of the states and the county to which each city
belongs, we could proceed in the same way. However, it is easier if we
try to locate this data and use it directly. These data are available in
the ft_usa_states
and ft_usa_city_county
data
sets, respectively.
However, if we operate in the same way, when importing the data an
error occurs. The reason is that not all the data in the dimension
matches the data in the imported table. We can determine the missing
data using the following function.
tb_missing <-
st_mrs_age |>
enrich_dimension_import_test(name = "where", ft_usa_states)
The result obtained is shown below.
48 |
3 |
Unknown |
Unknown |
East North Central |
2 |
Midwest |
78 |
6 |
Unknown |
Unknown |
East South Central |
3 |
South |
91 |
7 |
Unknown |
Unknown |
West South Central |
3 |
South |
111 |
9 |
Unknown |
Unknown |
Pacific |
4 |
West |
In all cases, the problem occurs for the value “Unknown” in the
state attribute. We must add a row to the data before importing
it.
tb_where_state <- ft_usa_states |>
tibble::add_row(state = "Unknown", state_name = "Unknown")
st_mrs_age <-
st_mrs_age |>
enrich_dimension_import(name = "where", tb_where_state)
st_mrs_cause <-
st_mrs_cause |>
enrich_dimension_import(name = "where", tb_where_state)
The same problem occurs and we apply the same solution to add the
county data.
tb_where_county <- ft_usa_city_county |>
tibble::add_row(city = "Unknown",
state = "Unknown",
county = "Unknown")
st_mrs_age <-
st_mrs_age |>
enrich_dimension_import(name = "where", tb_where_county)
st_mrs_cause <-
st_mrs_cause |>
enrich_dimension_import(name = "where", tb_where_county)
We can see the first rows of the final result below.
1 |
1 |
CT |
Bridgeport |
New England |
1 |
Northeast |
Connecticut |
Fairfield |
2 |
1 |
CT |
Hartford |
New England |
1 |
Northeast |
Connecticut |
Hartford |
3 |
1 |
CT |
New Haven |
New England |
1 |
Northeast |
Connecticut |
New Haven |
4 |
1 |
CT |
Waterbury |
New England |
1 |
Northeast |
Connecticut |
New Haven |
5 |
1 |
MA |
Boston |
New England |
1 |
Northeast |
Massachusetts |
Suffolk |
6 |
1 |
MA |
Cambridge |
New England |
1 |
Northeast |
Massachusetts |
Middlesex |
7 |
1 |
MA |
Fall River |
New England |
1 |
Northeast |
Massachusetts |
Bristol |
8 |
1 |
MA |
Lowell |
New England |
1 |
Northeast |
Massachusetts |
Middlesex |
9 |
1 |
MA |
Lynn |
New England |
1 |
Northeast |
Massachusetts |
Essex |
10 |
1 |
MA |
New Bedford |
New England |
1 |
Northeast |
Massachusetts |
Bristol |