Datasets for categorical data analysis
Michael Friendly
The vcdExtra
package contains 45 datasets, taken from
the literature on categorical data analysis, and selected to illustrate
various methods of analysis and data display. These are in addition to
the 33 datasets in the vcd package.
To make it easier to find those which illustrate a particular method,
the datasets in vcdExtra
have been classified using method
tags. This vignette creates an “inverse table”, listing the datasets
that apply to each method. It also illustrates a general method for
classifying datasets in R packages.
Processing tags
Using the result of
I created a
spreadsheet, vcdExtra-datasets.xlsx
, and then added method
dsets_tagged <- read_excel(here::here("inst", "extdata", "vcdExtra-datasets.xlsx"),
dsets_tagged <- dsets_tagged |>
dplyr::select(-Title, -dim) |>
dplyr::rename(dataset = Item)
## # A tibble: 6 × 3
## dataset class tags
## <chr> <chr> <chr>
## 1 Abortion table loglinear;logit;2x2
## 2 Accident data.frame loglinear; glm; logistic
## 3 AirCrash data.frame reorder; ca
## 4 Alligator data.frame loglinear;multinomial;zeros
## 5 Bartlett table 2x2;loglinear; homogeneity;oddsratio
## 6 Burt data.frame ca
To invert the table, need to split tags into separate observations, then collapse the rows for the same tag.
dset_split <- dsets_tagged |>
tidyr::separate_longer_delim(tags, delim = ";") |>
dplyr::mutate(tag = stringr::str_trim(tags)) |>
#' ## collapse the rows for the same tag
tag_dset <- dset_split |>
arrange(tag) |>
dplyr::group_by(tag) |>
dplyr::summarise(datasets = paste(dataset, collapse = "; ")) |> ungroup()
# get a list of the unique tags
Make this into a nice table
Another sheet in the spreadsheet gives a more descriptive
for corresponding to each tag.
tags <- read_excel(here::here("inst", "extdata", "vcdExtra-datasets.xlsx"),
Now, join this with the tag_dset
created above.
tag_dset <- tag_dset |>
dplyr::left_join(tags, by = "tag") |>
dplyr::relocate(topic, .after = tag)
tag_dset |>
dplyr::select(-tag) |>
Add links to help()
We’re almost there. It would be nice if the dataset names could be
linked to their documentation. This function is designed to work with
the pkgdown
site. There are different ways this can be
done, but what seems to work is a link to
Unfortunately, this won’t work
in the actual vignette.
add_links <- function(dsets,
style = c("reference", "help", "rdrr.io"),
sep = "; ") {
style <- match.arg(style)
names <- stringr::str_split_1(dsets, sep)
names <- dplyr::case_when(
style == "help" ~ glue::glue("[{names}](help({names}))"),
style == "reference" ~ glue::glue("[{names}](../reference/{names}.html)"),
style == "rdrr.io" ~ glue::glue("[{names}](https://rdrr.io/cran/vcdExtra/man/{names}.html)")
glue::glue_collapse(names, sep = sep)
Make the table
Use purrr::map()
to apply add_links()
all the datasets for each tag.
(mutate(datasets = add_links(datasets))
by itself doesn’t
tag_dset |>
dplyr::select(-tag) |>
dplyr::mutate(datasets = purrr::map(datasets, add_links)) |>