
Datasets for categorical data analysis
Michael Friendly
2025-03-24
Source:vignettes/datasets.Rmd
datasets.RmdThe vcdExtra package contains 45 datasets, taken from
the literature on categorical data analysis, and selected to illustrate
various methods of analysis and data display. These are in addition to
the 33 datasets in the vcd package.
To make it easier to find those which illustrate a particular method,
the datasets in vcdExtra have been classified using method
tags. This vignette creates an “inverse table”, listing the datasets
that apply to each method. It also illustrates a general method for
classifying datasets in R packages.
Processing tags
Using the result of
vcdExtra::datasets(package="vcdExtra") I created a
spreadsheet, vcdExtra-datasets.xlsx, and then added method
tags.
dsets_tagged <- read_excel(here::here("inst", "extdata", "vcdExtra-datasets.xlsx"),
sheet="vcdExtra-datasets")
dsets_tagged <- dsets_tagged |>
dplyr::select(-Title, -dim) |>
dplyr::rename(dataset = Item)
head(dsets_tagged)
## # A tibble: 6 × 3
## dataset class tags
## <chr> <chr> <chr>
## 1 Abortion table loglinear;logit;2x2
## 2 Accident data.frame loglinear; glm; logistic
## 3 AirCrash data.frame reorder; ca
## 4 Alligator data.frame loglinear;multinomial;zeros
## 5 Bartlett table 2x2;loglinear; homogeneity;oddsratio
## 6 Burt data.frame caTo invert the table, need to split tags into separate observations, then collapse the rows for the same tag.
dset_split <- dsets_tagged |>
tidyr::separate_longer_delim(tags, delim = ";") |>
dplyr::mutate(tag = stringr::str_trim(tags)) |>
dplyr::select(-tags)
#' ## collapse the rows for the same tag
tag_dset <- dset_split |>
arrange(tag) |>
dplyr::group_by(tag) |>
dplyr::summarise(datasets = paste(dataset, collapse = "; ")) |> ungroup()
# get a list of the unique tags
unique(tag_dset$tag)
## [1] "2x2" "agree" "binomial" "ca" "glm"
## [6] "homogeneity" "lm" "logistic" "logit" "loglinear"
## [11] "mobility" "multinomial" "oddsratio" "one-way" "ordinal"
## [16] "poisson" "reorder" "square" "zeros"Make this into a nice table
Another sheet in the spreadsheet gives a more descriptive
topic for corresponding to each tag.
tags <- read_excel(here::here("inst", "extdata", "vcdExtra-datasets.xlsx"),
sheet="tags")
head(tags)
## # A tibble: 6 × 2
## tag topic
## <chr> <chr>
## 1 2x2 2 by 2 tables
## 2 agree observer agreement
## 3 binomial binomial distributions
## 4 ca correspondence analysis
## 5 glm generalized linear models
## 6 homogeneity homogeneity of associationNow, join this with the tag_dset created above.
tag_dset <- tag_dset |>
dplyr::left_join(tags, by = "tag") |>
dplyr::relocate(topic, .after = tag)
tag_dset |>
dplyr::select(-tag) |>
head()
## # A tibble: 6 × 2
## topic datasets
## <chr> <chr>
## 1 2 by 2 tables Abortion; Bartlett; Heart
## 2 observer agreement Mammograms
## 3 binomial distributions Geissler
## 4 correspondence analysis AirCrash; Burt; Draft1970table; Gilby; HospVisits;…
## 5 generalized linear models Accident; Cormorants; DaytonSurvey; Donner; Draft1…
## 6 homogeneity of association BartlettAdd links to help()
We’re almost there. It would be nice if the dataset names could be
linked to their documentation. This function is designed to work with
the pkgdown site. There are different ways this can be
done, but what seems to work is a link to
../reference/{dataset}.html Unfortunately, this won’t work
in the actual vignette.
add_links <- function(dsets,
style = c("reference", "help", "rdrr.io"),
sep = "; ") {
style <- match.arg(style)
names <- stringr::str_split_1(dsets, sep)
names <- dplyr::case_when(
style == "help" ~ glue::glue("[{names}](help({names}))"),
style == "reference" ~ glue::glue("[{names}](../reference/{names}.html)"),
style == "rdrr.io" ~ glue::glue("[{names}](https://rdrr.io/cran/vcdExtra/man/{names}.html)")
)
glue::glue_collapse(names, sep = sep)
}Make the table
Use purrr::map() to apply add_links() to
all the datasets for each tag.
(mutate(datasets = add_links(datasets)) by itself doesn’t
work.)
tag_dset |>
dplyr::select(-tag) |>
dplyr::mutate(datasets = purrr::map(datasets, add_links)) |>
knitr::kable()Voila!