## Warning: package 'htmltools' was built under R version 4.5.2

Books

Main texts

Friendly & Meyer (2016), Discrete Data Analysis with R: Visualization and Modeling Techniques for Categorical and Count Data . This book provides the syllabus and main content for the course. Use the code ADC22 for a 30% discount from the publisher's web site.
Web site for the book: http://ddar.datavis.ca/
Alan Agresti (2019), Introduction to Categorical Data Analysis, 3rd Ed. . A somewhat parallel book, offering a different perspective on categorical data analysis.
An ebook version can be purchased from the York Bookstore. See How to purchase your course materials for PSYC 6136

Supplementary readings

Agresti (2013) Categorical Data Analysis . A much more technical book, that many consider the 'bible' for categorical data analysis methods.
Web site for the book http://www.stat.ufl.edu/~aa/cda/cda.html
Solutions manual for R https://home.comcast.net/~lthompson221/Splusdiscrete2.pdf
A PDF copy of this book is available to students in this course.
Fox (2015) Applied Regression Analysis and Generalized Linear Models . An excellent text on linear models; Part IV on Generalized Linear Models provides a clear and comprehensive discussion.

Software

In lectures and lab sessions I will be using R software exclusively, together with the R Studio user interface for R.

You are well-advised to download and install these to your computer so you can follow along.

  • Instructions to install R and R Studio for Windows and Mac. There is also a learnr tutorial that guides you through the steps.

  • I recommend that you set up an RStudio project for your work in the course, where you can organize your notes and work on assignments, projects, etc. I created a template for this on GitHub: my6136. You can simply download this to your computer, and then open it in RStudio (double-click on the file my6136.Rproj).

    • Alternatively, if use GitHub or you’re willing to create a GitHub account (highly recommended), you can fork and clone this repo into your own GitHub account. In the process, you’ll learn something about version control and the

    • The my6136 project has the following folders set up for you to use, but change anything to suit your workflow.

my6136
  ├── assign
  ├── data
  ├── images
  ├── notes
  ├── R
  └── tutorials

Making plots with ggplot2

The majority of the graphs in DDAR and in my lectures use custom graphic methods implemented in the vcd, vcdExtra, ca and other packages specific to categorical data analysis. Yet it is also helpful to learn how to make and customize graphs using ggplot2, the modern lingua franca for specifying and producing graphs.

Here are a few links that I find useful:

Papers, talks, blogs and others

Weekly Resources

I’ll post here other things of interest for the topic of the week.

Week 1: Introduction to R

  • A (very) short introduction to R covers the basics of installing R and RStudio, the R Studio window layout, and an overview of R commands, data structures and functions. If you haven’t already installed R and R Studio, do so now, and work through some of the examples.

McNamara & Horton (2017), Wrangling categorical data in R describe some aspects of data import and tidying specific to categorical data.

Week 2: Discrete Distributions

  • See the DDAR web site, Chapter 3: Fitting and Graphing Discrete Distributions for the R code for figures in this chapter.

  • Several other R packages offer tools for fitting distributions:

    • The function MASS::fitdistr() provides maximum likelihood fitting for a variety of univariate distributions, some for continuous (“beta”, “cauchy”, “chi-squared”, “exponential”, “gamma”, “log-normal”, “logistic”, “normal”, “t” and “weibull”) others for discrete distributions: “geometric”, “negative binomial”, “Poisson”. No graphical methods are available there.

    • The fitdistrplus package provides a more comprehensive framework with the fitdist() function for fitting a wide range of distributions (both discrete and continuous) and offers various goodness-of-fit plots and statistics (AIC, BIC, etc.). See the package vignette

    • The discretefit package in implements fast Monte Carlo simulations for goodness-of-fit (GOF) tests for discrete distributions. See the package vignette

Week 3: Two-way Contingency Tables

Week 4: Loglinear Models and Mosaic Displays

Foundations

R Tutorials and Examples

  • Newsom, J. Loglinear Models Tutorial. Portland State University. Clear tutorial with practical examples including political voting data analysis.
  • Clay Ford. Introduction to Loglinear Models. UVA Library. Modern tutorial demonstrating the use of glm() for loglinear modeling with substance use data examples.

  • Loglinear Models vignette. vcdExtra package. Practical examples using loglm() in R with interpretations using independence notation.

  • Mosaic Plots vignette. vcdExtra package. Comprehensive guide to creating and customizing mosaic plots using the vcd package.

Visualization Resources

Research Applications

Week 5: Correspondence Analysis

R Tutorials and Examples

Interpretation and Visualization Guides

Research Applications

Books and Comprehensive Guides

  • Greenacre, M. (2017). Correspondence Analysis in Practice, 3rd Edition. CRC Press. The definitive practical guide to CA with applications across social, environmental and health sciences. Includes R code in appendices.

Week 6: Logistic Regression

R Tutorials and Examples

  • Logit Regression Tutorial. UCLA Statistical Consulting. Comprehensive tutorial showing how to fit logistic regression using glm() with family = "binomial", including interpretation of coefficients and odds ratios.

  • Logistic Regression in R. DataCamp. Modern tutorial covering both base R and tidymodels approaches, with practical examples and interpretation guidance.

  • Logistic Regression Essentials in R. STHDA. Complete guide to fitting, evaluating and interpreting logistic regression models with working code examples.

  • Binary Logistic Regression in R. Stats and R. Step-by-step tutorial with a real dataset, covering model fitting, interpretation, and prediction.

  • Boehmke, B. & Greenwell, B. Logistic Regression. Hands-On Machine Learning with R. Advanced chapter covering variable importance, partial dependence plots, and model tuning.

Effect Plots and Visualization

Model Diagnostics

Interpreting Odds Ratios

Research Applications

Week 7: Logistic Regression: Extensions

R Tutorials and Examples

Advanced Methods with VGAM

Testing Assumptions

Research Applications

Week 8: Extending Loglinear Models

The gnm Package: Generalized Nonlinear Models

RC Models and Association Models

  • Fitting Row-Column Association Models. logmult package documentation. Details on fitting Goodman’s RC(M) association models using the rc() function with automatic selection of starting values.

  • logmult Package: Log-Multiplicative Models. CRAN page for logmult package. Fits log-multiplicative models including RC(M) row-column association models with convenient printing, plots, and bootstrap standard errors.

  • Short Reference for logmult. logmult vignette. Tutorial on fitting association models with one or several dimensions, including layer effects for stratified tables.

Models for Square Tables

  • Mobility Tables. vcdExtra package vignette. Extensive examples of models for square tables including symmetry, quasi-symmetry, and diagonal reference models using social mobility data.

  • Diagonal Reference Models with Dref. gnm package documentation. Details on specifying diagonal reference terms as introduced by Sobel (1981, 1985) for square tables with the form µ_ij = w γ_i + (1 - w) γ_j.

  • Quasi-Symmetry Model. Penn State STAT 504. Explanation of quasi-symmetry models for square contingency tables, contrasted with symmetry and marginal homogeneity.

  • Agresti, A. (1983). A Simple Diagonals-Parameter Symmetry and Quasi-Symmetry Model. Statistics & Probability Letters. Classic paper on diagonal-parameter models for square tables.

Specialized Models and Applications

  • Tomizawa, S., et al. (2022). Advances in Quasi-Symmetry for Square Contingency Tables. Symmetry, 14(5), 1051. Review of recent developments in quasi-symmetry models including ordinal QS and association-based models.

  • Kateri, M., et al. (2022). Quasi Association Models for Square Contingency Tables with Ordinal Categories. Symmetry, 14(4), 805. Parsimonious QS-type models for ordinal classifications based on local odds ratios.

  • Goodman, L.A. (1979). Simple Models for the Analysis of Association in Cross-Classifications having Ordered Categories. Journal of the American Statistical Association, 74(367), 537-552. Foundational paper on association models for ordered categorical data.

Research Applications

  • Wong, R.S-K. (1995). Extensions in the Use of Log-Multiplicative Scaled Association Models in Multiway Contingency Tables. Sociological Methods & Research, 23(4), 507-538. Applications of association models to social science research with multiway tables.

  • Becker, M.P. & Clogg, C.C. (1989). Analysis of Sets of Two-Way Contingency Tables Using Association Models. Journal of the American Statistical Association, 84(405), 142-151. Methods for analyzing multiple contingency tables using association models.

Week 9: GLMs for Count Data

Poisson Regression Tutorials

Overdispersion and Negative Binomial Models

Zero-Inflated and Hurdle Models

Model Diagnostics

Research Applications

Week 10

 

Copyright © 2018 Michael Friendly. All rights reserved. || lastModified :

friendly AT yorku DOT ca

                  ORCID iD iconorcid.org/0000-0002-3237-0941