## Warning: package 'htmltools' was built under R version 4.5.2

Books

Main texts

Friendly & Meyer (2016), Discrete Data Analysis with R: Visualization and Modeling Techniques for Categorical and Count Data . This book provides the syllabus and main content for the course. Use the code ADC22 for a 30% discount from the publisher's web site.
Web site for the book: http://ddar.datavis.ca/
Alan Agresti (2019), Introduction to Categorical Data Analysis, 3rd Ed. . A somewhat parallel book, offering a different perspective on categorical data analysis.
An ebook version can be purchased from the York Bookstore. See How to purchase your course materials for PSYC 6136

Supplementary readings

Agresti (2013) Categorical Data Analysis . A much more technical book, that many consider the 'bible' for categorical data analysis methods.
Web site for the book http://www.stat.ufl.edu/~aa/cda/cda.html
Solutions manual for R https://home.comcast.net/~lthompson221/Splusdiscrete2.pdf
A PDF copy of this book is available to students in this course.
Fox (2015) Applied Regression Analysis and Generalized Linear Models . An excellent text on linear models; Part IV on Generalized Linear Models provides a clear and comprehensive discussion.

Software

In lectures and lab sessions I will be using R software exclusively, together with the R Studio user interface for R.

You are well-advised to download and install these to your computer so you can follow along.

  • Instructions to install R and R Studio for Windows and Mac. There is also a learnr tutorial that guides you through the steps.

  • I recommend that you set up an RStudio project for your work in the course, where you can organize your notes and work on assignments, projects, etc. I created a template for this on GitHub: my6136. You can simply download this to your computer, and then open it in RStudio (double-click on the file my6136.Rproj).

    • Alternatively, if use GitHub or you’re willing to create a GitHub account (highly recommended), you can fork and clone this repo into your own GitHub account. In the process, you’ll learn something about version control and the

    • The my6136 project has the following folders set up for you to use, but change anything to suit your workflow.

my6136
  ├── assign
  ├── data
  ├── images
  ├── notes
  ├── R
  └── tutorials

Making plots with ggplot2

The majority of the graphs in DDAR and in my lectures use custom graphic methods implemented in the vcd, vcdExtra, ca and other packages specific to categorical data analysis. Yet it is also helpful to learn how to make and customize graphs using ggplot2, the modern lingua franca for specifying and producing graphs.

Here are a few links that I find useful:

Papers, talks, blogs and others

Weekly Resources

I’ll post here other things of interest for the topic of the week.

Week 1: Introduction to R

  • A (very) short introduction to R covers the basics of installing R and RStudio, the R Studio window layout, and an overview of R commands, data structures and functions. If you haven’t already installed R and R Studio, do so now, and work through some of the examples.

McNamara & Horton (2017), Wrangling categorical data in R describe some aspects of data import and tidying specific to categorical data.

Week 2: Discrete Distributions

  • See the DDAR web site, Chapter 3: Fitting and Graphing Discrete Distributions for the R code for figures in this chapter.

  • Several other R packages offer tools for fitting distributions:

    • The function MASS::fitdistr() provides maximum likelihood fitting for a variety of univariate distributions, some for continuous (“beta”, “cauchy”, “chi-squared”, “exponential”, “gamma”, “log-normal”, “logistic”, “normal”, “t” and “weibull”) others for discrete distributions: “geometric”, “negative binomial”, “Poisson”. No graphical methods are available there.

    • The fitdistrplus package provides a more comprehensive framework with the fitdist() function for fitting a wide range of distributions (both discrete and continuous) and offers various goodness-of-fit plots and statistics (AIC, BIC, etc.). See the package vignette

    • The discretefit package in implements fast Monte Carlo simulations for goodness-of-fit (GOF) tests for discrete distributions. See the package vignette

Week 3: Two-way Contingency Tables

Week 4: Loglinear Models and Mosaic Displays

Friendly, M. (1992). Mosaic Displays for Loglinear Models. ASA, Proceedings of the Statistical Graphics Section, 61–68.

Friendly, M. (1994). Mosaic Displays for Multi-way Contingency Tables. Journal of the American Statistical Association, 89, 190–200.

Friendly, M. (1999). Extending Mosaic Displays: Marginal, Conditional, and Partial Views of Categorical Data. Journal of Computational and Graphical Statistics, 8(3), 373–395

R Tutorials and Examples

  • Newsom, J. Loglinear Models Tutorial. Portland State University. Clear tutorial with practical examples including political voting data analysis.
  • Clay Ford. Introduction to Loglinear Models. UVA Library. Modern tutorial demonstrating the use of glm() for loglinear modeling with substance use data examples.

  • Loglinear Models vignette. vcdExtra package. Practical examples using loglm() in R with interpretations using independence notation.

  • Mosaic Plots vignette. vcdExtra package. Comprehensive guide to creating and customizing mosaic plots using the vcd package.

Visualization Resources

Research Applications

Week 5: Correspondence Analysis

R Tutorials and Examples

Interpretation and Visualization Guides

Research Applications

Books and Comprehensive Guides

  • Greenacre, M. (2017). Correspondence Analysis in Practice, 3rd Edition. CRC Press. The definitive practical guide to CA with applications across social, environmental and health sciences. Includes R code in appendices.

Week 6: Logistic Regression

R Tutorials and Examples

  • Logit Regression Tutorial. UCLA Statistical Consulting. Comprehensive tutorial showing how to fit logistic regression using glm() with family = "binomial", including interpretation of coefficients and odds ratios.

  • Logistic Regression in R. DataCamp. Modern tutorial covering both base R and tidymodels approaches, with practical examples and interpretation guidance.

  • Logistic Regression Essentials in R. STHDA. Complete guide to fitting, evaluating and interpreting logistic regression models with working code examples.

  • Binary Logistic Regression in R. Stats and R. Step-by-step tutorial with a real dataset, covering model fitting, interpretation, and prediction.

  • Boehmke, B. & Greenwell, B. Logistic Regression. Hands-On Machine Learning with R. Advanced chapter covering variable importance, partial dependence plots, and model tuning.

Effect Plots and Visualization

Model Diagnostics

Interpreting Odds Ratios

Research Applications

Week 7: Logistic Regression: Extensions

R Tutorials and Examples

Advanced Methods with VGAM

Testing Assumptions

Research Applications

Week 8

Week 9

Week 10

Week 11

 

Copyright © 2018 Michael Friendly. All rights reserved. || lastModified :

friendly AT yorku DOT ca

                  ORCID iD iconorcid.org/0000-0002-3237-0941