## Warning: package 'htmltools' was built under R version 4.5.2
|
|
Friendly & Meyer (2016),
Discrete Data Analysis with R: Visualization and Modeling Techniques for
Categorical and Count Data
. This book provides the syllabus and main content for the course.
Use the code ADC22 for a 30% discount from the publisher's web site.
Web site for the book: http://ddar.datavis.ca/ |
|
|
Alan Agresti (2019),
Introduction to Categorical Data Analysis, 3rd Ed.
. A somewhat parallel book, offering a different perspective on categorical data analysis.
An ebook version can be purchased from the York Bookstore. See How to purchase your course materials for PSYC 6136 |
|
|
Agresti (2013)
Categorical Data Analysis
. A much more technical book, that many consider the 'bible' for categorical data analysis methods.
Web site for the book http://www.stat.ufl.edu/~aa/cda/cda.html Solutions manual for R https://home.comcast.net/~lthompson221/Splusdiscrete2.pdf A PDF copy of this book is available to students in this course. |
|
|
Fox (2015) Applied Regression Analysis and Generalized Linear Models . An excellent text on linear models; Part IV on Generalized Linear Models provides a clear and comprehensive discussion. |
In lectures and lab sessions I will be using R software exclusively, together with the R Studio user interface for R.
You are well-advised to download and install these to your computer so you can follow along.
Instructions to install R and R Studio for Windows and Mac. There is also a learnr tutorial that guides you through the steps.
I recommend that you set up an RStudio project for your work in
the course, where you can organize your notes and work on assignments,
projects, etc. I created a template for this on GitHub: my6136. You can simply
download this to your computer, and then open it in RStudio
(double-click on the file my6136.Rproj).
Alternatively, if use GitHub or you’re willing to create a GitHub account (highly recommended), you can fork and clone this repo into your own GitHub account. In the process, you’ll learn something about version control and the
The my6136 project has the following folders set up
for you to use, but change anything to suit your workflow.
my6136
├── assign
├── data
├── images
├── notes
├── R
└── tutorials
My install-vcd-pkgs.R script, to install the most useful packages for this course. Download and run this in your R or RStudio console.
R Studio
cheatsheets A handy collection of cheat sheets for R, R Studio and a
number of the most useful R packages. You can also get some of these in
R Studio from the menu Help -> Cheat sheets.
An Introduction to R Graphics Notes from my SCS short course on R Graphics.
The vcdExtra
package contains a number of vignettes, each giving practical
methods for working with categorical data and models.
Short Making tables in R
tutorial. Also: Udi’s Psy3136
Tables tutorial on making tables with the rempsyc and
apaTables packages.
The majority of the graphs in DDAR
and in my lectures use custom graphic methods implemented in the
vcd, vcdExtra, ca and other
packages specific to categorical data analysis. Yet it is also helpful
to learn how to make and customize graphs using ggplot2,
the modern lingua franca for specifying and producing
graphs.
Here are a few links that I find useful:
The ggplot2 Book is the bible of graphing with the Grammar of Graphics framework. It explains the logic of layers, geoms, scales, themes, etc. It has many, many small, easily understood examples.
The R Graphics Cookbook is a great resource of recipes for ggplot2, organized by type of graph (bar charts, line graphs, scatterplots, …) and things you might want to do with them.
My favorite book on general ideas about graphs: Claus Wilke, Fundamentals of Data Visualization. Well thought out, a wide range of topics, good practical advice, lots of examples, but no R code in the book.
My lecture slides from Psy 6135: Psychology of Data Visualization give a reasonable overview:
Friendly (2002). A brief history of mosaic displays, JCGS 11(1), 89-107. Traces the origin of visual and conceptual ideas leading to modern mosaic displays.
Slides from a talk at CARME 2011, Advances in Visualizing Categorical Data Using the vcd, gnm and vcdExtra Packages in R
Slides from a talk at CARME 2015, General Models and Graphs for Log Odds and Log Odds Ratios. This re-considers some standard loglinear models and graphical methods (correspondence analysis, mosaic plots) from the perspective of models and visualization for log odds and log odds ratios.
I’ll post here other things of interest for the topic of the week.
McNamara & Horton (2017), Wrangling categorical data in R describe some aspects of data import and tidying specific to categorical data.
See the DDAR web site, Chapter 3: Fitting and Graphing Discrete Distributions for the R code for figures in this chapter.
Several other R packages offer tools for fitting distributions:
The function MASS::fitdistr() provides maximum
likelihood fitting for a variety of univariate distributions, some for
continuous (“beta”, “cauchy”, “chi-squared”, “exponential”, “gamma”,
“log-normal”, “logistic”, “normal”, “t” and “weibull”) others for
discrete distributions: “geometric”, “negative binomial”, “Poisson”. No
graphical methods are available there.
The fitdistrplus
package provides a more comprehensive framework with the
fitdist() function for fitting a wide range of
distributions (both discrete and continuous) and offers various
goodness-of-fit plots and statistics (AIC, BIC, etc.). See the package
vignette
The discretefit
package in implements fast Monte Carlo simulations for
goodness-of-fit (GOF) tests for discrete distributions. See the package
vignette
See the DDAR web site, Chapter 4: Two-Way Contingency Tables for the R code for figures in this chapter.
Friendly, M. (1992). Mosaic Displays for Loglinear Models. ASA, Proceedings of the Statistical Graphics Section, 61–68.
Friendly, M. (1994). Mosaic Displays for Multi-way Contingency Tables. Journal of the American Statistical Association, 89, 190–200.
Friendly, M. (1999). Extending Mosaic Displays: Marginal, Conditional, and Partial Views of Categorical Data. Journal of Computational and Graphical Statistics, 8(3), 373–395
Clay Ford. Introduction
to Loglinear Models. UVA Library. Modern tutorial demonstrating the
use of glm() for loglinear modeling with substance use data
examples.
Loglinear
Models vignette. vcdExtra package. Practical examples using
loglm() in R with interpretations using independence
notation.
Mosaic
Plots vignette. vcdExtra package. Comprehensive guide to creating
and customizing mosaic plots using the vcd
package.
Visualizing
Multivariate Categorical Data. STHDA. Practical guide to visualizing
categorical data with examples using ggplot2 and the
vcd package.
Comparative
Study of vcd::mosaic and geom_mosaic. Community contributions for
EDAV. Compares traditional vcd::mosaic() with the newer
ggmosaic approach, with examples using Titanic
data.
CA
- Correspondence Analysis in R: Essentials. STHDA. Comprehensive
tutorial demonstrating how to compute and visualize CA using the
ca and factoextra packages, with examples
using contingency tables.
Correspondence
Analysis in R. Quick R (StatMethods). Concise tutorial showing how
to use the ca package with basic code examples for fitting
and plotting CA.
Sanchez, G. 5 Functions to do Correspondence Analysis in R. Compares five different R packages for conducting CA, helping you choose the best approach for your needs.
Correspondence Analysis for Historical Research with R. Programming Historian. Step-by-step tutorial showing how CA can be applied to historical data, with complete worked examples.
MCA
- Multiple Correspondence Analysis in R: Essentials. STHDA. Complete
guide to computing and visualizing MCA using FactoMineR and
factoextra, extending CA to more than two categorical
variables.
Swiebold, T. Multiple Correspondence Analysis. Multivariate Statistical Analysis using R. Chapter on MCA with detailed R code examples and interpretation.
How to Interpret Correspondence Analysis Plots (It Probably Isn’t the Way You Think). Displayr. Essential reading on common misconceptions about interpreting CA plots and the correct way to read distances and angles.
How Correspondence Analysis Works (With Examples). Displayr. Clear explanation of the mathematics and logic behind CA, with worked examples showing how to interpret results.
Sourial, N., et al. (2010). Correspondence analysis is a useful tool to uncover the relationships among categorical variables. Journal of Clinical Epidemiology, 63(6), 638-646. Tutorial paper demonstrating CA with medical research examples.
Clausen, S-E. (1998). Social Research Update 7: Correspondence Analysis. University of Surrey. Overview of CA applications in sociological research, explaining its value for exploring categorical data relationships.
Souza, A.C., et al. (2014). Correspondence Analysis applied to psychological research. Tutorial on applying CA to psychology research with SPSS and R examples.
Understanding Correspondence Analysis: Exploring Relationships between Categorical Variables in Sociology. Overview of CA’s applications in studying segregation patterns, social status classification, and other sociological phenomena.
Logit
Regression Tutorial. UCLA Statistical Consulting. Comprehensive
tutorial showing how to fit logistic regression using glm()
with family = "binomial", including interpretation of
coefficients and odds ratios.
Logistic Regression in R. DataCamp. Modern tutorial covering both base R and tidymodels approaches, with practical examples and interpretation guidance.
Logistic Regression Essentials in R. STHDA. Complete guide to fitting, evaluating and interpreting logistic regression models with working code examples.
Binary Logistic Regression in R. Stats and R. Step-by-step tutorial with a real dataset, covering model fitting, interpretation, and prediction.
Boehmke, B. & Greenwell, B. Logistic Regression. Hands-On Machine Learning with R. Advanced chapter covering variable importance, partial dependence plots, and model tuning.
Clay Ford. Visualizing
the Effects of Logistic Regression. UVA Library. Tutorial on using
the effects package to create effect displays that show how
predictors relate to probability of success.
Predictor Effects Graphics Gallery. Effects package vignette. Gallery of effect plots for various models including logistic regression, showing both link and response scales.
Visualizing
Regression Results in R. SSCC Wisconsin. Demonstrates the
margins and ggeffects packages for creating
marginal effects plots.
Plotting Estimates of Regression Models. sjPlot package vignette. Shows how to create forest plots and coefficient visualizations for logistic regression results.
Plotting Your Logistic Regression Models. University of Oregon R Club. Practical guide to visualizing logistic regression predictions and effects using base R and ggplot2.
Logistic Regression Assumptions and Diagnostics in R. STHDA. Complete guide to checking assumptions and identifying influential observations using standardized residuals, leverage, and Cook’s distance.
Model Diagnostics for Logistic Regression. Penn State STAT 504. Detailed coverage of residuals, leverage, and influence measures with interpretation guidelines.
Zhang, Z. (2016). Residuals and
regression diagnostics: focusing on logistic regression. Annals
of Translational Medicine, 4(10). Tutorial paper explaining
outliers, leverage, and influence with R examples using the
car package.
Newsom, J. Diagnostics for Logistic Regression. Portland State University. Handout covering residuals, influential observations, and diagnostic plots for logistic regression.
FAQ: How do I interpret odds ratios in logistic regression? UCLA Statistical Consulting. Clear explanation with worked examples showing how odds ratios relate to regression coefficients.
Nahhas, R.W. Interpretation of Logistic Regression Coefficients. Introduction to Regression Methods for Public Health Using R. Chapter on interpreting coefficients, odds ratios, and probability predictions.
Park, H.A. (2013). An Introduction to Logistic Regression: From Basic Concepts to Interpretation with Particular Attention to Nursing Domain. Journal of Korean Academy of Nursing, 43(2), 154-164. Tutorial emphasizing interpretation of odds ratios in health research.
Kayri, M. Predicting Social Trust with Binary Logistic Regression. Educational Sciences. Application of logistic regression to predict social trust using personality and demographic variables.
Liu, Y., et al. (2022). Multiple Logistic Regression Analysis of Smartphone Use in University Students. Frontiers in Psychology. Research using logistic regression to examine relationships between smartphone use, sleep quality, and health.
Ordinal
Logistic Regression. UCLA Statistical Consulting. Comprehensive
tutorial on using polr() from the MASS package for
proportional odds models, with interpretation of coefficients as
proportional odds ratios.
Nahhas, R.W. Ordinal
Logistic Regression. Introduction to Regression Methods for Public
Health Using R. Chapter covering ordered categorical responses with
examples using polr().
Keith McNulty. Proportional Odds Logistic Regression for Ordered Category Outcomes. Handbook of Regression Modeling in People Analytics. Emphasizes the proportional odds assumption and how to test it.
Barlaz, M. Ordinal Logistic Regression in R. Tutorial with examples using both MASS and VGAM packages for ordinal regression.
Clay Ford. Fitting
and Interpreting a Proportional Odds Model. UVA Library.
Step-by-step guide to fitting and interpreting proportional odds models
using polr().
Multinomial
Logistic Regression. UCLA Statistical Consulting. Tutorial on using
multinom() from the nnet package for unordered categorical
responses.
Keith McNulty. Multinomial Logistic Regression for Nominal Category Outcomes. Handbook of Regression Modeling in People Analytics. Covers when to use multinomial logit and how to interpret results.
Clay Ford. Getting
Started with Multinomial Logit Models. UVA Library. Introduction to
multinomial regression using the nnet package with
practical examples.
Multinomial Regression in R. R-statistics.co. Tutorial with contraceptive choice example showing model fitting and interpretation.
Agresti, A. Examples of Using R for Modeling Ordinal Data. Comprehensive examples using VGAM for various ordinal regression models including proportional odds and partial proportional odds.
VGAM Package for Ordinal
Regression. RPubs tutorial. Unified approach to ordinal and
polytomous regressions using vglm() with various link
functions.
Ordinal
Regression with VGAM. Practical tutorial on using
vglm() from VGAM package for cumulative logit
models.
The Proportional Odds Assumption. Penn State STAT 504. Explains the proportional odds assumption and how it relates to a latent continuous variable framework.
Understanding the Proportional Odds Assumption in Clinical Trials. Quanticate. Practical guide to testing and interpreting the proportional odds assumption.
Multinomial Logit Model Applications. ScienceDirect. Overview of applications in transportation, consumer behavior, occupational choice, and e-commerce.
Torres-Reyna, O. Logit, Probit, and Multinomial Logit Models in R. Princeton University. Tutorial covering binary, ordinal, and multinomial models with social science examples.
Copyright © 2018 Michael Friendly. All rights reserved. || lastModified :
friendly AT yorku DOT ca