Instructor

  • Photo mosaic of me (composed of images from the history of data vis); click on the image for a larger version

Class meetings

  • Tuesday 2:30 pm - 5:30 pm, in the Hebb Computer lab, 159 BSB.

Course Description

This course is designed as a broad, applied introduction to the statistical analysis of categorical (or discrete) data, such as counts, proportions, nominal variables, ordinal variables, discrete variables with few values, continuous variables grouped into a small number of categories, etc.

  • The course begins with methods designed for cross-classified table of counts, (i.e., contingency tables), using simple chi square-based methods.

  • It progresses to generalized linear models, for which log-linear models provide a natural extension of simple chi square-based methods.

  • This framework is then extended to comprise logit and logistic regression models for binary responses and generalizations of these models for polytomous (multicategory) outcomes.

Throughout, there is a strong emphasis on associated graphical methods for visualizing categorical data, checking model assumptions, etc. Lab sessions will familiarize the student with software using R for carrying out these analyses.

Assignments

There will be occasional short assignments posted here and announced in class. These assignments are ungraded. Like the tutorials, they are meant to give practice in working with categorical data, but give a chance for corrective feedback when you submit your work.

They are usually due in two weeks from when announced. Details regarding a useful way of formatting R exercises are described in Assignment 1. See Render an R script to a report, which describes how to compile HTML, PDF, or MS Word notebooks from R scripts for further details.

Please submit your assignments to me by email, as a PDF, Word, or HTML attachment (together with the associated R or Rmd file), with a Subject: line “PSY 6136: Assignment XX”. To help me keep them straight, it would be most convenient to name them in a consistent style, something like “YourName-AssignXX.{pdf,docx,html}”.

In the 2023 year, Assignments 3 & 4 are no longer necessary to submit.

Evaluation

There are three components to your evaluation in the course: two take-home projects (each worth 30%) that will involve analysis of one or more data sets together with a research report describing the background, your analyses, results and conclusions. For these, you can use any software you like, although R is strongly encouraged.

A final project (worth 40%) is meant for you to spread your wings, and take up some topic of categorical data analysis, beyond the course materials, perhaps related to your area of research.

Here is a template for a markdown .Rmd file you can use if you are working in RStudio and want to write in R markdown.

  1. Project 1: a selection of data sets for the material up to and including logistic regression. Due date: Feb. 24 (but earlier appreciated)

  2. Project 2: a selection of data sets for the material from logistic regression to the end of the course. Due date: Mar. 31

  3. The remaining 40% can be earned as a research paper / project extending the scope of CDA in some way. Some sample project topics are described in 6136 Individual or Team Projects

Re-use policy

The lecture slides, tutorials and R scripts linked here are available under a Creative Commons Attribution-NonCommercial-ShareAlike license. They are available to everybody under the terms of this license and can be shared. This means that you can re-use these materials for non-commercial purposes as long as all uses include an appropriate credit to this source.

All other materials, notably course videos and support material files, should not be copied beyond your personal machines and hence are not available for redistribution.

 

Copyright © 2018 Michael Friendly. All rights reserved. || lastModified :

friendly AT yorku DOT ca

                  ORCID iD iconorcid.org/0000-0002-3237-0941