Test your knowledge of the material on correspondence analysis in the following quiz to see how much you learned. This is entirely private for you---no records are kept of your performance.

Questions

1. Correspondence analysis (CA) is best described as:

CA is the analog of PCA for frequency data. Like PCA, it aims to account for the maximum amount of variation (χ² in this case) in a few dimensions and provides optimal scaling of row and column categories.

2. Correspondence analysis uses which mathematical technique to find optimal row and column scores?

CA uses singular value decomposition (SVD) of the matrix of residuals from independence. This decomposes the residuals as D = XΛYT, where X and Y are the row and column scores, and Λ contains the singular values.

3. The singular values (λi) in correspondence analysis represent:

The singular values λi are the (canonical) correlations between the row and column category scores on each dimension. For dimension 1, the scores are chosen to have the maximum possible correlation λ1, and so on for subsequent dimensions.

4. For a two-way contingency table with r rows and c columns, the CA solution has at most how many dimensions?

The CA solution has at most min(r - 1, c - 1) dimensions. This is the rank of the matrix of residuals from independence, similar to how PCA has at most min(n - 1, p) dimensions for n observations and p variables.

5. How does the Pearson chi-square statistic (χ²) relate to the inertia in CA?

The Pearson χ² statistic equals n times the sum of squared singular values: χ² = n × Σλ²i. The total inertia is Σλ²i = χ²/n, so the percentage of inertia explained by each dimension reflects the percentage of χ² accounted for.

6. What does the term "inertia" mean in correspondence analysis?

Inertia refers to the weighted variation or dispersion of the profile points from their centroid (weighted average). The physical analogy is to mass × distance², hence the term 'inertia.' Higher inertia indicates greater association between rows and columns.

7. In the default "symmetric" CA map, what does it mean when a row point and column point are plotted near each other?

In a symmetric CA map, when a row point and column point are near each other, it indicates a positive association between them—the residual from independence (dij) is positive. Proximity reflects similarity or attraction in the data.

8. Which of the following is TRUE about CA solutions?

CA solutions are nested, just like PCA. The first two dimensions of a 3D solution are identical to the 2D solution. Additionally, the centroid (weighted average) of row and column profiles is at the origin, and distances reflect chi-square distances.

9. Multiple correspondence analysis (MCA) differs from simple CA in that MCA:

MCA extends CA to n-way tables and analyzes all pairwise bivariate associations among the categorical variables. It can plot all factors in a single display and provides an optimal scaling of category scores across all variables simultaneously.

10. For MCA with Q categorical variables, the Burt matrix is:

The Burt matrix is B = ZTZ, the product of the indicator matrix Z and its transpose. The diagonal blocks contain marginal frequencies for each variable, and the off-diagonal blocks contain the bivariate contingency tables for each pair of variables.