Skip to contents

predict_discrim calculates predicted class membership values for a linear or quadratic discriminant analysis, returning a data.frame suitable for graphing or other analysis.

Usage

predict_discrim(
  object,
  newdata,
  prior = object$prior,
  dimen,
  scores = FALSE,
  posterior = "max",
  ...
)

Arguments

object

An object of class "lda" or "qda" such as results from MASS::lda() or MASS::qda()

newdata

A data frame of cases to be classified or, if object has a formula, a data frame with columns of the same names as the variables used. A vector will be interpreted as a row vector. If newdata is missing, an attempt will be made to retrieve the data used to fit the lda object.

prior

The prior probabilities of the classes. By default, taken to be the proportions in what was set in the call to MASS::lda() or MASS::qda()

dimen

The dimension of the space to be used. If this is less than the number of available dimensions, min(p, ng-1), only the first dimen discriminant components are used. (This argument is not yet implemented because MASS::qda() does not support this.)

scores

A logical. If TRUE, the discriminant scores of the cases in newdata are appended as additional columns in the the result, with names LD1, LD2, ...

posterior

Either a logical or the character string "max". If TRUE, the posterior probabilities for all classes are included as columns named for the classes. If FALSE, these are omitted. If "max", the maximum value of the probabilities across the classes are included, with the variable name "maxp".

...

arguments based from or to other methods, not yet used here

Value

A data.frame, containing the the predicted class of the observations, values of the newdata variables and the maximum value of the posterior probabilities of the classes. rownames() in the result are inherited from those in newdata.

Details

The predict() methods provided for MASS::lda() and MASS::qda() are a mess, because they return their results as a list, with components class, posterior and x. This function is designed as a wrapper on those to return results in a more consistent and flexible way.

For use in graphs, where you want to show the classification boundaries or regions, you should supply a newdata data frame consisting of two focal variables which are varied over their ranges, with the remaining variables used in the discriminant analysis held fixed at typical values.

Using the scores argument, the function also returns the scores on the discriminant functions. This is only available for linear discriminant analysis with MASS::lda().

Examples

data(peng, package="heplots")
library(MASS)
peng.lda <- lda(species ~ bill_length + bill_depth + flipper_length + body_mass, 
                data = peng)
peng_pred <- predict(peng.lda)
str(peng_pred)
#> List of 3
#>  $ class    : Factor w/ 3 levels "Adelie","Chinstrap",..: 1 1 1 1 1 1 1 1 1 1 ...
#>  $ posterior: num [1:333, 1:3] 1 1 0.984 1 1 ...
#>   ..- attr(*, "dimnames")=List of 2
#>   .. ..$ : chr [1:333] "1" "2" "3" "4" ...
#>   .. ..$ : chr [1:3] "Adelie" "Chinstrap" "Gentoo"
#>  $ x        : num [1:333, 1:2] 4.32 2.44 2.98 4.54 5.66 ...
#>   ..- attr(*, "dimnames")=List of 2
#>   .. ..$ : chr [1:333] "1" "2" "3" "4" ...
#>   .. ..$ : chr [1:2] "LD1" "LD2"