predict_discrim calculates predicted class membership values for a linear or quadratic discriminant analysis,
returning a data.frame suitable for graphing or other analysis.
Usage
predict_discrim(
object,
newdata,
prior = object$prior,
dimen,
scores = FALSE,
posterior = "max",
...
)Arguments
- object
An object of class
"lda"or"qda"such as results fromMASS::lda()orMASS::qda()- newdata
A data frame of cases to be classified or, if
objecthas a formula, a data frame with columns of the same names as the variables used. A vector will be interpreted as a row vector. Ifnewdatais missing, an attempt will be made to retrieve the data used to fit theldaobject.- prior
The prior probabilities of the classes. By default, taken to be the proportions in what was set in the call to
MASS::lda()orMASS::qda()- dimen
The dimension of the space to be used. If this is less than the number of available dimensions, min(p, ng-1), only the first
dimendiscriminant components are used. (This argument is not yet implemented becauseMASS::qda()does not support this.)- scores
A logical. If
TRUE, the discriminant scores of the cases innewdataare appended as additional columns in the the result, with namesLD1,LD2, ...- posterior
Either a logical or the character string
"max". IfTRUE, the posterior probabilities for all classes are included as columns named for the classes. IfFALSE, these are omitted. If"max", the maximum value of the probabilities across the classes are included, with the variable name"maxp".- ...
arguments based from or to other methods, not yet used here
Value
A data.frame, containing the the predicted class of the observations, values of the newdata variables and the maximum value of the posterior probabilities of the classes. rownames() in the result are inherited from those in newdata.
Details
The predict() methods provided for MASS::lda() and MASS::qda() are a mess, because they return their results as
a list, with components class, posterior and x. This function is designed as a wrapper on those to return
results in a more consistent and flexible way.
For use in graphs, where you want to show the classification boundaries or regions, you should supply a newdata data frame consisting
of two focal variables which are varied over their ranges, with the remaining variables used in the discriminant analysis
held fixed at typical values.
Using the scores argument, the function also returns the scores on the discriminant functions. This is only available for
linear discriminant analysis with MASS::lda().
Examples
data(peng, package="heplots")
library(MASS)
peng.lda <- lda(species ~ bill_length + bill_depth + flipper_length + body_mass,
data = peng)
peng_pred <- predict(peng.lda)
str(peng_pred)
#> List of 3
#> $ class : Factor w/ 3 levels "Adelie","Chinstrap",..: 1 1 1 1 1 1 1 1 1 1 ...
#> $ posterior: num [1:333, 1:3] 1 1 0.984 1 1 ...
#> ..- attr(*, "dimnames")=List of 2
#> .. ..$ : chr [1:333] "1" "2" "3" "4" ...
#> .. ..$ : chr [1:3] "Adelie" "Chinstrap" "Gentoo"
#> $ x : num [1:333, 1:2] 4.32 2.44 2.98 4.54 5.66 ...
#> ..- attr(*, "dimnames")=List of 2
#> .. ..$ : chr [1:333] "1" "2" "3" "4" ...
#> .. ..$ : chr [1:2] "LD1" "LD2"
