predict_discrim calculates predicted class membership values for a linear or quadratic discriminant analysis,
returning a data.frame suitable for graphing or other analysis.
Usage
predict_discrim(
object,
newdata,
prior = object$prior,
dimen,
scores = FALSE,
posterior = FALSE,
...
)Arguments
- object
An object of class
"lda"or"qda"such as results fromMASS::lda()orMASS::qda()- newdata
A data frame of cases to be classified or, if
objecthas a formula, a data frame with columns of the same names as the variables used. A vector will be interpreted as a row vector. Ifnewdatais missing, an attempt will be made to retrieve the data used to fit theldaobject.- prior
The prior probabilities of the classes. By default, taken to be the proportions in what was set in the call to
MASS::lda()orMASS::qda()- dimen
The dimension of the space to be used. If this is less than the number of available dimensions, \(\min(p, ng-1)\), only the first
dimendiscriminant components are used. (This argument is not yet implemented becauseMASS::qda()does not support this.)- scores
A logical. If
TRUE, the discriminant scores of the cases innewdataare appended as additional columns in the the result, with namesLD1,LD2, ...- posterior
Either a logical or the character string
"max". IfTRUE, the posterior probabilities for all classes are included as columns named for the classes. IfFALSE, these are omitted. If"max", the maximum value of the probabilities across the classes are included, with the variable name"maxp".- ...
arguments based from or to other methods, not yet used here
Value
A data.frame, containing the the predicted class of the observations (named for the class in the model) and
values of the newdata variables. Other variables included are determined by the scores and posterior arguments.
rownames() in the result are inherited from those in newdata.
Details
The predict() methods provided for MASS::lda() and MASS::qda() are a mess, because they return their results as
a list, with components class, posterior and x. This function is designed as a wrapper on those to return
results in a more consistent and flexible way.
For use in graphs, where you want to show the classification boundaries or regions, you should supply a newdata data frame consisting
of two focal variables which are varied over their ranges, with the remaining variables used in the discriminant analysis
held fixed at typical values.
Using the scores argument, the function also returns the scores on the discriminant functions. This is only available for
linear discriminant analysis with MASS::lda().
Examples
library(candisc)
library(MASS) # for lda()
iris.lda <- lda(Species ~ ., iris)
pred_iris <- predict_discrim(iris.lda)
names(pred_iris)
#> [1] "Species" "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width"
# include scores, exclude posterior
pred_iris <- predict_discrim(iris.lda, scores = TRUE, posterior = FALSE)
names(pred_iris)
#> [1] "Species" "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width"
#> [6] "LD1" "LD2"
data(peng, package="heplots")
peng.lda <- lda(species ~ bill_length + bill_depth + flipper_length + body_mass,
data = peng)
peng_pred <- predict_discrim(peng.lda, scores = TRUE)
str(peng_pred)
#> 'data.frame': 333 obs. of 7 variables:
#> $ species : Factor w/ 3 levels "Adelie","Chinstrap",..: 1 1 1 1 1 1 1 1 1 1 ...
#> $ bill_length : num 39.1 39.5 40.3 36.7 39.3 38.9 39.2 41.1 38.6 34.6 ...
#> $ bill_depth : num 18.7 17.4 18 19.3 20.6 17.8 19.6 17.6 21.2 21.1 ...
#> $ flipper_length: num 181 186 195 193 190 181 195 182 191 198 ...
#> $ body_mass : num 3750 3800 3250 3450 3650 ...
#> $ LD1 : num 4.32 2.44 2.98 4.54 5.66 ...
#> $ LD2 : num 0.967 0.97 -0.168 1.626 0.824 ...
