Pollen Data Challenge
A synthetic dataset generated by David Coleman at RCA Laboratories in Princeton, N.J and used in the 1986 American Statistical Association JSM meeting as a data challenge for the Statistical Graphics Section.
A data frame with 3848 observations on the following 5 variables, representing ficticious measurements of grains of pollen.
along X, a numeric vector
along y, a numeric vector
along z, a numeric vector
weight of pollen grain, a numeric vector
weight of pollen grain, a numeric vector
The first three variables are the lengths of geometric features observed sampled pollen grains - in the x, y, and z dimensions: a "ridge" along x, a "nub" in the y direction, and a "crack" in along the z dimension. The fourth variable is pollen grain weight, and the fifth is density.
In the description for the data challenge: "the data analyst is advised that there is more than one "feature" to these data. Each feature can be observed through various graphical techniques, but analytic methods, as well, can help "crack" the dataset."
There were several features embedded in this dataset: clusters of points, 5D ellipsoidal voids with no points, and finally, a collection of points which spelled out "EUREKA".
Papers by Becker et al. (1986) and Slomka (1986) describe their work on this problem.
Yihui Xie used this data as an illustration of the animate package, using rgl to zoom in on the magic word. See the video on https://vimeo.com/1982725.
The canonical source for this data is the StatLib -- Datasets Archive http://lib.stat.cmu.edu/datasets/pollen.data in the inconvenient form of .sh archive.
It is also available as data(pollen, package="animate")