Skip to contents

The Toronto Word Pool consists of 1080 words in various grammatical classes together with a variety of normative variables.

The TWP contains high frequency nouns, adjectives, and verbs taken originally from the Thorndike-Lorge (1944) norms. This word pool has been used in hundreds of studies at Toronto and elsewhere.

Usage

data(TWP)

Format

A data frame with 1093 observations on the following 12 variables.

itmno

item number

word

the word

imagery

imagery rating

concreteness

concreteness rating

letters

number of letters

frequency

word frequency, from the Kucera-Francis norms

foa

a measure of first order approximation to English. In a first-order approximation, the probability of generating any string of letters is based on the frequencies of occurrence of individual letters in the language.

soa

a measure of second order approximation to English, based on bigram frequencies.

onr

Orthographic neighbor ratio, taken from Landauer and Streeter (1973). It is the ratio of the frequency of the word in Kucera and Francis (1967) count divided by the sum of the frequencies of all its orthographic neighbors.

dictcode

dictionary codes, a factor indicating the collection of grammatical classes, 1-5, for a given word form

. In the code, "1" in any position means the item had a dictionary definition as a noun; similarly, a "2" means a verb, "3" means an adjective, "4" means an adverb, and "5" was used to cover all other grammatical categories (but in practice was chiefly a preposition). Thus an entry "2130" indicates an item defined as a verb, noun, and an adjective in that order of historical precedence.
noun

percent noun usage. Words considered unambiguous based on dictcode are listed as 0 or 100; other items were rated in a judgment task.

canadian

a factor indicating an alternative Canadian spelling of a given word

Details

The last 13 words in the list are alternative Canadian spellings of words listed earlier, and have duplicate itmno values.

Source

Friendly, M., Franklin, P., Hoffman, D. & Rubin, D. The Toronto Word Pool, Behavior Research Methods and Instrumentation, 1982, 14(4), 375-399. http://datavis.ca/papers/twp.pdf.

References

Kucera and Francis, W.N. (1967). Computational Analysis of Present-Day American English. Providence: Brown University Press.

Landauer, T. K., & Streeter, L. A. Structural differences between common and rare words: Failure of equivalent assumptions for theories of word recognition. Journal of Verbal Learning and Verbal Behavior, 1973, 11, 119-131.

Examples

data(TWP)
str(TWP)
#> 'data.frame':	1093 obs. of  12 variables:
#>  $ itmno       : int  1 2 3 4 5 6 7 8 9 10 ...
#>  $ word        : chr  "abide" "able" "aboard" "about" ...
#>  $ imagery     : num  2 2.7 3.6 1.4 4 3 3.6 4.6 4.5 3.3 ...
#>  $ concreteness: num  2.8 3 4.2 1.9 4 3.7 4.4 3.6 3.4 3.4 ...
#>  $ letters     : int  5 4 6 5 5 7 6 6 5 6 ...
#>  $ frequency   : int  7 216 25 1815 296 53 28 13 18 72 ...
#>  $ foa         : num  -27.6 -23.4 -32.4 -28.5 -29.4 -35.3 -30.5 -34 -28.2 -32.3 ...
#>  $ soa         : num  -24.5 -18.4 -30.8 -23 -22.2 -32.3 -28.9 -38.2 -22.1 -32.5 ...
#>  $ onr         : num  0.09 0.96 0.93 1 0.99 1 0.88 1 0.86 0.89 ...
#>  $ dictcode    : Factor w/ 49 levels "1","2","3","4",..: 2 3 3 4 5 1 3 2 9 2 ...
#>  $ noun        : int  0 0 0 0 0 100 0 0 32 0 ...
#>  $ canadian    : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
summary(TWP)
#>      itmno            word              imagery       concreteness  
#>  Min.   :   1.0   Length:1093        Min.   :1.200   Min.   :1.600  
#>  1st Qu.: 271.0   Class :character   1st Qu.:3.000   1st Qu.:3.200  
#>  Median : 540.0   Mode  :character   Median :4.100   Median :4.000  
#>  Mean   : 540.3                      Mean   :4.197   Mean   :4.342  
#>  3rd Qu.: 809.0                      3rd Qu.:5.400   3rd Qu.:5.500  
#>  Max.   :1080.0                      Max.   :6.800   Max.   :7.000  
#>                                                                     
#>     letters       frequency            foa              soa        
#>  Min.   :3.00   Min.   :   0.00   Min.   :-48.60   Min.   :-46.50  
#>  1st Qu.:6.00   1st Qu.:  12.50   1st Qu.:-36.80   1st Qu.:-32.20  
#>  Median :6.00   Median :  30.00   Median :-33.20   Median :-29.00  
#>  Mean   :6.27   Mean   :  75.45   Mean   :-33.47   Mean   :-29.18  
#>  3rd Qu.:7.00   3rd Qu.:  72.00   3rd Qu.:-30.20   3rd Qu.:-25.90  
#>  Max.   :8.00   Max.   :1815.00   Max.   :-18.90   Max.   :-17.00  
#>                 NA's   :6                                          
#>       onr            dictcode        noun        canadian
#>  Min.   :0.0000   1      :253   Min.   :  0.00   0:1080  
#>  1st Qu.:0.6500   12     :211   1st Qu.:  0.00   1:  13  
#>  Median :0.9900   2      :155   Median : 39.00           
#>  Mean   :0.7987   3      :101   Mean   : 47.11           
#>  3rd Qu.:1.0000   21     :100   3rd Qu.:100.00           
#>  Max.   :1.0000   4      : 61   Max.   :100.00           
#>  NA's   :7        (Other):212   NA's   :118              
# quick view of distributions
boxplot(scale(TWP[, 3:9]))


plotDensity(TWP, "imagery")

plotDensity(TWP, "concreteness")

plotDensity(TWP, "frequency")


# select low imagery, concreteness and frequency words
R <- list(imagery=c(1,5), concreteness=c(1,4), frequency=c(0,30))
pickList(TWP, R)
#>     list itmno     word imagery concreteness letters frequency   foa   soa  onr
#> 320    1   320  dispose     3.3          3.6       7         5 -35.6 -32.3 1.00
#> 411    1   411  flatter     3.9          3.8       7         1 -34.5 -33.4 0.13
#> 233    1   233  consist     1.9          3.1       7        17 -34.5 -28.0 1.00
#> 851    1   851  retreat     4.0          3.8       7        14 -31.5 -31.5 1.00
#> 306    1   306   differ     2.5          3.1       6        18 -33.0 -29.9 0.95
#> 217    1   217 conceive     2.9          2.6       8        14 -41.1 -32.8 1.00
#> 489    1   489   horror     5.0          3.3       6        17 -30.4 -28.3 1.00
#> 353    1   353  endless     4.6          2.3       7        20 -33.9 -32.9 1.00
#> 907    1   907  sincere     2.9          3.0       7        15 -33.4 -26.1 1.00
#> 493    1   493   humble     4.0          2.6       6        18 -35.1 -30.2 0.55
#>     dictcode noun canadian
#> 320       21    3        0
#> 411        2    0        0
#> 233       21   NA        0
#> 851       12   23        0
#> 306        2    0        0
#> 217        2    0        0
#> 489        1  100        0
#> 353        3    0        0
#> 907        3    0        0
#> 493       32    0        0

# dplyr now makes this much more flexible
if (require(dplyr)) {
  # select items within given ranges
  selected <- TWP |>  
    filter( canadian == 0) |>              # remove Canadian spellings
    filter( imagery <= 5, concreteness <= 4, frequency <= 30) |>
    select(word, imagery:frequency )
    
  str(selected)
  
  # get random samples of selected items
  nitems <- 5
  nlists <- 2
  lists <- selected |>
    sample_n( nitems*nlists, replace=FALSE) |>
    mutate(list = rep(1:nlists, each=nitems))
  
  str(lists)
  lists
}
#> Loading required package: dplyr
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
#> 'data.frame':	269 obs. of  5 variables:
#>  $ word        : chr  "abide" "absorb" "abuse" "accord" ...
#>  $ imagery     : num  2 4.6 4.5 1.8 4.2 2.1 3.7 3.2 3.3 1.6 ...
#>  $ concreteness: num  2.8 3.6 3.4 2.6 3.5 3.6 3 3.9 3.5 2 ...
#>  $ letters     : int  5 6 5 6 6 8 6 5 6 4 ...
#>  $ frequency   : int  7 13 18 9 10 3 10 13 10 10 ...
#> 'data.frame':	10 obs. of  6 variables:
#>  $ word        : chr  "proudly" "defend" "mighty" "disgust" ...
#>  $ imagery     : num  3.8 4.3 4.6 4.3 3.4 3.1 4.7 1.9 3 3.5
#>  $ concreteness: num  3.1 3.5 3.3 3.3 4 2.1 3.5 2.7 3.1 3.9
#>  $ letters     : int  7 6 6 7 7 6 7 5 6 6
#>  $ frequency   : int  9 21 29 1 29 30 14 9 8 10
#>  $ list        : int  1 1 1 1 1 2 2 2 2 2
#>       word imagery concreteness letters frequency list
#> 1  proudly     3.8          3.1       7         9    1
#> 2   defend     4.3          3.5       6        21    1
#> 3   mighty     4.6          3.3       6        29    1
#> 4  disgust     4.3          3.3       7         1    1
#> 5  perform     3.4          4.0       7        29    1
#> 6   virtue     3.1          2.1       6        30    2
#> 7  immense     4.7          3.5       7        14    2
#> 8    await     1.9          2.7       5         9    2
#> 9   endure     3.0          3.1       6         8    2
#> 10  export     3.5          3.9       6        10    2