The course provides a brief overview of R data structures followed by the following topics:
- Loops in R 
- Vectorized functions (apply family of functions) 
- How R functions work 
- Function writing 
First up, recap and streamline repeated sections of code!
The course provides a brief overview of R data structures followed by the following topics:
Loops in R
Vectorized functions (apply family of functions)
How R functions work
Function writing
First up, recap and streamline repeated sections of code!
Five major types
| Dimension | Homogeneous | Heterogeneous | 
|---|---|---|
| 1d | Atomic Vector | List | 
| 2d | Matrix | Data Frame | 
Atomic Vector (1D group of elements with an ordering)
Elements must be same ‘type’
Return elements using square brackets []
Can ‘feed’ in a vector of indices to []
letters
## [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s" ## [20] "t" "u" "v" "w" "x" "y" "z"
letters[1:4]
## [1] "a" "b" "c" "d"
x <- c(1, 2, 5); letters[x]
## [1] "a" "b" "e"
Consider the built in iris data set
Can see info about object with str()
myIris <- as_tibble(iris) str(myIris)
## tibble [150 x 5] (S3: tbl_df/tbl/data.frame) ## $ Sepal.Length: num [1:150] 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ... ## $ Sepal.Width : num [1:150] 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ... ## $ Petal.Length: num [1:150] 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ... ## $ Petal.Width : num [1:150] 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ... ## $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
myIris[1:4, 2:4]
## # A tibble: 4 x 3 ## Sepal.Width Petal.Length Petal.Width ## <dbl> <dbl> <dbl> ## 1 3.5 1.4 0.2 ## 2 3 1.4 0.2 ## 3 3.2 1.3 0.2 ## 4 3.1 1.5 0.2
myIris[1, ]
## # A tibble: 1 x 5 ## Sepal.Length Sepal.Width Petal.Length Petal.Width Species ## <dbl> <dbl> <dbl> <dbl> <fct> ## 1 5.1 3.5 1.4 0.2 setosa
myIris$Sepal.Length
dplyr::pull(myIris, Sepal.Length)
## [1] 5.1 4.9 4.7 4.6 5.0 5.4 4.6 5.0 4.4 4.9 5.4 4.8 4.8 4.3 5.8 5.7 5.4 5.1 ## [19] 5.7 5.1 5.4 5.1 4.6 5.1 4.8 5.0 5.0 5.2 5.2 4.7 4.8 5.4 5.2 5.5 4.9 5.0 ## [37] 5.5 4.9 4.4 5.1 5.0 4.5 4.4 5.0 5.1 4.8 5.1 4.6 5.3 5.0 7.0 6.4 6.9 5.5 ## [55] 6.5 5.7 6.3 4.9 6.6 5.2 5.0 5.9 6.0 6.1 5.6 6.7 5.6 5.8 6.2 5.6 5.9 6.1 ## [73] 6.3 6.1 6.4 6.6 6.8 6.7 6.0 5.7 5.5 5.5 5.8 6.0 5.4 6.0 6.7 6.3 5.6 5.5 ## [91] 5.5 6.1 5.8 5.0 5.6 5.7 5.7 6.2 5.1 5.7 6.3 5.8 7.1 6.3 6.5 7.6 4.9 7.3 ## [109] 6.7 7.2 6.5 6.4 6.8 5.7 5.8 6.4 6.5 7.7 7.7 6.0 6.9 5.6 7.7 6.3 6.7 7.2 ## [127] 6.2 6.1 6.4 7.2 7.4 7.9 6.4 6.3 6.1 7.7 6.3 6.4 6.0 6.9 6.7 6.9 5.8 6.8 ## [145] 6.7 6.7 6.3 6.5 6.2 5.9
dplyr::select(myIris, starts_with("Sepal"))
## # A tibble: 150 x 2 ## Sepal.Length Sepal.Width ## <dbl> <dbl> ## 1 5.1 3.5 ## 2 4.9 3 ## 3 4.7 3.2 ## 4 4.6 3.1 ## 5 5 3.6 ## # ... with 145 more rows
tidyverse“TidyVerse” - collection of R packages that share common philosophies and are designed to work together!
If not installed (downloaded) on computer
install.packages("tidyverse")
library() or require() to loadlibrary(tidyverse)
dplry package made for most standard data manipulation taskstidyr handles most of the rest%>% operator allows coding from left to rightx %>% f(y) turns into f(x,y)
x %>% f(y) %>% g(z) turns into g(f(x, y), z)
library(Lahman) #Install pacakage if needed
Batting %>% 
  as_tibble() %>% 
  select(starts_with("X"), ends_with("ID"), G) %>% 
  rename("Doubles" = X2B, "Triples" = X3B)
## # A tibble: 105,861 x 7 ## Doubles Triples playerID yearID teamID lgID G ## <int> <int> <chr> <int> <fct> <fct> <int> ## 1 0 0 abercda01 1871 TRO NA 1 ## 2 6 0 addybo01 1871 RC1 NA 25 ## 3 4 5 allisar01 1871 CL1 NA 29 ## 4 10 2 allisdo01 1871 WS3 NA 27 ## 5 11 3 ansonca01 1871 RC1 NA 25 ## # ... with 105,856 more rows
On to the main attraction! Improving R code!
Often a repetitive task must be done
Task requires a small change each time it is done
Example:
Consider wine data from UCI machine learning repository
wineData <- read_csv("../datasets/winequality-full.csv")
wineData
## # A tibble: 6,497 x 13 ## `fixed acidity` `volatile acidity` `citric acid` `residual sugar` chlorides ## <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 7.4 0.7 0 1.9 0.076 ## 2 7.8 0.88 0 2.6 0.098 ## 3 7.8 0.76 0.04 2.3 0.092 ## 4 11.2 0.28 0.56 1.9 0.075 ## 5 7.4 0.7 0 1.9 0.076 ## # ... with 6,492 more rows, and 8 more variables: free sulfur dioxide <dbl>, ## # total sulfur dioxide <dbl>, density <dbl>, pH <dbl>, sulphates <dbl>, ## # alcohol <dbl>, quality <dbl>, type <chr>
summary(wineData)
## fixed acidity volatile acidity citric acid residual sugar ## Min. : 3.800 Min. :0.0800 Min. :0.0000 Min. : 0.600 ## 1st Qu.: 6.400 1st Qu.:0.2300 1st Qu.:0.2500 1st Qu.: 1.800 ## Median : 7.000 Median :0.2900 Median :0.3100 Median : 3.000 ## Mean : 7.215 Mean :0.3397 Mean :0.3186 Mean : 5.443 ## 3rd Qu.: 7.700 3rd Qu.:0.4000 3rd Qu.:0.3900 3rd Qu.: 8.100 ## Max. :15.900 Max. :1.5800 Max. :1.6600 Max. :65.800 ## chlorides free sulfur dioxide total sulfur dioxide density ## Min. :0.00900 Min. : 1.00 Min. : 6.0 Min. :0.9871 ## 1st Qu.:0.03800 1st Qu.: 17.00 1st Qu.: 77.0 1st Qu.:0.9923 ## Median :0.04700 Median : 29.00 Median :118.0 Median :0.9949 ## Mean :0.05603 Mean : 30.53 Mean :115.7 Mean :0.9947 ## 3rd Qu.:0.06500 3rd Qu.: 41.00 3rd Qu.:156.0 3rd Qu.:0.9970 ## Max. :0.61100 Max. :289.00 Max. :440.0 Max. :1.0390 ## pH sulphates alcohol quality ## Min. :2.720 Min. :0.2200 Min. : 8.00 Min. :3.000 ## 1st Qu.:3.110 1st Qu.:0.4300 1st Qu.: 9.50 1st Qu.:5.000 ## Median :3.210 Median :0.5100 Median :10.30 Median :6.000 ## Mean :3.219 Mean :0.5313 Mean :10.49 Mean :5.818 ## 3rd Qu.:3.320 3rd Qu.:0.6000 3rd Qu.:11.30 3rd Qu.:6.000 ## Max. :4.010 Max. :2.0000 Max. :14.90 Max. :9.000 ## type ## Length:6497 ## Class :character ## Mode :character ## ## ##
#fixed acidity c(Mean = mean(wineData$`fixed acidity`), Median = median(wineData$`fixed acidity`), TrimmedMean = mean(wineData$`fixed acidity`, 0.05))
## Mean Median TrimmedMean ## 7.215307 7.000000 7.104796
#volatile acidity c(Mean = mean(wineData$`volatile acidity`), Median = median(wineData$`volatile acidity`), TrimmedMean = mean(wineData$`volatile acidity`, 0.05))
## Mean Median TrimmedMean ## 0.3396660 0.2900000 0.3255864
#...
Instead use a Loop!
for loops or while loops commonly used in R
for loop syntax
for(index in values){
  code to be run
}
for (index in 1:10){
  print(index)
}
## [1] 1 ## [1] 2 ## [1] 3 ## [1] 4 ## [1] 5 ## [1] 6 ## [1] 7 ## [1] 8 ## [1] 9 ## [1] 10
for (i in c("cat", "dog", "wolf")){
  print(i)
}
## [1] "cat" ## [1] "dog" ## [1] "wolf"
values <- 1:10
for (index in values){
  print(index)
}
## [1] 1 ## [1] 2 ## [1] 3 ## [1] 4 ## [1] 5 ## [1] 6 ## [1] 7 ## [1] 8 ## [1] 9 ## [1] 10
for (i in seq_along(iris)){
  print(names(iris)[i])
}
## [1] "Sepal.Length" ## [1] "Sepal.Width" ## [1] "Petal.Length" ## [1] "Petal.Width" ## [1] "Species"
for(i in 1:12){ #first 12 columns are numeric
  colData <- pull(wineData, i)
  print(names(wineData)[i])
  print(c(Mean = mean(colData),
          Median = median(colData),
          TrimmedMean = mean(colData, 0.05))
  )
}
## [1] "fixed acidity" ## Mean Median TrimmedMean ## 7.215307 7.000000 7.104796 ## [1] "volatile acidity" ## Mean Median TrimmedMean ## 0.3396660 0.2900000 0.3255864 ## [1] "citric acid" ## Mean Median TrimmedMean ## 0.3186332 0.3100000 0.3160780 ## [1] "residual sugar" ## Mean Median TrimmedMean ## 5.443235 3.000000 5.027039 ## [1] "chlorides" ## Mean Median TrimmedMean ## 0.05603386 0.04700000 0.05197538 ## [1] "free sulfur dioxide" ## Mean Median TrimmedMean ## 30.52532 29.00000 29.64473 ## [1] "total sulfur dioxide" ## Mean Median TrimmedMean ## 115.7446 118.0000 115.2693 ## [1] "density" ## Mean Median TrimmedMean ## 0.9946966 0.9948900 0.9946826 ## [1] "pH" ## Mean Median TrimmedMean ## 3.218501 3.210000 3.214850 ## [1] "sulphates" ## Mean Median TrimmedMean ## 0.5312683 0.5100000 0.5208189 ## [1] "alcohol" ## Mean Median TrimmedMean ## 10.4918 10.3000 10.4434 ## [1] "quality" ## Mean Median TrimmedMean ## 5.818378 6.000000 5.810737
ncols <- ncol(wineData)
sumDF <- data.frame(varName = names(wineData)[-ncols], 
                    mean = numeric(ncols-1), 
                    median = numeric(ncols-1), 
                    trimmedMean = numeric(ncols-1)
                    )
sumDF
## varName mean median trimmedMean ## 1 fixed acidity 0 0 0 ## 2 volatile acidity 0 0 0 ## 3 citric acid 0 0 0 ## 4 residual sugar 0 0 0 ## 5 chlorides 0 0 0 ## 6 free sulfur dioxide 0 0 0 ## 7 total sulfur dioxide 0 0 0 ## 8 density 0 0 0 ## 9 pH 0 0 0 ## 10 sulphates 0 0 0 ## 11 alcohol 0 0 0 ## 12 quality 0 0 0
for(i in seq_along(wineData)[-ncols]){
  colData <- pull(wineData, i)
  sumDF[i, 2:4] <- c(mean(colData), median(colData), mean(colData, 0.05))
}
sumDF
## varName mean median trimmedMean ## 1 fixed acidity 7.21530706 7.00000 7.10479569 ## 2 volatile acidity 0.33966600 0.29000 0.32558643 ## 3 citric acid 0.31863322 0.31000 0.31607796 ## 4 residual sugar 5.44323534 3.00000 5.02703881 ## 5 chlorides 0.05603386 0.04700 0.05197538 ## 6 free sulfur dioxide 30.52531938 29.00000 29.64472559 ## 7 total sulfur dioxide 115.74457442 118.00000 115.26927680 ## 8 density 0.99469663 0.99489 0.99468259 ## 9 pH 3.21850085 3.21000 3.21485040 ## 10 sulphates 0.53126828 0.51000 0.52081894 ## 11 alcohol 10.49180083 10.30000 10.44339944 ## 12 quality 5.81837771 6.00000 5.81073688
while Loopswhile loop similar to for loopswhile(condition) {
    expression to evaluate
  modify condition to FALSE?
}
break Out of a Loopbreak exits a loopfor (i in 1:5){
    if (i == 4){ 
      break 
      }
  print(i)
}
## [1] 1 ## [1] 2 ## [1] 3
next to Skipnext jumps to the next iteration of the loopfor (i in 1:5){
    if (i == 3){
      next
    } 
  print(i)
}
## [1] 1 ## [1] 2 ## [1] 4 ## [1] 5
For loops inefficient in R
For loops inefficient in R
R interpreted language
Must figure out how to evaluate code at each iteration of loop
Slows it down
Vectorized functions much faster!
Some ‘built-in’ vectorized functions
colMeans(), rowMeans()colSums(), rowSums()colSds(), colVars(), colMedians() (matrixStats package)ifelse(), dplyr::if_else()apply() familyVectorize()colMeans - Find Column MeanscolMeans() just requires a numeric data frame (array)wineData %>% select(-type) %>% colMeans()
## fixed acidity volatile acidity citric acid ## 7.21530706 0.33966600 0.31863322 ## residual sugar chlorides free sulfur dioxide ## 5.44323534 0.05603386 30.52531938 ## total sulfur dioxide density pH ## 115.74457442 0.99469663 3.21850085 ## sulphates alcohol quality ## 0.53126828 10.49180083 5.81837771
microbenchmark package allows for easy recording of computing timeinstall.packages("microbenchmarK")
library(microbenchmark)
wineData2 <- wineData %>% select(-type) microbenchmark(colMeans(wineData2), unit = "ms")
## Unit: milliseconds ## expr min lq mean median uq max neval ## colMeans(wineData2) 0.2975 0.30905 0.416344 0.3237 0.34855 5.8771 100
microbenchmark(for(i in 1:12){mean(wineData[[i]])}, unit = "ms")
## Unit: milliseconds
##                                         expr    min      lq     mean median
##  for (i in 1:12) {     mean(wineData[[i]]) } 1.5886 1.67655 1.961633 1.7918
##      uq    max neval
##  2.0541 4.5707   100
colMedians - column mediansmatrixStats::colMedians() just requires a numeric data frame (array)library(matrixStats) wineData %>% select(-type) %>% as.matrix() %>% colMedians()
## [1] 7.00000 0.29000 0.31000 3.00000 0.04700 29.00000 118.00000 ## [8] 0.99489 3.21000 0.51000 10.30000 6.00000
Want to code a new categorical quality variable
if then else#initialize vector to save results
qualityCat <- character()
for (i in 1:(dim(wineData)[1])){
  if(wineData$quality[i] <= 3){
    qualityCat[i] <- "Poor"
  } else if(wineData$quality[i] <= 5){
    qualityCat[i] <- "Ok"
  } else if(wineData$quality[i] <= 7){
    qualityCat[i] <- "Good"
  } else if(wineData$quality[i] <= 10){
    qualityCat[i] <- "Great"
  } else {
    qualityCat[i] <- "Error"
  }
}
wineData$qualityCat <- qualityCat wineData %>% select(qualityCat, quality, everything())
## # A tibble: 6,497 x 14 ## qualityCat quality `fixed acidity` `volatile acidity` `citric acid` ## <chr> <dbl> <dbl> <dbl> <dbl> ## 1 Ok 5 7.4 0.7 0 ## 2 Ok 5 7.8 0.88 0 ## 3 Ok 5 7.8 0.76 0.04 ## 4 Good 6 11.2 0.28 0.56 ## 5 Ok 5 7.4 0.7 0 ## # ... with 6,492 more rows, and 9 more variables: residual sugar <dbl>, ## # chlorides <dbl>, free sulfur dioxide <dbl>, total sulfur dioxide <dbl>, ## # density <dbl>, pH <dbl>, sulphates <dbl>, alcohol <dbl>, type <chr>
Know for loops not great
if_else() (or ifelse()) is vectorized version of if then else
Syntax
if_else(vector_condition, if_true_do_this, if_false_do_this)
Know for loops not great
if_else() (or ifelse()) is vectorized version of if then else
Syntax
if_else(vector_condition, if_true_do_this, if_false_do_this)
qualityCat <- if_else(wineData$quality <= 3, "Poor",
                  if_else(wineData$quality <= 5, "Ok",
                        if_else(wineData$quality <= 7, "Good",
                            if_else(wineData$quality <= 10, "Great", "Error"))))
loopTime<-microbenchmark(
  for (i in 1:(dim(wineData)[1])){
    if(wineData$quality[i] <= 3){
      qualityCat[i] <- "Poor"
    } else if(wineData$quality[i] <= 5){
      qualityCat[i] <- "Ok"
    } else if(wineData$quality[i] <= 7){
      qualityCat[i] <- "Good"
    } else if(wineData$quality[i] <= 10){
      qualityCat[i] <- "Great"
    } else {
      qualityCat[i] <- "Error"
    }
  }
, unit = "us")
vectorTime <- microbenchmark(
    if_else(wineData$quality <= 3, "Poor",
        if_else(wineData$quality <= 5, "Ok",
            if_else(wineData$quality <= 7, "Good",
                if_else(wineData$quality <= 10, "Great", "Error"))))
, unit = "us")
loopTime
## Unit: microseconds
##                                                                                                                                                                                                                                                                                                                                                                                                          expr
##  for (i in 1:(dim(wineData)[1])) {     if (wineData$quality[i] <= 3) {         qualityCat[i] <- "Poor"     }     else if (wineData$quality[i] <= 5) {         qualityCat[i] <- "Ok"     }     else if (wineData$quality[i] <= 7) {         qualityCat[i] <- "Good"     }     else if (wineData$quality[i] <= 10) {         qualityCat[i] <- "Great"     }     else {         qualityCat[i] <- "Error"     } }
##      min      lq    mean  median    uq     max neval
##  31811.9 35022.7 36533.4 36424.6 38195 42296.4   100
vectorTime
## Unit: microseconds ## expr ## if_else(wineData$quality <= 3, "Poor", if_else(wineData$quality <= 5, "Ok", if_else(wineData$quality <= 7, "Good", if_else(wineData$quality <= 10, "Great", "Error")))) ## min lq mean median uq max neval ## 762.3 802.15 2110.173 841.05 951.15 110538 100
dplyr to Summarize Datagroup_by() and summarize() great for quick summaries
Find mean alcohol for each quality category
wineData %>% group_by(qualityCat, type) %>% summarize(meanAlcohol = mean(alcohol))
## `summarise()` has grouped output by 'qualityCat'. You can override using the `.groups` argument.
## # A tibble: 8 x 3 ## # Groups: qualityCat [4] ## qualityCat type meanAlcohol ## <chr> <chr> <dbl> ## 1 Good Red 10.8 ## 2 Good White 10.8 ## 3 Great Red 12.1 ## 4 Great White 11.7 ## 5 Ok Red 9.93 ## 6 Ok White 9.84 ## 7 Poor Red 9.96 ## 8 Poor White 10.3
dplyr to Summarize Datagroup_by() and mutate() provide a nice way to add to a dataframewineData %>% group_by(qualityCat, type) %>% mutate(meanAlcoholCat = mean(alcohol)) %>% select(meanAlcoholCat, qualityCat, type, alcohol, everything())
## # A tibble: 6,497 x 15 ## # Groups: qualityCat, type [8] ## meanAlcoholCat qualityCat type alcohol `fixed acidity` `volatile acidity` ## <dbl> <chr> <chr> <dbl> <dbl> <dbl> ## 1 9.93 Ok Red 9.4 7.4 0.7 ## 2 9.93 Ok Red 9.8 7.8 0.88 ## 3 9.93 Ok Red 9.8 7.8 0.76 ## 4 10.8 Good Red 9.8 11.2 0.28 ## 5 9.93 Ok Red 9.4 7.4 0.7 ## # ... with 6,492 more rows, and 9 more variables: citric acid <dbl>, ## # residual sugar <dbl>, chlorides <dbl>, free sulfur dioxide <dbl>, ## # total sulfur dioxide <dbl>, density <dbl>, pH <dbl>, sulphates <dbl>, ## # quality <dbl>
Some ‘built-in’ vectorized functions
colMeans(), rowMeans()
colSums(), rowSums()
colSds(), colVars(), colMedians() (matrixStats package)
ifelse(), dplyr::if_else()
apply() family
Create your own with Vectorize()
apply() familyapply() family of functions pretty fastapply(), lapply(), sapply(), and replicate()apply() familyapply() to find summary for columns of wine dataapply(X = wineData %>% select(-type, -qualityCat),
      MARGIN = 2, 
      FUN = summary, 
      na.rm = TRUE)
## fixed acidity volatile acidity citric acid residual sugar chlorides ## Min. 3.800000 0.080000 0.0000000 0.600000 0.00900000 ## 1st Qu. 6.400000 0.230000 0.2500000 1.800000 0.03800000 ## Median 7.000000 0.290000 0.3100000 3.000000 0.04700000 ## Mean 7.215307 0.339666 0.3186332 5.443235 0.05603386 ## 3rd Qu. 7.700000 0.400000 0.3900000 8.100000 0.06500000 ## Max. 15.900000 1.580000 1.6600000 65.800000 0.61100000 ## free sulfur dioxide total sulfur dioxide density pH sulphates ## Min. 1.00000 6.0000 0.9871100 2.720000 0.2200000 ## 1st Qu. 17.00000 77.0000 0.9923400 3.110000 0.4300000 ## Median 29.00000 118.0000 0.9948900 3.210000 0.5100000 ## Mean 30.52532 115.7446 0.9946966 3.218501 0.5312683 ## 3rd Qu. 41.00000 156.0000 0.9969900 3.320000 0.6000000 ## Max. 289.00000 440.0000 1.0389800 4.010000 2.0000000 ## alcohol quality ## Min. 8.0000 3.000000 ## 1st Qu. 9.5000 5.000000 ## Median 10.3000 6.000000 ## Mean 10.4918 5.818378 ## 3rd Qu. 11.3000 6.000000 ## Max. 14.9000 9.000000
lapplylapply() to apply a function to a list
Create a list object
myList <- list( norm = rnorm(100), unif = runif(25), gamma = rgamma(500, rate = 1, shape = 1) )
lapplymean() function to each list elementlapply(X = myList, FUN = mean)
## $norm ## [1] 0.007024213 ## ## $unif ## [1] 0.5315503 ## ## $gamma ## [1] 1.026491
sapplysapply() similar but returns a vector if possiblesapply(X = myList, FUN = mean)
## norm unif gamma ## 0.007024213 0.531550253 1.026491492
replicatereplicate() function great for repeatedly running code
Estimate a probability using repeated simulations
Suppose you select five letters at random. What is the probability none are repeated?
sample(size = 5, letters, replace = TRUE)
## [1] "s" "x" "o" "e" "z"
sample(size = 5, letters, replace = TRUE)
## [1] "p" "e" "k" "c" "w"
replicateset.seed(1) sample(size = 5, letters, replace = TRUE) %>% unique()
## [1] "y" "d" "g" "a" "b"
set.seed(1) sample(size = 5, letters, replace = TRUE) %>% unique() %>% length()
## [1] 5
replicateset.seed(1) sample(size = 5, letters, replace = TRUE) %>% unique() %>% length() == 5
## [1] TRUE
replicatereplicate(5, 
  sample(size = 5, letters, replace = TRUE) %>% 
    unique() %>% 
    length() == 5
)
## [1] TRUE FALSE TRUE TRUE FALSE
replicatereplicate(50000, 
  sample(size = 5, letters, replace = TRUE) %>% 
    unique() %>% 
    length() == 5
) %>% 
  mean()
## [1] 0.66264
Vectorized functions fast!
‘Built-in’ vectorized functions
colMeans(), rowMeans()colSums(), rowSums()colSds(), colVars(), colMedians() (matrixStats package)ifelse()apply() family