class: center, middle, inverse, title-slide .title[ # Apply Family of Functions ] .author[ ### Justin Post ] --- layout: true <div class="my-footer"><img src="data:image/png;base64,#img/logo.png" alt = "logo" style="height: 60px;"/></div> --- # Efficient Code For loops vs Vectorized Functions --- # `apply()` Family - `apply()` family of functions *pretty* fast - Check `help(apply)`! + We'll look at `apply()`, `sapply()`, `lapply()` --- # `apply()` Family - `apply()` family of functions *pretty* fast - Check `help(apply)`! + We'll look at `apply()`, `sapply()`, `lapply()` - Consider our Batting data set ``` r library(Lahman) my_batting <- Batting[, c("playerID", "teamID", "G", "AB", "R", "H", "X2B", "X3B", "HR")] |> as_tibble() my_batting ``` ``` ## # A tibble: 128,598 × 9 ## playerID teamID G AB R H X2B X3B HR ## <chr> <fct> <int> <int> <int> <int> <int> <int> <int> ## 1 aardsda01 SFN 11 0 0 0 0 0 0 ## 2 aardsda01 CHN 45 2 0 0 0 0 0 ## 3 aardsda01 CHA 25 0 0 0 0 0 0 ## 4 aardsda01 BOS 47 1 0 0 0 0 0 ## 5 aardsda01 SEA 73 0 0 0 0 0 0 ## # ℹ 128,593 more rows ``` --- # `apply()` Family - Use `apply()` to find summary for the batting data ``` r apply(X = my_batting, MARGIN = 2, FUN = summary, na.rm = TRUE) ``` ``` ## playerID teamID G AB R H X2B X3B HR ## Length 128598 128598 128598 128598 128598 128598 128598 128598 128598 ## N.unique 24011 256 165 699 167 250 64 32 69 ## N.blank 0 0 0 0 0 0 0 0 0 ## Min.nchar 5 2 3 3 3 3 2 2 2 ## Max.nchar 9 3 3 3 3 3 2 2 2 ``` --- # `apply()` Family - Let's try it with just numeric data! ``` r batting_summary <- apply(X = my_batting |> select(where(is.numeric)), MARGIN = 2, FUN = summary, na.rm = TRUE) batting_summary ``` ``` ## G AB R H X2B X3B HR ## Min. 1.0000 0.0000 0.00000 0.00000 0.000000 0.000000 0.0000 ## 1st Qu. 11.0000 3.0000 0.00000 0.00000 0.000000 0.000000 0.0000 ## Median 31.0000 40.0000 3.00000 7.00000 1.000000 0.000000 0.0000 ## Mean 47.2958 129.3894 17.29095 33.76627 5.768674 1.154497 2.6883 ## 3rd Qu. 71.0000 195.0000 24.00000 48.00000 8.000000 1.000000 2.0000 ## Max. 165.0000 716.0000 198.00000 262.00000 67.000000 36.000000 73.0000 ``` --- # Anonymous Functions - We often use our own custom functions with the `apply()` family + Called anonymous functions or lambda functions --- # Anonymous Functions - We often use our own custom functions with the `apply()` family + Called anonymous functions or lambda functions ``` r custom_batting_summary <- apply(X = my_batting |> select(where(is.numeric)), MARGIN = 2, FUN = function(x){ temp <- c(mean(x), sd(x)) names(temp) <- c("mean", "sd") temp } ) custom_batting_summary ``` ``` ## G AB R H X2B X3B HR ## mean 47.29580 129.3894 17.29095 33.76627 5.768674 1.154497 2.688300 ## sd 45.83771 177.0342 26.95612 50.05455 9.272108 2.461973 6.197295 ``` --- # Anonymous Functions - Anonymous functions can take other arguments ``` r custom_batting_summary <- apply(X = my_batting |> select(where(is.numeric)), MARGIN = 2, FUN = function(x, trim){ temp <- c(mean(x, trim), sd(x)) names(temp) <- c("mean", "sd") temp }, trim = 0.1 ) custom_batting_summary ``` ``` ## G AB R H X2B X3B HR ## mean 40.69582 94.06721 11.15623 22.94881 3.630074 0.532465 1.045636 ## sd 45.83771 177.03418 26.95612 50.05455 9.272108 2.461973 6.197295 ``` --- # `lapply()` - Use `lapply()` to apply function to lists - Obtain a list object ``` r set.seed(10) my_list <- list(rnorm(100), runif(10), rgamma(40, shape = 1, rate = 1)) ``` --- # `lapply()` - Apply `mean()` function to each list element ``` r lapply(X = my_list, FUN = mean) ``` ``` ## [[1]] ## [1] -0.1365489 ## ## [[2]] ## [1] 0.5997619 ## ## [[3]] ## [1] 1.108209 ``` --- # `lapply()` - To give additional arguments to `FUN` we add them on afterward ``` r lapply(X = my_list, FUN = mean, trim = 0.1, na.rm = TRUE) ``` ``` ## [[1]] ## [1] -0.1359629 ## ## [[2]] ## [1] 0.6062252 ## ## [[3]] ## [1] 0.9563087 ``` --- # `sapply()` - Similar function but it attempts to simplify when possible ``` r sapply(X = my_list, FUN = mean, trim = 0.1, na.rm = TRUE) ``` ``` ## [1] -0.1359629 0.6062252 0.9563087 ``` --- # Recap! - Vectorized functions fast! - `apply()` family is sort of vectorized - `lapply()` and `sapply()` to apply a function to a list - `aggregate()`, `replicate()`, `tapply()` `vapply()`, and `mapply()` also exist!