purrr Exercises: 13 Functional Programming Practice Problems

Thirteen hands-on purrr problems covering the typed map_*() family, map2(), pmap(), map_dfr(), nest(), safely(), and the \(x) lambda shortcut, each with an expected result so you can verify and a runnable solution behind a reveal. Difficulty progresses from beginner to advanced.

The 13 problems are grouped into three sections. Section 1 covers the typed map_*() family one variant at a time. Section 2 mixes predicates, row-binding, and parallel iteration with map2() and pmap(). Section 3 stitches list-columns, error handling, and lambdas into the kind of pipelines you write on the job. Every problem ships with an expected result, two progressive hints, and a hidden solution with an explanation.

RRun this once before any exercise
library(purrr) library(dplyr) library(tidyr) library(broom) library(tibble)

  

All code runs in one shared R session, so the setup block above loads the packages once and the exercises do not repeat it. Use ex_ prefixed names (already scaffolded) so you do not overwrite anything by accident.

Section 1. The typed map family

These three problems use one map_*() variant each. The typed map_*() family is the workhorse of purrr: you hand it a list (or a data frame, which is a list of columns) and a function, and it applies the function to every element, returning a guaranteed-type atomic vector instead of a list.

Exercise 1.1: Mean of every column with map_dbl

Task: Compute the mean of every column in airquality and store it in ex_1_1. The dataset has missing values, so pass na.rm = TRUE through map_dbl() using its ... slot. Save the result to ex_1_1.

Expected result:

#>     Ozone   Solar.R      Wind      Temp     Month       Day
#>  42.12931 185.93151   9.95752  77.88235   6.99346  15.80392

Difficulty: Beginner

RYour turn
ex_1_1 <- map_dbl(airquality, mean, ___) ex_1_1

  
Click to reveal solution
RSolution
ex_1_1 <- map_dbl(airquality, mean, na.rm = TRUE) ex_1_1 #> Ozone Solar.R Wind Temp Month Day #> 42.12931 185.93151 9.95752 77.88235 6.99346 15.80392

  

Explanation: Any argument you put after the function in map_dbl() is forwarded to every call. Here na.rm = TRUE tells mean() to ignore NAs in Ozone and Solar.R. Without it, those two columns would come back as NA and poison any downstream arithmetic. map_dbl() guarantees the output is a named numeric vector, so downstream code that does round(..., 2) or sort() just works.

Exercise 1.2: Column classes with map_chr

Task: Return the class of every column in iris as a character vector. The right variant here is map_chr() because each call returns one string. Save the result to ex_1_2.

Expected result:

#> Sepal.Length  Sepal.Width Petal.Length  Petal.Width      Species
#>    "numeric"    "numeric"    "numeric"    "numeric"     "factor"

Difficulty: Beginner

RYour turn
ex_1_2 <- ___(iris, class) ex_1_2

  
Click to reveal solution
RSolution
ex_1_2 <- map_chr(iris, class) ex_1_2 #> Sepal.Length Sepal.Width Petal.Length Petal.Width Species #> "numeric" "numeric" "numeric" "numeric" "factor"

  

Explanation: map_chr() enforces that every call returns exactly one character value. class() fits that shape for every column in iris. If you ran map_chr() against an object where class() returns multiple strings (some S4 objects do), you would get a clear type error, which is better than a quietly malformed result.

Exercise 1.3: Distinct value counts with map_int

Task: Count the number of unique values in every column of iris and save the result. Use map_int() with a lambda that calls length(unique(x)). Save the result to ex_1_3.

Expected result:

#> Sepal.Length  Sepal.Width Petal.Length  Petal.Width      Species
#>           35           23           43           22            3

Difficulty: Beginner

RYour turn
ex_1_3 <- map_int(iris, ___) ex_1_3

  
Click to reveal solution
RSolution
ex_1_3 <- map_int(iris, \(x) length(unique(x))) ex_1_3 #> Sepal.Length Sepal.Width Petal.Length Petal.Width Species #> 35 23 43 22 3

  

Explanation: The \(x) lambda is R 4.1+ shorthand for function(x). length(unique(x)) is one common way to count distinct values; map_int() then coerces the result to a named integer vector. dplyr::n_distinct() does the same job and is slightly faster on large vectors, but length(unique()) needs no extra package.

Section 2. Predicates, row-binding, and parallel inputs

These four problems combine two or more ideas. map_lgl() answers TRUE/FALSE questions about every element, map_dfr() stacks data frames returned by each iteration into one tidy frame, map2() walks two inputs in parallel, and pmap() generalises to three or more inputs by iterating over the rows of a data frame.

Exercise 2.1: Centered non-negative check with map_lgl

Task: airquality has values that become negative after centering. Center every numeric column (subtract the mean) then find which columns still contain only non-negative values. Handle NAs with na.rm = TRUE. Save the result to ex_2_1.

Expected result:

#>   Ozone Solar.R    Wind    Temp   Month     Day
#>   FALSE   FALSE   FALSE   FALSE   FALSE   FALSE

Difficulty: Intermediate

RYour turn
centered <- map(airquality, \(x) x - mean(x, na.rm = TRUE)) ex_2_1 <- ___(centered, \(x) all(x >= 0, na.rm = TRUE)) ex_2_1

  
Click to reveal solution
RSolution
centered <- map(airquality, \(x) x - mean(x, na.rm = TRUE)) ex_2_1 <- map_lgl(centered, \(x) all(x >= 0, na.rm = TRUE)) ex_2_1 #> Ozone Solar.R Wind Temp Month Day #> FALSE FALSE FALSE FALSE FALSE FALSE

  

Explanation: Centering by the mean guarantees the new column sums to zero, which forces at least some values below zero. Every entry in ex_2_1 is FALSE, exactly as you would expect from the math. The chain map() |> map_lgl() is very common: first transform, then test. map_lgl() plus a predicate gives you a keep/drop mask in one line, no loop required.

Exercise 2.2: Per-group tidy regression with map_dfr

Task: For each cylinder group in mtcars, fit lm(mpg ~ wt) and row-bind the tidy coefficient tables into one data frame. Use split() to get a list of three data frames, then map_dfr() to stack the tidy results with a .id column naming the group. Save the result to ex_2_2.

Expected result:

#> # A tibble: 6 x 6
#>   cyl   term        estimate std.error statistic  p.value
#>   <chr> <chr>          <dbl>     <dbl>     <dbl>    <dbl>
#> 1 4     (Intercept)    39.6       4.35      9.10 7.77e- 6
#> 2 4     wt             -5.65     1.85      -3.05 1.37e- 2
#> 3 6     (Intercept)    28.4       4.18      6.79 1.05e- 3
#> 4 6     wt             -2.78     1.33      -2.08 9.18e- 2
#> 5 8     (Intercept)    23.9       3.01      7.94 4.05e- 6
#> 6 8     wt             -2.19     0.739    -2.97 1.18e- 2

Difficulty: Intermediate

RYour turn
groups <- split(mtcars, mtcars$cyl) ex_2_2 <- ___(groups, \(df) tidy(lm(mpg ~ wt, data = df)), .id = "cyl") ex_2_2

  
Click to reveal solution
RSolution
groups <- split(mtcars, mtcars$cyl) ex_2_2 <- map_dfr(groups, \(df) tidy(lm(mpg ~ wt, data = df)), .id = "cyl") ex_2_2 #> # A tibble: 6 x 6 #> cyl term estimate std.error statistic p.value #> <chr> <chr> <dbl> <dbl> <dbl> <dbl> #> 1 4 (Intercept) 39.6 4.35 9.10 7.77e- 6 #> 2 4 wt -5.65 1.85 -3.05 1.37e- 2 #> 3 6 (Intercept) 28.4 4.18 6.79 1.05e- 3 #> 4 6 wt -2.78 1.33 -2.08 9.18e- 2 #> 5 8 (Intercept) 23.9 3.01 7.94 4.05e- 6 #> 6 8 wt -2.19 0.739 -2.97 1.18e- 2

  

Explanation: split() returns a named list keyed by cyl. map_dfr() walks that list, calls tidy(lm(...)) on each data frame, and row-binds the six resulting mini-tables. The .id = "cyl" argument copies the list name into a new column so you know which row came from which group. map_dfr() is the purrr answer to do.call(rbind, lapply(...)), with better readability and the free .id column.

Exercise 2.3: Pointwise maximum with map2

Task: Given the two numeric vectors of equal length in the starter block, compute the pointwise maximum for each index. Use map2_dbl() so the output is a plain numeric vector. Save the result to ex_2_3.

Expected result:

#> [1]  5  8  7  6 10

Difficulty: Intermediate

RYour turn
a <- c(3, 8, 1, 6, 10) b <- c(5, 4, 7, 2, 9) ex_2_3 <- ___(a, b, max) ex_2_3

  
Click to reveal solution
RSolution
a <- c(3, 8, 1, 6, 10) b <- c(5, 4, 7, 2, 9) ex_2_3 <- map2_dbl(a, b, max) ex_2_3 #> [1] 5 8 7 6 10

  

Explanation: map2_dbl() walks a and b together, calling max(a[i], b[i]) for every index and returning a numeric vector. This is the same as pmax(a, b) in base R; map2() shines when the per-element function is more complex than max(), say a custom calculation that base R cannot vectorise.

Exercise 2.4: Compound interest with pmap

Task: Using the loans tibble in the starter block (columns principal, rate, and years), compute the compound interest final value for each row with pmap_dbl() and the formula principal * (1 + rate)^years. Save the result to ex_2_4.

Expected result:

#> [1] 1628.895 6083.264 3062.575

Difficulty: Intermediate

RYour turn
loans <- data.frame( principal = c(1000, 5000, 2500), rate = c(0.05, 0.04, 0.07), years = c(10, 5, 3) ) ex_2_4 <- ___(loans, \(principal, rate, years) principal * (1 + rate)^years) ex_2_4

  
Click to reveal solution
RSolution
loans <- data.frame( principal = c(1000, 5000, 2500), rate = c(0.05, 0.04, 0.07), years = c(10, 5, 3) ) ex_2_4 <- pmap_dbl(loans, \(principal, rate, years) principal * (1 + rate)^years) ex_2_4 #> [1] 1628.895 6083.264 3062.575

  

Explanation: pmap() treats each column of loans as a parallel input. For row i it calls your lambda with principal = loans$principal[i], rate = loans$rate[i], years = loans$years[i]. Name your lambda arguments to match the column names and the binding is automatic. pmap_dbl() then coerces the three results into a numeric vector.

Section 3. List-columns, error handling, and real pipelines

These six problems stitch three or more concepts into the kind of pipelines you write on the job: grouped data kept in one tidy frame via nest() plus map(), iterations that might fail handled with safely() and quietly(), and short one-off \(x) lambdas.

Exercise 3.1: Per-group nested correlation

Task: For each cylinder group in mtcars, compute the Pearson correlation between mpg and wt. Nest the data frame with nest(), map a correlation function over the data list-column, and keep only cyl and corr. Save the result to ex_3_1.

Expected result:

#> # A tibble: 3 x 2
#> # Groups:   cyl [3]
#>     cyl   corr
#>   <dbl>  <dbl>
#> 1     6 -0.682
#> 2     4 -0.713
#> 3     8 -0.650

Difficulty: Advanced

RYour turn
nested <- mtcars |> group_by(cyl) |> nest() ex_3_1 <- nested |> mutate(corr = ___(data, \(df) cor(df$mpg, df$wt))) |> select(cyl, corr) ex_3_1

  
Click to reveal solution
RSolution
nested <- mtcars |> group_by(cyl) |> nest() ex_3_1 <- nested |> mutate(corr = map_dbl(data, \(df) cor(df$mpg, df$wt))) |> select(cyl, corr) ex_3_1 #> # A tibble: 3 x 2 #> # Groups: cyl [3] #> cyl corr #> <dbl> <dbl> #> 1 6 -0.682 #> 2 4 -0.713 #> 3 8 -0.650

  

Explanation: group_by() |> nest() creates a list-column called data where each row holds a mini data frame for one cylinder group. map_dbl() walks that list-column, computes one correlation per group, and returns a numeric vector that mutate() stores alongside the grouping key. This is the foundation of the tidyverse split-apply-combine pattern.

Exercise 3.2: Error-tolerant log with safely

Task: Wrap log() with safely() so a problematic input returns the error slot instead of stopping the pipeline. Map the safe version over c(4, -2, 9) and inspect the shape of the results. Save the list of results to ex_3_2 and print its first element.

Expected result:

#> $result
#> [1] 1.386294
#>
#> $error
#> NULL

Difficulty: Advanced

RYour turn
safe_log <- ___(log) ex_3_2 <- map(c(4, -2, 9), safe_log) ex_3_2[[1]]

  
Click to reveal solution
RSolution
safe_log <- safely(log) ex_3_2 <- map(c(4, -2, 9), safe_log) ex_3_2[[1]] #> $result #> [1] 1.386294 #> #> $error #> NULL

  

Explanation: safely() takes a function and returns a new function that wraps every call in a try/catch. On success, $result holds the value and $error is NULL; on failure the reverse. For log(-2) R returns NaN with a warning (not an error), so $error is still NULL, a reminder that "warning" and "error" are different in R. safely() returns a list of lists, so use map("result") or transpose() to separate successes from failures.

Exercise 3.3: Z-score numeric columns with a lambda

Task: Z-score every numeric column of iris (subtract the mean, divide by the standard deviation) using a \(x) lambda inside map(). The Species column is not numeric, so drop it first with keep(is.numeric). Save the result to ex_3_3 and print the first six values of the scaled Sepal.Length column.

Expected result:

#> [1] -0.8976739 -1.1392005 -1.3807271 -1.5014904 -1.0184372 -0.5353840

Difficulty: Advanced

RYour turn
ex_3_3 <- iris |> keep(is.numeric) |> map(___) head(ex_3_3$Sepal.Length, 6)

  
Click to reveal solution
RSolution
ex_3_3 <- iris |> keep(is.numeric) |> map(\(x) (x - mean(x)) / sd(x)) head(ex_3_3$Sepal.Length, 6) #> [1] -0.8976739 -1.1392005 -1.3807271 -1.5014904 -1.0184372 -0.5353840

  

Explanation: keep(is.numeric) drops Species before the iteration starts. Then map() applies the z-score lambda to every remaining column. The \(x) shortcut lets you write the formula inline without a function(x) { ... } wrapper, perfect for a one-off transformation you do not plan to reuse.

Exercise 3.4: Grouped summary table in one pipeline

Task: For mtcars, nest by cyl and compute three per-group aggregates in one mutate() plus map_*() sweep: mean mpg, max hp, and row count. Return a single tidy frame with columns cyl, mean_mpg, max_hp, and n_rows. Save the result to ex_3_4.

Expected result:

#> # A tibble: 3 x 4
#> # Groups:   cyl [3]
#>     cyl mean_mpg max_hp n_rows
#>   <dbl>    <dbl>  <dbl>  <int>
#> 1     6     19.7    175      7
#> 2     4     26.7    113     11
#> 3     8     15.1    335     14

Difficulty: Advanced

RYour turn
ex_3_4 <- mtcars |> group_by(cyl) |> nest() |> # your code here ex_3_4

  
Click to reveal solution
RSolution
ex_3_4 <- mtcars |> group_by(cyl) |> nest() |> mutate( mean_mpg = map_dbl(data, \(df) mean(df$mpg)), max_hp = map_dbl(data, \(df) max(df$hp)), n_rows = map_int(data, nrow) ) |> select(cyl, mean_mpg, max_hp, n_rows) ex_3_4 #> # A tibble: 3 x 4 #> # Groups: cyl [3] #> cyl mean_mpg max_hp n_rows #> <dbl> <dbl> <dbl> <int> #> 1 6 19.7 175 7 #> 2 4 26.7 113 11 #> 3 8 15.1 335 14

  

Explanation: Three map_*() calls inside one mutate() let you build the summary in a single pipe. Each call picks the typed variant that matches its answer: map_dbl() for the two numeric summaries, map_int() for the row count. Using the right typed variant keeps the downstream columns from becoming list-columns of numerics.

Exercise 3.5: Safely parse many character vectors with quietly

Task: Using the list of three character vectors in the starter block (two cleanly numeric, one with a non-numeric token), wrap as.numeric in quietly(), map it over the list, and build a tidy tibble with columns input_id (which vector) and status ("ok" if the conversion produced no warnings, "failed" otherwise). Save the result to ex_3_5.

Expected result:

#> # A tibble: 3 x 2
#>   input_id status
#>   <chr>    <chr>
#> 1 a        ok
#> 2 b        failed
#> 3 c        ok

Difficulty: Advanced

RYour turn
raw <- list( a = c("1", "2", "3"), b = c("4.5", "oops", "6"), c = c("7", "8", "9") ) quiet_numeric <- ___(as.numeric) parsed <- map(raw, quiet_numeric) ex_3_5 <- tibble( input_id = names(parsed), status = map_chr(parsed, \(p) if (length(p$warnings) == 0) "ok" else "failed") ) ex_3_5

  
Click to reveal solution
RSolution
raw <- list( a = c("1", "2", "3"), b = c("4.5", "oops", "6"), c = c("7", "8", "9") ) quiet_numeric <- quietly(as.numeric) parsed <- map(raw, quiet_numeric) ex_3_5 <- tibble( input_id = names(parsed), status = map_chr(parsed, \(p) if (length(p$warnings) == 0) "ok" else "failed") ) ex_3_5 #> # A tibble: 3 x 2 #> input_id status #> <chr> <chr> #> 1 a ok #> 2 b failed #> 3 c ok

  

Explanation: quietly() is the sibling of safely() that also captures warnings, and as.numeric() issues a warning (not an error) when a token fails to parse. The solution uses map() to run the quiet version, then map_chr() to inspect each result's $warnings slot. Pattern: pick safely() for errors, quietly() for warnings, possibly() when you just want a default value on failure.

Exercise 3.6: All pairwise correlations with pmap_dbl

Task: Using the tibble df with three numeric columns x, y, w in the starter block, compute the three pairwise Pearson correlations, cor(x,y), cor(x,w), cor(y,w), in a single pmap_dbl() call over a helper tibble that lists the column pairs. Save the result to ex_3_6 as a named numeric vector with entries x_y, x_w, y_w.

Expected result:

#>       x_y       x_w       y_w
#> 0.1204519 0.1596373 0.1893028

Difficulty: Advanced

RYour turn
set.seed(17) df <- tibble( x = rnorm(50), y = rnorm(50) + 0.4 * rnorm(50), w = rnorm(50) ) pairs <- tibble( a = c("x", "x", "y"), b = c("y", "w", "w") ) ex_3_6 <- ___(pairs, \(a, b) cor(df[[a]], df[[b]])) names(ex_3_6) <- paste(pairs$a, pairs$b, sep = "_") ex_3_6

  
Click to reveal solution
RSolution
set.seed(17) df <- tibble( x = rnorm(50), y = rnorm(50) + 0.4 * rnorm(50), w = rnorm(50) ) pairs <- tibble( a = c("x", "x", "y"), b = c("y", "w", "w") ) ex_3_6 <- pmap_dbl(pairs, \(a, b) cor(df[[a]], df[[b]])) names(ex_3_6) <- paste(pairs$a, pairs$b, sep = "_") ex_3_6 #> x_y x_w y_w #> 0.1204519 0.1596373 0.1893028

  

Explanation: The trick is building a small pairs tibble with the column names you want to correlate, then pmap_dbl() walks its rows, pulling out df[[a]] and df[[b]] at each step. This pattern scales: if you had 20 columns and wanted all pairs, you would generate pairs with utils::combn(names(df), 2) and feed the same pmap_dbl() call unchanged.

Summary

The 13 problems together exercise the purrr vocabulary you will use in 90% of real analysis code.

Function / helper Exercises that use it
map_dbl() 1.1, 3.1, 3.4
map_chr() 1.2, 3.5
map_int() 1.3, 3.4
map_lgl() 2.1
map() 2.1, 3.2, 3.3, 3.5
map_dfr() 2.2
map2_dbl() 2.3
pmap_dbl() 2.4, 3.6
nest() + list-columns 3.1, 3.4
safely() / quietly() 3.2, 3.5
keep() + \(x) lambda 3.3

If you solved Sections 1 and 2 without peeking, you are comfortable with everyday purrr. If you solved Section 3 too, you are ready for list-columns, error-tolerant iteration, and real analytical pipelines.

FAQ

Q: When should I use map() versus a typed variant like map_dbl()? Use map() when each iteration returns something irregular (a model object, a multi-row table) that has to stay in a list. Use the typed variant whenever the per-element answer is a single value of a known type: map_dbl() for numbers, map_chr() for strings, map_int() for counts, map_lgl() for predicates. The typed variant is a contract, it fails loudly when the function misbehaves instead of handing you a broken list three pipes downstream.

Q: How do I forward extra arguments to the mapped function? Put them after the function in the map_*() call. map_dbl(airquality, mean, na.rm = TRUE) forwards na.rm = TRUE to every mean() call. For anything more complex, wrap the call in a \(x) lambda.

Q: What is the difference between safely(), quietly(), and possibly()? All three wrap a function to survive bad input. safely() traps errors and returns a $result / $error list. quietly() also captures warnings and printed output. possibly() just substitutes a default value when the call fails. Pick safely() for errors, quietly() for warnings, possibly() for a clean fallback.

Q: What does the \(x) syntax mean? \(x) is the lambda shorthand added in R 4.1, equivalent to function(x). It lets you write a one-off function inline, for example map(xs, \(x) (x - mean(x)) / sd(x)), without naming it. For functions you reuse, define them normally.

Q: How is map_dfr() different from do.call(rbind, lapply(...))? Both produce one stacked data frame. map_dfr() reads more clearly and adds an optional .id column recording which iteration each row came from, which you would otherwise have to track by hand.

References

  1. Wickham, H. & Grolemund, G., R for Data Science, 2nd ed. Chapter 27: Iteration. Link
  2. purrr package reference and articles. Link
  3. Wickham, H., Advanced R, 2nd ed. Chapter 9: Functionals. Link
  4. tidyr nest() documentation. Link
  5. Tidyverse blog, purrr 1.0.0 release notes. Link

Continue Learning