R Functions Exercises: 18 Default Arg, Closure and Dots Problems
Eighteen runnable practice problems for writing R functions: positional and named arguments, defaults, match.arg(), the dots argument ..., lazy evaluation, closures, on.exit(), recursion, anonymous lambdas with \(x), and the higher-order helpers Filter and Reduce. Every solution is tucked behind a reveal block.
Functions are R's main unit of reuse. Most people stop at function(args) body and never touch closures, dots, or lazy evaluation, then wonder why packages they read look impenetrable. These problems isolate each idiom on a small enough surface that the mechanism is visible, not the data wrangling around it.
Section 1. Arguments, defaults and validation (4 problems)
Exercise 1.1: Write a body mass index function with two numeric arguments
Task: A clinic dashboard wants a quick BMI calculator written as a reusable function. Define bmi(weight_kg, height_m) that returns body mass index, weight divided by height squared. Call it with 70 kg and 1.75 m and save the result to ex_1_1.
Expected result:
#> [1] 22.85714
Difficulty: Beginner
The last expression evaluated in a function body becomes its return value, so no explicit return statement is needed.
Divide weight_kg by height_m raised to the power 2 using the ^ operator, which binds tighter than /.
Click to reveal solution
Explanation: A function is created with function(args) body and the last expression in the body is its return value, so an explicit return() is unnecessary for the common case. The ^ operator has higher precedence than /, so weight_kg / height_m^2 parses as the intended weight_kg / (height_m^2). Functions are first-class objects in R: you can pass them as arguments, store them in lists, and return them from other functions.
Exercise 1.2: Convert Celsius to Fahrenheit with a default offset argument
Task: A meteorologist often converts temperatures from Celsius to Fahrenheit. Write c_to_f(c, offset = 32) returning c * 9/5 + offset, where the offset has a default because Fahrenheit's zero point is shifted. Call c_to_f(-5) using the default, save it to ex_1_2, and verify with c_to_f(100).
Expected result:
#> [1] 23
#> [1] 212
Difficulty: Beginner
A default value lets the caller skip the most common argument while still allowing an override.
Return c * 9 / 5 + offset, leaving offset to fall back on its declared default of 32.
Click to reveal solution
Explanation: Default arguments are evaluated lazily inside the function body, which means a default can depend on other arguments of the same call, for example function(x, y = x * 2). Defaults make a function callable with the most common path on autopilot while keeping advanced overrides available. If you want the default to be required, write offset = stop("supply offset") and the error fires only when the argument is actually used.
Exercise 1.3: Restrict an argument to a fixed set of strings with match.arg
Task: A weather dashboard exposes forecast_label(level) and must reject typos. Use match.arg() so level is one of "sunny", "cloudy", "rain", or "snow", and any other value raises a clear error. Call forecast_label("rain") and save the returned string to ex_1_3.
Expected result:
#> [1] "rain"
Difficulty: Intermediate
Validating a string against a fixed allowed set is cleaner than a chain of equality checks.
Assign level <- match.arg(level) so it resolves against the default vector, then return level.
Click to reveal solution
Explanation: match.arg() is the idiomatic way to validate a string option against an allowed set. When the default is the full allowed vector and match.arg() is called with no arguments, the first element becomes the default value. It also supports partial matching, so forecast_label("rai") would resolve to "rain". Pass an unknown value and you get Error in match.arg: 'arg' should be one of "sunny", "cloudy", "rain", "snow", which is more helpful than a custom stop().
Exercise 1.4: Make an argument truly optional with missing()
Task: A retailer wants apply_discount(price, pct) to return price unchanged when pct is not supplied, and price * (1 - pct/100) otherwise. Use missing(pct) for the test, since it is true only when the caller did not pass pct. Save apply_discount(100) to ex_1_4.
Expected result:
#> [1] 100
Difficulty: Intermediate
You need to tell apart an argument the caller never passed from one passed with an explicit value.
Branch on missing(pct): return price when it is true, else price * (1 - pct / 100).
Click to reveal solution
Explanation: missing() is TRUE only when the caller did not supply the argument and the argument has no default value. It is more accurate than checking is.null(pct) because NULL may be a legitimate value the caller wanted to pass. The common alternative is to set pct = NULL as a default and check is.null(pct), which is fine when NULL is not a meaningful input. Reach for missing() when you need to distinguish "not passed" from "passed as NULL".
Section 2. The dots argument and forwarding (3 problems)
Exercise 2.1: Average any number of numeric vectors using the dots argument
Task: Write mean_safely(...) that accepts any number of numeric vectors via ..., flattens them with c(...), drops NA values, and returns the overall mean. Save mean_safely(c(1, NA, 3), c(5, 7)) to ex_2_1 and verify with mean_safely(1:10).
Expected result:
#> [1] 4
#> [1] 5.5
Difficulty: Intermediate
Extra arguments collected into one place can be merged into a single vector before summarising.
Flatten the inputs with c(...), then call mean() passing na.rm = TRUE.
Click to reveal solution
Explanation: Inside a function, ... is a special symbol that collects every unmatched extra argument the caller passed. You consume it with c(...) to flatten or list(...) to keep elements separate. The dots argument is positional and lazy, meaning each piece is only evaluated when you actually touch it. Picking out a single dot by name uses ..1, ..2 and so on, while ...length() reports how many extra arguments were passed.
Exercise 2.2: Element-wise sum of arbitrary equal-length vectors with dots and Reduce
Task: Build sum_columns(...) that takes any number of equal-length numeric vectors through ... and returns their element-wise sum (the same length as each input). Wrap the dots into a list and reduce with +. Save sum_columns(1:3, 10:12, 100:102) to ex_2_2.
Expected result:
#> [1] 111 114 117
Difficulty: Intermediate
To combine many vectors pairwise you need each kept as a separate element, not merged into one.
Capture the inputs with list(...) and fold them together with Reduce(+, ...).
Click to reveal solution
Explanation: list(...) keeps each input vector as a separate list element, which is what you need for an element-wise reduction. Reduce("+", vecs) applies + left-to-right across the list, leveraging R's vectorised arithmetic: 1:3 + 10:12 = c(11, 13, 15), and so on. Compared to do.call("+", vecs), Reduce scales to any number of inputs whereas the binary + only takes two operands directly. For a length-mismatch input set, this approach silently recycles, so a leading stopifnot(length(unique(lengths(vecs))) == 1) is a useful guardrail in production.
Exercise 2.3: Pass extra arguments to mean through a safe wrapper
Task: Write safe_mean(x, ...) that returns the mean of a numeric vector after removing NAs by passing na.rm = TRUE, and forwards any extra arguments (such as trim) to base::mean() via the dots. Save safe_mean(c(1, 2, NA, 4, 5), trim = 0.2) to ex_2_3, then verify on c(NA, NA, 5).
Expected result:
#> [1] 3
#> [1] 5
Difficulty: Advanced
A wrapper can hard-code one option while letting every other option flow straight through.
Call mean(x, na.rm = TRUE, ...) so the dots forward extra arguments such as trim.
Click to reveal solution
Explanation: Forwarding ... directly to another function is the most common dots pattern in the tidyverse and base R: every named argument the user passes flows through unchanged. The trick is that safe_mean already hard-codes na.rm = TRUE, so the caller cannot reset it; if they pass na.rm = FALSE it would either be silently shadowed or raise a duplicate-argument error depending on argument matching. To keep both behaviours, drop the hard-coded value and rely entirely on ..., or use if (!"na.rm" %in% names(list(...))) to inject the default only when absent.
Section 3. Lazy evaluation, closures and on.exit (4 problems)
Exercise 3.1: Show that R does not evaluate unused arguments with lazy evaluation
Task: Write pick_first(x, y = stop("y was evaluated")) that returns x and never touches y. Calling pick_first(42) should succeed even though the default expression for y is an error, proving R defers evaluation until a value is needed. Save the returned value to ex_3_1 and confirm the trick on a string input too.
Expected result:
#> [1] 42
#> [1] "hello"
Difficulty: Intermediate
An argument's expression is only evaluated the moment the body actually reads that argument.
Make the body simply x and never reference y, so its stop() default is never forced.
Click to reveal solution
Explanation: R arguments are not evaluated when the function is called, only when the body actually references them. This is called lazy evaluation and it is implemented via promises: each argument is bundled with its expression and the enclosing environment, then forced on first use. The pattern shows up in real code as defaults like function(x, n = length(x)) where n depends on x. It also has a footgun: if a default has a side effect, you have no guarantee it ever fires, so never put log writes or counters in default expressions.
Exercise 3.2: Build a counter factory that returns a stateful closure
Task: Write a factory make_counter(start = 0) that returns a closure incrementing and returning a private counter on each call. Two counters built from the same factory must hold independent state. Call one closure three times, save the third returned value to ex_3_2, then call a second fresh counter once.
Expected result:
#> [1] 3
#> [1] 1
Difficulty: Intermediate
A function returned from another function keeps access to the variables of the call that built it.
Hold count in the factory, return an inner function that does count <<- count + 1 then returns count.
Click to reveal solution
Explanation: A closure is a function plus the environment in which it was defined. When make_counter returns the inner function, that environment, containing count, survives as long as the returned closure is reachable. The <<- operator walks up parent environments to find count and assigns to it in place, which is what gives the counter its memory. Each call to make_counter() builds a fresh enclosing environment, so two counters never share state. This is the core mechanism behind R6 classes, memoisation, and most stateful gadgets in R.
Exercise 3.3: Speed up a slow function with a memoising closure
Task: Write memoise(f) that takes a function and returns a closure storing prior (arg, result) pairs in a private list. The closure must call f once for each unique argument and reuse the cached result thereafter. Save memoise(sqrt)(16) to ex_3_3, then call the same wrapper on 25 to confirm the cache reuses cleanly.
Expected result:
#> [1] 4
#> [1] 5
Difficulty: Advanced
Storing past results in a private spot lets repeated calls skip the real computation.
Keep a cache list in the enclosing scope; on a miss assign cache[[key]] <<- f(x) keyed by as.character(x).
Click to reveal solution
Explanation: Memoisation trades memory for speed by caching results, and closures are the natural home for the cache because the environment is private and persists across calls. Using a list keyed by as.character(x) is the simplest implementation; for multi-argument functions and richer keys, the memoise package on CRAN uses digest::digest() to hash the inputs. The trade-off is that the cache grows without bound unless you cap it (an LRU eviction is a typical fix), and the cache key collapses numerically distinct values that share a string representation.
Exercise 3.4: Guarantee cleanup of a temporary file using on.exit
Task: Write with_tmpfile(action) that calls tempfile() to get a path, runs file.create(path) so the file actually exists, applies action(path), and is guaranteed to delete the file via on.exit(unlink(path)) even if action errors. Use file.exists as the action and save the resulting logical to ex_3_4.
Expected result:
#> [1] TRUE
Difficulty: Advanced
Cleanup code should run whether the work finishes normally or aborts partway with an error.
After tempfile() and file.create(), register on.exit(unlink(path)), then call action(path).
Click to reveal solution
Explanation: on.exit() registers an expression to be run when the enclosing function exits, no matter whether it returned normally or threw an error. This is the R idiom for "finally" blocks and is essential for closing database connections, restoring options(), releasing file handles, or removing scratch files. Place on.exit() immediately after creating the resource so a later error cannot bypass cleanup. By default a second on.exit() overwrites the first; use on.exit(expr, add = TRUE) to stack handlers.
Section 4. Anonymous lambdas and higher-order functions (4 problems)
Exercise 4.1: Compute column means of mtcars with an anonymous lambda
Task: Use sapply() together with an anonymous lambda written in the new \(x) ... shorthand to compute the mean of every column of the built-in mtcars data (all 11 columns are numeric). Save the resulting named numeric vector to ex_4_1.
Expected result:
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> 20.09063 6.18750 230.72188 146.68750 3.59656 3.21725 17.84875 0.43750 0.40625 3.68750 2.81250
Difficulty: Beginner
A data frame is a list of columns, so one apply call can walk every column in turn.
Use sapply(mtcars, \(x) mean(x)) with the \(x) lambda shorthand.
Click to reveal solution
Explanation: Since R 4.1, \(x) expr is a backslash shorthand for function(x) expr, much like Haskell's \x -> expr or Python's lambda x:. It is exactly equivalent to writing function(x) mean(x) and saves four characters per call site, which adds up when threading anonymous functions through sapply, Map, Reduce, or pipe sequences. A data frame is internally a list of columns, so sapply() walks each column. Of course sapply(mtcars, mean) works too, since mean is already a function and no lambda is required.
Exercise 4.2: Keep only numeric columns of iris with a predicate and Filter
Task: Use base R's higher-order function Filter() with the predicate is.numeric to drop the non-numeric Species column from the built-in iris data, keeping the four measurement columns intact. Save the resulting data.frame to ex_4_2.
Expected result:
#> Sepal.Length Sepal.Width Petal.Length Petal.Width
#> 1 5.1 3.5 1.4 0.2
#> 2 4.9 3.0 1.4 0.2
#> 3 4.7 3.2 1.3 0.2
#> # 147 more rows hidden
Difficulty: Intermediate
Selecting columns by a property of each column is a filtering job over a list of columns.
Call Filter(is.numeric, iris) to keep only the columns where the predicate is true.
Click to reveal solution
Explanation: Filter(predicate, x) keeps elements of x for which the predicate is TRUE. Because data frames are lists of columns, Filter(is.numeric, iris) walks each column and keeps the numeric ones. A predicate is any function that returns a logical scalar: is.character, is.factor, \(c) length(unique(c)) > 1. The base sibling Find() returns the first matching element, and Position() returns its index. These are the same building blocks that show up across functional languages, just spelled with title case in base R.
Exercise 4.3: Compose three numeric transforms into one function using Reduce
Task: Given the three transforms \(x) x + 1, sqrt, and \(x) round(x, 2), compose them into a single function pipeline that applies them left to right using Reduce(). Save pipeline(15) to ex_4_3 and verify on pipeline(99).
Expected result:
#> [1] 4
#> [1] 10
Difficulty: Intermediate
Threading a value through a sequence of steps, each fed the previous result, is a fold.
Use Reduce(\(acc, f) f(acc), fns, init = x) to apply the function list left to right.
Click to reveal solution
Explanation: Reduce(f, x, init) walks x left to right, threading an accumulator through f(acc, elem) calls and returning the final accumulator. Setting init = x and passing a list of functions turns the reduction into function composition: start with the input value, apply each function in turn. The trace here is 15 to 16 (plus 1), 16 to 4 (square root), 4 to 4 (round). The same idea generalises to fold operations in any functional language. For right-to-left composition, pass right = TRUE.
Exercise 4.4: Curry a three-argument function into a chain of unary closures
Task: Write curry3(f) that takes a function f of three arguments and returns function(a) function(b) function(c) f(a, b, c). Curry the three-argument paste and save the result of curry3(paste)("R")("is")("functional") to ex_4_4.
Expected result:
#> [1] "R is functional"
Difficulty: Advanced
Each nested function can remember the argument it was handed and wait for the next one.
Return function(a) function(b) function(c) f(a, b, c) as three layered closures.
Click to reveal solution
Explanation: Currying converts an n-argument function into a chain of n one-argument functions, named after Haskell Curry. Each nested closure captures the argument it received and waits for the next one. In R this is rarely written by hand because partial application via purrr::partial() or a simple \(x) f(a, b, x) is usually clearer, but the exercise pins down what closures actually carry: the chain works because each returned function remembers the values of a and b in its parent environments long after those parent calls have returned.
Section 5. Return values and recursion (3 problems)
Exercise 5.1: Return mean, median and SD as a single named list
Task: Write describe(x) that takes a numeric vector and returns a named list with three elements: mean, median, and sd. Returning a list is the R idiom for packing several values out of one call when a single scalar would lose information. Save describe(mtcars$mpg) to ex_5_1.
Expected result:
#> $mean
#> [1] 20.09062
#>
#> $median
#> [1] 19.2
#>
#> $sd
#> [1] 6.026948
Difficulty: Intermediate
When one call must surface several values, pack them into a single named container.
Return list(mean = mean(x), median = median(x), sd = sd(x)).
Click to reveal solution
Explanation: R functions return exactly one object, so when you need to surface multiple values the conventional answer is a named list, which then unpacks at the call site with result$mean. A named numeric vector via c(mean = mean(x), ...) is shorter when all values are scalars of the same type, but loses generality once you want to mix types (a vector and a data frame, say). For richer return types, an S3 class like structure(list(...), class = "describe_result") gives you a hook for a custom print method.
Exercise 5.2: Compute factorial recursively with a clear base case
Task: Write fact(n) that returns n! (factorial of n) using direct recursion: if n is 0 or 1 return 1, otherwise return n * fact(n - 1). The base case is what stops the recursion from running forever. Save fact(7) to ex_5_2.
Expected result:
#> [1] 5040
Difficulty: Intermediate
A self-calling function needs a stopping condition or it never terminates.
Return 1 when n <= 1, otherwise return n * fact(n - 1).
Click to reveal solution
Explanation: Recursion in R works just like in any other language: a function calls itself with a smaller input until a base case returns directly. The base case here is n <= 1, returning 1 for both 0! and 1!. R does not optimise tail calls, so deep recursions blow the call stack at depth options("expressions"), currently 5000. For factorial specifically the iterative prod(seq_len(n)) is faster, simpler, and immune to that limit; recursion shines when the structure of the problem is itself recursive, such as walking a tree.
Exercise 5.3: Memoise a recursive Fibonacci with a closure-bound cache
Task: Build make_fib() that returns a closure computing the n-th Fibonacci number recursively, but caches every result in an enclosing environment so repeated calls run in near-constant time. Use fib(0) = 0 and fib(1) = 1. Save the value at n = 25 to ex_5_3.
Expected result:
#> [1] 75025
Difficulty: Advanced
A recursive computation that revisits the same inputs benefits from remembering earlier answers.
Keep a cache list in the factory; the inner fib returns the cached value or stores fib(n - 1) + fib(n - 2) with <<-.
Click to reveal solution
Explanation: The naive recursive Fibonacci is exponential, computing fib(25) calls the function more than 200 thousand times. Caching prior results in the enclosing environment with <<- makes the recurrence linear in n because each fib(k) returns from cache after the first call. The closure pattern works because the inner fib is named, captured by the surrounding environment, and is what <<- writes into. An alternative is the memoise package on CRAN, which gives you the same speedup with one wrapper call and richer caches.
What to do next
- Revisit the parent tutorial R Functions for the syntax reference.
- Drill the
case_whenandswitch()patterns in the R Control Flow Exercises. - Practice the higher-order helpers further in the Apply Family Exercises in R.
- Build on dots-forwarding inside a tidy pipeline with the dplyr Exercises in R.
r-statistics.co · Verifiable credential · Public URL
This document certifies mastery of
R Functions (18 problems) Mastery
Every certificate has a public verification URL that proves the holder passed the assessment. Anyone with the link can confirm the recipient and date.
167 learners have earned this certificate