R Control Flow Exercises: 18 if/else, Loop and switch Problems
Eighteen runnable practice problems covering R's control flow primitives: scalar if, vectorised ifelse() and case_when(), for and while loops, break and next, switch() dispatch, and the short-circuit operators && and ||. Every exercise hides its solution behind an expandable block so you can attempt it first.
Control flow in R is small in surface area and big in pitfalls. if insists on a length-one logical, for loops are usually the wrong tool when a vectorised function exists, and && is not just a faster &. These problems isolate each idiom on realistic data so the difference is concrete.
Section 1. if, else, and else-if chains (3 problems)
Exercise 1.1: Return a pass or fail label from a numeric exam score
Task: Given a single numeric variable score, write an if/else expression that returns the string "pass" when the score is at least 60 and "fail" otherwise. Assign the returned value (not a printed side effect) for score <- 72 to ex_1_1.
Expected result:
#> [1] "pass"
Difficulty: Beginner
A single threshold splits the score into two possible outcomes, and the comparison itself decides which one you get.
Use if (score >= 60) ... else ... as an expression and assign its value straight into ex_1_1.
Click to reveal solution
Explanation: In R, if is an expression that returns the value of its matched branch, so you can assign its result directly without an intermediate print(). This is unusual compared with Python or Java where if is a statement. The expression form keeps grading logic tight; for vectorised pass/fail across many scores you'd reach for ifelse() instead, since scalar if errors when fed a length>1 logical.
Exercise 1.2: Convert numeric exam scores into letter grades with an else-if chain
Task: The registrar needs a function grade(score) that maps a numeric exam score to a letter using these cutoffs: 90+ is "A", 80 to 89 is "B", 70 to 79 is "C", 60 to 69 is "D", and below 60 is "F". Build it with a chained if/else if/else and save grade(73) to ex_1_2.
Expected result:
#> [1] "C"
Difficulty: Intermediate
Test the strictest cutoff first so each later branch can assume the higher ones already failed.
Chain if (score >= 90) "A" else if (score >= 80) "B" ... and finish with a bare else "F".
Click to reveal solution
Explanation: The branches are tested top to bottom, so ordering from highest cutoff to lowest is required: a 95 would otherwise match the first satisfied test, not the strictest one. The trailing else is the catch-all for anything under 60, including negatives or zero. For three or more buckets this stays readable, but past five branches you should prefer dplyr::case_when() or cut() with labelled breaks, which both express the buckets declaratively.
Exercise 1.3: Build an audit flag combining amount and country with AND and OR
Task: A retail audit team flags a transaction for review when the amount exceeds 5000 OR the country code is outside the allowed set c("US","CA","GB","DE"). Write a function flag_txn(amount, country) returning TRUE or FALSE using || and !(... %in% ...), then save flag_txn(7500, "US") to ex_1_3.
Expected result:
#> [1] TRUE
Difficulty: Intermediate
The transaction is risky if either condition alone holds, so a single combined test is enough.
Return amount > 5000 || !(country %in% allowed) with the allowed set defined inside the function.
Click to reveal solution
Explanation: || short-circuits: the moment the first operand is TRUE R never evaluates the second. That matters when the second is expensive (a database call, say) or when the second could itself error. Use || for scalar guards like this and reserve | for element-wise OR on vectors. A common bug is feeding || a length>1 vector, which raises an error in R 4.3+ and silently used only the first element in older versions.
Section 2. Vectorised conditions: ifelse and case_when (3 problems)
Exercise 2.1: Label every mtcars row as efficient or thirsty in one call
Task: Use ifelse() on the mtcars$mpg column to produce a character vector of the same length where rows with mpg >= 20 are "efficient" and the rest are "thirsty". Save the resulting character vector to ex_2_1. Do not use a loop.
Expected result:
#> [1] "thirsty" "thirsty" "efficient" "efficient" "thirsty" "thirsty"
#> [7] "thirsty" "efficient" "efficient" "thirsty" "thirsty" "thirsty"
#> [13] "thirsty" "thirsty" "thirsty" "thirsty" "thirsty" "efficient"
#> ... 14 more values
Difficulty: Beginner
One call should map the whole numeric column to labels without visiting rows one at a time.
Pass the logical test mtcars$mpg >= 20 as the first argument to ifelse(), with "efficient" and "thirsty" as the yes/no values.
Click to reveal solution
Explanation: ifelse() is the vectorised twin of if/else: it accepts a logical vector and returns a vector of the same length, picking from the yes or no argument element-wise. It does not short-circuit, so both branches are fully evaluated; for that reason avoid it when one branch could error on certain inputs (use dplyr::if_else() which checks types more strictly, or case_when() for multiple buckets).
Exercise 2.2: Bin diamonds into three price tiers with case_when
Task: A jeweller preparing a quarterly sale wants to bucket the diamonds inventory into three tiers by price: "budget" for under 1000, "mid" for 1000 to 4999, and "premium" for 5000 and above. Add a tier column to diamonds using dplyr::case_when() and save the augmented tibble to ex_2_2.
Expected result:
#> # A tibble: 3 x 2
#> tier n
#> <chr> <int>
#> 1 budget 14524
#> 2 mid 28966
#> 3 premium 10450
Difficulty: Intermediate
Order the bands from cheapest upward so each later clause inherits the lower bound for free.
Inside mutate(), call case_when() with price < 1000 ~ "budget", price < 5000 ~ "mid", and a TRUE ~ "premium" catch-all.
Click to reveal solution
Explanation: case_when() walks its formulas top to bottom and the first match wins, so the second clause covers 1000 to 4999 without an explicit lower bound. The trailing TRUE ~ ... is the catch-all default: without it, rows above 5000 would become NA and you'd ship a bug. For two-way splits dplyr::if_else() is cleaner; reach for case_when() once you have three or more buckets or non-overlapping conditions on multiple columns.
Exercise 2.3: Classify Ozone air-quality readings with explicit NA handling
Task: A climate analyst categorising the airquality dataset's Ozone column into "good" (under 50), "moderate" (50 to 99), and "unhealthy" (100 and above), with explicit NA preserved for missing readings rather than coerced to a category. Build the labelled factor with case_when() and save to ex_2_3.
Expected result:
#>
#> good moderate unhealthy <NA>
#> 103 34 -- 37
#> (approximate counts; exact values depend on Ozone NAs)
Difficulty: Advanced
Decide what a missing reading should become before any numeric band is considered.
Make is.na(Ozone) ~ NA_character_ the first clause of case_when(), then add the < 50 and < 100 bands.
Click to reveal solution
Explanation: Putting is.na(Ozone) as the first clause and returning NA_character_ is the idiomatic way to keep missingness out of your buckets. Without it, case_when() would silently push NA rows into whichever branch the comparison evaluates to (and NA < 50 evaluates to NA, which means none of the clauses match and you get NA in the output anyway, but with no clear signal of intent). Being explicit documents the policy and stops a future maintainer from wondering whether the gap was deliberate.
Section 3. for loops on real data (3 problems)
Exercise 3.1: Collect squares of 1 to 5 into a preallocated numeric vector
Task: Use a classic for loop over 1:5 to compute the square of each integer and store the results in a preallocated numeric vector of length 5 (do not grow with c()). Save the final vector to ex_3_1 so you can compare it with the vectorised one-liner (1:5)^2.
Expected result:
#> [1] 1 4 9 16 25
Difficulty: Beginner
Write each result into a slot that already exists rather than extending the container as you go.
Inside the loop, assign ex_3_1[i] <- i^2 using the loop index i.
Click to reveal solution
Explanation: Preallocating with numeric(n) and writing into known positions is the right way to write a for loop in R because each c()/append() reallocates the entire vector, giving you O(n^2) behaviour. Of course (1:5)^2 is shorter, faster, and the idiomatic R way: reach for explicit for only when the next iteration genuinely depends on the previous one or when the body does side effects you cannot vectorise away.
Exercise 3.2: Mean of every mtcars column with a loop, then compare with sapply
Task: Loop over every column of mtcars using seq_along(mtcars), compute the column mean, and accumulate the results into a named numeric vector with the column names attached. Save the named vector to ex_3_2 and confirm it equals sapply(mtcars, mean) element by element.
Expected result:
#> mpg cyl disp hp drat wt qsec vs am
#> 20.09063 6.18750 230.72188 146.68750 3.59656 3.21725 17.84875 0.43750 0.40625
#> gear carb
#> 3.68750 2.81250
Difficulty: Intermediate
Each pass handles one column and drops its average into the matching named slot.
Use mtcars[[j]] to pull column j as a vector, then assign mean(...) of it to ex_3_2[j].
Click to reveal solution
Explanation: seq_along(mtcars) is safer than 1:ncol(mtcars) because it returns an empty integer for a zero-column data frame instead of 1:0 (which is c(1, 0) and would error). Using mtcars[[j]] extracts the column as a vector; mtcars[j] would give you a one-column data frame and mean() would warn. In production code prefer sapply(), vapply(), or colMeans() for numeric matrices: the loop here is a teaching tool, not the idiomatic choice.
Exercise 3.3: Compute cumulative drug exposure per subject with a grouped loop
Task: A pharmacology team running a dose-response study has a tibble of dosing events with subject_id and dose_mg. For each unique subject, compute the cumulative dose across their events (preserving original row order) and append it as a new column cum_dose. Use a for loop over subjects and save the augmented tibble to ex_3_3.
Expected result:
#> # A tibble: 6 x 3
#> subject_id dose_mg cum_dose
#> <chr> <dbl> <dbl>
#> 1 S01 50 50
#> 2 S01 100 150
#> 3 S02 25 25
#> 4 S02 75 100
#> 5 S02 100 200
#> 6 S01 50 200
Difficulty: Advanced
Process one subject at a time: find their rows and accumulate within that group only, untouched rows keeping their order.
Use which(ex_3_3$subject_id == s) to get the row indices, then write cumsum() of their doses back to those positions.
Click to reveal solution
Explanation: The trick is using which() to capture the row positions per subject and writing back to those exact indices, which preserves the original event order without sorting. The same result in tidy idiom is one line: dosing |> group_by(subject_id) |> mutate(cum_dose = cumsum(dose_mg)). Knowing both versions matters because longitudinal clinical pipelines often interleave loops (when the next-event logic depends on prior state) with group_by (when it does not).
Section 4. while loops with break and next (3 problems)
Exercise 4.1: Count how many doublings of 1 it takes to exceed 1000
Task: Starting with x <- 1, use a while loop that doubles x until it strictly exceeds 1000, counting each doubling. Save the final iteration count (not the final value of x) to ex_4_1. Verify mentally that the answer is log2(1000) rounded up.
Expected result:
#> ex_4_1
#> [1] 10
#> ceiling(log2(1000))
#> [1] 10
Difficulty: Intermediate
Each pass both changes the value being tested and records that a pass happened.
In the loop body do x <- x * 2 and ex_4_1 <- ex_4_1 + 1.
Click to reveal solution
Explanation: The condition is tested at the top of each pass, so the loop exits as soon as x crosses 1000 with the count already incremented. ceiling(log2(1000)) is 10, which matches: a while is the right tool whenever the number of iterations is implicit in a stopping condition rather than a known range. If the condition can never become false (a typo, perhaps) you have an infinite loop, so when in doubt add a hard iteration cap as a safety net.
Exercise 4.2: Find the first day a return path crosses a stop-loss threshold
Task: A trading desk simulating a 100-day P&L path needs to know on which day the cumulative log-return first crosses below -10%. Given the returns vector below, iterate with a while loop, breaking out the moment the cumulative sum drops below -0.10. Save the breaching day index (or NA_integer_ if it never breaches) to ex_4_2.
Expected result:
#> ex_4_2
#> [1] 47
#> # cumulative return at that day is just below -0.10
Difficulty: Intermediate
Advance the running total one day at a time and bail out the instant it sinks past the threshold.
Increment day, add rets[day] to cum_ret, and when cum_ret < -0.10 set ex_4_2 <- day then break.
Click to reveal solution
Explanation: break exits the loop immediately, leaving everything after it in the current iteration unexecuted, which is exactly what you want for a first-crossing test: there is no point continuing once the answer is known. The equivalent vectorised form which(cumsum(rets) < -0.10)[1] is shorter but evaluates the whole series even when day 1 already breaches. For backtests over millions of paths the loop with break is often faster than the vectorised one, since the average breach happens long before the path ends.
Exercise 4.3: Sum 1 to 50, skip multiples of 3, stop once running total tops 200
Task: Iterate over 1:50. Use next to skip any value divisible by 3 without adding it to the running total. Use break to exit the moment the running total strictly exceeds 200. Save a list with two elements, total (final running total) and index (the value of i when the loop stopped), to ex_4_3.
Expected result:
#> $total
#> [1] 202
#>
#> $index
#> [1] 25
Difficulty: Advanced
Some values are skipped before they ever touch the total, and the loop ends as soon as the total is high enough.
Use if (i %% 3 == 0) next to skip, accumulate into total, and break once total > 200 after storing the result list.
Click to reveal solution
Explanation: next jumps to the start of the next iteration without running the rest of the loop body, which is cleaner than wrapping the rest of the body in an if (!divisible) { ... } block. break and next always apply to the innermost loop, so in nested loops you must structure with sentinel flags or refactor into a function with an early return() when you need to bail out of the outer loop. The pair shows up constantly in batch jobs that must filter and stop, such as web scrapers honouring rate limits.
Section 5. switch and dispatch (3 problems)
Exercise 5.1: Translate severity codes to integer levels with switch
Task: A junior analyst onboarding to an ops dashboard needs to convert string severity codes from log lines ("INFO", "WARN", "ERROR", "FATAL") to integer levels (1, 2, 3, 4). Write a function severity_level(code) using switch() that returns the integer, and save the result of severity_level("ERROR") to ex_5_1.
Expected result:
#> severity_level("ERROR")
#> [1] 3
#> severity_level("FATAL")
#> [1] 4
Difficulty: Beginner
A fixed set of codes maps cleanly onto a lookup table, with a fallback for anything unrecognised.
Call switch(code, INFO = 1L, WARN = 2L, ERROR = 3L, FATAL = 4L, NA_integer_).
Click to reveal solution
Explanation: switch() on a string matches the argument against the unnamed-or-named branches; the trailing unnamed expression (NA_integer_ here) is the default returned when nothing matches, which is how you defend against typos like "WANR". Unlike chained if/else it is constant-time and reads as a lookup table. The downside is that switch() with numeric input is positional, not value-based, so always coerce to character first if your codes happen to be numeric.
Exercise 5.2: Build a summarise_by helper that dispatches on a function name
Task: A reporting analyst writes a generic helper summarise_by(x, fn_name) where fn_name is one of "mean", "median", "max", or "sd". The function uses switch() to call the corresponding base function on the numeric vector x. Save the result of summarise_by(mtcars$mpg, "median") to ex_5_2.
Expected result:
#> [1] 19.2
Difficulty: Intermediate
The name picks which summary to compute, and an unknown name should fail loudly rather than return nothing.
Use switch(fn_name, mean = mean(x), median = median(x), ...) with stop(...) as the default branch.
Click to reveal solution
Explanation: Using stop() as the default branch turns an unknown name into a loud failure instead of silently returning NULL, which is what switch() does when there is no default. A more dynamic alternative is do.call(fn_name, list(x)) or match.fun(fn_name)(x), but switch() keeps the supported menu explicit at the call site, which is easier to read and to audit in a regulated reporting workflow.
Exercise 5.3: Convert between kg, lb, Celsius and Fahrenheit using a switch table
Task: An ops engineer building a small unit-converter receives a value and a conversion code from a config file: "kg_to_lb", "lb_to_kg", "c_to_f", or "f_to_c". Write convert(value, code) that uses switch() and the standard formulas (1 kg = 2.20462 lb; F = C*9/5 + 32). Save the result of convert(100, "c_to_f") to ex_5_3.
Expected result:
#> [1] 212
Difficulty: Advanced
The conversion code picks one formula from a closed menu of known transformations.
Use switch(code, c_to_f = value * 9 / 5 + 32, ...) with stop(...) for an unrecognised code.
Click to reveal solution
Explanation: switch() shines when the dispatch keys are a closed enumeration like a unit catalogue: adding a new conversion is one line and the structure documents what is supported. For a fully extensible converter you'd register conversion functions in a named list and look them up with conversions[[code]](value), which is the idiomatic R version of the Strategy pattern. The cutoff between switch() and a function table is roughly five or six entries.
Section 6. Short-circuit operators and defensive guards (3 problems)
Exercise 6.1: Write a scalar predicate that uses double-pipe AND short-circuit checks
Task: Write a predicate is_positive_number(x) that returns TRUE only when x is exactly length 1, numeric, not NA, and strictly greater than zero. Use && rather than & to short-circuit the cheap checks before the expensive ones. Save the result of is_positive_number(3) to ex_6_1.
Expected result:
#> [1] TRUE
Difficulty: Intermediate
Chain the checks cheapest-and-safest first so a bad input is rejected before any risky comparison runs.
Combine length(x) == 1, is.numeric(x), !is.na(x), and x > 0 with &&.
Click to reveal solution
Explanation: Each && only evaluates its right operand if the left is TRUE. That ordering is intentional: testing length(x) == 1 before x > 0 prevents the value comparison from accidentally returning a length-2 logical that would crash an if later. & would not short-circuit and would also return a vector for vector inputs, which is wrong for a scalar guard. In R 4.3+ feeding && a non-length-1 vector raises an error, which is why ordering the length check first matters.
Exercise 6.2: Combine AND and OR in a fraud-risk flagging function
Task: A fraud team's policy: flag a transaction when (amount > 10000 AND country is in the high-risk list c("XX","YY")) OR (amount > 50000 regardless of country). Write is_suspicious(amount, country) using && and || so the short-circuit avoids the %in% lookup when amount is small. Save the result of is_suspicious(60000, "US") to ex_6_2.
Expected result:
#> [1] TRUE
Difficulty: Intermediate
Two separate routes lead to a flag, and the cheap amount test should gate the costlier list lookup.
Return (amount > 10000 && country %in% high_risk) || amount > 50000.
Click to reveal solution
Explanation: Parentheses change everything: without them && binds tighter than || and R would still parse it correctly, but explicit grouping makes the policy auditable for a compliance reviewer. The short-circuit means that for the 99% of transactions under 10000 the %in% lookup never runs, which matters at high volume. In production you'd vectorise this for batch scoring with & and |, but for a per-row guard inside an apply loop or a service handler the scalar version is the right choice.
Exercise 6.3: Guard a vector against both NULL and NA before computing a summary
Task: A code reviewer flagged a bug where safe_mean(x) crashed on NULL input because is.na(NULL) returns logical(0), not a usable scalar. Rewrite safe_mean(x) using is.null(x) || all(is.na(x)) to short-circuit the is.null check before touching x, returning NA_real_ for those bad inputs. Save safe_mean(NULL) to ex_6_3.
Expected result:
#> safe_mean(NULL)
#> [1] NA
#> safe_mean(c(1, 2, NA))
#> [1] 1.5
Difficulty: Advanced
Reject the unusable inputs up front, before any averaging is attempted.
Guard with if (is.null(x) || all(is.na(x))) return(NA_real_), then fall through to mean(x, na.rm = TRUE).
Click to reveal solution
Explanation: The first operand of || is evaluated and if is.null(x) is TRUE the second operand is never touched, so all(is.na(NULL)) (which would be TRUE anyway, vacuously) is never reached. If you wrote is.na(x) || is.null(x) the order would matter for length>1 inputs since is.na() returns a vector; using all(is.na(x)) collapses it to a scalar so || is well-defined. This pattern, ordering cheap and safe checks first, is the foundation of robust R input validation.
What to do next
- Revisit the parent tutorial R Control Flow for the syntax reference.
- Practice the loop alternative with the Apply Family Exercises in R.
- Build on the vectorised conditions in the dplyr Exercises in R.
- Tighten function-writing fundamentals in the R Functions Exercises.
r-statistics.co · Verifiable credential · Public URL
This document certifies mastery of
R Control Flow (18 problems) Mastery
Every certificate has a public verification URL that proves the holder passed the assessment. Anyone with the link can confirm the recipient and date.
137 learners have earned this certificate