R Subsetting Exercises: 20 Practice Problems for Vectors, Lists, and Data Frames
Twenty scenario-based subsetting exercises grouped into five themed sections covering vectors, lists, data frames, matrices, and advanced patterns. Every problem ships with an expected result so you can verify your answer, and solutions stay hidden behind reveal toggles so you actually try first.
Section 1. Subsetting vectors (4 problems)
Exercise 1.1: Extract elements by position and by name
Task: A teacher has stored mid-term scores in a named numeric vector called scores. Use [] to first extract the 2nd and 4th scores by position, then extract the same two scores by name. Save the name-based extraction to ex_1_1.
Expected result:
#> science history
#> 92 95
Difficulty: Beginner
There are two ways to point at the same element: by the slot it occupies, or by the label attached to it.
Pass a vector of names like c("science", "history") inside [] to pick by name.
Click to reveal solution
Explanation: Position indexing with c(2, 4) and name indexing with c("science", "history") return the same named numeric vector here. Names survive every subsetting step, which is what makes the second form safer: if anyone reorders scores later, the position call grabs the wrong students while the name call still finds science and history.
Exercise 1.2: Logical and negative indexing on a vector
Task: From the scores vector created above, extract every score strictly greater than 85 using a logical condition. Then build a second result that excludes the 3rd element using negative indexing. Save the above-85 result to ex_1_2.
Expected result:
#> math science history
#> 88 92 95
Difficulty: Beginner
A comparison produces a TRUE/FALSE mask the same length as the vector, and the bracket keeps only the TRUE spots.
Put the condition scores > 85 inside [], and use a leading minus like scores[-3] to drop a position.
Click to reveal solution
Explanation: scores > 85 produces a logical vector aligned with scores, and [ ] keeps only positions where it evaluates TRUE. Negative indexing with -3 drops the third element, and -c(1, 3) drops several at once. Mixing positive and negative indices inside one call is an error in R.
Exercise 1.3: Replace selected elements via subsetting assignment
Task: A quality-control engineer just discovered that any reading below zero in the sensor vector readings is a calibration glitch and should be replaced with NA. Use logical indexing on the left side of <- to overwrite those bad values in place. Save the cleaned vector to ex_1_3.
Expected result:
#> [1] 12.4 NA 8.7 NA 0.5 15.6 22.1 NA
Difficulty: Intermediate
You can place a selection on the left side of the arrow, not just the right, to overwrite chosen elements in place.
Index with a condition on the left: readings[readings < 0] <- NA.
Click to reveal solution
Explanation: The left-hand side of <- accepts the same indexing syntax as the right-hand side, so R rewrites the call as "replace the selected positions with NA". The vector is modified in place rather than rebuilt. This idiom scales to data-cleaning steps where you want to mark bad values without dropping rows.
Exercise 1.4: Find positions of matching elements with which
Task: An audit team needs to know the row positions where customer IDs in ids equal the suspicious values "C103" or "C107". Use which() combined with %in% to return the integer positions inside ids. Save the positions to ex_1_4.
Expected result:
#> [1] 4 8
Difficulty: Intermediate
First mark which elements match, then convert those marks into the numbers that say where they sit.
Wrap the membership test ids %in% c("C103", "C107") inside which().
Click to reveal solution
Explanation: ids %in% c("C103", "C107") returns a logical vector marking matches. Wrapping it in which() converts those TRUE positions to integer indices, which is what you actually need to subset other parallel vectors (transaction amounts, timestamps) that sit alongside ids. Without which() you would still have a logical mask, useful for slicing the same vector but awkward for cross-referencing positions.
Section 2. Subsetting lists with [], [[]], and $ (4 problems)
Exercise 2.1: One bracket keeps the list, two brackets extract the element
Task: Given the list prefs storing a user's display preferences, use single [] to return a one-element sub-list containing just theme, and use [[]] to extract the value of theme as a bare character vector. Save the [[]] extraction to ex_2_1.
Expected result:
#> [1] "dark"
Difficulty: Beginner
One level of brackets hands back a smaller container of the same kind; doubling them reaches inside and pulls out the bare contents.
Use prefs[["theme"]] for the bare value and prefs["theme"] for the one-element sub-list.
Click to reveal solution
Explanation: prefs["theme"] keeps the list wrapper, so the result is still a list of length 1. prefs[["theme"]] unwraps and returns the underlying character vector. If you called nchar() on the first form you would hit an error; on the second it works cleanly. This is the single rule that catches the most beginners with R lists.
Exercise 2.2: Dollar sign as syntactic sugar for [[]]
Task: Continuing with the prefs list, use the $ operator to extract font_size directly. Then use [[]] to do the same and confirm with identical() that both calls return the same value. Save the $ extraction to ex_2_2.
Expected result:
#> [1] 14
#> [1] TRUE
Difficulty: Beginner
There is a shorthand operator that grabs a single named field directly, without quotes or brackets.
Write prefs$font_size, then check it against prefs[["font_size"]] with identical().
Click to reveal solution
Explanation: $name is shorthand for [["name"]] with one extra feature: it does partial matching on names (prefs$font would still work, which is dangerous). Unlike [[]], you cannot pass a variable holding the name to $; you must literally type the field. For programmatic code use [[]]; for quick interactive work, $ is fine.
Exercise 2.3: Recursive indexing with [[c(i, j, k)]]
Task: A list report contains a nested sub-list metrics, which itself holds a numeric vector monthly. Use the recursive form [[c(...)]] to reach two levels down and pull out the third element of monthly in one expression. Save it to ex_2_3.
Expected result:
#> [1] 162
Difficulty: Advanced
A single selector can describe a whole path downward, taking one step per element you list.
Pass the path vector c("metrics", "monthly", 3) to [[ ]].
Click to reveal solution
Explanation: When you pass a length-n vector to [[]], R drills one level per element, so [[c("metrics", "monthly", 3)]] is equivalent to report[["metrics"]][["monthly"]][[3]]. This is the cleanest way to fetch deeply nested data, but it errors hard if any intermediate name is missing. Reach for purrr::pluck() when paths might fail and you want a safe default.
Exercise 2.4: Partial matching pitfall with $
Task: A code reviewer is auditing a function where the previous author wrote config$item against a list whose actual key is items. Show that config$item still returns the value because of partial matching, then show that config[["item"]] returns NULL instead. Save the safe [[]] result to ex_2_4.
Expected result:
#> [1] "a" "b" "c"
#> NULL
Difficulty: Intermediate
One operator quietly guesses at near-miss names, while the other demands the spelling match exactly.
Use the exact-match form config[["item"]], which returns NULL when no key matches.
Click to reveal solution
Explanation: $ quietly partial-matches item against items and returns the value, which can mask bugs where a typo in the key would otherwise be caught. [[]] requires exact match by default and returns NULL for a missing key, surfacing the mistake immediately. Set options(warnPartialMatchDollar = TRUE) to get a warning, and prefer [[]] inside package code.
Section 3. Subsetting data frames (4 problems)
Exercise 3.1: Extract a data-frame column three different ways
Task: An analyst pulling totals from the built-in mtcars data frame wants to extract the mpg column three ways: with mtcars[, "mpg"], with mtcars[["mpg"]], and with mtcars$mpg. Confirm all three are identical with identical() and save the $ version to ex_3_1.
Expected result:
#> [1] 21.0 21.0 22.8 21.4 18.7 18.1
#> [1] TRUE
#> [1] TRUE
Difficulty: Intermediate
A data frame is built from a set of columns, so several different selectors all reach the same one.
Save mtcars$mpg, then compare it with identical() against mtcars[["mpg"]] and mtcars[, "mpg"].
Click to reveal solution
Explanation: A data frame is internally a list of equal-length columns, so all three forms reach the same vector. Use $ for interactive work, [["mpg"]] when the column name lives in a variable, and [, "mpg"] when you want consistent matrix-like notation. For tibbles (not base data frames), [, "mpg"] keeps the tibble wrapper, so the three forms diverge.
Exercise 3.2: Filter rows and select columns in one call
Task: A used-car appraiser inspecting mtcars needs every six-cylinder model along with just the cyl, mpg, and hp columns. Use [rows, cols] notation in a single call: a logical condition for rows and a character vector for columns. Save the result to ex_3_2.
Expected result:
#> cyl mpg hp
#> Mazda RX4 6 21.0 110
#> Mazda RX4 Wag 6 21.0 110
#> Hornet 4 Drive 6 21.4 110
#> Valiant 6 18.1 105
#> Merc 280 6 19.2 123
#> Merc 280C 6 17.8 123
#> Ferrari Dino 6 19.7 175
Difficulty: Intermediate
The space inside the brackets has two slots: the first picks rows, the second picks columns.
Use a condition for rows and a name vector for columns: mtcars[mtcars$cyl == 6, c("cyl", "mpg", "hp")].
Click to reveal solution
Explanation: The first argument inside [ , ] selects rows; the second selects columns. A logical vector for rows must be the same length as nrow(df). A character vector for columns is order-preserving, so output columns appear in the order you listed them, not the original column order. Rownames carry through into the subset by default.
Exercise 3.3: df[, 1] vs df[[1]] and the drop=TRUE trap
Task: Working on iris, show that iris[, 1] and iris[[1]] both return the same numeric vector, but iris[, 1, drop = FALSE] returns a single-column data frame instead. Save the data-frame-shaped result to ex_3_3.
Expected result:
#> [1] TRUE
#> Sepal.Length
#> 1 5.1
#> 2 4.9
#> 3 4.7
#> [1] "data.frame"
Difficulty: Advanced
Asking for a single column can quietly flatten the result unless you tell R to keep its rectangular shape.
Add the drop = FALSE argument: iris[, 1, drop = FALSE].
Click to reveal solution
Explanation: Base data frames have drop = TRUE as the default when you ask for a single column, which collapses the result to a bare vector and surprises pipelines that expect a data frame. Setting drop = FALSE preserves the column structure. Tibbles invert this default (a single-column tibble slice stays a tibble), which is one of the main reasons people migrate to tibbles.
Exercise 3.4: Subset with a combined logical mask
Task: A retailer auditing mtcars wants rows where cars are heavy (wt > 5) AND fuel-thirsty (mpg < 15). Build a logical mask combining the two conditions with & and use it inside [rows, ]. Save the filtered data frame to ex_3_4.
Expected result:
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> Cadillac Fleetwood 10.4 8 472 205 2.93 5.250 17.98 0 0 3 4
#> Lincoln Continental 10.4 8 460 215 3.00 5.424 17.82 0 0 3 4
#> Chrysler Imperial 14.7 8 440 230 3.23 5.345 17.42 0 0 3 4
Difficulty: Intermediate
Two separate conditions can be fused into one mask that is TRUE only where both hold at once.
Combine them with & like mtcars$wt > 5 & mtcars$mpg < 15, then pass that mask as the row index.
Click to reveal solution
Explanation: & is the vectorised AND operator that returns a logical vector the same length as its inputs; && short-circuits and returns a single value, so it would error inside [ , ]. Save the mask to a name when the expression is long: it makes the subset call short and easy to read, and lets you reuse the same mask for parallel slicing of other objects.
Section 4. Subsetting matrices (4 problems)
Exercise 4.1: Two-dimensional indexing with [row, col]
Task: Given a 4 by 3 matrix m of quarterly regional sales, extract the value at row 2, column 3 using m[2, 3]. Then return the entire 2nd row by leaving the column index empty. Save the single value at row 2, column 3 to ex_4_1.
Expected result:
#> [1] 132
#> North South East
#> 145 155 132
Difficulty: Intermediate
A matrix needs two coordinates, and leaving one of them blank means "all of that dimension".
Use m[2, 3] for the single cell and m[2, ] for the whole second row.
Click to reveal solution
Explanation: A matrix is indexed with two arguments inside [ , ]: row first, column second. Leaving an argument empty returns the full dimension. Both numeric and character (when dimnames exist) indexing work, so m["Q2", "East"] returns the same scalar. A single-cell extraction drops the matrix attribute and returns a bare numeric.
Exercise 4.2: Keep matrix shape with drop = FALSE
Task: An ML engineer building a pipeline expects every step to receive a matrix. Show that m[, 2] collapses to a named vector and breaks the contract, then re-do the same selection with drop = FALSE to keep it as a 4 by 1 matrix. Save the matrix-shaped result to ex_4_2.
Expected result:
#> South
#> Q1 130
#> Q2 155
#> Q3 170
#> Q4 184
Difficulty: Intermediate
Selecting a single column shrinks that dimension away unless you explicitly ask R not to.
Add the drop = FALSE argument: m[, 2, drop = FALSE].
Click to reveal solution
Explanation: The default drop = TRUE collapses any dimension of size 1, so a single-column slice becomes a vector and a single-row slice becomes a vector. Setting drop = FALSE keeps the dimension attribute intact. This matters most inside machine-learning pipelines or apply() callbacks where downstream code does dim(x)[1] or solve(x) and silently fails on a bare vector.
Exercise 4.3: Filter matrix rows with a logical condition
Task: A regional sales analyst wants the rows of m where the North region sold more than 150 units. Build a logical mask from m[, "North"] > 150 and use it as the row index inside m[rows, ]. Save the filtered matrix to ex_4_3.
Expected result:
#> North South East
#> Q3 162 170 150
#> Q4 178 184 165
Difficulty: Intermediate
Compare one column against a threshold to get a TRUE/FALSE mask aligned with the rows, then index with it.
Build m[, "North"] > 150 and pass it as the row index in m[mask, ].
Click to reveal solution
Explanation: m[, "North"] returns the North column as a vector; comparing it to 150 produces a logical vector aligned with the rows. Passing that vector as the row index keeps only matching rows. The number of TRUE entries determines nrow() of the result. The same pattern flips for columns: m[, m["Q1", ] > 100] filters columns by the Q1 row.
Exercise 4.4: Pull scattered cells with a 2-column index matrix
Task: A statistician needs the main diagonal of m's first 3 rows and 3 columns: cells (1, 1), (2, 2), and (3, 3). Build a 2-column matrix holding those row/column pairs and pass it directly to m[ ] with no comma to pull out just those cells. Save the result to ex_4_4.
Expected result:
#> [1] 120 155 150
Difficulty: Advanced
A table of coordinate pairs can name a scattered set of cells all at once.
Build a matrix with ncol = 2 holding the (row, col) pairs, then pass it to m[idx] with no comma inside the brackets.
Click to reveal solution
Explanation: When you pass an n by 2 integer matrix to m[ ] (no comma inside the brackets), R reads each row as an (i, j) pair and returns the value at that cell. This is the canonical way to pull a scattered set of cells in a single call, far cleaner than a loop. The same trick generalises: an n by k index matrix works on a k-dimensional array.
Section 5. Advanced subsetting patterns (4 problems)
Exercise 5.1: Extract from a deeply nested config list
Task: A site reliability engineer is reading a configuration list that holds a database sub-list containing a replicas numeric vector of port numbers. Use $ chaining first, then the recursive [[c(...)]] form, to pull out the second replica's port. Save the recursive-form result to ex_5_1.
Expected result:
#> [1] 5434
#> [1] 5434
Difficulty: Advanced
A path through nested lists can be written either one step at a time or packed into a single selector.
Pass the path vector c("database", "replicas", 2) to [[ ]].
Click to reveal solution
Explanation: $ chaining reads top-down and is fine when every key exists. The recursive [[c(...)]] form does the same drill in a single call and is easier to parameterise: store the path in a character vector and you can extract any field without touching code. Both forms throw a hard error if an intermediate name is missing, so wrap them in tryCatch() if paths might fail.
Exercise 5.2: Bulk-redact list fields with a name vector
Task: A data engineer needs to mask three sensitive fields (ssn, email, phone) inside a customer record list by replacing each value with the string "REDACTED". Use record[c("ssn", "email", "phone")] <- "REDACTED" to assign all three positions in a single call. Save the redacted list to ex_5_2.
Expected result:
#> $name
#> [1] "Alex Kim"
#>
#> $ssn
#> [1] "REDACTED"
#>
#> $email
#> [1] "REDACTED"
#>
#> $phone
#> [1] "REDACTED"
#>
#> $role
#> [1] "admin"
Difficulty: Intermediate
A single assignment can target several named slots at once and reuse one value across all of them.
Select with a name vector on the left: record[c("ssn", "email", "phone")] <- "REDACTED".
Click to reveal solution
Explanation: Subset assignment with a name vector recycles the right-hand side across all selected positions. Pass a list on the right to assign different values per slot, like record[...] <- list("X", "Y", "Z"). Use [<- (not [[<-) when writing to multiple slots in one shot. This pattern is the cleanest way to bulk-update a config or redact a record.
Exercise 5.3: Drop a list element by assigning NULL
Task: Given the same record list (the original, before redaction), drop the ssn field entirely by assigning NULL to it via [[<-. The name should disappear from names(record) and the list should shrink to four entries. Save the shrunk list to ex_5_3.
Expected result:
#> [1] "name" "email" "phone" "role"
Difficulty: Advanced
Writing "nothing" into a slot removes that slot entirely rather than leaving it empty.
Assign with the double-bracket form: record[["ssn"]] <- NULL.
Click to reveal solution
Explanation: Assigning NULL via [[<- is the documented way to drop a list element. The list shrinks by one and the surviving names are renumbered. To keep a slot that literally holds NULL (instead of removing it), wrap the value in list(NULL) on the right-hand side. The behaviour of [<- NULL is different and inconsistent, so always use [[<- for deletion.
Exercise 5.4: Pull a matrix diagonal with a row/col equality mask
Task: Build a 5 by 5 matrix mm filled with the integers 1 through 25, then use the logical mask row(mm) == col(mm) inside mm[ ] to extract the main diagonal in a single call. Save the diagonal vector to ex_5_4.
Expected result:
#> [1] 1 7 13 19 25
#> [1] 1 7 13 19 25
Difficulty: Advanced
The main diagonal is exactly the set of cells whose row number equals their column number.
Index the matrix with the logical mask row(mm) == col(mm).
Click to reveal solution
Explanation: row(mm) and col(mm) return matrices the same shape as mm filled with row and column indices respectively. The comparison row(mm) == col(mm) is a logical matrix that is TRUE only on the main diagonal. Indexing mm with that mask pulls out those cells. diag() is the shortcut; the mask form generalises to off-diagonals (e.g., row(mm) - col(mm) == 1) or any geometric pattern.
What to do next
You have practised every subsetting operator across vectors, lists, data frames, and matrices. Next, move on to one of these neighbouring hubs:
- dplyr Exercises in R for tidyverse-style row and column selection with
filter(),select(), andslice(). - Apply Family Exercises in R for using subsetting alongside
sapply(),lapply(), andvapply(). - R Vectors for a deeper review of how indexing interacts with names, coercion, and recycling.
- R Subsetting for the one unifying rule across
[,[[,$, and@.
r-statistics.co · Verifiable credential · Public URL
This document certifies mastery of
R Subsetting Mastery
Every certificate has a public verification URL that proves the holder passed the assessment. Anyone with the link can confirm the recipient and date.
180 learners have earned this certificate