tidyr Nest Unnest Exercises in R: 18 List-Column Drills
Eighteen scenario-based exercises on tidyr's nest/unnest family and the many-models workflow that depends on it. Each problem ships with an expected result so you can verify, and full solutions plus explanations are hidden behind reveal toggles so you actually try first.
Section 1. Building list-columns with nest (3 problems)
Exercise 1.1: Nest mtcars rows by cylinder count
Task: A bench engineer wants to keep each cylinder group's full data as a single bundle so it can be passed around as one object per group. Use nest() on mtcars (after as_tibble()) to collapse all non-cyl columns into a list-column named data, grouped by cyl. Save the result to ex_1_1.
Expected result:
#> # A tibble: 3 x 2
#> cyl data
#> <dbl> <list>
#> 1 6 <tibble [7 x 10]>
#> 2 4 <tibble [11 x 10]>
#> 3 8 <tibble [14 x 10]>
Difficulty: Beginner
Think of each cylinder group's rows as one bundle you can keep in a single cell of the table.
Use nest() with data = -cyl so every column except cyl is sent into the list-column.
Click to reveal solution
Explanation: Modern nest() uses tidy-select inside data = ... to decide which columns get bundled. data = -cyl means "everything except cyl goes into the list-column called data," which is the cleanest grouping idiom. The older form group_by(cyl) |> nest() also works but leaves a grouped tibble that can surprise downstream verbs.
Exercise 1.2: Nest mpg by manufacturer and class
Task: The fleet-pricing team wants one row per manufacturer-class combination in ggplot2::mpg, with all remaining columns rolled up so they can attach a metadata blob later. Nest mpg with data = -c(manufacturer, class) and add a column n containing each nest's row count. Save the result to ex_1_2.
Expected result:
#> # A tibble: 32 x 4
#> manufacturer class data n
#> <chr> <chr> <list> <int>
#> 1 audi compact <tibble [15 x 9]> 15
#> 2 audi midsize <tibble [3 x 9]> 3
#> 3 chevrolet 2seater <tibble [5 x 9]> 5
#> 4 chevrolet midsize <tibble [5 x 9]> 5
#> 5 chevrolet suv <tibble [9 x 9]> 9
#> ...
#> # 27 more rows hidden
Difficulty: Intermediate
Bundle the leftover columns per group first, then count how many rows landed inside each bundle.
Pair nest(data = -c(manufacturer, class)) with a mutate() that calls map_int(data, nrow).
Click to reveal solution
Explanation: map_int(data, nrow) is type-stable: it forces an integer return per element, so you get a plain <int> column instead of a list. Computing row counts after nesting is a common debugging step, since a stray nesting key can leave you with one-row nests that hint at a join problem upstream. count(manufacturer, class) would give the same n but without the bundled data.
Exercise 1.3: Nest ChickWeight twice for a hierarchical structure
Task: A growth-curve study tracks individual chicks within diets. Build a two-level nest from ChickWeight: first nest by Diet to get one row per diet with a list-column of all chicks, then add a second list-column called by_chick where each Diet's data is further nested by Chick. Save the result to ex_1_3.
Expected result:
#> # A tibble: 4 x 3
#> Diet data by_chick
#> <fct> <list> <list>
#> 1 1 <tibble [220 x 3]> <tibble [20 x 2]>
#> 2 2 <tibble [120 x 3]> <tibble [10 x 2]>
#> 3 3 <tibble [120 x 3]> <tibble [10 x 2]>
#> 4 4 <tibble [118 x 3]> <tibble [10 x 2]>
Difficulty: Intermediate
Build the outer grouping first, then split each outer bundle again by the inner key.
After nesting by Diet, add by_chick with map(data, ~ nest(.x, chick_data = -Chick)).
Click to reveal solution
Explanation: Nesting inside a map() lets you build hierarchical structures cheaply: outer row per Diet, inner tibble per Chick. The lambda ~ nest(.x, chick_data = -Chick) runs on each Diet's data subset. This pattern shows up whenever a downstream model wants the second-level groups available but not yet unrolled. Avoid nest_by() here because chaining a second nest_by() inside a rowwise frame is awkward.
Section 2. Unnesting list-columns (3 problems)
Exercise 2.1: Reverse a nested tibble back to flat
Task: A teammate handed you the nested mtcars tibble from Exercise 1.1, but the next step of the pipeline needs a flat frame. Take ex_1_1 and call unnest() on the data column so every original row is restored, with cyl preserved as a leading column. Save the result to ex_2_1.
Expected result:
#> # A tibble: 32 x 11
#> cyl mpg disp hp drat wt qsec vs am gear carb
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 6 21 160 110 3.9 2.62 16.5 0 1 4 4
#> 2 6 21 160 110 3.9 2.88 17.0 0 1 4 4
#> 3 4 22.8 108 93 3.85 2.32 18.6 1 1 4 1
#> ...
#> # 29 more rows hidden
Difficulty: Beginner
You want to undo the bundling so every original row sits flat in the table again.
Call unnest() on the data column.
Click to reveal solution
Explanation: unnest() is the inverse of nest() when applied to a list-column of tibbles or data frames: rows of the inner frames are stacked, and outer columns (here cyl) are recycled along the way. If the inner frames have inconsistent columns, unnest() fills with NA. Note that round-tripping nest() then unnest() does not preserve row order unless the input was already sorted by the nesting key.
Exercise 2.2: Spread a list-column of vectors into rows
Task: Suppose the survey backend dumped each respondent's tag answers as an R list of character vectors. Given the tibble below with one list-column tags, use unnest_longer() to produce one row per tag, preserving the respondent_id. Save the result to ex_2_2.
Expected result:
#> # A tibble: 7 x 2
#> respondent_id tags
#> <int> <chr>
#> 1 1 r
#> 2 1 sql
#> 3 2 python
#> 4 2 r
#> 5 2 julia
#> 6 3 sql
#> 7 3 r
Difficulty: Intermediate
Each respondent's vector of tags should be spread so one tag occupies one row.
Use unnest_longer() on the tags column.
Click to reveal solution
Explanation: unnest_longer() puts each element of a list-column on its own row, recycling the outer columns. It is the right tool when the list elements are unnamed atomic vectors (here, tag strings). If you wanted the original positions, you would add indices_include = TRUE to materialize a tags_id integer column.
Exercise 2.3: Flatten named lists into columns with unnest_wider
Task: An API response gives you one row per user, with each user's profile stored as a named R list (fields: name, age, city). Use unnest_wider() to lift those names into top-level columns alongside user_id. Save the result to ex_2_3.
Expected result:
#> # A tibble: 3 x 4
#> user_id name age city
#> <int> <chr> <int> <chr>
#> 1 1 Anna 34 Berlin
#> 2 2 Bilal 29 Lahore
#> 3 3 Chen 41 Taipei
Difficulty: Intermediate
The named fields inside each profile list should become their own columns.
Use unnest_wider() on the profile column.
Click to reveal solution
Explanation: unnest_wider() lifts the names of each list element into column names, producing one row per outer row and one column per unique inner name. It is the standard first move when ingesting JSON via jsonlite::fromJSON(..., simplifyVector = FALSE). If some users were missing a field, unnest_wider() fills NA for that column and does not error.
Section 3. Mapping over list-columns (4 problems)
Exercise 3.1: Compute mean mpg per cyl group on a nested frame
Task: Working on the nested mtcars from ex_1_1, compute the mean mpg for each cyl group by mapping mean() over the data list-column inside a mutate(). Return a new column mean_mpg of type double. Save the result to ex_3_1.
Expected result:
#> # A tibble: 3 x 3
#> cyl data mean_mpg
#> <dbl> <list> <dbl>
#> 1 6 <tibble [7 x 10]> 19.7
#> 2 4 <tibble [11 x 10]> 26.7
#> 3 8 <tibble [14 x 10]> 15.1
Difficulty: Intermediate
Reach into each group's bundled table and reduce its mileage values to a single number.
Inside mutate(), use map_dbl(data, ~ mean(.x$mpg)) so the column stays a plain double.
Click to reveal solution
Explanation: Inside map_dbl(), .x is the inner tibble for each cyl group, so .x$mpg is a numeric vector you can hand to mean(). Use map_dbl() rather than plain map() because you know each call returns a single double: the type-stable variant catches surprises like an inner tibble with no rows (which would yield NaN rather than a list element). A non-list scalar return keeps the column easy to filter and arrange.
Exercise 3.2: Fit a linear model per cyl group
Task: Continue from ex_1_1 and fit a simple linear regression of mpg ~ wt separately for each cyl group. Store each fitted lm object in a new list-column called model. Save the result to ex_3_2.
Expected result:
#> # A tibble: 3 x 3
#> cyl data model
#> <dbl> <list> <list>
#> 1 6 <tibble [7 x 10]> <lm>
#> 2 4 <tibble [11 x 10]> <lm>
#> 3 8 <tibble [14 x 10]> <lm>
Difficulty: Intermediate
Each group's bundled table can be fed to a regression, and the fitted object rides along in a new cell.
In mutate(), use map(data, ~ lm(mpg ~ wt, data = .x)) to store one fit per row.
Click to reveal solution
Explanation: This is the kernel of the many-models pattern: one row per group, with a fitted model carried alongside its training data. Plain map() is the right choice because each call returns an lm object, not a scalar. Storing models in a list-column means you can extract coefficients, predict, or compute diagnostics later without re-fitting, and the table acts as a single auditable artifact instead of a scattered set of named objects.
Exercise 3.3: Pull glance metrics from each model
Task: Use broom::glance() to extract a one-row summary (r.squared, adj.r.squared, sigma, etc.) for each model in ex_3_2, store as a list-column glance, then unnest those summaries to produce a flat per-cyl metrics table. Drop the data and model columns from the final output. Save the result to ex_3_3.
Expected result:
#> # A tibble: 3 x 13
#> cyl r.squared adj.r.squared sigma statistic p.value df logLik AIC
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 6 0.465 0.357 1.17 4.34 0.0918 1 -9.83 25.7
#> 2 4 0.509 0.454 3.33 9.32 0.0137 1 -27.7 61.5
#> 3 8 0.423 0.375 2.02 8.80 0.0118 1 -28.7 63.4
#> # 4 more columns hidden
Difficulty: Advanced
Turn each fitted model into its one-row scorecard, then flatten those scorecards into a plain table.
Map broom::glance() over the model column, then select(cyl, glance) and unnest() it.
Click to reveal solution
Explanation: broom::glance() always returns a one-row tibble of model-level metrics, which makes it perfect to stash in a list-column and then unnest. Because every inner tibble has the same columns, the unnest produces a clean rectangular result. Compare this with broom::tidy(), which returns one row per coefficient (variable length), so unnesting widens the row count instead of just the column count.
Exercise 3.4: Tidy coefficients into a long table
Task: Now use broom::tidy() on each lm in ex_3_2 to recover its coefficient table (term, estimate, std.error, statistic, p.value), unnest it, and keep cyl alongside the coefficient rows so each row identifies its group. Save the result to ex_3_4.
Expected result:
#> # A tibble: 6 x 6
#> cyl term estimate std.error statistic p.value
#> <dbl> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 6 (Intercept) 28.4 4.18 6.79 0.00105
#> 2 6 wt -2.78 1.33 -2.08 0.0918
#> 3 4 (Intercept) 39.6 4.35 9.10 0.0000777
#> 4 4 wt -5.65 1.85 -3.05 0.0137
#> 5 8 (Intercept) 23.9 3.01 7.94 0.00000405
#> 6 8 wt -2.19 0.739 -2.97 0.0118
Difficulty: Advanced
Each model carries a small table of one row per coefficient that you want stacked under its group.
Map broom::tidy() over the model column, keep cyl, and unnest() the result.
Click to reveal solution
Explanation: broom::tidy() produces a coefficient-per-row tibble, so unnesting multiplies the row count by the number of terms in each model. Once you have this long table, downstream verbs are trivial: filter(term == "wt") isolates the slope, and a join against a labels table can attach human-readable names. Keep the result long; only pivot to wide for presentation.
Section 4. The many-models workflow (4 problems)
Exercise 4.1: Compare three specifications per group
Task: For each cyl group in the nested mtcars, fit three competing specs side by side: mpg ~ wt, mpg ~ wt + hp, and mpg ~ wt * hp. Lay them out as three list-columns named m1, m2, m3 and compute the AIC of each into matching columns aic1, aic2, aic3. Save the result to ex_4_1.
Expected result:
#> # A tibble: 3 x 8
#> cyl data m1 m2 m3 aic1 aic2 aic3
#> <dbl> <list> <list> <list> <list> <dbl> <dbl> <dbl>
#> 1 6 <tibble [7 x 10]> <lm> <lm> <lm> 25.7 27.4 28.1
#> 2 4 <tibble [11 x 10]> <lm> <lm> <lm> 61.5 61.7 63.5
#> 3 8 <tibble [14 x 10]> <lm> <lm> <lm> 63.4 56.4 56.9
Difficulty: Advanced
Fit each competing formula into its own cell, then score each fit with one number alongside it.
In one mutate(), build m1/m2/m3 with map(data, ~ lm(...)) and aic1/aic2/aic3 with map_dbl(m1, AIC).
Click to reveal solution
Explanation: This is the model-bake-off skeleton: keep the candidate models adjacent so you can compare diagnostics row-wise. For more than three specs, pivot to a longer layout with one row per (group, spec) and a single model list-column. Notice the 8-cylinder group prefers m2 or m3, while small samples in cyl=6 leave AIC nearly tied: the worst single penalty is AIC pretending precision the data does not have.
Exercise 4.2: Extract residuals with augment, then unnest
Task: For diagnostic plotting you need a flat tibble of per-row residuals tagged by their cyl group. Apply broom::augment() to each model in ex_3_2, unnest the result, and keep cyl, mpg, wt, .fitted, and .resid. Save the result to ex_4_2.
Expected result:
#> # A tibble: 32 x 5
#> cyl mpg wt .fitted .resid
#> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 6 21 2.62 21.1 -0.137
#> 2 6 21 2.88 20.4 0.633
#> 3 6 21.4 3.21 19.5 1.93
#> 4 4 22.8 2.32 26.5 -3.74
#> 5 4 24.4 3.19 21.6 2.84
#> ...
#> # 27 more rows hidden
Difficulty: Advanced
Ask each model for its per-observation table of fitted values and residuals, then flatten it.
Map broom::augment() over model, unnest() it, and select() the columns you need.
Click to reveal solution
Explanation: augment() is the third leg of the broom trio: while tidy() returns coefficients and glance() returns model-level metrics, augment() returns one row per observation with fitted and residual columns added. Once unnested, the result is a clean long frame you can pass straight to ggplot(aes(x = wt, y = .resid)) faceted by cyl. Using select() to drop data and model keeps the final tibble compact.
Exercise 4.3: Predict at new data points per group
Task: A planner wants predicted mpg at three reference weights (wt = 2.5, 3.0, 3.5) for each cyl group. For every model in ex_3_2, predict at those new weights and unnest the result so each row holds cyl, wt, and a .pred column. Save the result to ex_4_3.
Expected result:
#> # A tibble: 9 x 3
#> cyl wt .pred
#> <dbl> <dbl> <dbl>
#> 1 6 2.5 21.5
#> 2 6 3 20.1
#> 3 6 3.5 18.7
#> 4 4 2.5 25.5
#> 5 4 3 22.7
#> 6 4 3.5 19.9
#> 7 8 2.5 18.4
#> 8 8 3 17.3
#> 9 8 3.5 16.2
Difficulty: Advanced
For each model, build a tiny table that pairs the reference weights with their predictions so the two stay aligned through flattening.
Map over model returning tibble(wt = ref_grid$wt, .pred = predict(.x, newdata = ref_grid)), then unnest().
Click to reveal solution
Explanation: Wrapping the prediction call in tibble(wt = ..., .pred = predict(...)) is the trick that keeps the new x-values aligned with their predictions when you unnest. If you returned just the prediction vector and tried to recycle ref_grid$wt later, sorting or duplicate-handling in unnest() could silently misalign rows. Always carry your reference grid through the same map step that produces the prediction.
Exercise 4.4: Build an AIC leaderboard
Task: Reshape ex_4_1 into a long leaderboard with one row per (cyl, spec) combination, columns cyl, spec (values "m1", "m2", "m3"), and aic. Sort within each cyl by aic ascending. Save the result to ex_4_4.
Expected result:
#> # A tibble: 9 x 3
#> cyl spec aic
#> <dbl> <chr> <dbl>
#> 1 6 m1 25.7
#> 2 6 m2 27.4
#> 3 6 m3 28.1
#> 4 4 m1 61.5
#> 5 4 m2 61.7
#> 6 4 m3 63.5
#> 7 8 m2 56.4
#> 8 8 m3 56.9
#> 9 8 m1 63.4
Difficulty: Advanced
The three score columns should stack into one value column labelled by which spec produced them, sorted best-first within each group.
Use pivot_longer(starts_with("aic"), names_to = "spec", values_to = "aic", names_prefix = "aic"), then arrange(aic, .by_group = TRUE).
Click to reveal solution
Explanation: Even when models live in list-columns, summary metrics are scalars that benefit from being pivoted long. names_prefix = "aic" strips the prefix so the surviving values are 1, 2, 3, which you then re-prefix with "m" for readability. arrange(aic, .by_group = TRUE) sorts within each cyl group so the leaderboard reads top-down by group.
Section 5. JSON-shaped data and deep records (2 problems)
Exercise 5.1: Lift API records into a flat tibble
Task: You receive three sales records from an API as a list of named lists, with fields order_id, customer, amount, and currency. Wrap them in a tibble column record, then use unnest_wider() to lift the fields out. Save the result to ex_5_1.
Expected result:
#> # A tibble: 3 x 4
#> order_id customer amount currency
#> <int> <chr> <dbl> <chr>
#> 1 1001 Anna 42.5 EUR
#> 2 1002 Bilal 180. PKR
#> 3 1003 Chen 95 TWD
Difficulty: Intermediate
Every field stored inside a record should be lifted up to become a column of its own.
Use unnest_wider() on the record column.
Click to reveal solution
Explanation: This is the canonical first step after parsing JSON: one outer row per record, all fields lifted to named columns. unnest_wider() inspects the names inside each list to decide the output columns. If some records had extra fields the others did not, those columns would appear with NA for the missing rows. For deeply nested fields, chain a second unnest_wider() or use hoist() (next exercise).
Exercise 5.2: Hoist deep fields out of a nested record
Task: The same API now returns a meta sub-list containing region and vat_rate inside each record. Use hoist() to extract just those two fields directly into top-level columns, leaving the rest of record intact in a residual list-column. Save the result to ex_5_2.
Expected result:
#> # A tibble: 3 x 3
#> region vat_rate record
#> <chr> <dbl> <list>
#> 1 EU 0.19 <named list [3]>
#> 2 SA 0.17 <named list [3]>
#> 3 APAC 0.05 <named list [3]>
Difficulty: Advanced
You want only two buried fields pulled up to the top while the rest of the record stays packed away.
Use hoist() on record with index paths like c("meta", "region") for each target column.
Click to reveal solution
Explanation: hoist() uses purrr-style index paths (here c("meta", "region")) to dive into a nested list and pluck specific fields into named columns. Unlike unnest_wider(), it does not flatten everything: untargeted fields remain inside the original list-column, which is ideal when records have dozens of fields and you only want two or three. The plucked elements are removed from the residual list, so subsequent hoists never see them twice.
Section 6. Reshaping rows in nests (2 problems)
Exercise 6.1: Collapse rows into vector list-columns with chop
Task: A reporting tibble lists every order line individually, but the audit dashboard wants one row per customer_id with order_amount and order_date collapsed into list-columns of vectors (not nested tibbles). Use chop() to do that, preserving order. Save the result to ex_6_1.
Expected result:
#> # A tibble: 3 x 3
#> customer_id order_amount order_date
#> <chr> <list> <list>
#> 1 A <dbl [3]> <chr [3]>
#> 2 B <dbl [2]> <chr [2]>
#> 3 C <dbl [1]> <chr [1]>
Difficulty: Intermediate
Collapse the repeating columns per customer into cells holding plain vectors rather than whole tables.
Use chop() on c(order_amount, order_date).
Click to reveal solution
Explanation: chop() is the lightweight sibling of nest(): it produces list-columns of atomic vectors instead of list-columns of tibbles, which is cheaper in memory and faster to round-trip. unchop() reverses it. Prefer chop() when each "nest" only needs to hold a handful of parallel vectors and you do not need a tibble per group; prefer nest() when you want the inner blob to behave like a self-contained data frame (for example, passed to lm()).
Exercise 6.2: Use nest_by to skip group_by plumbing
Task: Repeat the cyl-grouped lm fit from Exercise 3.2 using nest_by() instead of nest() plus mutate(map(...)). The result should be a rowwise tibble with columns cyl, data (nested tibble), and model (an lm fit). Save the result to ex_6_2.
Expected result:
#> # A tibble: 3 x 3
#> # Rowwise: cyl
#> cyl data model
#> <dbl> <list> <list>
#> 1 4 <tibble [11 x 10]> <lm>
#> 2 6 <tibble [7 x 10]> <lm>
#> 3 8 <tibble [14 x 10]> <lm>
Difficulty: Intermediate
Group and bundle in a single step that also makes every later step run one row at a time.
Use nest_by(cyl), then mutate(model = list(lm(mpg ~ wt, data = data))).
Click to reveal solution
Explanation: nest_by() returns a rowwise tibble in which the nested column is named data by default and each subsequent mutate() is evaluated one row at a time. That removes the explicit map() in the mutate (you just wrap the result in list(...) because each row produces one lm). It is a cleaner fit when every step you want to do downstream is per-row anyway. For more flexible workflows that mix scalar and rowwise steps, stick with nest() plus map().
What to do next
Now that nest/unnest and the many-models pattern are second nature, deepen your tidyr toolkit and apply it to wider modeling workflows:
- tidyr Pivot Exercises in R for
pivot_longerandpivot_widerdrills. - dplyr Exercises in R for
mutate,summarise, and join practice that powers list-column workflows. - Apply Family Exercises in R for the base-R analogue of map across list-columns.
- Linear Regression in R for the modeling background behind the many-models examples.
r-statistics.co · Verifiable credential · Public URL
This document certifies mastery of
tidyr Nest/Unnest Mastery
Every certificate has a public verification URL that proves the holder passed the assessment. Anyone with the link can confirm the recipient and date.
91 learners have earned this certificate