tidymodels Exercises in R: 25 Real-World Practice Problems
Twenty-five graded practice problems on the tidymodels stack, framed as the kind of work a working data scientist actually performs: splitting data the way a risk team would, building recipes for marketing scoring, tuning regularization for credit fraud, evaluating a churn classifier with the metrics a stakeholder would ask for. Solutions are hidden under each block. Try first, then peek.
Section 1. Splitting and resampling with rsample (4 problems)
Exercise 1.1: Hold out a stratified test set from the diamonds inventory
Task: A jeweller wants to predict the price of stones in the diamonds dataset and asks for a clean 75/25 train/test split that preserves the distribution of cut quality in both halves. Use initial_split() with strata = cut, set the seed to 42, and save the resulting split object to ex_1_1.
Expected result:
#> <Training/Testing/Total>
#> <40455/13485/53940>
Difficulty: Beginner
A single random partition can leave rare quality grades absent from one half; you want each grade represented in both halves in roughly its original proportion.
Reach for initial_split() with prop = 0.75 and a strata argument set to the cut column.
Click to reveal solution
Explanation: initial_split() returns an rsplit object that simply records row indices, so memory stays cheap. The strata argument bins cut and samples within each bucket so rare levels (Fair, Good) appear in both train and test in roughly their original proportion. Without stratification a random split can leave the test set missing whole levels of an ordered factor, which then breaks predict() downstream.
Exercise 1.2: Build a 10-fold cross-validation plan for mtcars
Task: A junior analyst is benchmarking a regression model on mtcars and needs a reproducible 10-fold cross-validation plan to feed into fit_resamples(). Use vfold_cv() with v = 10, set the seed to 7, and save the resamples object to ex_1_2.
Expected result:
#> # 10-fold cross-validation
#> # A tibble: 10 x 2
#> splits id
#> <list> <chr>
#> 1 <split [28/4]> Fold01
#> 2 <split [29/3]> Fold02
#> 3 <split [29/3]> Fold03
#> ...
#> 10 <split [29/3]> Fold10
Difficulty: Beginner
You need a reusable resampling plan in which every row serves as a held-out assessment point exactly once.
Use vfold_cv() on mtcars with the v argument set to 10.
Click to reveal solution
Explanation: With only 32 rows the analysis set per fold is tiny, which is exactly when k-fold CV beats a single hold-out: each row is used as an assessment point exactly once, smoothing the variance of the performance estimate. Use repeats = 5 if you need even tighter variance bounds. Always set the seed before calling vfold_cv() because the row shuffle is random.
Exercise 1.3: Create a bootstrap resampling scheme stratified by Species
Task: A botanist is fitting a multinomial classifier on iris and wants 25 bootstrap resamples stratified by Species so the three classes stay balanced inside each analysis set. Use bootstraps() with times = 25 and strata = Species, set the seed to 99, and save the result to ex_1_3.
Expected result:
#> # Bootstrap sampling using stratification with apparent sample
#> # A tibble: 26 x 2
#> splits id
#> <list> <chr>
#> 1 <split [150/55]> Bootstrap01
#> 2 <split [150/57]> Bootstrap02
#> ...
Difficulty: Intermediate
You want repeated draws taken with replacement while keeping the three classes balanced inside each analysis set.
Call bootstraps() with times = 25 and a strata argument pointing at Species.
Click to reveal solution
Explanation: Bootstraps draw rows with replacement; the assessment set is the out-of-bag rows for that draw, so its size varies from sample to sample (hence the differing right-hand counts). Stratification ensures each species contributes roughly 50 sampled rows. Bootstraps tend to give optimistic performance estimates compared to k-fold CV because the analysis sets overlap heavily; use them for variance estimation or when sample size is small.
Exercise 1.4: Build a rolling-origin time series CV split for an airline series
Task: A demand planner is evaluating a forecasting model for monthly passengers and wants a rolling-origin resampling scheme over a synthetic monthly series of 144 observations. Use rolling_origin() with initial = 120, assess = 12, and cumulative = FALSE to produce a moving-window scheme, and save the resulting object to ex_1_4.
Expected result:
#> # Rolling origin forecast resampling
#> # A tibble: 13 x 2
#> splits id
#> <list> <chr>
#> 1 <split [120/12]> Slice01
#> 2 <split [120/12]> Slice02
#> ...
#> 13 <split [120/12]> Slice13
Difficulty: Advanced
Time ordering must be respected so that no future observation leaks backward into a training window.
Use rolling_origin() with initial = 120, assess = 12, and cumulative = FALSE for a fixed-width moving window.
Click to reveal solution
Explanation: Rolling-origin respects time ordering, which random CV would violate by leaking future observations into training folds. With cumulative = FALSE the analysis window is fixed-width (always 120 rows) and slides forward by one step per slice, mimicking how a production forecaster retrains weekly on the most recent year. Set cumulative = TRUE for an expanding window when older history is also informative.
Section 2. Preprocessing with recipes (5 problems)
Exercise 2.1: Define a baseline recipe that centres and scales numeric predictors
Task: A pricing analyst wants a clean preprocessing pipeline for predicting price on the diamonds data, with all numeric predictors centered and scaled so that downstream regularized models behave consistently. Build a recipe with step_center() and step_scale() on all_numeric_predictors() and save it to ex_2_1.
Expected result:
#> -- Recipe ----------------------------------------------------
#> Inputs:
#> role #variables
#> outcome 1
#> predictor 9
#> Operations:
#> Centering for all_numeric_predictors()
#> Scaling for all_numeric_predictors()
Difficulty: Beginner
Start by declaring which column is the outcome and which are predictors, then chain the two standardisation operations onto that plan.
Begin with recipe(price ~ ., data = diamonds) and add step_center() and step_scale() over all_numeric_predictors().
Click to reveal solution
Explanation: A recipe is a deferred plan: it remembers what to do but only computes means and standard deviations when prep() or a workflow fit() is called on training data. Using selectors like all_numeric_predictors() keeps the spec robust to schema changes; you do not have to rewrite the recipe if a new numeric column appears. Centering and scaling matter for penalized regression and distance-based learners.
Exercise 2.2: One-hot encode the cut, color, clarity factors with step_dummy
Task: The same pricing analyst now needs the three ordered factors (cut, color, clarity) turned into a dummy variable matrix for a glmnet pipeline that cannot consume factors directly. Extend the centred and scaled recipe by adding step_dummy(all_nominal_predictors(), one_hot = FALSE) and save the result to ex_2_2.
Expected result:
#> -- Recipe ----------------------------------------------------
#> Operations:
#> Centering for all_numeric_predictors()
#> Scaling for all_numeric_predictors()
#> Dummy variables from all_nominal_predictors()
Difficulty: Intermediate
Take the existing centred-and-scaled plan and append one more operation that turns the ordered factors into numeric indicator columns.
Add step_dummy() on all_nominal_predictors() with one_hot = FALSE after the centring and scaling steps.
Click to reveal solution
Explanation: With one_hot = FALSE recipes drops one reference level per factor, the standard treatment for unregularized linear models so the design matrix stays full-rank. Use one_hot = TRUE for tree-based models that do not need a reference level and for L1-penalised models when you want the penalty to choose which level to drop. Step order matters: step_dummy() must come AFTER step_other() if you want to lump rare levels first.
Exercise 2.3: Impute missing predictors with KNN before standardising
Task: A retail analyst has a sales tibble where revenue and units have scattered NA values that would otherwise drop rows in a linear model. Build a recipe that imputes all numeric predictors via step_impute_knn() with 5 neighbours and then standardises with step_normalize(), and save it to ex_2_3.
Expected result:
#> -- Recipe ----------------------------------------------------
#> Operations:
#> K-nearest neighbor imputation for all_numeric_predictors()
#> Centering and scaling for all_numeric_predictors()
Difficulty: Intermediate
Fill the missing cells before standardising, otherwise the scaler computes a centre on a column that still has holes.
Use step_impute_knn() with neighbors = 5, then step_normalize(), both over all_numeric_predictors().
Click to reveal solution
Explanation: step_impute_knn() fills missing cells using the average of the nearest non-missing neighbours in predictor space, which preserves the multivariate structure better than mean imputation. Always impute BEFORE normalising; otherwise the scaler computes a mean on a column with NAs and the imputed cells are scaled against an incorrect centre. step_normalize() is a one-step shortcut for step_center() plus step_scale().
Exercise 2.4: Pool rare factor levels with step_other then dummy encode
Task: A marketing analyst is modelling click-through on a campaign factor with 20 small levels, most of which appear in fewer than 1 percent of rows. Build a recipe that lumps any level below 5 percent frequency into a single "other" bucket via step_other(threshold = 0.05), then dummy-encodes the result, and save it to ex_2_4.
Expected result:
#> -- Recipe ----------------------------------------------------
#> Operations:
#> Collapsing factor levels for campaign
#> Dummy variables from campaign
Difficulty: Advanced
Collapse the thinly populated factor levels into a single bucket before expanding the factor into indicator columns.
Apply step_other() with threshold = 0.05 to campaign, then step_dummy() on the same column.
Click to reveal solution
Explanation: Rare levels carry almost no signal but inflate the design matrix and risk leaking labels when the test set sees a level never observed in training. step_other() solves both: anything below the threshold becomes the literal level "other". The threshold can be a proportion (between 0 and 1) or a raw count. Put it strictly before step_dummy(), since dummy expansion happens on the already-lumped factor.
Exercise 2.5: Build a recipe that creates interaction terms then PCA-compresses
Task: A risk analyst suspects pairwise interactions among disp, hp, and wt carry signal for predicting mpg on mtcars but does not want the inflated feature count to hurt the linear model. Build a recipe that creates the three pairwise interaction terms with step_interact() and then collapses everything via step_pca(num_comp = 3), and save it to ex_2_5.
Expected result:
#> -- Recipe ----------------------------------------------------
#> Operations:
#> Centering and scaling for all_numeric_predictors()
#> Interactions with disp:hp + disp:wt + hp:wt
#> PCA extraction with all_numeric_predictors()
Difficulty: Advanced
Standardise first so no single column dominates, then build the pairwise products, then compress the widened matrix into a few components.
Chain step_normalize(), step_interact(terms = ~ disp:hp + disp:wt + hp:wt), and step_pca(num_comp = 3).
Click to reveal solution
Explanation: Order matters: normalise first so the PCA loadings are not dominated by the column with the largest variance, then create interactions, then PCA the union. step_interact() takes a formula with the colon syntax, so disp:hp means "the product of disp and hp". PCA on the expanded matrix yields uncorrelated components that absorb most of the joint variance, which a downstream OLS or glmnet can consume without multicollinearity warnings.
Section 3. Model specification with parsnip (4 problems)
Exercise 3.1: Specify a linear regression with the lm engine
Task: A finance team wants a simple ordinary least squares regression as a baseline for predicting the mpg outcome on mtcars. Use parsnip to specify a linear_reg() model with engine lm, then save the spec to ex_3_1 so it can be slotted into a workflow later.
Expected result:
#> Linear Regression Model Specification (regression)
#>
#> Computational engine: lm
Difficulty: Beginner
Declare the kind of model you want as a standalone spec that holds no data, separately from the routine that will fit it.
Use linear_reg() piped into set_engine("lm").
Click to reveal solution
Explanation: parsnip separates WHAT model you want (a linear regression) from WHICH engine fits it (lm, glmnet, stan, keras, etc.). The model specification is itself a small object that holds no data: it gets fit() later inside a workflow or directly with fit(spec, formula, data). This decoupling is the whole point of parsnip: you can swap engines without rewriting the rest of the pipeline.
Exercise 3.2: Specify an elastic net glmnet regression with tunable penalty and mixture
Task: A credit risk modeller wants an elastic net regression on customer features and intends to tune both the penalty strength and the L1/L2 mixture later via grid search. Use linear_reg() with penalty = tune() and mixture = tune(), set the engine to glmnet, and save the spec to ex_3_2.
Expected result:
#> Linear Regression Model Specification (regression)
#>
#> Main Arguments:
#> penalty = tune()
#> mixture = tune()
#>
#> Computational engine: glmnet
Difficulty: Intermediate
Leave the regularization strength and the L1/L2 balance unspecified so a later search can fill them in.
Call linear_reg(penalty = tune(), mixture = tune()) and set_engine("glmnet").
Click to reveal solution
Explanation: Marking an argument with tune() is a placeholder: it tells parsnip that the value will be provided later by tune_grid() or tune_bayes(). With mixture = 0 glmnet behaves as pure ridge (L2), with mixture = 1 it is pure lasso (L1), and anything in between is elastic net. Tuning both at once explores the regularization geometry rather than committing to a flavour up front, which usually wins on noisy financial features.
Exercise 3.3: Specify a logistic classification spec with the glm engine
Task: A fraud team is building a binary classifier for whether a transaction will be charged back and wants a no-frills logistic regression as the baseline. Use parsnip to specify logistic_reg() with engine glm and mode classification, and save it to ex_3_3.
Expected result:
#> Logistic Regression Model Specification (classification)
#>
#> Computational engine: glm
Difficulty: Intermediate
You want a baseline binary classifier defined as a spec, with both its task and its fitting routine stated explicitly.
Pipe logistic_reg() into set_engine("glm") and set_mode("classification").
Click to reveal solution
Explanation: logistic_reg() is already a classification-only spec so set_mode("classification") is technically redundant, but writing it explicitly makes the intent obvious in code reviews. The glm engine fits maximum likelihood; switch to glmnet if you need regularization or to stan_glm (via rstanarm) for Bayesian posterior intervals. parsnip will silently route predict(type = "prob") to the correct underlying call.
Exercise 3.4: Specify a random forest classifier with ranger and tunable mtry plus min_n
Task: A churn analyst is choosing a random forest as the strong baseline for predicting customer attrition and wants to tune both mtry (predictors sampled per split) and min_n (node-size floor) via cross-validation. Specify rand_forest() with mtry = tune(), min_n = tune(), trees = 500, engine ranger, mode classification, and save the spec to ex_3_4.
Expected result:
#> Random Forest Model Specification (classification)
#>
#> Main Arguments:
#> mtry = tune()
#> trees = 500
#> min_n = tune()
#>
#> Computational engine: ranger
Difficulty: Advanced
Fix the ensemble size but leave the per-split sampling and the node-size floor open for a cross-validated search.
Use rand_forest(mtry = tune(), min_n = tune(), trees = 500) with set_engine("ranger") and set_mode("classification").
Click to reveal solution
Explanation: ranger is the production-grade engine: it is multi-threaded, handles factors directly, and is roughly an order of magnitude faster than the original randomForest package on wide data. Keep trees fixed (500 is usually plenty for accuracy stability) and tune mtry and min_n, which control bias/variance trade-off and tree depth. Setting importance = "impurity" on the engine call would let you read feature importance after fitting.
Section 4. Combining recipe and model into a workflow (4 problems)
Exercise 4.1: Bind a recipe and a linear model into a single workflow
Task: A reporting analyst wants one object that carries both the preprocessing recipe and the OLS model spec so the data prep and the fit cannot drift apart in code review. Combine the recipe ex_2_1 (normalisation only) and the spec ex_3_1 (lm) into a workflow object using workflow(), add_recipe(), and add_model(), and save it to ex_4_1.
Expected result:
#> == Workflow ==================================================
#> Preprocessor: Recipe
#> Model: linear_reg()
#>
#> -- Preprocessor ----------------------------------------------
#> 2 Recipe Steps
#> - step_center()
#> - step_scale()
#> -- Model -----------------------------------------------------
#> Linear Regression Model Specification (regression)
#> Computational engine: lm
Difficulty: Intermediate
Bundle the preprocessing plan and the model spec into one container so they cannot drift apart in review.
Start with workflow(), then add_recipe(ex_2_1) and add_model(ex_3_1).
Click to reveal solution
Explanation: A workflow is the single object you pass to fit(), predict(), last_fit(), and tune_grid(). Bundling recipe and spec prevents the classic mistake of preprocessing train and test differently: when you call fit(workflow, train), the recipe is prep()d on train ONLY, and predict(workflow, new_data = test) reuses those frozen statistics. A workflow without a preprocessor is allowed; add a formula instead with add_formula().
Exercise 4.2: Fit a workflow on training data and pull out the model parameters
Task: The same reporting analyst now wants to fit ex_4_1 on the training portion of an 80/20 split of mtcars and inspect the fitted coefficients with extract_fit_parsnip() followed by tidy(). Save the tidied coefficient tibble to ex_4_2.
Expected result:
#> # A tibble: 11 x 5
#> term estimate std.error statistic p.value
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 (Intercept) 19.7 0.452 43.6 3.20e-19
#> 2 cyl -0.523 1.07 -0.49 6.31e- 1
#> 3 disp 1.51 1.85 0.81 4.26e- 1
#> ...
Difficulty: Intermediate
Fit on the training rows only, then drill into the bundle to reach the inner fitted model and turn its coefficients into a tidy table.
Call fit() on training(mt_split), then pipe the result through extract_fit_parsnip() and tidy().
Click to reveal solution
Explanation: extract_fit_parsnip() digs through the workflow wrapper and returns the inner parsnip fit object; tidy() then converts the engine-specific coefficient table into a uniform tibble. Use extract_fit_engine() instead if you need the raw lm object for diagnostics. Always tidy the result before plotting or exporting: the column names (estimate, std.error) are guaranteed by broom regardless of engine.
Exercise 4.3: Compare two workflows with a workflow_set
Task: A model-comparison reviewer is benchmarking two classifiers (logistic regression and a 500-tree random forest) on a binarised version of mtcars where am is the target. Build a workflow_set() that pairs the same simple normalised recipe with both model specs, and save the workflowset to ex_4_3.
Expected result:
#> # A workflow set/tibble: 2 x 4
#> wflow_id info option result
#> <chr> <list> <list> <list>
#> 1 norm_logistic <tibble [1 x 4]> <opts[0]> <list [0]>
#> 2 norm_random_forest <tibble [1 x 4]> <opts[0]> <list [0]>
Difficulty: Intermediate
You want every pairing of one preprocessor with several models laid out as rows so they can be compared in one pass.
Use workflow_set() with a preproc list holding the recipe and a models list holding both specs.
Click to reveal solution
Explanation: workflow_set() takes lists of preprocessors and models and produces every combination as rows of a tibble. You then call workflow_map("fit_resamples", resamples = folds) on the whole set to evaluate them in one call, and rank_results() to sort the leaderboard. This is the right tool when you want to compare three preprocessing variants against four model families: twelve workflows in three lines of code.
Exercise 4.4: Last-fit a workflow on the full training split and evaluate on test
Task: A pricing modeller wants the standard tidymodels closing move on a workflow: fit on the training half of an 80/20 split of diamonds (small sample of 2000 rows for speed) and immediately score on the held-out test rows. Use last_fit() on a normalised lm workflow against ex_1_1_small and save the resulting tibble to ex_4_4.
Expected result:
#> # Resampling results
#> # Manual resampling
#> # A tibble: 1 x 6
#> splits id .metrics .notes .predictions .workflow
#> <list> <chr> <list> <list> <list> <list>
#> 1 <split [1500/500]> train/test <tibble [2 x 4]> <tibble [0 x 3]> <tibble [500 x 4]> <workflow>
Difficulty: Advanced
The closing move refits on the training half and scores the held-out half in a single call against the original split object.
Call last_fit(), passing the workflow and split = ex_1_1_small.
Click to reveal solution
Explanation: last_fit() is the bridge between training and the final test-set score. It refits the workflow on the training set, generates predictions on the test set, and returns one tibble row holding the metrics, the predictions, and the fitted workflow. Pull pieces with collect_metrics(), collect_predictions(), or extract_workflow(). Use it once and only once per project: the test set is sacred and not for iteration.
Section 5. Hyperparameter tuning with tune and dials (4 problems)
Exercise 5.1: Build a regular grid of penalty and mixture for an elastic net
Task: A credit risk analyst is preparing to tune the elastic net from exercise 3.2 and wants a five-by-five grid over penalty (log-spaced) and mixture (linearly spaced from 0 to 1). Use grid_regular() with penalty() and mixture() and levels = 5, and save the grid tibble to ex_5_1.
Expected result:
#> # A tibble: 25 x 2
#> penalty mixture
#> <dbl> <dbl>
#> 1 0.0000000001 0
#> 2 0.0000000316 0
#> 3 0.0000100000 0
#> ...
#> 25 0.1 1
Difficulty: Intermediate
You need an evenly spaced Cartesian set of candidate values across the two regularization controls.
Use grid_regular() with penalty() and mixture() and levels = 5.
Click to reveal solution
Explanation: grid_regular() builds a Cartesian product across the named parameters. The dials package defines sensible default ranges per parameter: penalty() defaults to 1e-10 to 1 on a log10 scale, and mixture() to 0 to 1 linearly. Override with penalty(range = c(-4, 0)) if you want a tighter window. For higher-dimensional spaces use grid_latin_hypercube() or grid_space_filling(), which scale far better than dense grids.
Exercise 5.2: Tune the elastic net via cross-validation and collect the metric tibble
Task: The same analyst now runs the tuning loop: combine the recipe ex_2_2, the spec ex_3_2, the cv folds ex_1_2 (10-fold), and the grid ex_5_1. Call tune_grid() on the workflow with metrics RMSE and MAE, then collect_metrics() and save the averaged metrics tibble to ex_5_2.
Expected result:
#> # A tibble: 50 x 8
#> penalty mixture .metric .estimator mean n std_err .config
#> <dbl> <dbl> <chr> <chr> <dbl> <int> <dbl> <chr>
#> 1 1e-10 0 mae standard 900 10 35.2 Preprocessor1_Model01
#> 2 1e-10 0 rmse standard 1400 10 62.7 Preprocessor1_Model01
#> ...
Difficulty: Intermediate
Fit the workflow once per fold-and-candidate combination, then average the scores across folds into one row per setting.
Call tune_grid() with resamples, grid = ex_5_1, and metrics = metric_set(rmse, mae), then pass the result to collect_metrics().
Click to reveal solution
Explanation: tune_grid() fits the workflow once per (fold x grid point) combination, here 5 x 25 = 125 fits, and returns a nested tibble. collect_metrics() flattens that into one row per (grid point x metric), averaging across folds and reporting the standard error so you can spot which configurations are merely lucky rather than genuinely best. Pass control = control_grid(save_pred = TRUE) if you also want the raw fold predictions for ROC plotting.
Exercise 5.3: Identify the best penalty and finalize the workflow
Task: Continue from the tuned object: select the best combination of penalty and mixture from tuned using the RMSE metric, then finalize the workflow with those values so it can be last_fit(). Save the finalized workflow to ex_5_3.
Expected result:
#> == Workflow ==================================================
#> Preprocessor: Recipe
#> Model: linear_reg()
#>
#> -- Model -----------------------------------------------------
#> Linear Regression Model Specification (regression)
#>
#> Main Arguments:
#> penalty = 0.001
#> mixture = 0.5
#>
#> Computational engine: glmnet
Difficulty: Advanced
Pick the winning hyperparameter row by the chosen metric, then stamp those values into the workflow's open placeholders.
The output of select_best() (with metric = "rmse") feeds into finalize_workflow().
Click to reveal solution
Explanation: select_best() returns a one-row tibble with the winning hyperparameter combination by the named metric. finalize_workflow() substitutes those values into every tune() placeholder in the workflow, producing a workflow object that no longer contains any unresolved parameters. From here you call last_fit(ex_5_3, split = your_initial_split) to score the test set, or fit(ex_5_3, data = full_train) for production. Use select_by_one_std_err() for a more conservative pick.
Exercise 5.4: Run a Bayesian search instead of a fixed grid
Task: A senior modeller observes that the elastic net grid is wasteful when the optimum sits in a small region, and wants a Bayesian search to home in on it efficiently. Run tune_bayes() on the same wf_enet and 5-fold resamples, starting from 5 initial points and iterating 10 times to minimise RMSE, and save the result to ex_5_4.
Expected result:
#> # Tuning results
#> # 5-fold cross-validation
#> # A tibble: 15 x 5
#> splits id .metrics .notes .iter
#> <list> <chr> <list> <list> <int>
#> 1 <split> Fold1 <tibble [2 x 6]> <tibble [0 x 3]> 0
#> ...
Difficulty: Advanced
Instead of an exhaustive grid, let a surrogate model propose where to sample next, starting from a few seed points.
Use tune_bayes() with initial = 5, iter = 10, and metrics = metric_set(rmse).
Click to reveal solution
Explanation: tune_bayes() fits a Gaussian process surrogate over the parameter space using the initial points, then proposes the next configuration where the GP expects the largest improvement in the target metric. The no_improve argument is an early-stopping rule: if 5 successive iterations fail to beat the running best, the search halts. For 2-3 dimensional spaces grid search still wins on simplicity, but for 5+ dimensions Bayesian search reaches similar accuracy with 10x fewer fits.
Section 6. Evaluation and finalization with yardstick (4 problems)
Exercise 6.1: Compute RMSE and R-squared on a regression prediction tibble
Task: A pricing modeller has a prediction tibble with a .pred column and a truth column price after scoring a regression model and wants a single tibble with both RMSE and R-squared via a shared metric set. Use metric_set(rmse, rsq) and call it on preds with truth = price and estimate = .pred, then save the resulting metric tibble to ex_6_1.
Expected result:
#> # A tibble: 2 x 3
#> .metric .estimator .estimate
#> <chr> <chr> <dbl>
#> 1 rmse standard 1187.
#> 2 rsq standard 0.87
Difficulty: Intermediate
Bundle the two regression scores into one reusable scorer, then apply that scorer to the prediction tibble.
Build metric_set(rmse, rsq) and call the resulting function with truth = price and estimate = .pred.
Click to reveal solution
Explanation: metric_set() is a closure factory: it returns a function that accepts a data frame plus truth and estimate columns and returns the metrics as a tidy tibble. Bundling metrics like this is far cleaner than calling rmse(), rsq(), and mae() separately and bind_rows-ing them. The .estimator column distinguishes binary, micro, macro, etc.; for regression it is always standard.
Exercise 6.2: Build a classification metric set with accuracy, sensitivity, and roc_auc
Task: A churn classifier reviewer needs a single metric set that captures accuracy, sensitivity, and ROC AUC so the same call works inside tune_grid() and on a final test set. Build the metric set with metric_set(accuracy, sensitivity, roc_auc) and apply it to cls_preds with truth = churn, estimate = .pred_class, and the probability column .pred_yes. Save the metric tibble to ex_6_2.
Expected result:
#> # A tibble: 3 x 3
#> .metric .estimator .estimate
#> <chr> <chr> <dbl>
#> 1 accuracy binary 0.81
#> 2 sensitivity binary 0.72
#> 3 roc_auc binary 0.88
Difficulty: Intermediate
One scorer should cover both the hard-class metrics and the probability-based one so the same call works everywhere.
Build metric_set(accuracy, sensitivity, roc_auc) and pass truth = churn, estimate = .pred_class, and the .pred_yes column.
Click to reveal solution
Explanation: yardstick separates class-prediction metrics (need estimate = .pred_class) from probability-based metrics (need the class-prob column, here .pred_yes). A single metric_set() accepts both: pass the class column as estimate and any probability columns as additional unnamed arguments. The event_level argument tells sensitivity which factor level is the positive class; without it yardstick defaults to the first factor level, which is often wrong for "yes/no" outcomes.
Exercise 6.3: Plot a calibration-style ROC curve from prediction probabilities
Task: A senior reviewer wants a visual diagnostic alongside the AUC number: the ROC curve for the churn predictions in cls_preds. Build the curve tibble with roc_curve() and pipe it into autoplot() to render a ggplot object, then save that plot to ex_6_3.
Expected result:
#> # A ggplot object showing the ROC curve, x = 1 - specificity, y = sensitivity,
#> # diagonal reference line indicating chance performance.
Difficulty: Advanced
First build the threshold-by-threshold sensitivity and specificity table, then hand it to the generic plotting method.
Pipe roc_curve() (with truth = churn and the .pred_yes column) into autoplot().
Click to reveal solution
Explanation: roc_curve() returns a tibble of thresholds, sensitivity, and specificity at each cut point; autoplot.roc_df() is the yardstick-specific method that renders it with the chance diagonal. Pair it with pr_curve() for the precision-recall view, which is far more informative on imbalanced targets (here the 30/70 churn split is borderline). For a multiclass ROC, pass several prob columns and yardstick produces one curve per class facet.
Exercise 6.4: Confusion matrix and Cohen's kappa from a multiclass classifier
Task: A botanist evaluating the Species classifier on a sampled iris prediction tibble wants the 3x3 confusion matrix and Cohen's kappa together so she can report both raw counts and the agreement-adjusted score. Use conf_mat() to build the matrix object, summarise with summary(), filter to the kappa row, and save the resulting tibble to ex_6_4.
Expected result:
#> # A tibble: 1 x 3
#> .metric .estimator .estimate
#> <chr> <chr> <dbl>
#> 1 kap multiclass 0.92
Difficulty: Advanced
Build the cross-tabulation object, expand it into its full panel of metrics, and keep only the chance-adjusted agreement row.
Use conf_mat(), then summary(), then filter() to the .metric == "kap" row.
Click to reveal solution
Explanation: Cohen's kappa adjusts raw accuracy for the proportion of agreement expected by chance, which matters when class frequencies are uneven. A kappa above 0.8 is conventionally "almost perfect"; near zero means the classifier is no better than guessing the marginal proportions. summary(conf_mat) returns roughly 13 metrics in one tibble (accuracy, kappa, sensitivity, specificity, ppv, npv, mcc, f1, and more), which is the fastest way to populate a stakeholder report card.
What to do next
Pair these exercises with the related material on r-statistics.co:
- Machine Learning Exercises in R for the broader ML-in-R problem set across packages.
- caret Exercises in R if you want to compare the tidymodels stack against the older caret API on the same problem.
- Cross-Validation Exercises in R for deeper rsample drills focused only on resampling.
- Random Forest Exercises in R and XGBoost Exercises in R for engine-specific tuning practice you can plug into a tidymodels workflow.
Further Reading
- parsnip boost_tree() in R: Define Gradient Boosting Models
- parsnip decision_tree() in R: Build Tree-Based Models
- parsnip naive_Bayes() in R: Build a Naive Bayes Classifier
- parsnip nearest_neighbor() in R: Specify a KNN Model
- parsnip rand_forest() in R: Specify Random Forest Models
- parsnip bag_tree() in R: Build Bagged Tree Models
- parsnip bart() in R: Bayesian Additive Regression Trees
- parsnip cubist_rules() in R: Rule-Based Regression Models
- parsnip discrim_flexible() in R: Fit FDA Models
- parsnip discrim_linear() in R: Build an LDA Classifier
- parsnip discrim_quad() in R: Fit QDA Models
- parsnip discrim_regularized() in R: Fit RDA Models
- parsnip fit() in R: Train a Model Specification
- parsnip mars() in R: Adaptive Regression Splines
- parsnip mlp() in R: Single-Layer Neural Network Spec
- parsnip augment() in R: Add Predictions to Data
- parsnip extract_fit_engine() in R: Get the Engine Model
- parsnip fit_xy() in R: Train Models With X/Y Matrices
- parsnip glance() in R: One-Row Model Summary
- parsnip predict() in R: Score New Data With a Fit
- parsnip set_args() in R: Update Model Arguments
- parsnip set_engine() in R: Choose the Model Engine
- parsnip set_mode() in R: Set Regression or Classification
- parsnip tidy() in R: Tidy Model Coefficients
- parsnip translate() in R: Inspect a Model's Engine Code
- parsnip bag_mars() in R: Build Bagged MARS Models
- parsnip extract_parameter_dials() in R: Get Tuning Ranges
- parsnip pls() in R: Partial Least Squares Models
- parsnip proportional_hazards() in R: Cox Survival Models
- pull_workflow_fit() in R: Extract a Model From a Workflow
- parsnip required_pkgs() in R: Find a Model's Packages
- parsnip show_engines() in R: List Engines for a Model
- parsnip survival_reg() in R: Parametric Survival Models
- recipes recipe() in R: Build a Preprocessing Blueprint
- recipes step_BoxCox() in R: Normalize Skewed Predictors
- recipes step_YeoJohnson() in R: Transform Skewed Data
- rsample initial_split() in R: Make a Train/Test Split
- rsample bootstraps() in R: Bootstrap Resampling Splits
- rsample loo_cv() in R: Leave-One-Out Cross-Validation
- rsample mc_cv() in R: Monte Carlo Cross-Validation Splits
- rsample testing() in R: Get the Test Set From a Split
- rsample training() in R: Get the Training Set From a Split
- rsample vfold_cv() in R: V-Fold Cross-Validation Splits
- rsample group_initial_split() in R: Group-Safe Splits
- rsample group_vfold_cv() in R: Group-Aware CV Splits
- rsample nested_cv() in R: Nested Cross-Validation Splits
- rsample permutations() in R: Permutation Test Resamples
- rsample rolling_origin() in R: Time-Series Resampling
- rsample sliding_period() in R: Calendar-Based Resamples
- rsample sliding_window() in R: Slider-Style Resamples
- rsample validation_split() in R: Train Validation Split
- yardstick accuracy() in R: Score Classification Models
- yardstick bal_accuracy() in R: Score Imbalanced Classifiers
- yardstick f_meas() in R: Score the F-Measure of Classifiers
- yardstick kap() in R: Cohen's Kappa for Classification
- yardstick mcc() in R: Matthews Correlation Coefficient
- yardstick npv() in R: Negative Predictive Value Explained
- yardstick ppv() in R: Prevalence-Adjusted Predictive Value
- yardstick precision() in R: Score Positive Predictive Value
- yardstick recall() in R: Score True Positive Rate
- yardstick roc_auc() in R: Score Classifier Ranking Quality
- yardstick sens() in R: Score Diagnostic Sensitivity
- yardstick spec() in R: Score True Negative Rate
- yardstick mae() in R: Outlier-Robust Regression Scoring
- yardstick mape() in R: Scale-Free Percentage Error
- yardstick mase() in R: Scale-Free Forecast Error
- yardstick mn_log_loss() in R: Cross-Entropy Loss Metric
- yardstick pr_auc() in R: Score Imbalanced Classifier Ranking
- yardstick pr_curve() in R: Precision-Recall Curve Points
- yardstick rmse() in R: Score Regression Models
- yardstick roc_curve() in R: ROC Sweep Data for Plotting
- yardstick rsq() in R: Score Regression R-Squared
- yardstick rsq_trad() in R: Traditional R-Squared Score
- dials grid_random() in R: Random Hyperparameter Grids
- dials grid_regular() in R: Build Regular Tuning Grids
- tune tune_bayes() in R: Bayesian Hyperparameter Search
- tune tune_grid() in R: Hyperparameter Search on Resamples
- tune tune_race_anova() in R: ANOVA Hyperparameter Racing
- tune tune_race_win_loss() in R: Win-Loss Racing
- tune tune_sim_anneal() in R: Simulated Annealing Tuning
- yardstick ccc() in R: Concordance With Bias Correction
- yardstick huber_loss() in R: Robust Regression Loss
- yardstick smape() in R: Symmetric Percentage Error Metric
- tune collect_metrics() in R: Extract Tuning Metrics
- tune collect_predictions() in R: Inspect Tuning Predictions
- tune control_grid() in R: Configure Grid Search Behavior
- dials grid_latin_hypercube() in R: Space-Filling Tuning Grid
- dials grid_max_entropy() in R: Maximally Spread Tuning Grid
- tune finalize_workflow() in R: Lock In Best Hyperparameters
- tune select_best() in R: Pick Top Hyperparameter Set
- tune show_best() in R: Inspect Top Tuning Results
- workflows add_formula() in R: Attach a Formula to a Workflow
- workflows add_model() in R: Attach a Parsnip Model Spec
- workflows add_recipe() in R: Attach a Preprocessing Recipe
- workflows add_variables() in R: Bare Columns, No Formula
- workflows extract_recipe() in R: Pull the Trained Recipe Out
- workflows extract_spec_parsnip() in R: Pull the Parsnip Spec
- workflows update_model() in R: Swap a Workflow's Model Spec
- workflows update_recipe() in R: Swap a Workflow's Recipe
- workflows workflow() in R: Bundle Preprocessor and Model
- workflowsets workflow_set() in R: Compare Models at Once
- dials finalize() in R: Set Tuning Parameter Ranges From Data
- dials learn_rate() in R: Tune Boosting Learning Rate
- dials min_n() in R: Tune Minimum Node Size for Tree Models
- dials mtry() in R: Tune Random Forest Variable Sampling
- dials tree_depth() in R: Tune Decision Tree Depth
- dials trees() in R: Tune Ensemble Size in tidymodels
- dials update() Params in R: Set Custom Tuning Ranges
- workflowsets as_workflow_set() in R: Wrap Workflow Lists
- workflowsets rank_results() in R: Rank Tuned Workflows
- workflowsets workflow_map() in R: Fit All Workflows at Once
r-statistics.co · Verifiable credential · Public URL
This document certifies mastery of
tidymodels Mastery
Every certificate has a public verification URL that proves the holder passed the assessment. Anyone with the link can confirm the recipient and date.
541 learners have earned this certificate