parsnip linear_reg() in R: Fit Linear Regression Models
The parsnip linear_reg() function defines a linear regression model specification in tidymodels, ready to be fit with the lm, glmnet, stan, or keras engine. It gives you one consistent interface for ordinary least squares, penalized regression, and Bayesian linear models without rewriting the fitting call.
linear_reg() # default spec, lm engine linear_reg() |> set_engine("lm") # ordinary least squares linear_reg(penalty = 0.1) |> set_engine("glmnet") # ridge / lasso / elastic net linear_reg(penalty = 0.1, mixture = 1) |> set_engine("glmnet") # pure lasso linear_reg() |> set_engine("stan") # Bayesian linear model spec |> set_mode("regression") # only mode linear_reg allows fit(spec, mpg ~ ., data = mtcars) # train on a continuous outcome predict(fit, new_data) # expected value per row
Need explanation? Read on for examples and pitfalls.
What linear_reg() does
linear_reg() is a model specification, not a fitted model. It records your intent to build a linear regression and the hyperparameters you want, but no data touches it until you call fit(). This separation lets you reuse one specification across many datasets, formulas, or resampling folds.
Linear regression models a continuous numeric outcome as a linear combination of predictors. It assumes the conditional mean of the response is a straight-line function of the inputs, so each coefficient reports the additive effect of a one-unit change in that predictor.
The function belongs to the tidymodels framework and ships in core parsnip, so no extension package is needed. Because parsnip standardizes the interface, the same linear_reg() code runs on the base lm engine, the penalized glmnet engine, or the Bayesian stan engine with only one line changed.
fit() turns it into a trained model object. Keeping those two steps apart is what makes tidymodels workflows reproducible across resamples and easy to swap between engines.library(tidymodels) is enough. Unlike poisson_reg(), no extension package is required. The default engine is lm, and registered engines include glmnet, brulee, stan, keras, gee, and the mixed-model engines lmer and lme.linear_reg() syntax and arguments
linear_reg() takes two tuning arguments and two setup verbs. The arguments control regularization, while set_engine() and set_mode() finish the specification.
The penalty argument sets the total amount of regularization applied to coefficients, on the same scale as glmnet::glmnet()'s lambda. The mixture argument blends ridge (mixture = 0) and lasso (mixture = 1) penalties, with values in between giving an elastic net. The default lm engine ignores both arguments because it fits an unpenalized model.
The mode is always regression. A linear model predicts a continuous number, so set_mode("regression") is the only legal choice. You can pass the engine through set_engine() instead of the engine argument, which is the more common tidymodels style.
Fit a linear model: four examples
Every example below uses the built-in mtcars dataset. Its mpg column is the continuous outcome, and wt, hp, and cyl are the predictors, which makes it a familiar testbed for linear regression.
Example 1: Fit with the default lm engine
Build the specification, then fit it to data. The lm engine fits a standard ordinary least squares regression using stats::lm() underneath.
The fitted object reports one coefficient per predictor on the original outcome scale. The intercept is the expected mpg when weight, horsepower, and cylinders are all zero, which is an extrapolation; the slopes are what carry the interpretation.
Example 2: Predict expected mpg for new rows
predict() returns a tidy tibble with one row per input row. For a regression-mode model, the default prediction type gives the conditional mean of the outcome.
Each output column from a parsnip model starts with .pred, which keeps prediction columns from clashing with your original data when you bind them back together with bind_cols().
Example 3: Tidy and glance the fitted model
Use broom helpers through parsnip to pull coefficients and fit statistics. tidy() returns one row per coefficient, and glance() returns a one-row model summary.
Weight is the strongest predictor; each extra 1000 lb is associated with a 3.17 mpg drop, holding hp and cyl fixed. The model explains about 84 percent of the variance in mpg on this small sample.
Example 4: Fit a penalized model with glmnet
Switch to glmnet for regularized coefficients. The glmnet engine needs a non-NULL penalty, and mixture = 1 requests a pure lasso penalty that can shrink weak predictors to zero.
penalty = tune() in the specification, then pass it to tune_grid() with a resampling object like vfold_cv(). The framework searches a grid of penalty values and reports which one generalizes best on held-out folds.linear_reg() vs other regression models
Pick the model by the type of outcome you are predicting. linear_reg() handles continuous numeric outcomes; the alternatives below cover the other cases.
| Function | Outcome type | Default engine | Use when |
|---|---|---|---|
linear_reg() |
continuous numeric | lm | Price, mpg, test score |
poisson_reg() |
non-negative counts | glm | Calls, defects, visits |
logistic_reg() |
exactly 2 classes | glm | Yes/no, churn, spam |
multinom_reg() |
3+ unordered classes | nnet | Species, product category |
rand_forest() |
numeric or class | ranger | Non-linear effects, interactions |
Use linear_reg() when the outcome is continuous and you expect roughly linear, additive effects of the predictors. When relationships are strongly non-linear or interactions dominate, a tree-based model often fits better with less feature engineering.
Common pitfalls
Three mistakes catch most newcomers to linear_reg(). Each one below shows the problem and the fix.
The biggest is passing penalty to the wrong engine. The lm engine ignores regularization arguments entirely, so a specification like linear_reg(penalty = 0.1) |> set_engine("lm") silently fits an unpenalized OLS model. Switching the engine to glmnet is what actually applies the penalty.
A categorical outcome also trips people up. linear_reg() expects a numeric response, so a factor gives a misleading fit even when the call does not error; use logistic_reg() for two classes or multinom_reg() for more. Finally, forgetting to scale predictors before glmnet skews the penalty toward larger-scale variables; rely on step_normalize() or glmnet's standardize = TRUE default.
linear_reg() can still output negative .pred values. If that bites you, switch to poisson_reg() for counts or a log transformation of the response.Try it yourself
Try it: Fit a linear model on mtcars using only wt as the predictor, then predict mpg for the 15th row. Save the prediction to ex_pred.
Click to reveal solution
Explanation: The formula mpg ~ wt drops hp and cyl from the model. Row 15 is a Cadillac Fleetwood with a heavy 5.25-ton weight, so the prediction lands well below the dataset's mean mpg of 20.
Related parsnip functions
linear_reg() works alongside the rest of the parsnip model family. These functions cover the neighboring tasks in a tidymodels project.
logistic_reg()defines a two-class logistic regression model.poisson_reg()defines a Poisson regression model for count outcomes.multinom_reg()defines a multinomial model for three or more classes.set_engine()chooses the computational backend for any specification.fit()trains a specification on data and returns a model object.predict()generates predictions from a fitted parsnip model.
FAQ
What package is linear_reg() in?
linear_reg() ships in core parsnip, which loads automatically with library(tidymodels). No extension package is required, unlike poisson_reg() which lives in poissonreg. The default engine is stats::lm(), and parsnip also registers glmnet, stan, keras, brulee, gee, lmer, and lme for the more specialized cases.
What is the difference between linear_reg() and lm()?
lm() is the base R function that fits the model; linear_reg() is a tidymodels wrapper that defines a specification and dispatches to lm() (or another engine) when you call fit(). The wrapper gives one syntax that swaps between OLS, penalized, and Bayesian fits, and plays nicely with workflows, recipes, and tune.
How do I fit ridge or lasso regression with linear_reg()?
Use linear_reg(penalty = ..., mixture = ...) with set_engine("glmnet"). Set mixture = 0 for pure ridge, mixture = 1 for pure lasso, and a value in between for an elastic net. The penalty argument controls how much shrinkage is applied, and you can replace it with tune() to search over candidate values during resampling.
Can I tune the penalty in linear_reg()?
Yes, set penalty = tune() (and optionally mixture = tune()) in the specification and use the glmnet engine. Pass the specification to tune_grid() with a resampling object such as vfold_cv(), and the framework searches a grid of penalty values. Use select_best() to pick the value with the best metric, then finalize_workflow() to lock it in before the final fit.
Does linear_reg() handle multiple outcomes at once?
No. linear_reg() expects a single continuous response in the model formula. For multivariate outcomes, fit one linear_reg() per response, or step outside parsnip to base R's lm(cbind(y1, y2) ~ x, data = ...) and inspect the resulting mlm object directly.
For the full argument reference, see the parsnip linear_reg() docs.