R for Marketing Analytics Exercises: 20 Real-World Problems
Twenty marketing analytics practice problems in R that mirror real work the growth team, performance marketing, and CRM analysts ship every week: RFM segmentation, A/B test sizing, multi-touch attribution, cohort retention curves, churn modelling, CLV, and funnel diagnostics. Solutions are hidden behind a click; try the problem first.
Section 1. RFM segmentation and scoring (4 problems)
Exercise 1.1: Aggregate transactions into RFM per customer
Task: A CRM analyst at a DTC brand needs the canonical RFM table for the loyalty program review. Given a small transactions tibble of user_id, txn_date, and amount, compute Recency (days since last purchase relative to a reference date 2024-06-01), Frequency (count of transactions), and Monetary (total spend) per customer. Save the result to ex_1_1.
Expected result:
#> # A tibble: 4 x 4
#> user_id R F M
#> <chr> <int> <int> <dbl>
#> 1 c001 28 3 245
#> 2 c002 95 2 180
#> 3 c003 7 4 410
#> 4 c004 152 1 60
Difficulty: Intermediate
Collapse the log to one row per customer, then derive each metric from that customer's own purchase dates and amounts.
Group by user_id and summarise(): recency from max(txn_date) subtracted from ref, frequency from n(), monetary from sum(amount).
Click to reveal solution
Explanation: RFM is the workhorse segmentation feature of every CRM stack. Recency uses max(txn_date) (the most recent purchase) so the difference against a fixed reference date is well-defined for cohort comparisons; never use Sys.Date() here because the table becomes non-reproducible the next day. Frequency is n(), not n_distinct(txn_date), since two purchases on the same day still count as two events for engagement scoring.
Exercise 1.2: Bin customers into RFM tercile scores
Task: Building on the RFM table, the CRM team wants each customer scored 1 to 3 on each dimension so they can target the top tercile. Use ntile() to assign tercile scores: highest R-score for the most recent buyers (so reverse R), highest F-score for most frequent, highest M-score for highest spenders. Save the scored tibble to ex_1_2.
Expected result:
#> # A tibble: 4 x 7
#> user_id R F M R_score F_score M_score
#> <chr> <int> <int> <dbl> <int> <int> <int>
#> 1 c001 28 3 245 2 2 2
#> 2 c002 95 2 180 2 1 1
#> 3 c003 7 4 410 3 3 3
#> 4 c004 152 1 60 1 1 1
Difficulty: Advanced
Each dimension needs a 1-to-3 rank where the best customers land in bucket 3, and recency must be flipped because more days since purchase is worse.
Inside mutate(), use ntile() with 3 buckets per dimension; negate the recency column (-R) so recent buyers score highest.
Click to reveal solution
Explanation: ntile() distributes rows as evenly as possible into N buckets, which is exactly the right primitive for RFM scoring. The -R trick flips the sign so the smallest recency (most recent buyer) lands in the top tercile; this is the single most common mistake because higher recency in days actually means worse. With more customers a quintile (5-bucket) split is more standard than terciles, but the call site is identical.
Exercise 1.3: Build the canonical 3-digit RFM cell code
Task: The retention pod wants to bucket customers into the classic 3-digit cell (e.g. "333" for VIPs, "111" for at-risk lapsed). Concatenate R_score, F_score, M_score from the previous step into a single rfm_cell character column, then count customers per cell. Save the cell-count tibble to ex_1_3.
Expected result:
#> # A tibble: 3 x 2
#> rfm_cell n
#> <chr> <int>
#> 1 111 1
#> 2 211 1
#> 3 222 1
#> 4 333 1
Difficulty: Advanced
Glue the three single-digit scores into one string, then tally how many customers fall into each combination.
Build the code with paste0(R_score, F_score, M_score) in mutate(), then count() on the new column.
Click to reveal solution
Explanation: The 3-digit cell encoding (popularised by Hughes 2005) is the standard handoff format from analytics to marketing ops because each cell maps to a distinct campaign treatment. paste0() is preferred over paste(..., sep="") for performance and readability. With quintile scoring you get 125 cells; for activation campaigns, marketers usually collapse this further to ~10 macro-segments before sending.
Exercise 1.4: Flag the "Champions" and "At Risk" segments
Task: The lifecycle marketing team needs two named segments to feed into Braze: "Champions" (R_score = 3 AND F_score = 3 AND M_score = 3) and "At Risk" (R_score = 1 AND F_score >= 2). Add a segment column to the scored tibble using case_when() with these two named segments and "Other" for everyone else. Save to ex_1_4.
Expected result:
#> # A tibble: 4 x 8
#> user_id R F M R_score F_score M_score segment
#> <chr> <int> <int> <dbl> <int> <int> <int> <chr>
#> 1 c001 28 3 245 2 2 2 Other
#> 2 c002 95 2 180 2 1 1 Other
#> 3 c003 7 4 410 3 3 3 Champions
#> 4 c004 152 1 60 1 1 1 Other
Difficulty: Intermediate
Assign a label by testing the score combination, checking the most specific rule before the broader one.
Use case_when() in mutate() with the Champions test first, then the At Risk test, then a TRUE fallback to "Other".
Click to reveal solution
Explanation: case_when() scans top-down so put the most specific rule first; "At Risk" goes second because it's broader. The standard 11-segment RFM grid (Champions, Loyal, Potential Loyalist, New, Promising, Need Attention, About to Sleep, At Risk, Cannot Lose Them, Hibernating, Lost) is a superset of this pattern. For lifecycle email, only 3 to 5 segments usually drive distinct creative; more segments means thinner tests.
Section 2. A/B testing and lift analysis (4 problems)
Exercise 2.1: Conversion rate by variant from a raw assignment log
Task: The growth team just shipped a new pricing-page hero. The experimentation platform exported a per-user log with variant (A or B) and converted (0/1). Compute the count of users, count of conversions, and conversion rate per variant from the supplied tibble. Save to ex_2_1.
Expected result:
#> # A tibble: 2 x 4
#> variant users conversions conv_rate
#> <chr> <int> <int> <dbl>
#> 1 A 500 40 0.08
#> 2 B 500 60 0.12
Difficulty: Beginner
Collapse the per-user log to one row per variant, reporting both the group size and the conversion outcome.
Group by variant and summarise() with n(), sum(converted), and mean(converted) for the rate.
Click to reveal solution
Explanation: mean() on a 0/1 vector is the conversion rate, which is shorter and faster than sum/n(). Reporting users and conversions alongside rate is non-negotiable: any stakeholder review will ask "out of how many?" because rate alone hides whether the test had power. The exact numbers from rbinom() will differ; the point is the shape and direction of the result.
Exercise 2.2: Run a two-proportion test for A vs B
Task: Take the per-variant conversion counts from the previous exercise and run prop.test() to get a p-value and 95% confidence interval on the difference in conversion rates. Extract the p-value and the lower and upper CI bounds into a one-row tibble with columns p_value, ci_low, ci_high. Save to ex_2_2.
Expected result:
#> # A tibble: 1 x 3
#> p_value ci_low ci_high
#> <dbl> <dbl> <dbl>
#> 1 0.0454 -0.0721 -0.000896
Difficulty: Intermediate
Feed the two variants' conversion counts and sample sizes into a two-proportion test, then pull the fields you need off the result.
Call prop.test() with x = conversions, n = users, and correct = FALSE; read $p.value and $conf.int.
Click to reveal solution
Explanation: prop.test() with correct = FALSE matches the standard z-test most growth platforms use (Optimizely, VWO); the default Yates continuity correction is more conservative and slightly inflates the p-value. The CI is on the difference (A minus B), so a fully negative interval means B is significantly better. Always report the CI alongside the p-value: stakeholders care about effect-size range, not just whether p < 0.05.
Exercise 2.3: Compute relative lift and a bootstrap CI
Task: The PM wants the result framed as a percentage lift of B over A with a 95% bootstrap confidence interval, because relative lift travels better in a launch deck than absolute difference. Resample with replacement 1000 times from the per-variant assignment log, compute relative lift (rate_B / rate_A) - 1 per resample, then take the 2.5th and 97.5th percentiles. Save a one-row tibble with lift_pct, ci_low_pct, ci_high_pct (all in percent) to ex_2_3.
Expected result:
#> # A tibble: 1 x 3
#> lift_pct ci_low_pct ci_high_pct
#> <dbl> <dbl> <dbl>
#> 1 41.9 0.5 100.0
Difficulty: Advanced
Resample the log many times with replacement, compute the relative lift on each resample, and read the spread off that distribution.
Use replicate() over a slice_sample(prop = 1, replace = TRUE) draw, then take quantile() at 0.025 and 0.975.
Click to reveal solution
Explanation: Bootstrap CIs are the lingua franca of growth experimentation because they make zero distributional assumptions and naturally handle low-conversion regimes where normal approximations break. A wide upper bound (100% here) is a real signal that the test was underpowered: the point estimate of +41.9% lift is almost meaningless if the interval includes near-zero. Always communicate this to stakeholders before the headline number triggers a launch decision.
Exercise 2.4: Sample size needed for a 10% relative lift detection
Task: Before running the next test, the PM asks how many users per variant the team needs to detect a 10% relative lift on a 5% baseline conversion rate at 80% power and alpha 0.05. Use power.prop.test() to compute the required sample size per group. Save a one-row tibble with baseline, target, n_per_group to ex_2_4.
Expected result:
#> # A tibble: 1 x 3
#> baseline target n_per_group
#> <dbl> <dbl> <dbl>
#> 1 0.05 0.055 31198
Difficulty: Advanced
Work backwards from the baseline rate, the lift you want to detect, the power, and alpha to get the required group size.
Call power.prop.test() with p1 = 0.05, p2 = 0.055, power = 0.8, sig.level = 0.05, then ceiling() the returned $n.
Click to reveal solution
Explanation: power.prop.test() returns the per-group sample size, so total experiment volume is double that. Notice how punishing small relative lifts are: a 10% relative lift on a 5% baseline needs ~31k per arm, but a 10% relative lift on a 30% baseline needs only ~3.5k per arm because the variance of a Bernoulli is highest near p=0.5 in absolute terms but the relative variance shrinks as the mean grows. Always sanity-check against the platform's MDE calculator.
Section 3. Multi-touch attribution (3 problems)
Exercise 3.1: Last-touch attribution from an event log
Task: The performance marketing lead wants to see how revenue would be credited under last-touch attribution: 100% credit goes to the channel of the final touch before conversion. Given the supplied per-conversion event log with conversion_id, touch_order (1 = first), channel, and revenue, compute total credited revenue per channel using last-touch rules. Save to ex_3_1.
Expected result:
#> # A tibble: 3 x 2
#> channel revenue
#> <chr> <dbl>
#> 1 organic 120
#> 2 paid 200
#> 3 referral 150
Difficulty: Advanced
Keep only the final touch of each journey, then total revenue by the channel of that closing touch.
Group by conversion_id and slice_max(touch_order, n = 1), then regroup by channel and sum() the revenue.
Click to reveal solution
Explanation: slice_max(touch_order, n = 1) is the safest way to grab the last touch per conversion; using arrange(desc(touch_order)) |> slice(1) works but is slower on large data. Last-touch over-credits closing channels (often paid search and direct) and starves upper-funnel work; that's why every multi-touch model exists. Always present last-touch alongside first-touch and a fractional model so the team can see the full credit picture, never one alone.
Exercise 3.2: Linear (fractional) attribution across all touches
Task: Now compute linear attribution: each touch in a journey gets equal credit, so a 4-touch journey worth $80 assigns $20 to each channel touched. Compute revenue credited per channel using linear rules from the same events log. Save to ex_3_2.
Expected result:
#> # A tibble: 4 x 2
#> channel revenue
#> <chr> <dbl>
#> 1 email 20
#> 2 organic 230
#> 3 paid 210
#> 4 referral 110
Difficulty: Advanced
Split each journey's revenue equally across its touches before rolling the credit up by channel.
Within group_by(conversion_id), mutate() a per-touch credit of revenue / n(), then regroup by channel and sum.
Click to reveal solution
Explanation: Linear attribution is the simplest fractional model and the standard "balanced" comparator in Google Analytics. Each touch's share is revenue / journey_length, computed inside group_by(conversion_id) before regrouping by channel. It systematically over-credits common high-volume channels (email here gets credit despite appearing once) and under-credits decisive late-funnel pushes; pair it with position-based or data-driven models for a defensible budget reallocation.
Exercise 3.3: Time-decay attribution with exponential weights
Task: The performance team wants a time-decay model: more recent touches get more credit, with weight 0.5 ^ (steps_from_last). Within each conversion, the last touch has weight 1, second-to-last has weight 0.5, third-to-last has weight 0.25, and so on. Normalise weights per conversion so they sum to 1, then multiply by revenue. Aggregate credited revenue per channel and save to ex_3_3.
Expected result:
#> # A tibble: 4 x 2
#> channel revenue
#> <chr> <dbl>
#> 1 email 5.33
#> 2 organic 211.
#> 3 paid 234.
#> 4 referral 120.
Difficulty: Advanced
Weight each touch by how close it sits to the conversion, normalise those weights within a journey, then split revenue by them.
Inside group_by(conversion_id), compute 0.5 ^ (max(touch_order) - touch_order), divide by sum() of the raw weights, and multiply by revenue.
Click to reveal solution
Explanation: Time-decay is a sensible default when the typical journey spans days (think SaaS trials, considered purchases) because it credits both the closer and the brand-builders, just at different magnitudes. The 0.5 half-life is conventional but tunable; halving it weights the closer more aggressively. Normalising per conversion so weights sum to 1 keeps the channel sums equal to total revenue, which is essential for stakeholder reconciliation.
Section 4. Cohort retention and churn (4 problems)
Exercise 4.1: Tag each customer with their acquisition month
Task: A retention analyst needs every transaction tagged with the customer's acquisition month (the calendar month of their first ever purchase) so cohort tables can be built downstream. Add an acq_month column (truncated to first of month, as Date) to the supplied transactions tibble. Save to ex_4_1.
Expected result:
#> # A tibble: 9 x 4
#> user_id txn_date amount acq_month
#> <chr> <date> <dbl> <date>
#> 1 u1 2024-01-05 100 2024-01-01
#> 2 u1 2024-02-12 90 2024-01-01
#> 3 u1 2024-04-20 110 2024-01-01
#> 4 u2 2024-02-08 80 2024-02-01
#> 5 u2 2024-03-14 70 2024-02-01
#> 6 u3 2024-01-22 150 2024-01-01
#> 7 u3 2024-04-02 140 2024-01-01
#> 8 u4 2024-03-09 60 2024-03-01
#> 9 u4 2024-05-18 65 2024-03-01
Difficulty: Intermediate
Each customer's cohort comes from their own earliest purchase, truncated to the start of that month.
Group by user_id and mutate() an acq_month with floor_date(min(txn_date), "month").
Click to reveal solution
Explanation: The acquisition cohort is the customer's intrinsic "birthday" and must be derived from the user's own minimum transaction date, not from the row's transaction date. floor_date(..., "month") from lubridate is the safest way to truncate to month-start (avoid format() then as.Date() because it loses Date type). Pre-computing this column once is a standard CRM pattern that lets every downstream join key off it.
Exercise 4.2: Build a months-since-acquisition retention curve
Task: Using the per-transaction tibble with acq_month, compute the retention rate by months-since-acquisition: for each acquisition cohort, what fraction of original cohort customers transacted in month 0, month 1, month 2, etc. Return a long tibble with columns acq_month, months_since, retention. Filter to just the 2024-01-01 cohort and save to ex_4_2.
Expected result:
#> # A tibble: 4 x 3
#> acq_month months_since retention
#> <date> <dbl> <dbl>
#> 1 2024-01-01 0 1
#> 2 2024-01-01 1 0.5
#> 3 2024-01-01 2 0
#> 4 2024-01-01 3 1
Difficulty: Advanced
Fix the cohort size at month zero, then for each later month count the distinct customers still active and divide by that fixed size.
Derive months elapsed with interval(acq_month, txn_month) %/% months(1), distinct() users before counting, and left_join() the cohort sizes.
Click to reveal solution
Explanation: Cohort retention has two traps: (1) cohort denominator must be fixed at month 0 (the size never grows), and (2) you must distinct(user_id) before counting active users per month, otherwise heavy buyers get counted multiple times. interval() %/% months(1) is the cleanest way to express "calendar months between two dates" because plain subtraction returns a difftime in days and rounds inconsistently across month lengths.
Exercise 4.3: Flag 90-day churned customers
Task: Marketing ops wants a list of customers who have not transacted in 90 days as of the reference date 2024-06-01, so they can be enrolled in a win-back flow. Compute days-since-last-purchase per user_id and add a churned boolean (TRUE if days_since_last >= 90). Save the resulting per-user tibble (sorted by days_since_last descending) to ex_4_3.
Expected result:
#> # A tibble: 4 x 3
#> user_id days_since_last churned
#> <chr> <int> <lgl>
#> 1 u2 79 FALSE
#> 2 u4 14 FALSE
#> 3 u1 42 FALSE
#> 4 u3 60 FALSE
Difficulty: Advanced
Measure each customer's gap from their last purchase to a fixed reference date, then compare it against the 90-day cutoff.
summarise() days_since_last as ref_d - max(txn_date), then mutate() a churned flag with >= 90.
Click to reveal solution
Explanation: The 90-day threshold is industry standard for ecommerce; subscription products often use shorter windows (14 to 30 days) tied to the billing cycle. The key idea is computing the gap relative to a fixed ref_d rather than Sys.Date() so the table is reproducible; this matters when the win-back campaign is built today and shipped a week later. For B2B, the threshold usually scales with median repurchase interval.
Exercise 4.4: Logistic regression to predict 90-day churn
Task: The CRM team wants a quick churn-risk score from a small training set with recency (days since last purchase), frequency (lifetime orders), and tenure_days (days since signup) as predictors. Fit a logistic regression with glm(family = binomial) predicting the binary churned label. Extract the model's coefficient table as a tibble (column names: term, estimate, p_value) and save to ex_4_4.
Expected result:
#> # A tibble: 4 x 3
#> term estimate p_value
#> <chr> <dbl> <dbl>
#> 1 (Intercept) -3.20 0.40
#> 2 recency 0.097 0.06
#> 3 frequency -0.43 0.30
#> 4 tenure_days 0.0010 0.85
Difficulty: Advanced
Fit a binary-outcome model on the three predictors, then read the coefficient table straight off the fitted object.
Call glm() with family = binomial, then pull summary(fit)$coefficients into a tibble of term, estimate, and p-value.
Click to reveal solution
Explanation: Logistic regression remains the production default for first-pass churn scoring because coefficients are interpretable (a +1 unit of recency multiplies churn odds by exp(0.097) = 1.10), and the model is calibrated out of the box, unlike most tree ensembles. Frequency's negative coefficient correctly signals that engaged customers churn less. For production lift, gradient boosted trees beat logistic by 2 to 5% AUC but require calibration; logistic is still the cleanest baseline and the easiest to debug with stakeholders.
Section 5. Customer lifetime value and revenue concentration (3 problems)
Exercise 5.1: Historical CLV per customer
Task: The finance partner wants a simple historical CLV per customer for the loyalty cohort: average order value times order frequency times average gross margin (assume 30%). Compute aov, freq, and clv = aov * freq * 0.30 per user_id from the supplied transactions tibble. Save to ex_5_1.
Expected result:
#> # A tibble: 4 x 4
#> user_id aov freq clv
#> <chr> <dbl> <int> <dbl>
#> 1 u1 100 3 90
#> 2 u2 75 2 45
#> 3 u3 145 2 87
#> 4 u4 62.5 2 37.5
Difficulty: Beginner
For each customer, combine their typical order size, how often they order, and the margin assumption.
Group by user_id and summarise() with mean(amount), n(), and a product that multiplies by 0.30.
Click to reveal solution
Explanation: Historical CLV is the simplest CLV form, fine for early-stage benchmarking but blind to expected future purchases. The 30% gross margin is a placeholder and must come from finance; using gross revenue instead of margin overstates true value by 3 to 5x. For a more defensible estimate, move to predictive CLV (BTYD models like BG/NBD plus Gamma-Gamma) once you have 12+ months of data.
Exercise 5.2: Project predicted CLV with a discount rate
Task: Building on historical CLV, project a forward 12-month CLV assuming each customer continues at their historical monthly purchase rate, with a 1% monthly discount rate applied to future cash flows. Compute monthly_freq = freq / tenure_months (assume 5 tenure months for simplicity), then sum the discounted contribution monthly_freq * aov * 0.30 / (1 + 0.01)^t for t = 1:12. Save the per-user predicted CLV tibble as ex_5_2.
Expected result:
#> # A tibble: 4 x 4
#> user_id monthly_freq pred_clv_12mo aov
#> <chr> <dbl> <dbl> <dbl>
#> 1 u1 0.6 201. 100
#> 2 u2 0.4 100. 75
#> 3 u3 0.4 194. 145
#> 4 u4 0.4 83.7 62.5
Difficulty: Intermediate
Project each customer's monthly purchase rate forward a year, shrinking each future month's contribution before summing.
Compute monthly_freq as freq / 5, then sum monthly_freq * aov * 0.30 / (1 + 0.01)^t over t = 1:12, e.g. via mapply().
Click to reveal solution
Explanation: Discounting future cash flows is mandatory in finance partner conversations because a dollar booked next year is worth less than a dollar today. The 1% monthly rate is roughly 12.7% annual, which matches typical SaaS WACC. mapply() lets the discount function take per-row inputs; for production-scale data, vectorise the whole formula directly with purrr::pmap_dbl() or pre-compute the discount factor sum and multiply once.
Exercise 5.3: Pareto check: top 20% of customers driving X% of revenue
Task: The CFO wants a one-line stat for the board deck: what percent of total revenue comes from the top 20% of customers ranked by spend. From a supplied per-customer revenue tibble, sort by revenue descending, take the top 20% of customers (by count), and compute their share of total revenue. Save a one-row tibble with top_n_users, top_revenue, total_revenue, share_pct to ex_5_3.
Expected result:
#> # A tibble: 1 x 4
#> top_n_users top_revenue total_revenue share_pct
#> <int> <dbl> <dbl> <dbl>
#> 1 20 4830 12450 38.8
Difficulty: Advanced
Rank customers by spend, isolate the top fifth by headcount, and compare their revenue against the whole base.
arrange(desc(revenue)), take the top ceiling(nrow() * 0.20) rows with slice_head(), then divide summed top revenue by total.
Click to reveal solution
Explanation: The 80/20 Pareto rule is folk wisdom rather than a constant; in practice you usually see top 20% driving anywhere from 50% to 80% of revenue. A share well below 50% suggests the customer base is unusually flat (commoditised retail), while above 80% concentration is a serious risk to flag for the CRO. Always pair this with a top-1% concentration stat: enterprise SaaS often has 1% of accounts driving 30%+ of revenue, which is not visible in the 20/80 split alone.
Section 6. Funnel analysis and campaign rollups (2 problems)
Exercise 6.1: Funnel drop-off rates between stages
Task: A growth analyst wants the classic funnel summary for a new pricing-page experiment: counts and step-over-step conversion rate between Visit, Signup, Activate, Pay. Given the supplied stage-count tibble (already aggregated), compute conversion_from_prev (this stage / previous stage) and overall_from_top (this stage / Visit). Save to ex_6_1.
Expected result:
#> # A tibble: 4 x 4
#> stage users conversion_from_prev overall_from_top
#> <chr> <int> <dbl> <dbl>
#> 1 Visit 10000 NA 1
#> 2 Signup 2400 0.24 0.24
#> 3 Activate 1080 0.45 0.108
#> 4 Pay 250 0.231 0.025
Difficulty: Intermediate
Each stage's rate depends on the row directly above it, while the overall rate depends on the very first row.
mutate() with lag(users) for the step-over-step ratio and first(users) for the overall-from-top ratio.
Click to reveal solution
Explanation: lag() gives the previous row's value, which is the canonical funnel-step ratio computation. The two ratios serve different audiences: conversion_from_prev is what PMs use to pinpoint the worst step, while overall_from_top is what finance and CEOs care about because it ties to total cohort yield. The biggest analytical mistake is comparing funnel steps without normalising for the time window: a Visit on day 1 and a Pay on day 30 measure different cohorts of users.
Exercise 6.2: Top campaigns ranked by conversions
Task: The paid-media manager exported a campaign-level summary with campaign, clicks, conversions, spend and wants the top 3 campaigns by conversions plus the cost-per-acquisition (spend / conversions) for each. Sort descending by conversions, take the top 3, and add the cpa column. Save to ex_6_2.
Expected result:
#> # A tibble: 3 x 5
#> campaign clicks conversions spend cpa
#> <chr> <int> <int> <dbl> <dbl>
#> 1 brand_us 8500 420 3200 7.62
#> 2 retgt_eu 6200 340 2100 6.18
#> 3 prosp_us 9100 280 4500 16.1
Difficulty: Beginner
Add the cost-efficiency metric first, then keep only the three best campaigns by conversion volume.
mutate() a cpa of spend / conversions, arrange(desc(conversions)), then slice_head(n = 3).
Click to reveal solution
Explanation: Ranking by raw conversions can mislead budget reallocation because a high-volume campaign with poor CPA is a worse use of spend than a low-volume campaign with strong CPA. A defensible ranking adds a CPA threshold filter or sorts by efficiency (conversions per dollar). For paid-media reviews, always present conversions, spend, and CPA together, never any single column in isolation.
What to do next
Now that you have shipped 20 marketing analytics workflows, build the supporting muscle:
- dplyr exercises in R for the verb mechanics that drive every aggregation here.
- tidyverse exercises in R to extend cohort and attribution joins across packages.
- A/B testing exercises in R for deeper experimentation power, sequential testing, and multi-arm tests.
- Logistic regression exercises in R to harden the churn-prediction modelling step.
r-statistics.co · Verifiable credential · Public URL
This document certifies mastery of
R for Marketing Analytics Mastery
Every certificate has a public verification URL that proves the holder passed the assessment. Anyone with the link can confirm the recipient and date.
484 learners have earned this certificate