ggplot2 Bar Chart Exercises in R: 17 Real-World Problems
Drill the bar chart patterns analysts actually build: counts versus pre-summarised values, dodged versus stacked, factor reordering, themes, and value labels. Seventeen problems across six sections with hidden solutions. Try each task in the editor before clicking the reveal. Every chart uses the diamonds or mpg datasets that ship with ggplot2.
Section 1. Basics: geom_bar vs geom_col (3 problems)
Exercise 1.1: Count cars in each body class with geom_bar
Task: The mpg dataset that ships with ggplot2 includes a class column with vehicle body styles. Build a basic bar chart with geom_bar() that counts how many cars fall into each class. Save the plot object to ex_1_1.
Expected result:
#> A vertical bar chart, x = class (alphabetical), y = count.
#> Bar heights: 2seater 5, compact 47, midsize 41, minivan 11, pickup 33, subcompact 35, suv 62.
Difficulty: Beginner
Each row in mpg is one car, so you want a tally of rows per body style, not a sum of any existing column.
Map the category to x with aes(x = class) and add geom_bar(), which counts rows per category for you.
Click to reveal solution
Explanation: geom_bar() defaults to stat = "count", which tallies rows per x category for you. There is no need to summarise the data first. Use geom_bar() when you have raw observations and want ggplot to count them; reach for geom_col() when the y-value is already computed (next exercise).
Exercise 1.2: Plot pre-summarised totals with geom_col
Task: A retailer wants a chart of total diamonds inventory value by cut. Summarise diamonds to total price per cut, then plot the rolled-up table with geom_col() since the y-values are precomputed. Save the plot to ex_1_2.
Expected result:
#> A vertical bar chart of total price (USD) by cut.
#> Ideal ~74.5M tallest, Premium ~63.2M, Very Good ~45.3M, Good ~17.3M, Fair ~7.4M.
Difficulty: Beginner
The per-cut totals are already computed in the summary table, so the chart only needs to draw bars at those heights.
Pass diamond_totals to ggplot() with aes(x = cut, y = total_price) and use geom_col().
Click to reveal solution
Explanation: geom_col() is identical to geom_bar(stat = "identity"), but more explicit and idiomatic. When the y-aesthetic is already a numeric value (a sum, average, share), use geom_col(). The common beginner trap is mapping y and still using geom_bar(); ggplot will warn and try to sum within each category, which is rarely what you wanted.
Exercise 1.3: Show proportions instead of raw counts
Task: Re-plot the diamonds$cut bar chart so each bar's height is the proportion of rows in that cut rather than the raw count, with all bars summing to one. Use aes(y = after_stat(prop), group = 1) inside geom_bar() and save to ex_1_3.
Expected result:
#> A bar chart of cut proportions: Fair 0.030, Good 0.091, Very Good 0.224, Premium 0.256, Ideal 0.400.
Difficulty: Intermediate
You want each bar's height to read as a fraction of all rows, with the heights summing to one across every category.
Inside aes() set y = after_stat(prop) and group = 1, then add geom_bar().
Click to reveal solution
Explanation: after_stat(prop) accesses the computed proportion column that stat_count() generates internally. The group = 1 aesthetic tells ggplot to treat all bars as one group so proportions sum across all categories (not within each). Without group = 1, each bar would be its own group and every bar would have height 1.
Section 2. Color, fill, and themes (3 problems)
Exercise 2.1: Fill bars by category using a viridis palette
Task: Build a count bar chart of mpg cars by class, fill each bar by its own class using the discrete viridis palette, apply theme_minimal(), and hide the legend because the x-axis already labels every bar. Save to ex_2_1.
Expected result:
#> Vertical bar chart, 7 bars colored on the viridis discrete palette (purple to yellow).
#> Theme is minimal (white background, gray gridlines). No legend visible.
Difficulty: Intermediate
The color should carry the category, the theme should be clean, and the legend is redundant once the x-axis labels every bar.
Map fill = class in aes(), then add scale_fill_viridis_d(), theme_minimal(), and theme(legend.position = "none").
Click to reveal solution
Explanation: Mapping fill = class inside aes() makes the color carry information; setting fill outside aes() would apply a single static color. scale_fill_viridis_d() is the discrete variant of viridis (suitable for categorical fills). Hiding a redundant legend with theme(legend.position = "none") is a small touch that produces cleaner reports.
Exercise 2.2: Style a brand-colored chart with a single fill
Task: The marketing team wants the diamonds$cut count chart rendered in their corporate teal (#1f7a8c) with a clean white background. Use fill = "#1f7a8c" as a static value inside geom_bar() and apply theme_classic(). Save the result to ex_2_2.
Expected result:
#> A vertical bar chart of cut counts, every bar filled solid teal (#1f7a8c).
#> theme_classic axes (black lines), no panel grid.
Difficulty: Intermediate
One fixed color for every bar means the color is decoration, not data, so it belongs outside the aesthetic mapping.
Set fill = "#1f7a8c" as an argument to geom_bar() rather than inside aes(), and add theme_classic().
Click to reveal solution
Explanation: The key distinction: fill inside aes() maps a variable to color (and creates a legend); fill outside aes() paints every bar the same static color. Hex codes work everywhere R color names do. theme_classic() removes the gray panel background and grid, which usually reads better in slide decks and printed reports than the default theme_gray().
Exercise 2.3: Highlight one category by manual fill mapping
Task: A product manager wants to highlight only the "Premium" cut in the diamonds bar chart while greying out the rest. Build the count chart with scale_fill_manual() mapping "Premium" to "#e85d04" and every other cut to "grey70". Save to ex_2_3.
Expected result:
#> A vertical bar chart of cut counts.
#> The "Premium" bar is orange (#e85d04); Fair, Good, Very Good, Ideal are grey70.
Difficulty: Intermediate
To spotlight one category, give it a bold color and paint every other bar the same neutral grey.
Map fill = cut, then use scale_fill_manual(values = ...) with a named vector setting "Premium" to "#e85d04" and every other cut to "grey70".
Click to reveal solution
Explanation: scale_fill_manual() accepts a named vector mapping every level of the fill variable to a color. This is the standard "single-callout" idiom when you need to draw attention to one category in a comparison. An alternative is case_when() outside the plot to create a highlight flag column, then map fill to that, which scales better when you have many factor levels.
Section 3. Stacked, dodged, and filled bars (3 problems)
Exercise 3.1: Stack drivetrain inside each body class
Task: A used-car analyst wants to see how drivetrain (drv) splits across each vehicle class in mpg. Build a stacked bar chart with class on the x-axis, bars filled by drv, and the default position = "stack". Save the plot to ex_3_1.
Expected result:
#> Vertical stacked bar chart, x = class, y = count.
#> Each bar segmented by drv (4, f, r) with a 3-color legend.
Difficulty: Intermediate
Splitting each class bar into segments for a second category, stacked on top of each other, is ggplot's default behavior.
Add fill = drv to aes() alongside x = class and call geom_bar() with no position argument.
Click to reveal solution
Explanation: Mapping fill to a second categorical variable triggers stacking by default. Stacking is good for comparing totals between groups but bad for comparing subgroups across groups, because the segments don't start from a common baseline. If your reader's question is "which class has the most 4WD cars?" use dodging (next exercise) instead.
Exercise 3.2: Switch to dodged bars for side-by-side comparison
Task: Rebuild the same class by drv chart with side-by-side bars instead of stacked, so each drivetrain gets its own bar inside each class. Use position = "dodge" inside geom_bar(). Save the plot object to ex_3_2.
Expected result:
#> Vertical dodged bar chart, x = class, y = count.
#> Each class has up to 3 small bars side-by-side, one per drv value.
Difficulty: Intermediate
Instead of segments stacked on top of each other, you want each subgroup drawn as its own bar sharing a common baseline.
Keep aes(x = class, fill = drv) but pass position = "dodge" to geom_bar().
Click to reveal solution
Explanation: position = "dodge" puts each subgroup in its own bar at the same baseline, so the eye can compare drivetrain counts across classes directly. A subtle issue: when a class has no observations for some drv level, dodged bars within that class are uneven widths. Use position = position_dodge2(preserve = "single") if you need equal-width bars even when groups are missing.
Exercise 3.3: 100% stacked bar showing shares within each class
Task: Convert the class by drv chart into a 100% stacked bar where each bar fills the full height and segments represent the proportion of each drv within that class. Use position = "fill" and format the y-axis with scales::label_percent(). Save to ex_3_3.
Expected result:
#> A stacked bar chart where every bar reaches y = 1.0 (100%).
#> Y-axis labels read 0%, 25%, 50%, 75%, 100%. Segments colored by drv.
Difficulty: Advanced
Rescale every bar to the same full height so the segments show within-class composition rather than totals.
Use geom_bar(position = "fill") and format the axis with scale_y_continuous(labels = label_percent()).
Click to reveal solution
Explanation: position = "fill" rescales each bar to height 1 so segments become within-group proportions, which is the right view when totals differ a lot and you only care about composition. scales::label_percent() is preferred over manually multiplying by 100 and pasting "%" because it handles axis breaks and decimal precision automatically. The trade-off: you lose information about absolute totals, so pair with a count chart when the audience needs both.
Section 4. Ordering and factor reordering (3 problems)
Exercise 4.1: Order bars from most to least frequent with fct_infreq
Task: The default alphabetical order of mpg$class bars makes it hard to spot the most common body style. Re-plot the count chart with bars sorted from most to least frequent using forcats::fct_infreq() on class inside aes(). Save to ex_4_1.
Expected result:
#> A bar chart with x labels left to right: suv, compact, midsize, subcompact, pickup, minivan, 2seater.
#> Heights descending: 62, 47, 41, 35, 33, 11, 5.
Difficulty: Intermediate
The bars should run from the most common body style down to the rarest so the ranking is obvious at a glance.
Wrap class in fct_infreq() inside aes(x = ...) and reset the axis title with labs(x = "class").
Click to reveal solution
Explanation: fct_infreq() reorders a factor's levels by descending frequency, which is the right ordering for almost every "count by category" chart. Without it, ggplot uses the factor's existing level order (often alphabetical) which gives readers no cue about which bar is biggest. The labs(x = "class") resets the x-axis title because fct_infreq(class) would otherwise become the displayed label.
Exercise 4.2: Order bars by a numeric statistic using reorder
Task: An analyst comparing average highway mileage by class in mpg wants the bars sorted from highest to lowest mean hwy. First summarise to a per-class mean, then plot with reorder(class, -mean_hwy) inside aes() and geom_col(). Save the plot to ex_4_2.
Expected result:
#> A bar chart, x labels left to right: compact, subcompact, midsize, 2seater, minivan, suv, pickup.
#> Bar heights approximately 28.3, 28.1, 27.3, 24.8, 22.4, 18.1, 16.9.
Difficulty: Intermediate
Order the categories by the numeric value the bars represent, with the tallest bar first.
Inside aes() use reorder(class, -mean_hwy) for x with geom_col(); the minus sign sorts descending.
Click to reveal solution
Explanation: reorder(x, by) sorts the factor x by ascending values of by; prefix by with a minus sign for descending order. forcats::fct_reorder() is the tidyverse equivalent and supports custom summary functions via the .fun argument (e.g., .fun = median). Both work in aes(). Always reset the x-axis label with labs(x = ...) because the default label becomes the full reorder(...) expression.
Exercise 4.3: Reorder a horizontal bar chart by mean price
Task: A reporting analyst wants a horizontal bar chart of mean diamond price per cut, with the highest-priced cut at the top. Summarise to per-cut means, use fct_reorder() on cut by mean price, plot with geom_col() and coord_flip(). Save the plot to ex_4_3.
Expected result:
#> Horizontal bar chart, y axis lists cuts; Premium at top, Ideal at bottom.
#> Bar lengths approximate mean prices: Premium 4584, Fair 4359, Very Good 3982, Good 3929, Ideal 3458.
Difficulty: Advanced
For a horizontal chart you order the categorical axis by the price value, then flip the plot's orientation.
Use fct_reorder(cut, mean_price) in aes(), draw with geom_col(), and add coord_flip().
Click to reveal solution
Explanation: coord_flip() swaps the x and y axes after the plot is built, which means you order the x aesthetic for what will visually become the y axis. The "top of the chart" maps to the largest factor level after the flip, so fct_reorder() in ascending order (default) puts the largest value at the top once flipped. A modern alternative is mapping cut to y directly: aes(x = mean_price, y = fct_reorder(cut, mean_price)). That skips coord_flip() entirely.
Section 5. Labels, annotations, and coord_flip (3 problems)
Exercise 5.1: Add count labels just above each bar
Task: For the basic mpg$class count chart, add the count value above each bar using geom_text() with stat = "count", aes(label = after_stat(count)), and vjust = -0.3 so labels sit just above the bar tops. Save the plot to ex_5_1.
Expected result:
#> A bar chart of class counts with numeric labels above each bar.
#> Labels read: 5, 47, 41, 11, 33, 35, 62 above the respective bars.
Difficulty: Intermediate
Each bar needs its own count printed as text floating just above its top edge.
Add geom_text(stat = "count", aes(label = after_stat(count)), vjust = -0.3) to the bar chart.
Click to reveal solution
Explanation: stat = "count" tells geom_text() to use the same underlying counting transformation as geom_bar(), then after_stat(count) reaches the computed count column. vjust = -0.3 nudges the label upward by 30% of label height. Negative vjust raises the label above the bar; positive vjust (between 0 and 1) drops it inside the bar. For geom_col(), where y is already mapped, you don't need stat = "count": use aes(label = y_value) directly.
Exercise 5.2: Format y-axis as dollars on a sales bar chart
Task: A finance analyst is presenting total diamonds revenue by cut. Build a geom_col() chart of summed price per cut, format the y-axis with scales::label_dollar() so values read in millions ($XM), and add a clear chart title. Save the plot to ex_5_2.
Expected result:
#> Bar chart of total price by cut.
#> Y-axis tick labels read like $20M, $40M, $60M (label_dollar with scale = 1e-6, suffix = "M").
#> Plot title: "Total diamond revenue by cut".
Difficulty: Intermediate
The dollar totals should display in compact millions on the axis without altering the underlying numbers.
Plot the summary with geom_col(), then scale_y_continuous(labels = label_dollar(scale = 1e-6, suffix = "M")) and set the title via labs().
Click to reveal solution
Explanation: label_dollar() returns a labelling function (not a finished string) which scale_y_continuous(labels = ...) calls on each break. The scale = 1e-6 multiplier rescales the displayed value (dividing by 1,000,000) and suffix = "M" appends the unit. The actual data values are unchanged. This is cleaner than mutating the data because the bar heights remain accurate for downstream filtering or faceting.
Exercise 5.3: Percentage labels inside a horizontal bar chart
Task: Build a horizontal bar chart of diamonds cut showing each cut's percentage share of the dataset. Compute the proportion, plot with geom_col() and coord_flip(), and add percentage labels inside each bar using geom_text() with scales::label_percent(). Save the plot to ex_5_3.
Expected result:
#> Horizontal bar chart of cut shares.
#> Each bar carries a percentage label inside, white text. Labels: 3.0%, 9.1%, 22.4%, 25.6%, 40.0%.
Difficulty: Advanced
Draw each cut's share as a horizontal bar and tuck a formatted percentage just inside each bar's end.
Use geom_col() with coord_flip(), then geom_text(aes(label = label_percent(accuracy = 0.1)(share)), hjust = 1.1, color = "white").
Click to reveal solution
Explanation: label_percent(accuracy = 0.1) returns a function; calling it on share produces the formatted strings before they reach geom_text(). Because the chart is flipped, hjust = 1.1 pulls the label slightly inside the right end of each bar (the flipped equivalent of vjust). White text on the colored bar gives strong contrast; switch to black if the bar fill is light. The pattern of calling a scales::label_*() factory inside aes(label = ...) is the cleanest way to format inline data labels.
Section 6. End-to-end realistic workflows (3 problems)
Exercise 6.1: Top 5 categories with title and caption
Task: A category buyer wants a chart showing the top 5 most common mpg$class values ordered by count descending, with a title "Top 5 vehicle classes in the EPA fuel economy dataset" and a caption "Source: ggplot2::mpg". Save the plot to ex_6_1.
Expected result:
#> Bar chart with 5 bars: suv 62, compact 47, midsize 41, subcompact 35, pickup 33.
#> Title displayed above plot; caption "Source: ggplot2::mpg" in bottom-right.
Difficulty: Advanced
Trim the data to the five most frequent classes first, then build a titled, captioned bar chart ordered by count.
Use count(class, sort = TRUE) plus slice_max(n, n = 5), then geom_col() with fct_reorder() and labs(title = ..., caption = "Source: ggplot2::mpg").
Click to reveal solution
Explanation: slice_max(n, n = 5) picks the top 5 rows by the n column (note: the inner n is the column name; the outer n = is the slice argument count). Chaining count(class, sort = TRUE) then slice_max is the standard "top-N category" pattern. labs(x = NULL) removes the x-axis title when the labels themselves are self-explanatory. The caption argument is the right place for data source attribution; putting it in the subtitle clutters the heading.
Exercise 6.2: Long-format dodged comparison with two measures
Task: Build a publication-ready bar chart comparing mean city and highway mileage by class in mpg. Pivot the summary long with tidyr::pivot_longer(), plot a dodged geom_col(), set fill colors manually, label each bar with one decimal place, and apply theme_minimal(). Save the plot to ex_6_2.
Expected result:
#> A dodged bar chart with two bars per class (cty and hwy), 14 bars total.
#> Each bar labeled with its mean to one decimal place.
#> Two-color legend (cty, hwy).
Difficulty: Advanced
Reshape the two mileage columns into one long value column so a single bar layer can compare both measures side by side.
After pivot_longer(), map fill = measure, draw geom_col(position = position_dodge(width = 0.85)), and give geom_text() a matching position_dodge.
Click to reveal solution
Explanation: The pivot-long pattern is the canonical way to compare multiple measures with a single geom_col() call: class on the x-axis, mpg on the y-axis, measure as the grouping fill. The matching position_dodge(width = ...) in geom_text() is required so labels align with their bars; if you only set dodge on geom_col(), the text would stack at the x-tick centers. sprintf("%.1f", mpg) formats to one decimal place; an equivalent tidyverse-style call is scales::number(mpg, accuracy = 0.1).
Exercise 6.3: Lollipop alternative to a long bar chart
Task: When a bar chart has many categories, a lollipop chart often reads better. Build a horizontal lollipop showing mean hwy by class for mpg, ordered by value, using geom_segment() for the stems and geom_point() for the heads. Save to ex_6_3.
Expected result:
#> A horizontal lollipop chart: 7 thin segments running from x = 0 to the mean hwy value, each ending in a dot.
#> Y-axis lists classes ordered by mean hwy (compact at top, pickup at bottom).
Difficulty: Advanced
A lollipop replaces each solid bar with a thin stem running from zero plus a dot marking the value.
Draw the stems with geom_segment(aes(x = 0, xend = mean_hwy, y = class, yend = class)) and the heads with geom_point().
Click to reveal solution
Explanation: A lollipop is functionally identical to a horizontal bar chart but with less ink, which works well when bars would otherwise look like a thick wall of color. The trick is geom_segment() with x = 0 and xend = mean_hwy, which draws each "bar" as a thin line, then geom_point() overlays the dot. Sorting the factor with fct_reorder() is the same idiom from Section 4. Bars remain the right choice for stacked or grouped comparisons; lollipops shine for single-value ranks.
What to do next
Now that you have practiced the bar chart fundamentals, move into the related ggplot2 patterns:
- ggplot2 Bar Charts is the parent tutorial that covers each pattern in detail with explanations.
- ggplot2 geom_bar() vs geom_col() in R goes deeper on when to choose each geom.
- ggplot2 Histogram and Density Plot Exercises in R is the natural next step for visualizing continuous distributions.
- ggplot2 Facets Exercises in R extends your bar charts into small-multiples comparisons.
r-statistics.co · Verifiable credential · Public URL
This document certifies mastery of
ggplot2 Bar Chart Mastery
Every certificate has a public verification URL that proves the holder passed the assessment. Anyone with the link can confirm the recipient and date.
169 learners have earned this certificate