Network Analysis Exercises in R: 20 Practice Problems with igraph
Twenty practice problems that walk through real network analysis workflows in R: construction, degree and density, four centrality families, shortest paths, components, community detection, and ego subgraphs. Solutions are hidden and revealed on click.
Section 1. Building and inspecting graphs (3 problems)
Exercise 1.1: Build an undirected friendship graph from an edge list
Task: A sociologist is encoding a small office friendship survey where seven coworkers (Ada, Ben, Cleo, Dan, Eva, Finn, Gia) reported mutual friendships: Ada-Ben, Ada-Cleo, Ben-Cleo, Cleo-Dan, Dan-Eva, Eva-Finn, Finn-Gia, Gia-Eva. Build an undirected igraph object from this two-column edge list using graph_from_data_frame() and save it to ex_1_1.
Expected result:
#> IGRAPH 4a1b2c3 UN-- 7 8 --
#> + attr: name (v/c)
#> + edges from 4a1b2c3 (vertex names):
#> [1] Ada --Ben Ada --Cleo Ben --Cleo Cleo--Dan Dan --Eva Eva --Finn
#> [7] Finn--Gia Gia --Eva
Difficulty: Beginner
The two-column data frame is just a list of endpoint pairs; you need a constructor that turns those pairs into a graph plus a flag saying friendships run both ways.
Call graph_from_data_frame() on edges and pass directed = FALSE.
Click to reveal solution
Explanation: graph_from_data_frame() is the workhorse constructor when your raw data is a tidy edge list, which is how networks are usually stored in databases or CSV exports. The first two columns become the edge endpoints; any extra columns become edge attributes automatically. Passing directed = FALSE produces an undirected graph (note the UN-- flag in the summary). For directed networks (followers, citations, transactions), set directed = TRUE.
Exercise 1.2: Inspect graph topology with vcount, ecount, and is_directed
Task: Continuing from Exercise 1.1, the same sociologist now needs a quick summary for the report: the number of vertices, the number of edges, whether the graph is directed, and whether it is simple (no loops or multi-edges). Build a named list with elements n, m, directed, simple from ex_1_1 and save the list to ex_1_2.
Expected result:
#> $n
#> [1] 7
#>
#> $m
#> [1] 8
#>
#> $directed
#> [1] FALSE
#>
#> $simple
#> [1] TRUE
Difficulty: Beginner
Each element of the list answers one separate question about the graph's shape and cleanliness, so build it from four independent one-value queries.
Use vcount(), ecount(), is_directed(), and is_simple() on ex_1_1 inside a list() with names n, m, directed, simple.
Click to reveal solution
Explanation: These four predicates are the first thing you check on any graph object: shape (n, m), orientation (directed yes/no), and whether the data is clean (no self-loops, no parallel edges). is_simple() returns FALSE if the graph has either loops or multi-edges, and the common fix is simplify(g, remove.multiple = TRUE, remove.loops = TRUE). Confirming simplicity early prevents subtle metric bugs: degree counts double-count multi-edges, and shortest-path algorithms treat self-loops as zero-cost detours.
Exercise 1.3: Attach vertex attributes (department) and access them
Task: The HR team adds department metadata to the seven coworkers from Exercise 1.1: Ada and Ben are in Sales, Cleo and Dan are in Engineering, Eva is in Marketing, Finn and Gia are in Sales. Attach a dept vertex attribute to ex_1_1 and save the resulting graph as ex_1_3. Then verify by retrieving the attribute vector.
Expected result:
#> [1] "Sales" "Sales" "Engineering" "Engineering"
#> [5] "Marketing" "Sales" "Sales"
Difficulty: Intermediate
Copy the graph first, then write a per-vertex value onto the copy, lining the lookup up by vertex name so the order cannot drift.
After ex_1_3 <- ex_1_1, assign V(ex_1_3)$dept <- dept_map[V(ex_1_3)$name].
Click to reveal solution
Explanation: Vertex attributes are how you carry "real-world meaning" inside the graph object so that downstream filtering, coloring, and community comparison work without external joins. The pattern V(g)$attr <- value assigns; V(g)$attr reads. Indexing dept_map by V(g)$name is the safe alignment trick: it tolerates a different vertex order than the source data. Edge attributes follow the symmetric pattern with E(g)$weight <- ....
Section 2. Degree, density, and transitivity (3 problems)
Exercise 2.1: Compute the degree of each vertex and identify the most connected node
Task: A growth team analyzing the friendship graph from Exercise 1.1 wants to spot the most socially connected employee for a referral pilot. Compute the degree of every vertex of ex_1_1, then return a tibble with columns name and degree sorted descending by degree. Save the tibble to ex_2_1.
Expected result:
#> # A tibble: 7 x 2
#> name degree
#> <chr> <dbl>
#> 1 Cleo 3
#> 2 Eva 3
#> 3 Ada 2
#> 4 Ben 2
#> 5 Dan 2
#> 6 Finn 2
#> 7 Gia 2
Difficulty: Intermediate
Get a per-vertex connection count, then turn that named vector into a two-column table ordered from most to least connected.
Call degree(ex_1_1), build a tibble() with name = names(deg) and degree = deg, then arrange(desc(degree)).
Click to reveal solution
Explanation: Degree is the simplest and most interpretable centrality: the number of incident edges. Top-degree nodes are usually your "hub" candidates for seeding diffusion experiments, but degree alone is biased toward visibility, not structural importance, which is why we layer in betweenness and eigenvector centrality later. For directed graphs, pass mode = "in", "out", or "all" to distinguish follower count from following count.
Exercise 2.2: Plot the degree distribution and identify any power-law shape
Task: A network researcher reviewing a 200-node Barabasi-Albert preferential attachment graph wants to verify the well-known heavy-tailed degree pattern. Generate a sample BA graph with sample_pa(200, power = 1.2, m = 2, directed = FALSE) using set.seed(42) for reproducibility, then compute the degree distribution as a named numeric vector (degree -> frequency). Save the distribution to ex_2_2.
Expected result:
#> Approximate (set.seed(42)):
#> 1 2 3 4 5 6 7 8 10 12 15 22
#> 0.405 0.225 0.110 0.075 0.040 0.035 0.025 ...
Difficulty: Intermediate
You want the share of nodes at each degree value, then to label those shares with the degree they belong to.
Call degree_distribution(g_ba) and set names() to seq(0, length(ex_2_2) - 1).
Click to reveal solution
Explanation: degree_distribution() returns probabilities indexed from degree 0 upward, which is convenient for fitting a power-law tail with fit_power_law() from igraph. The Barabasi-Albert model produces a heavy-tailed distribution because new nodes preferentially attach to high-degree existing nodes, a process that shows up empirically in citation networks, web graphs, and protein interactions. If the head of the distribution looks roughly geometric instead, suspect Erdos-Renyi sampling, not preferential attachment.
Exercise 2.3: Compute edge density and global transitivity (clustering coefficient)
Task: A research team comparing two student social networks (one from a high school, one from a graduate seminar) needs the two most basic structural metrics: edge density (how saturated the graph is) and global transitivity (how often friends of friends are also friends). Compute both for ex_1_1 and return them as a named numeric vector c(density = ..., transitivity = ...) saved to ex_2_3.
Expected result:
#> density transitivity
#> 0.3809524 0.6000000
Difficulty: Intermediate
Two separate structural ratios sit side by side in one named vector: one for how saturated the edges are, one for how often triangles close.
Combine edge_density() and transitivity(ex_1_1, type = "global") inside c() with names density and transitivity.
Click to reveal solution
Explanation: Density is the ratio of observed edges to the maximum possible (n*(n-1)/2 for undirected simple graphs), useful for comparing graphs of different sizes. Global transitivity is the proportion of connected triples that are closed into triangles, a classic measure of "small-world" structure. Real social networks typically show transitivity of 0.1 to 0.6 versus near-zero for random Erdos-Renyi graphs at the same density. Pass type = "local" to get a per-vertex clustering coefficient instead.
Section 3. Centrality measures (4 problems)
Exercise 3.1: Rank vertices by betweenness centrality
Task: An information security team mapping internal communication flows in a 30-employee firm wants to identify "broker" employees through which the most cross-team messages route. Using a karate-style sample graph from make_graph("Zachary"), compute betweenness centrality for every vertex, then return a tibble with vertex, betweenness, sorted descending. Save the top-10 rows to ex_3_1.
Expected result:
#> # A tibble: 10 x 2
#> vertex betweenness
#> <int> <dbl>
#> 1 1 231.
#> 2 34 160.
#> 3 33 76.7
#> 4 3 75.9
#> 5 32 73.0
#> ...
Difficulty: Advanced
Score every vertex by how often it sits on shortest routes, then keep only the busiest handful in a sorted table.
Call betweenness(g_kar, directed = FALSE, normalized = FALSE), build a tibble(), arrange(desc(betweenness)), then slice_head(n = 10).
Click to reveal solution
Explanation: Betweenness counts the fraction of shortest paths through each vertex, so high-betweenness nodes act as bottlenecks ("brokers") whose removal would fragment the graph. In Zachary's karate club, vertices 1 (the instructor) and 34 (the president) score highest, which matches the historical split. For graphs above ~10,000 nodes the exact algorithm gets expensive; pass cutoff = k to estimate from paths of length <= k, or use estimate_betweenness().
Exercise 3.2: Compute closeness centrality on a connected component
Task: A logistics planner studying a regional courier network wants to find the depot from which the average delivery distance to all other depots is shortest. Using the same Zachary karate graph as in 3.1 (it is connected), compute normalized closeness centrality, return the top-5 vertices by closeness in a tibble with columns vertex, closeness. Save the tibble to ex_3_2.
Expected result:
#> # A tibble: 5 x 2
#> vertex closeness
#> <int> <dbl>
#> 1 1 0.569
#> 2 3 0.559
#> 3 34 0.550
#> 4 32 0.541
#> 5 9 0.516
Difficulty: Advanced
Rank vertices by how short their average trip to everyone else is, scaled so graph size does not distort the score.
Call closeness(g_kar, normalized = TRUE), then tibble() with arrange(desc(closeness)) and slice_head(n = 5).
Click to reveal solution
Explanation: Closeness is the reciprocal of mean geodesic distance to every other vertex, so high-closeness nodes can reach the whole graph in few hops. Normalization (normalized = TRUE) scales by n-1 so values are comparable across graph sizes. Closeness is poorly defined on disconnected graphs because some pairs have infinite distance; in practice, compute it on the largest connected component (use components() and induced_subgraph()). Closeness and betweenness often disagree: a node can be close to many others without being a broker.
Exercise 3.3: Eigenvector centrality and the "important-friends" intuition
Task: A marketing analyst studying an influencer network on a niche platform wants a centrality score that rewards being connected to other high-status accounts, not just having many followers. Compute eigenvector centrality on the Zachary karate graph from Exercise 3.1, then build a tibble of the top-5 vertices with columns vertex, eigen. Save to ex_3_3.
Expected result:
#> # A tibble: 5 x 2
#> vertex eigen
#> <int> <dbl>
#> 1 34 1.00
#> 2 3 0.866
#> 3 1 0.853
#> 4 2 0.770
#> 5 33 0.733
Difficulty: Advanced
You want a score that rewards being linked to other high-scoring nodes, then the five highest in a sorted table.
Take the $vector element of eigen_centrality(g_kar), build a tibble(), arrange(desc(eigen)), and slice_head(n = 5).
Click to reveal solution
Explanation: Eigenvector centrality is recursive: your score is proportional to the sum of your neighbors' scores. Mathematically it is the leading eigenvector of the adjacency matrix. It often disagrees with raw degree because a node with three well-connected friends scores higher than a node with five isolated ones. PageRank is essentially eigenvector centrality with random-restart damping, which fixes pathologies on directed graphs with dangling nodes; reach for page_rank() when handling web-style directed data.
Exercise 3.4: PageRank on a directed citation graph
Task: A library data scientist studying a small academic citation graph of seven papers (P1 cites P2, P3; P2 cites P3; P3 cites P5; P4 cites P3, P5; P5 cites P6; P6 cites P7; P7 cites P3) wants the most "influential" paper by PageRank, accounting for the directed nature of citations. Build the directed graph inline, compute PageRank with default damping 0.85, return a tibble paper, pagerank sorted descending, saved to ex_3_4.
Expected result:
#> # A tibble: 7 x 2
#> paper pagerank
#> <chr> <dbl>
#> 1 P3 0.296
#> 2 P5 0.181
#> 3 P7 0.105
#> 4 P6 0.099
#> 5 P2 0.083
#> ...
Difficulty: Intermediate
First turn the citation pairs into a graph that respects direction, then score papers with a random-walk-with-restart measure.
Build the graph with graph_from_data_frame(cites, directed = TRUE), then take the $vector of page_rank(g_cite, damping = 0.85).
Click to reveal solution
Explanation: PageRank treats edges as directional endorsements and uses a random-walk-with-restart formulation: with probability damping the walker follows an outgoing edge, otherwise jumps to a uniform random vertex. The restart fixes two failure modes that break vanilla eigenvector centrality on directed graphs: dangling sinks and disconnected strongly-connected components. P3 wins here because four other papers cite it and one of those citers (P4) is itself cited.
Section 4. Paths and connectivity (3 problems)
Exercise 4.1: Find the shortest path between two named vertices
Task: A subway operator analyzing a small transit graph wants to confirm the minimum number of transfers between stations Ada and Finn in the friendship graph from Exercise 1.1 (treating each friendship as a one-step hop). Compute the shortest path vertex sequence from Ada to Finn on ex_1_1 and save the vertex names along the path to a character vector ex_4_1.
Expected result:
#> [1] "Ada" "Cleo" "Dan" "Eva" "Finn"
Difficulty: Intermediate
Ask the graph for the route between the two named stations, then pull just the station labels strung along it.
Call shortest_paths(ex_1_1, from = "Ada", to = "Finn") and read names() of its $vpath[[1]].
Click to reveal solution
Explanation: shortest_paths() returns both vertex and edge sequences in a list; pull $vpath[[1]] for the vertex IDs and wrap in names() for the labels. The default algorithm is unweighted BFS, which gives geodesic distance in number of hops. For weighted edges (travel time, cost), set weights = E(g)$weight and igraph automatically switches to Dijkstra. When you only need the integer distance and not the path itself, distances(g, v = "Ada", to = "Finn") is much faster.
Exercise 4.2: Full pairwise distance matrix and graph diameter
Task: A network reliability engineer studying a fiber backbone wants the full shortest-path distance matrix (for capacity planning) and the diameter (the worst-case latency) of the Zachary karate graph from Exercise 3.1. Compute both and return them as a list with elements distances (the matrix) and diameter (an integer), saved to ex_4_2.
Expected result:
#> $distances
#> 1 2 3 4 ...
#> 1 0 1 1 1 ...
#> 2 1 0 1 1 ...
#> ...
#>
#> $diameter
#> [1] 5
Difficulty: Intermediate
One list element holds every pairwise hop count, the other holds the single worst-case hop count across all pairs.
Put distances(g_kar) and diameter(g_kar) into a list() named distances and diameter.
Click to reveal solution
Explanation: distances() returns an n-by-n matrix in O(n*m) for unweighted graphs (Dijkstra on weighted), and diameter() is the maximum finite entry of that matrix. The karate graph's diameter of 5 means the most distant pair is reachable in five hops, which is why "six degrees of separation" sounds plausible: small-world graphs grow logarithmically in diameter. For very large graphs, mean_distance() is cheaper and more meaningful than the worst-case diameter.
Exercise 4.3: Identify connected components and extract the largest one
Task: A fraud analyst examining a transaction network knows that suspicious clusters often appear as small disconnected islands separate from the legitimate giant component. Construct a small disconnected graph with graph_from_literal(A-B-C, D-E, F), compute its components (membership and sizes), and save the size of the largest component (as a length-1 integer) to ex_4_3.
Expected result:
#> Components found: 3
#> Component sizes: 3 2 1
#> Largest component (saved to ex_4_3):
#> [1] 3
Difficulty: Beginner
Break the graph into its disconnected pieces, then report how many vertices sit in the biggest piece.
Call components(g_disc) and take max() of its $csize vector.
Click to reveal solution
Explanation: components() returns three pieces: membership (vertex -> component ID), csize (size of each component), and no (number of components). For directed graphs, pass mode = "weak" to ignore direction or "strong" to require bidirectional reachability. The largest weakly connected component is usually the "main" graph for analysis: extract it with induced_subgraph(g, V(g)[membership == which.max(csize)]) to drop noise islands before computing centrality.
Section 5. Community detection (4 problems)
Exercise 5.1: Detect communities with the Louvain algorithm
Task: A social media team studying conversation clusters in a tightly-knit interest community needs a fast modularity-optimizing partition of the Zachary karate graph from Exercise 3.1. Run cluster_louvain() on g_kar, then save the resulting communities object to ex_5_1. Print the number of detected communities and the modularity score.
Expected result:
#> IGRAPH clustering multi level, groups: 4, mod: 0.45
#> + groups:
#> $`1` ...
Difficulty: Intermediate
Run a fast modularity-optimizing partition and keep the whole communities object it hands back.
Assign ex_5_1 <- cluster_louvain(g_kar).
Click to reveal solution
Explanation: Louvain is the default "greedy modularity" community detector for large graphs because it scales near-linearly and almost always finds the best partition. The output is a communities object you can call membership(), length() (community count), and sizes() on. Modularity around 0.3 is "weak" community structure, 0.4+ is strong, and >0.7 is essentially block-diagonal. Louvain is non-deterministic (random tie-breaking), so set a seed for reproducible reports.
Exercise 5.2: Compare community sizes with the sizes() helper
Task: A data product manager preparing a stakeholder report on the Louvain partition from Exercise 5.1 wants a clean named integer vector of community sizes for a horizontal bar chart. Extract it from ex_5_1 and save it (named by community ID) to ex_5_2.
Expected result:
#> Community sizes
#> 1 2 3 4
#> 11 12 6 5
Difficulty: Beginner
You already have the partition; you just need a named count of how many vertices landed in each group.
Call sizes(ex_5_1).
Click to reveal solution
Explanation: sizes() is just table(membership(comm)) under the hood, but it returns a table object with a tidy print method. The community IDs are arbitrary labels assigned during the optimization run, so they have no inherent meaning across runs (random tie-breaking can swap labels). For longitudinal comparisons (week-over-week communities), match communities by overlap rather than by ID with compare(comm1, comm2, method = "nmi").
Exercise 5.3: Compare two community detection algorithms with modularity
Task: A network science PhD student benchmarking algorithms on the Zachary karate graph wants to compare the modularity of three partitions: Louvain, edge betweenness (Girvan-Newman), and walktrap. Run all three with set.seed(11), collect their modularity scores into a named numeric vector, and save it to ex_5_3.
Expected result:
#> louvain edge_btw walktrap
#> 0.4449037 0.4012985 0.3532216
Difficulty: Advanced
Run three different partitioners on the same graph, then collect one quality number from each into a named vector.
Run cluster_louvain(), cluster_edge_betweenness(), and cluster_walktrap() on g_kar, then wrap modularity() of each inside c() with names louvain, edge_btw, walktrap.
Click to reveal solution
Explanation: Different algorithms optimize different objectives and have different scaling properties. Louvain optimizes modularity directly and scales to millions of edges. Edge betweenness (Girvan-Newman) iteratively removes the highest-betweenness edge and is O(n*m^2), so it is only viable below ~1000 nodes but produces interpretable hierarchical splits. Walktrap uses random-walk distance and finds tighter, more cohesive communities. On clean toy graphs they often agree; on noisy real-world data, comparing modularity (or NMI against ground truth) is the right way to choose.
Exercise 5.4: Visualize a community-colored graph plot
Task: A communications designer producing a figure for a blog post wants to render the karate graph with each Louvain community colored differently, vertices sized by degree, and the layout fixed by layout_with_fr() for reproducibility. Save the plot output to ex_5_4 (the function call invisibly returns NULL; capturing layout matrix for grading). Use set.seed(99) for layout reproducibility.
Expected result:
#> A 34 x 2 numeric matrix of Fruchterman-Reingold coordinates.
#> Plot rendered: 4 communities colored, vertex size proportional to degree.
Difficulty: Intermediate
Compute the force-directed coordinates once and save them, then feed those coordinates plus community colors into the drawing call.
Save layout_with_fr(g_kar) to ex_5_4 and pass it as the layout argument to plot() alongside vertex.color = membership(ex_5_1).
Click to reveal solution
Explanation: Separating the layout (layout_with_fr()) from the plot call is the standard idiom because force-directed layouts are stochastic; computing them once and passing the matrix in lets you produce identical figures across renders. vertex.color accepts any integer vector (one per vertex); pairing it with membership(comm) gives community-aware coloring "for free". Avoid putting raw degree into vertex.size without scaling; degrees of 1-2 produce dots smaller than the vertex label.
Section 6. Subgraphs and structural queries (3 problems)
Exercise 6.1: Extract the largest connected component as an induced subgraph
Task: A fraud analyst (continuing from Exercise 4.3) wants to run centrality and community detection on only the main legitimate cluster, discarding the small isolated islands. Build g_disc again as in 4.3, extract the largest weakly connected component as an igraph object using induced_subgraph(), and save it to ex_6_1.
Expected result:
#> IGRAPH UN-- 3 2 --
#> + attr: name (v/c)
#> + edges from ...:
#> [1] A--B B--C
Difficulty: Intermediate
Find which piece is biggest, then carve out just the vertices belonging to it together with their internal edges.
From components(g_disc), take which.max() of $csize and pass the matching V(g_disc)[membership == biggest] to induced_subgraph().
Click to reveal solution
Explanation: induced_subgraph() keeps the listed vertices and all edges among them, preserving vertex and edge attributes. Combine it with components()$membership to slice out connected pieces, or with a boolean filter V(g)[degree > 5] to keep a "core" set. The alternative subgraph() is deprecated; use induced_subgraph() for vertex-based slicing and subgraph.edges() for edge-based slicing.
Exercise 6.2: Compute one-step ego networks for a target vertex
Task: A user-research lead studying personal influence wants the immediate ego network (1-step neighborhood) around vertex 1 of the Zachary karate graph: the focal vertex plus all its direct neighbors and any edges among them. Build it from g_kar with make_ego_graph(), return the single resulting graph object, and save it to ex_6_2.
Expected result:
#> IGRAPH U--- 17 41 --
#> + edges from ...:
#> (vertex 1 plus its 16 neighbors and all edges among them)
Difficulty: Advanced
Build the local neighborhood graph around one focal vertex at radius one, then pull the single graph out of the returned list.
Call make_ego_graph(g_kar, order = 1, nodes = 1) and extract element [[1]].
Click to reveal solution
Explanation: make_ego_graph() returns a list because you can request multiple egos at once; if you pass a single node, just extract [[1]]. The order argument controls the radius: order = 1 is direct neighbors, order = 2 adds friends-of-friends. Ego networks are the building block for personal-influence studies and for breaking very large graphs into per-user sub-analyses that fit in memory. For just the vertex IDs without building the subgraph, use the lighter ego() or neighborhood().
Exercise 6.3: Filter edges by a weight attribute and rebuild a sparser graph
Task: A trading desk reviewing a co-trading network where edge weights represent USD volume in millions wants to threshold the network at >= 5 million to focus on material relationships. Build a weighted graph inline (six edges with weights c(2, 3, 5, 6, 8, 11)), then return a new igraph object containing only edges with weight >= 5. Save the sparser graph to ex_6_3.
Expected result:
#> IGRAPH UNW- 4 3 --
#> + attr: name (v/c), weight (e/n)
#> + edges from ... (vertex names):
#> [1] C--D D--E E--F
Difficulty: Intermediate
Select only the edges whose weight clears the threshold, then rebuild a graph from just those edges and drop any nodes left stranded.
Take E(g_w)[weight >= 5] and pass it to subgraph.edges(g_w, ..., delete.vertices = TRUE).
Click to reveal solution
Explanation: subgraph.edges() keeps only the listed edges and (with delete.vertices = TRUE) drops any vertices left isolated. Note the convenient E(g)[weight >= 5] syntax: igraph evaluates the expression inside the brackets against the edge attribute set, similar to dplyr::filter(). The UNW- flag in the summary confirms the graph is undirected, named, and weighted. After thresholding, always re-check components() because a sparsifying filter often splits the graph into pieces.
What to do next
- dplyr Joins Exercises: joining edge lists and vertex attribute tables.
- Data Visualization Exercises: practice plotting graphs and other complex structures.
- Spatial Analysis Exercises: geographic networks and adjacency from spatial data.
- Apply Family Exercises: vectorize per-vertex computations across centrality measures.
r-statistics.co · Verifiable credential · Public URL
This document certifies mastery of
Network Analysis Mastery
Every certificate has a public verification URL that proves the holder passed the assessment. Anyone with the link can confirm the recipient and date.
101 learners have earned this certificate