data.table funion() in R: Fast Set Union of Tables
The data.table funion() function in R computes the fast set union of two tables, returning every unique row found in either one. It is the row-wise equivalent of SQL's UNION.
funion(x, y) # union, duplicate rows removed funion(x, y, all = TRUE) # union all, duplicate rows kept funion(funion(x, y), z) # union of three tables, nested fintersect(x, y) # rows present in BOTH tables fsetdiff(x, y) # rows in x but not in y fsetequal(x, y) # TRUE if both hold the same rows union(v1, v2) # base R union for plain vectors
Need explanation? Read on for examples and pitfalls.
What funion() does
funion() returns the set union of two tables, every distinct row that appears in x, y, or both. You pass two data.tables with the same columns. The function stacks their rows and, by default, removes any duplicate so each unique row appears once.
It is data.table's answer to merging two record sets without double-counting overlaps. A typical case is combining two monthly extracts where some customers appear in both files. Base R's union() only works on vectors, so it cannot deduplicate multi-column rows. funion() fills that gap and runs as compiled C code, which matters once the tables grow large.
UNION: the deduplication runs over the full row, so a table with five columns is compared across all five at once.funion() syntax
The call takes two tables and one optional all flag. The signature is funion(x, y, all = FALSE). Here x and y are the two tables to combine, and all controls whether duplicate rows survive into the result.
| Argument | Role |
|---|---|
x |
First data.table |
y |
Second data.table, same column names and types as x |
all |
FALSE (default) drops duplicate rows; TRUE keeps them (UNION ALL) |
The function returns a new data.table and never modifies x or y in place. Unlike SQL, data.table preserves row order: rows of x come first, then the new rows from y.
The shared row (3, Chicago) appears once. The result holds five rows, not six.
funion() examples
These four examples move from a plain union to a real deduplication task. Each block builds on the dt1 and dt2 tables created above, so run them in order.
The default call removes duplicates. Switch all = TRUE when you want a straight UNION ALL that keeps every copy.
The Chicago row now appears twice, once from each table. Use this when you need the combined row count to equal nrow(x) + nrow(y).
A common job is merging two overlapping snapshots into one clean table. Here two weekly extracts share members, and funion() collapses them to the distinct set.
Six input rows collapse to four because ben and cara appear identically in both weeks.
unique(rbind(x, y)) on large data. Both produce the same deduplicated result, but funion() runs the stack and the deduplication in one optimized C pass, so it scales better as row counts climb into the millions.You can also union subsets pulled from the same source. This example converts the built-in mtcars dataset to a data.table and unions two filtered views.
Every high-mpg car is also a 4-cylinder car, so the union deduplicates the four overlapping rows down to the 11 distinct ones.
funion vs union vs rbindlist
Pick funion() for a two-table deduplicated combine, and rbindlist() for fast raw stacking of many tables. The three functions overlap, but each is tuned for a different job.
| Operation | Removes duplicates? | Inputs | Best for |
|---|---|---|---|
funion(x, y) |
Yes | Two data.tables | Distinct union of two tables |
funion(x, y, all = TRUE) |
No | Two data.tables | UNION ALL of two tables |
rbindlist(list(x, y, z)) |
No | Many tables in a list | Fast stacking, no dedup |
unique(rbind(x, y)) |
Yes | data.frames | Small data, base R only |
union(v1, v2) |
Yes | Two vectors | Single-column sets |
The decision rule is simple. Combining exactly two tables and want unique rows? Use funion(). Stacking three or more tables, or speed matters more than deduplication? Use rbindlist() and call unique() afterward only if needed.
funion(x, y) is pd.concat([x, y]).drop_duplicates(), and funion(x, y, all = TRUE) is a plain pd.concat([x, y]).Common pitfalls
Most funion() errors trace back to column shape or input type. These three mistakes account for nearly every failure.
The columns of x and y must share the same names and types. A mismatch raises an error rather than guessing an alignment.
The fix is to rename the columns so both tables describe the same fields before calling funion().
funion(funion(x, y), z), or use unique(rbindlist(list(x, y, z))).Both inputs must be data.tables. A plain data.frame is not accepted, so convert it first with as.data.table().
Try it yourself
Try it: Union two data.tables of stock keeping units, keeping every row including duplicates. Save the result to ex_union.
Click to reveal solution
Explanation: With all = TRUE, funion() keeps every row from both tables, so the identical (B2, 3) row appears twice and the result has four rows. Dropping all = TRUE would deduplicate it down to three.
Related data.table functions
funion() is one of four data.table set operations. The others handle the remaining ways to compare two row sets.
- fintersect() returns rows present in both tables.
- fsetdiff() returns rows in the first table but not the second.
fsetequal()tests whether two tables hold exactly the same rows.- rbindlist() stacks many tables fast without removing duplicates.
- uniqueN() counts the distinct rows or values in a table.
For the full reference, see the data.table set operations documentation.
FAQ
What is the difference between funion and union in R?
Base R's union() works on vectors and returns the distinct combined values of a single dimension. data.table's funion() works on whole tables and deduplicates across every column at once. If you call union() on two data.tables, data.table dispatches to funion() automatically, so the two names point to the same row-wise behavior for data.tables.
Does funion remove duplicate rows?
Yes, by default. With all = FALSE (the default), funion() returns each distinct row only once, even if it appeared in both input tables. Pass all = TRUE to keep every copy, which gives the equivalent of SQL's UNION ALL and makes the result row count equal nrow(x) + nrow(y).
What does the all argument do in funion?
The all argument controls duplicate handling. When all = FALSE, duplicate rows are collapsed to one. When all = TRUE, all copies are retained, including rows duplicated within a single input table. Use TRUE when row counts must be preserved, for example when each row represents a separate transaction.
Can funion combine data.tables with different columns?
No. funion() requires both tables to have the same column names and the same column types. A mismatch raises an error rather than attempting an alignment. Rename or reorder columns so the two tables describe identical fields before calling funion().
Is funion faster than rbind?
For a deduplicated union, funion() is faster than unique(rbind(x, y)) because it combines the stack and the deduplication into one optimized C pass. If you only need to stack tables without removing duplicates, rbindlist() is the fastest option and also accepts more than two tables at once.