Import Any Data Format Into R: CSV, Excel, JSON, and 12 Others

Almost every real R project starts with "read this file and give me a data frame." This guide shows you the one function to reach for by format, CSV, Excel, JSON, TSV, fixed-width, SPSS, Stata, SAS, Parquet, RDS, and more, plus the arguments that rescue broken imports.

By Selva Prabhakaran · Published May 24, 2026 · Last updated May 24, 2026

How do you import a CSV file into R?

CSV is the workhorse format, flat, text-based, universal. R gives you three main choices: base read.csv(), tidyverse readr::read_csv(), and data.table::fread(). All three return a data frame; they differ in speed, defaults, and output class.

Rreadcsv with readr basics

library(readr) library(writexl) sales <- read_csv("sales.csv") sales #> # A tibble: 6 x 4 #> date product units price #> <date> <chr> <dbl> <dbl> #> 1 2026-01-05 widget 12 9.99 #> 2 2026-01-06 gizmo 4 24.50 #> 3 2026-01-07 widget 8 9.99 #> 4 2026-01-08 sprocket 15 4.25 #> 5 2026-01-09 gizmo 2 24.50 #> 6 2026-01-10 widget 10 9.99

read_csv() auto-detects column types, parses dates, returns a tibble, and is about 10× faster than base read.csv(). It also prints a column-spec summary so you can catch type surprises early.

Key Insight

Use read_csv() for most work, fread() when speed matters on huge files (100 MB+), and base read.csv() only when you can't install packages. The differences are in defaults, not capability.

Try it: Load a CSV with the readr package and check its column types with spec().

RExercise: Inline CSV type spec

# tiny inline example library(readr) ex_text <- "id,name,score\n1,Alice,92\n2,Bob,87" ex_df <- read_csv(ex_text) spec(ex_df)

Click to reveal solution

RInline CSV type spec solution

library(readr) ex_text <- "id,name,score\n1,Alice,92\n2,Bob,87" ex_df <- read_csv(ex_text) spec(ex_df) #> cols( #> id = col_double(), #> name = col_character(), #> score = col_double() #> )

read_csv() accepts a literal string as input when it contains a newline, so you can test parsing without writing a file. spec() returns the column specification read_csv() inferred, use it to verify types before pointing at a large file.

What arguments fix broken CSV imports?

Real-world CSVs are rarely clean. These five arguments solve 90% of import headaches.

REuropean decimals with readdelim

# Handle European-style decimal commas and semicolons library(readr) read_delim("a;b\n1,5;2,3\n3,1;4,7", delim = ";", locale = locale(decimal_mark = ",")) #> # A tibble: 1 x 2 #> a b #> <dbl> <dbl> #> 1 1.5 2.3

RSkip rows and force types

# Skip junk header rows, force column types read_csv("junk\njunk\na,b\n1,x\n2,y", skip = 2, col_types = "ic") #> # A tibble: 2 x 2 #> a b #> <int> <chr> #> 1 1 x #> 2 2 y

RTreat custom strings as NA

# Treat custom strings as NA read_csv("a,b\n1,good\n-999,missing\n3,N/A", na = c("-999", "N/A")) #> # A tibble: 3 x 2 #> a b #> <dbl> <chr> #> 1 1 good #> 2 NA missing #> 3 3 <NA>

The core arguments: delim, locale, skip, col_types, na. Memorize these and you'll handle 99% of real CSVs without resorting to text editors.

Tip

col_types uses a compact string syntax: "icldDTc" means integer, character, logical, double, Date, POSIXct, character. Or pass cols(x = col_double(), y = col_character()) for named columns.

Try it: Parse a CSV where missing values are encoded as "NA", "NULL", and "-".

RExercise: Multiple NA strings

read_csv("id,score\n1,85\n2,NA\n3,NULL\n4,-", na = c("NA", "NULL", "-"))

Click to reveal solution

RMultiple NA strings solution

library(readr) read_csv("id,score\n1,85\n2,NA\n3,NULL\n4,-", na = c("NA", "NULL", "-")) #> # A tibble: 4 x 2 #> id score #> <dbl> <dbl> #> 1 1 85 #> 2 2 NA #> 3 3 NA #> 4 4 NA

Passing a character vector to na tells read_csv() to treat every one of those tokens as a missing value. Because all three sentinels resolve to NA, the score column ends up as a clean numeric, without the na argument the column would have been read as character.

How do you read Excel files?

Excel files (.xlsx, .xls) need the readxl package, installed with tidyverse, but you call it directly. It handles multi-sheet workbooks and preserves cell types better than any CSV export would.

RRead Excel sheets and ranges

library(readxl) # read the first sheet df <- read_excel("data.xlsx") # read a specific sheet by name products <- read_excel("data.xlsx", sheet = "Products") # list all sheets first excel_sheets("data.xlsx") #> [1] "Products" "Orders" "Customers" # read a specific range q1 <- read_excel("data.xlsx", sheet = "Orders", range = "A1:E100")

No external dependencies (no Java, no libreoffice), readxl is pure C++ under the hood, so it works the same on Mac, Linux, and Windows.

Warning

Excel's date columns come through as numeric days-since-1900 if the cell format is wrong. read_excel() catches this for properly-formatted cells, but if dates arrive as numbers like 44562, convert with as.Date(44562, origin = "1899-12-30").

Try it: List the sheets in a hypothetical workbook with excel_sheets().

RExercise: List Excel sheet names

# excel_sheets("myfile.xlsx") # returns a character vector of sheet names

Click to reveal solution

RList Excel sheet names solution

library(readxl) # Write a two-sheet workbook to a temp file so we can read it back tmp <- tempfile(fileext = ".xlsx") writexl::write_xlsx(list(Products = head(iris, 2), Orders = head(mtcars, 2)), tmp) excel_sheets(tmp) #> [1] "Products" "Orders"

excel_sheets() opens the workbook and returns a character vector of sheet names in their stored order, call this before read_excel() whenever you don't control the file, so you can pass the right sheet = argument instead of guessing.

How do you import JSON into R?

The jsonlite package parses JSON into a data frame when the shape is tabular, or a nested list when it isn't. It's fast and handles both local files and API responses.

RParse tabular JSON with fromJSON

library(jsonlite) json_text <- '[ {"id": 1, "name": "Ann", "active": true}, {"id": 2, "name": "Bo", "active": false}, {"id": 3, "name": "Cal", "active": true} ]' fromJSON(json_text) #> id name active #> 1 1 Ann TRUE #> 2 2 Bo FALSE #> 3 3 Cal TRUE

When the JSON is nested, fromJSON() returns a list of lists/data frames that you navigate with $. Pass flatten = TRUE to unnest embedded objects into columns automatically.

RFlatten nested JSON objects

nested <- '[ {"id": 1, "profile": {"age": 30, "city": "NYC"}}, {"id": 2, "profile": {"age": 25, "city": "LA"}} ]' fromJSON(nested, flatten = TRUE) #> id profile.age profile.city #> 1 1 30 NYC #> 2 2 25 LA

Try it: Parse a tiny JSON array of three objects and check the class of the result.

RExercise: JSON class check

library(jsonlite) ex_json <- '[{"a":1},{"a":2},{"a":3}]' class(fromJSON(ex_json))

Click to reveal solution

RJSON class check solution

library(jsonlite) ex_json <- '[{"a":1},{"a":2},{"a":3}]' result <- fromJSON(ex_json) class(result) #> [1] "data.frame" result #> a #> 1 1 #> 2 2 #> 3 3

fromJSON() auto-simplifies a JSON array of flat objects into a data.frame because every element shares the same keys. If any object had a different shape or a nested value, the result would fall back to a list, which is why you should always class() the result before piping it downstream.

How do you read data from other statistical software (SPSS, Stata, SAS)?

Migrants from other stats software can keep their existing files. The haven package reads (and writes) SPSS .sav, Stata .dta, and SAS .sas7bdat files directly.

RRead SPSS, Stata, and SAS

library(haven) # SPSS spss_df <- read_sav("survey.sav") # Stata stata_df <- read_dta("panel.dta") # SAS sas_df <- read_sas("registry.sas7bdat")

haven preserves value labels, variable labels, and missing-value markers, which is critical when you're collaborating with SPSS or Stata users. Use labelled::to_factor() to convert labelled columns to regular R factors when you need them.

Note

For old Excel .xls, fixed-width, and Matlab .mat files, use readxl::read_xls(), readr::read_fwf(), and R.matlab::readMat() respectively. Each format has a dedicated package; the tidyverse ecosystem keeps them consistent.

Try it: Check what class read_sav() returns (conceptually, it's a tibble with haven_labelled columns).

RExercise: SPSS read class

# class(read_sav("myfile.sav")) # typically: c("tbl_df", "tbl", "data.frame")

Click to reveal solution

RSPSS read class solution

library(haven) # Write an SPSS file from a built-in dataset, then read it back tmp <- tempfile(fileext = ".sav") write_sav(head(iris), tmp) class(read_sav(tmp)) #> [1] "tbl_df" "tbl" "data.frame"

read_sav() returns a tibble (tbl_df) so dplyr verbs work directly on it. Individual columns that carried SPSS value labels gain an extra haven_labelled class, check them with class(df$col) and convert with haven::as_factor() when you need plain R factors.

How do you handle big files with data.table::fread()?

When your file is hundreds of megabytes or you're iterating many times, fread() from the data.table package is the fastest tool in R. It auto-detects delimiters, types, and headers with a single call.

Rfread for big CSVs

library(data.table) big <- fread("large.csv") class(big) #> [1] "data.table" "data.frame"

fread() returns a data.table (a high-performance subclass of data frame). If you want a plain data frame or tibble, wrap the call: as.data.frame(fread(...)) or tibble::as_tibble(fread(...)).

Rfread with select and nrows

# Read only specific columns, memory saver on wide files fread("large.csv", select = c("date", "value")) # Read first 1000 rows for a peek fread("large.csv", nrows = 1000) # Handle a custom delimiter fread("large.txt", sep = "|")

For files over 1 GB, fread() will often be 5-10× faster than read_csv() and use less memory.

Try it: Use fread() on inline text with text = ....

RExercise: fread on inline text

library(data.table) fread(text = "x,y\n1,10\n2,20\n3,30")

Click to reveal solution

Rfread inline text solution

library(data.table) fread(text = "x,y\n1,10\n2,20\n3,30") #> x y #> 1: 1 10 #> 2: 2 20 #> 3: 3 30

The text = argument lets fread() parse a literal string just like it would a file path, which is the fastest way to prototype a call without touching disk. The result is a data.table, the leading 1:, 2:, 3: on each row are the data.table row index, not a real column.

How do you save and load R-native formats (RDS, RData)?

When the source of data is another R session, use R-native formats. saveRDS()/readRDS() save one object and let the caller name it on load. save()/load() save multiple named objects and restore them under their original names.

RsaveRDS and save workspace

# Save a single object saveRDS(mtcars, "mtcars.rds") cars2 <- readRDS("mtcars.rds") identical(mtcars, cars2) #> [1] TRUE # Save multiple objects x <- 1:10 y <- letters[1:5] save(x, y, file = "workspace.RData") rm(x, y) load("workspace.RData") x #> [1] 1 2 3 4 5 6 7 8 9 10

readRDS() is the safer pattern, you control what variable the object gets bound to, so there's no risk of silently overwriting something in your workspace. Prefer it for any single-object serialization.

Tip

For very large data frames, arrow::write_parquet() and arrow::read_parquet() are worth learning. Parquet is columnar, compressed, and readable from Python, Spark, and most data tools, making it the best "exchange format" for modern data work.

Try it: Save a small vector to an RDS file and read it back.

RExercise: Round-trip saveRDS

saveRDS(c(1, 2, 3), tempfile(fileext = ".rds")) # readRDS(path) returns c(1, 2, 3)

Click to reveal solution

RRound-trip saveRDS solution

path <- tempfile(fileext = ".rds") saveRDS(c(1, 2, 3), path) readRDS(path) #> [1] 1 2 3

Capturing the tempfile() path in a variable is the key move, otherwise you write the RDS to one random path and try to read from a different one. saveRDS() serializes exactly one object and readRDS() returns it so the caller can bind it to any variable name they like.

Practice Exercises

Exercise 1: Multi-sheet Excel

Given a workbook with sheets "Q1", "Q2", "Q3", "Q4", read all four sheets and combine into one data frame with a quarter column.

Show solution

RCombine quarterly sheets solution

library(readxl) library(dplyr) sheets <- c("Q1", "Q2", "Q3", "Q4") all_data <- lapply(sheets, function(s) { df <- read_excel("data.xlsx", sheet = s) df$quarter <- s df }) combined <- bind_rows(all_data)

Exercise 2: Dirty CSV rescue

A CSV has 3 junk rows, semicolon delimiters, comma decimals, and "NULL" as missing. Read it correctly.

Show solution

RMessy European CSV solution

library(readr) read_delim("messy.csv", delim = ";", skip = 3, locale = locale(decimal_mark = ","), na = "NULL")

Exercise 3: JSON to data frame

Parse this JSON into a data frame with columns id, name, tags (a list-column).

RStarter JSON with nested tags

json <- '[ {"id": 1, "name": "Ann", "tags": ["r", "stats"]}, {"id": 2, "name": "Bo", "tags": ["sql"]} ]'

Show solution

RParse JSON with list-column solution

library(jsonlite) df <- fromJSON(json, simplifyDataFrame = TRUE) df #> id name tags #> 1 1 Ann r, stats #> 2 2 Bo sql str(df$tags) #> List of 2 #> $ : chr [1:2] "r" "stats" #> $ : chr "sql"

Putting It All Together

A realistic import pipeline: detect the file extension, route to the right reader, and return a tibble.

REnd-to-end importany dispatcher

library(readr) library(readxl) library(jsonlite) library(tools) import_any <- function(path) { ext <- tolower(file_ext(path)) switch(ext, csv = read_csv(path, show_col_types = FALSE), tsv = read_tsv(path, show_col_types = FALSE), xlsx = read_excel(path), xls = read_excel(path), json = as_tibble(fromJSON(path)), rds = as_tibble(readRDS(path)), stop("Unsupported extension: ", ext) ) } # Simulated usage tmp <- tempfile(fileext = ".csv") write_csv(head(mtcars), tmp) import_any(tmp) #> # A tibble: 6 x 11 #> mpg cyl disp hp drat wt qsec vs am gear carb #> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> #> 1 21 6 160 110 3.9 2.62 16.5 0 1 4 4 #> 2 21 6 160 110 3.9 2.88 17.0 0 1 4 4 #> 3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1 #> 4 21.4 6 258 110 3.08 3.22 19.4 0 0 3 1 #> 5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2 #> 6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1

One function, six formats, zero manual branching at the call site. This is the kind of small utility that pays for itself the first week.

Summary

Format	Function	Package
CSV	`read_csv()`	readr
TSV	`read_tsv()`	readr
Custom delimiter	`read_delim()`	readr
Excel (.xlsx, .xls)	`read_excel()`	readxl
JSON	`fromJSON()`	jsonlite
SPSS	`read_sav()`	haven
Stata	`read_dta()`	haven
SAS	`read_sas()`	haven
Fixed-width	`read_fwf()`	readr
Big CSV (fast)	`fread()`	data.table
R single object	`readRDS()`	base
R multi-object	`load()`	base
Parquet	`read_parquet()`	arrow

References

readr package documentation, modern CSV/TSV/fwf reader
readxl package documentation, Excel without Java
jsonlite documentation, JSON parsing
haven package documentation, SPSS/Stata/SAS
data.table::fread, fastest CSV reader

Continue Learning

R Data Frames: Every Operation You'll Need, what to do with the data after importing.
dplyr filter() and select(), subset your imported data.
R Data Types: Which Type Is Your Variable?, understand the column types readers produce.

{% endraw %}

Navigate

Tidyverse packages

Deep dives

Wrangling & EDA

Statistics

Machine Learning

Time Series

By Industry

Reporting & Apps

Levels

Import Any Data Format Into R: CSV, Excel, JSON, and 12 Others

How do you import a CSV file into R?

What arguments fix broken CSV imports?

How do you read Excel files?

How do you import JSON into R?

How do you read data from other statistical software (SPSS, Stata, SAS)?

How do you handle big files with data.table::fread()?

How do you save and load R-native formats (RDS, RData)?

Practice Exercises

Exercise 1: Multi-sheet Excel

Exercise 2: Dirty CSV rescue

Exercise 3: JSON to data frame

Putting It All Together

Summary

References

Continue Learning

Further Reading