Import Any Data Format Into R: CSV, Excel, JSON, and 12 Others
Almost every real R project starts with "read this file and give me a data frame." This guide shows you the one function to reach for by format, CSV, Excel, JSON, TSV, fixed-width, SPSS, Stata, SAS, Parquet, RDS, and more, plus the arguments that rescue broken imports.
How do you import a CSV file into R?
CSV is the workhorse format, flat, text-based, universal. R gives you three main choices: base read.csv(), tidyverse readr::read_csv(), and data.table::fread(). All three return a data frame; they differ in speed, defaults, and output class.
read_csv() auto-detects column types, parses dates, returns a tibble, and is about 10× faster than base read.csv(). It also prints a column-spec summary so you can catch type surprises early.
read_csv() for most work, fread() when speed matters on huge files (100 MB+), and base read.csv() only when you can't install packages. The differences are in defaults, not capability.Try it: Load a CSV with the readr package and check its column types with spec().
Click to reveal solution
read_csv() accepts a literal string as input when it contains a newline, so you can test parsing without writing a file. spec() returns the column specification read_csv() inferred, use it to verify types before pointing at a large file.
What arguments fix broken CSV imports?
Real-world CSVs are rarely clean. These five arguments solve 90% of import headaches.
The core arguments: delim, locale, skip, col_types, na. Memorize these and you'll handle 99% of real CSVs without resorting to text editors.
col_types uses a compact string syntax: "icldDTc" means integer, character, logical, double, Date, POSIXct, character. Or pass cols(x = col_double(), y = col_character()) for named columns.Try it: Parse a CSV where missing values are encoded as "NA", "NULL", and "-".
Click to reveal solution
Passing a character vector to na tells read_csv() to treat every one of those tokens as a missing value. Because all three sentinels resolve to NA, the score column ends up as a clean numeric, without the na argument the column would have been read as character.
How do you read Excel files?
Excel files (.xlsx, .xls) need the readxl package, installed with tidyverse, but you call it directly. It handles multi-sheet workbooks and preserves cell types better than any CSV export would.
No external dependencies (no Java, no libreoffice), readxl is pure C++ under the hood, so it works the same on Mac, Linux, and Windows.
read_excel() catches this for properly-formatted cells, but if dates arrive as numbers like 44562, convert with as.Date(44562, origin = "1899-12-30").Try it: List the sheets in a hypothetical workbook with excel_sheets().
Click to reveal solution
excel_sheets() opens the workbook and returns a character vector of sheet names in their stored order, call this before read_excel() whenever you don't control the file, so you can pass the right sheet = argument instead of guessing.
How do you import JSON into R?
The jsonlite package parses JSON into a data frame when the shape is tabular, or a nested list when it isn't. It's fast and handles both local files and API responses.
When the JSON is nested, fromJSON() returns a list of lists/data frames that you navigate with $. Pass flatten = TRUE to unnest embedded objects into columns automatically.
Try it: Parse a tiny JSON array of three objects and check the class of the result.
Click to reveal solution
fromJSON() auto-simplifies a JSON array of flat objects into a data.frame because every element shares the same keys. If any object had a different shape or a nested value, the result would fall back to a list, which is why you should always class() the result before piping it downstream.
How do you read data from other statistical software (SPSS, Stata, SAS)?
Migrants from other stats software can keep their existing files. The haven package reads (and writes) SPSS .sav, Stata .dta, and SAS .sas7bdat files directly.
haven preserves value labels, variable labels, and missing-value markers, which is critical when you're collaborating with SPSS or Stata users. Use labelled::to_factor() to convert labelled columns to regular R factors when you need them.
.xls, fixed-width, and Matlab .mat files, use readxl::read_xls(), readr::read_fwf(), and R.matlab::readMat() respectively. Each format has a dedicated package; the tidyverse ecosystem keeps them consistent.Try it: Check what class read_sav() returns (conceptually, it's a tibble with haven_labelled columns).
Click to reveal solution
read_sav() returns a tibble (tbl_df) so dplyr verbs work directly on it. Individual columns that carried SPSS value labels gain an extra haven_labelled class, check them with class(df$col) and convert with haven::as_factor() when you need plain R factors.
How do you handle big files with data.table::fread()?
When your file is hundreds of megabytes or you're iterating many times, fread() from the data.table package is the fastest tool in R. It auto-detects delimiters, types, and headers with a single call.
fread() returns a data.table (a high-performance subclass of data frame). If you want a plain data frame or tibble, wrap the call: as.data.frame(fread(...)) or tibble::as_tibble(fread(...)).
For files over 1 GB, fread() will often be 5-10× faster than read_csv() and use less memory.
Try it: Use fread() on inline text with text = ....
Click to reveal solution
The text = argument lets fread() parse a literal string just like it would a file path, which is the fastest way to prototype a call without touching disk. The result is a data.table, the leading 1:, 2:, 3: on each row are the data.table row index, not a real column.
How do you save and load R-native formats (RDS, RData)?
When the source of data is another R session, use R-native formats. saveRDS()/readRDS() save one object and let the caller name it on load. save()/load() save multiple named objects and restore them under their original names.
readRDS() is the safer pattern, you control what variable the object gets bound to, so there's no risk of silently overwriting something in your workspace. Prefer it for any single-object serialization.
arrow::write_parquet() and arrow::read_parquet() are worth learning. Parquet is columnar, compressed, and readable from Python, Spark, and most data tools, making it the best "exchange format" for modern data work.Try it: Save a small vector to an RDS file and read it back.
Click to reveal solution
Capturing the tempfile() path in a variable is the key move, otherwise you write the RDS to one random path and try to read from a different one. saveRDS() serializes exactly one object and readRDS() returns it so the caller can bind it to any variable name they like.
Practice Exercises
Exercise 1: Multi-sheet Excel
Given a workbook with sheets "Q1", "Q2", "Q3", "Q4", read all four sheets and combine into one data frame with a quarter column.
Show solution
Exercise 2: Dirty CSV rescue
A CSV has 3 junk rows, semicolon delimiters, comma decimals, and "NULL" as missing. Read it correctly.
Show solution
Exercise 3: JSON to data frame
Parse this JSON into a data frame with columns id, name, tags (a list-column).
Show solution
Putting It All Together
A realistic import pipeline: detect the file extension, route to the right reader, and return a tibble.
One function, six formats, zero manual branching at the call site. This is the kind of small utility that pays for itself the first week.
Summary
| Format | Function | Package |
|---|---|---|
| CSV | read_csv() |
readr |
| TSV | read_tsv() |
readr |
| Custom delimiter | read_delim() |
readr |
| Excel (.xlsx, .xls) | read_excel() |
readxl |
| JSON | fromJSON() |
jsonlite |
| SPSS | read_sav() |
haven |
| Stata | read_dta() |
haven |
| SAS | read_sas() |
haven |
| Fixed-width | read_fwf() |
readr |
| Big CSV (fast) | fread() |
data.table |
| R single object | readRDS() |
base |
| R multi-object | load() |
base |
| Parquet | read_parquet() |
arrow |
References
- readr package documentation, modern CSV/TSV/fwf reader
- readxl package documentation, Excel without Java
- jsonlite documentation, JSON parsing
- haven package documentation, SPSS/Stata/SAS
- data.table::fread, fastest CSV reader
Continue Learning
- R Data Frames: Every Operation You'll Need, what to do with the data after importing.
- dplyr filter() and select(), subset your imported data.
- R Data Types: Which Type Is Your Variable?, understand the column types readers produce.
Further Reading
- Apache Arrow in R: Read Parquet Files & Run Fast In-Memory Analytics
- R Data Import Exercises: 17 read_csv() and fread() Practice Problems
- readr vs read.csv vs fread in R: Which Data Import Function Is Fastest?
- readr read_csv() in R: Read CSV Files Into Tibbles
- readr read_delim() in R: Read Any Delimited Text File
- readr read_file() in R: Read a Whole File Into a String
- readr read_fwf() in R: Read Fixed-Width Files
- readr read_lines() in R: Read a File Into Lines
- readr read_log() in R: Read Web Server Log Files
- readr read_table() in R: Read Whitespace-Separated Files
- readr read_tsv() in R: Read Tab-Separated Files Fast
- readr col_types in R: Control Column Types on Import
- readr cols() in R: Set Column Types When Reading Data
- readr parse_date() in R: Convert Strings to Date Objects
- readr parse_number() in R: Extract Numbers From Text
- readr read_rds() in R: Load a Saved R Object
- readr write_csv() in R: Export Data Frames to CSV
- readr write_delim() in R: Write Data to Delimited Files
- readr write_lines() in R: Write Text Lines to a File
- readr write_rds() in R: Save Objects to .rds Files
- readr write_tsv() in R: Export Data to TSV Files
- readr locale() in R: Control Date and Number Parsing
- readr parse_factor() in R: Parse Text Into a Factor
- readxl anchored() in R: Read a Cell Block by Corner
- readxl cell_cols() in R: Read Specific Excel Columns
- readxl cell_limits() in R: Read a Precise Cell Range
- readxl cell_rows() in R: Read Specific Rows From Excel
- readxl excel_sheets() in R: List Excel Sheet Names
- readxl read_excel() in R: Import Excel Files to Tibbles
- readxl read_xls() in R: Read Legacy .xls Excel Files
- readxl read_xlsx() in R: Read Modern .xlsx Files
- haven read_dta() in R: Import Stata .dta Files
- haven read_sas() in R: Import SAS Data Files
- haven read_sav() in R: Import SPSS .sav Files
- haven write_dta() in R: Export Data Frames to Stata
- haven write_sas() in R: Export Data Frames to SAS Files
- haven write_sav() in R: Export Data Frames to SPSS Files
- readxl Cells vs Rows in R: Slice Excel Imports
- readxl skip vs range in R: Which One to Use
- arrow read_feather() in R: Read Feather Files Fast
- arrow read_parquet() in R: Read Parquet Files Fast
- arrow write_parquet() in R: Save Data Frames to Parquet
- haven as_factor() in R: Convert Labelled Data to Factors
- haven labelled() in R: Create Labelled Survey Vectors
- haven zap_formats() in R: Clear SPSS and Stata Formats
- haven zap_labels() in R: Remove Value Labels from Data
- jsonlite fromJSON() in R: Parse JSON Into R Objects
- jsonlite stream_in() in R: Read Newline-Delimited JSON
- jsonlite toJSON() in R: Convert R Objects to JSON
- arrow open_dataset() in R: Query Multi-File Datasets
- fst read_fst() in R: Read .fst Data Files Fast
- fst write_fst() in R: Fast Save Data Frames to Disk
- qs qsave() in R: Save R Objects Fast to Disk