Understanding data import and export in R: working with CSV and Excel files | R bloggers

Understanding data import and export in R: working with CSV and Excel files | R bloggers


Introduction

When learning R, most people focus on functions, models, and visualizations. However, many problems in the real world start much earlier data import phase – and end much later – with export results.

If data is misread, no statistical method can save the analysis.

In this post we will focus on the logic of data import and export in Rusage CSV and Excel files. Instead of memorizing functions, we build a mental model for how R handles files.

Why importing and exporting data is important

Data analysis is a workflow:

Data source → Import → Analysis → Results → Export → Sharing

Errors are common with the import phase:

  • wrong separators,

  • incorrect decimal separators,

  • incorrect file paths,

  • silently converted data types.

The result?
A model that runs perfectly — on the wrong data.

CSV vs Excel: no competition

Before we touch R, we need to clarify the difference between file formats.

CSV files

  • Plain text files

  • Lightweight and fast

  • Universally supported

  • One table per file

  • No formatting, just data

Example:

total_bill,tip,sex
16.99,1.01,Female

Excel files

  • Binary format (.xlsx)

  • Can contain multiple sheets

  • Store structure and presentation together

  • Widely used for reporting and sharing

Key idea:
CSV is a data transport format.
Excel is one communication format.

Workbook: what R actually looks like

One of the most common beginner mistakes has nothing to do with R syntax.

R yes not search your entire computer for files. It only looks inward workbook.

This command shows what R is currently looking at.

If a file exists on your computer but not in this folder, R behaves as if the file does not exist.

This is why errors like:

cannot open the connection

usually indicates a path problemno coding problem.

The example dataset: tips

In this post we use a single dataset: tips.

  • Data on tipping in restaurants

  • Small and easy to understand

  • Contains numerical and categorical variables

  • Ideal for demonstrating import/export logic

Data source:
https://raw.githubusercontent.com/mwaskom/seaborn-data/master/tips.csv

Reading CSV files: the core logic

When R reads a CSV file, it needs answers to four questions:

  1. How are columns separated?

  2. Is the first row a header?

  3. What is the decimal separator?

  4. How should text be interpreted?

These answers are provided via function arguments.

read.table(): The Foundation

All CSV reading functions in Base R have been built upon read.table().

tips <- read.table(
  file = "tips.csv",
  header = TRUE,
  sep = ",",
  dec = ".",
  stringsAsFactors = FALSE
)

Understanding this feature means you understand CSV import in R.

read.csv() and its assumptions

read.csv() is just a shortcut for a common case:

tips <- read.csv("tips.csv")

This works perfectly – if the assumptions match the file.

The dangerous part? R should not generate an error even if the assumptions are wrong.

The most dangerous mistakes are silent mistakes.

read.csv2() and regional differences

In many European datasets:

total_bill;tip;sex
16,99;1,01;Female

For this structure read.csv2() has been designed.

tips2 <- read.csv2("tips_semicolon.csv")

Important nuance:
Even if decimals use points, read.csv2() can still work in some cases – but this is not guaranteed.

Correct approach:

Always inspect the file structure before choosing the function.

Writing CSV files from R

Data analysis rarely ends in R. Results are shared as files.

Write comma-separated CSV

write.csv(tips, "tips_comma.csv", row.names = FALSE)

Writing semicolon-separated CSV

write.csv2(tips, "tips_semicolon.csv", row.names = FALSE)

Choosing the right format depends who will read the file next.

Why we still need Excel

CSV is technically superior in many ways. Yet Excel remains dominant in practice.

Why?

Excel is not an analysis tool, but it is one is a powerful delivery tool.

Working with Excel in R: openxlsx

The openxlsx package enables Excel operations without the need for Excel itself.

Write a simple Excel file

write.xlsx(tips, "tips.xlsx", sheetName = "tips")

Read from Excel

tips_excel <- read.xlsx("tips.xlsx", sheet = 1)

Multiple sheets: a mini report

Excel shines at organizing related tables.

summary_tips <- aggregate(tip ~ day, data = tips, mean)

wb <- createWorkbook()

addWorksheet(wb, "Raw Data")
writeData(wb, "Raw Data", tips)

addWorksheet(wb, "Summary")
writeData(wb, "Summary", summary_tips)

saveWorkbook(wb, "tips_report.xlsx", overwrite = TRUE)

One file.

Multiple views.

Clean structure.

Common mistakes to look out for

Most errors are not caused by R, but by assumptions:

  • Incorrect workbook

  • Wrong separator (sep)

  • Wrong decimal separator (dec)

  • Reading the wrong Excel sheet

  • Unintentionally overwriting files

A healthy habit after every import:

head(data)
str(data)
summary(data)

Final thoughts

If you can:

you have already crossed one of the most important thresholds in data analysis.

For additional discussion, you may also find this article useful:
https://medium.com/p/e730f4a84b3b


Extended version on Medium:
https://medium.com/@Fatih.Tuzen/understanding-data-import-and-export-in-r-working-with-csv-and-excel-files-6322e61049b2


#Understanding #data #import #export #working #CSV #Excel #files #bloggers

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *