Introduction
When learning R, most people focus on functions, models, and visualizations. However, many problems in the real world start much earlier data import phase – and end much later – with export results.
If data is misread, no statistical method can save the analysis.
In this post we will focus on the logic of data import and export in Rusage CSV and Excel files. Instead of memorizing functions, we build a mental model for how R handles files.
Why importing and exporting data is important
Data analysis is a workflow:
Data source → Import → Analysis → Results → Export → Sharing
Errors are common with the import phase:
- wrong separators,
- incorrect decimal separators,
incorrect file paths,
silently converted data types.
The result?
A model that runs perfectly — on the wrong data.
CSV vs Excel: no competition
Before we touch R, we need to clarify the difference between file formats.
CSV files
- Plain text files
Lightweight and fast
Universally supported
One table per file
No formatting, just data
Example:
total_bill,tip,sex 16.99,1.01,Female
Excel files
- Binary format (
.xlsx) Can contain multiple sheets
Store structure and presentation together
Widely used for reporting and sharing
Key idea:
CSV is a data transport format.
Excel is one communication format.
Workbook: what R actually looks like
One of the most common beginner mistakes has nothing to do with R syntax.
R yes not search your entire computer for files. It only looks inward workbook.
This command shows what R is currently looking at.
If a file exists on your computer but not in this folder, R behaves as if the file does not exist.
This is why errors like:
cannot open the connection
usually indicates a path problemno coding problem.
The example dataset: tips
In this post we use a single dataset: tips.
- Data on tipping in restaurants
Small and easy to understand
Contains numerical and categorical variables
Ideal for demonstrating import/export logic
Data source:
https://raw.githubusercontent.com/mwaskom/seaborn-data/master/tips.csv
Reading CSV files: the core logic
When R reads a CSV file, it needs answers to four questions:
- How are columns separated?
Is the first row a header?
What is the decimal separator?
How should text be interpreted?
These answers are provided via function arguments.
read.table(): The Foundation
All CSV reading functions in Base R have been built upon read.table().
tips <- read.table( file = "tips.csv", header = TRUE, sep = ",", dec = ".", stringsAsFactors = FALSE )
Understanding this feature means you understand CSV import in R.
read.csv() and its assumptions
read.csv() is just a shortcut for a common case:
tips <- read.csv("tips.csv")This works perfectly – if the assumptions match the file.
The dangerous part? R should not generate an error even if the assumptions are wrong.
The most dangerous mistakes are silent mistakes.
read.csv2() and regional differences
In many European datasets:
total_bill;tip;sex 16,99;1,01;Female
For this structure read.csv2() has been designed.
tips2 <- read.csv2("tips_semicolon.csv")Important nuance:
Even if decimals use points, read.csv2() can still work in some cases – but this is not guaranteed.
Correct approach:
Always inspect the file structure before choosing the function.
Writing CSV files from R
Data analysis rarely ends in R. Results are shared as files.
Write comma-separated CSV
write.csv(tips, "tips_comma.csv", row.names = FALSE)
Writing semicolon-separated CSV
write.csv2(tips, "tips_semicolon.csv", row.names = FALSE)
Choosing the right format depends who will read the file next.
Why we still need Excel
CSV is technically superior in many ways. Yet Excel remains dominant in practice.
Why?
Excel is not an analysis tool, but it is one is a powerful delivery tool.
Working with Excel in R: openxlsx
The openxlsx package enables Excel operations without the need for Excel itself.
Write a simple Excel file
write.xlsx(tips, "tips.xlsx", sheetName = "tips")
Read from Excel
tips_excel <- read.xlsx("tips.xlsx", sheet = 1)Multiple sheets: a mini report
Excel shines at organizing related tables.
summary_tips <- aggregate(tip ~ day, data = tips, mean) wb <- createWorkbook() addWorksheet(wb, "Raw Data") writeData(wb, "Raw Data", tips) addWorksheet(wb, "Summary") writeData(wb, "Summary", summary_tips) saveWorkbook(wb, "tips_report.xlsx", overwrite = TRUE)
One file.
Multiple views.
Clean structure.
Common mistakes to look out for
Most errors are not caused by R, but by assumptions:
- Incorrect workbook
Wrong separator (
sep)Wrong decimal separator (
dec)Reading the wrong Excel sheet
Unintentionally overwriting files
A healthy habit after every import:
head(data) str(data) summary(data)
Final thoughts
If you can:
you have already crossed one of the most important thresholds in data analysis.
For additional discussion, you may also find this article useful:
https://medium.com/p/e730f4a84b3b
Extended version on Medium:
https://medium.com/@Fatih.Tuzen/understanding-data-import-and-export-in-r-working-with-csv-and-excel-files-6322e61049b2
Related
#Understanding #data #import #export #working #CSV #Excel #files #bloggers


