Simplify research data sharing with R | R bloggers

[This article was first published on openwashdata, and kindly contributed to R-bloggers]. (You can report a problem with the content on this page here)

Want to share your content on R bloggers? click here if you have a blog, or here if you don’t.

Working with Water, Sanitation and Hygiene (WASH) researchers in multiple resource-constrained countries, we have found that valuable datasets often remain underutilized. This is often due to limited familiarity with FAIR (Finsoluble, Aaccessible, Iinoperable, Rusable) data practices (Wilkinson et al. 2016). As part of the academic community, we recognize that research extends beyond traditional metrics such as citations and publications. The demanding work of generating, collecting, and cleaning data often goes unrecognized, leaving many contributors unrecognized. As part of GHE’s Open Science project open wash dates we conducted surveys among participants from our network of employees who were interested in participating in a Data Science for Open WASH Data course. The data collected reveals suboptimal data storage practices among WASH researchers, with many still relying on methods that hinder portability and interoperability (see @plot storage).

The survey data also shows that programming skill levels vary (see Figure 2). @plot experience), with many researchers having limited experience with R specifically. This highlights the need for easy-to-use tools that do not require extensive programming knowledge. A primary barrier is the lack of accessible tools that simplify data publication and distribution using open source software. This challenge motivated the creation of washran R package that streamlines the process of transforming raw data into publication-ready data packages using devtools utilities.

In addition washrwe developed a comprehensive guide to publishing data as an online book with R and Quarto. This resource provides step-by-step instructions for creating data packages, including automatically generating websites where datasets are available for download in CSV and XLSX formats. The guide also covers version control with Git and GitHub, and DOI generation via Zenodo integration.

Based on user feedback and recognition of the broader academic community’s need for accessible open data tools, we developed fairenough: An enhanced R package designed for more efficient data publishing workflows with minimal user input requirements. It provides a complete pipeline for creating R data packages with the following features:

Single command pipeline: Complete creation of R data packages in a single command, with an automated and interactive workflow from cleansed data to completed package and website.
Detailed control options: Individual wrapper functions with the alternative of overwriting documentation and optional detailed messages of the process in the console.

Compared to washr this new iteration minimizes required user input by reusing all information provided and suggesting content wherever possible. For example, fairenough uses LLMs ellmer to automatically generate data dictionaries. We also plan to provide a detailed guide to working with fairenough.

By automating metadata generation, ensuring proper documentation, enabling version control, and facilitating DOI assignment through Zenodo, fairenough directly addresses each part of the FAIR principles: data creation Findable through extensive metadata, Accessible via the R data package and download options on the website, Interoperable by providing data and metadata in machine-readable formats, and Reusable with clear licensing and attribution.

We were very happy to have the opportunity to present fairenough shown to the public last December in the LatinR conference and received positive encouraging comments about this project. Our proposal was accepted as a lightning conversation where we could demonstrate how to create an R data package and a website in just a few minutes! We were lucky enough to share the (virtual) stage with other R enthusiasts who also presented interesting new tools. Discovering existing efforts for open science and reproducibility from different perspectives also enriches the development process of fairenough. It was especially motivating to participate in a space where we can reach Spanish- and Portuguese-speaking communities and get feedback. We believe that the lack of knowledge of open data and open science practices is a significant barrier to their adoption and that is why reaching a larger and more diverse audience has also become part of our mission.

Sources:

#Simplify #research #data #sharing #bloggers

Simplify research data sharing with R | R bloggers

Sources:

Related

Like this:

Related

Similar Posts

Hello blog! | August 21, 2025

3562: Veeva Systems on AI and the Future of Clinical Trials –

Leave a Reply Cancel reply

Sources:

Related

Share this:

Like this:

Related

Similar Posts

Leave a Reply Cancel reply