Open once, close automatically: A resource connection pattern for R

Hanging database connections rarely cause direct data leaks, but are a common cause of outages, reduced performance and availability incidents. In regulated or public sector systems, disruptions in availability can have legal and contractual consequences. Properly managing connections is therefore not only a technical problem, but also a reliability problem.

https://devclass.com/2020/03/27/github-reveals-database-infrastructure-was-the-villain-behind-february-spate-of-outages-again/

The third incident on February 25 again involved ProxySQL, when “active database connections exceeded a critical threshold that changed the behavior of this new infrastructure. Because connections remained above the critical threshold after recovery, the system reverted to a degraded state.”

These types of errors may seem operational, but in production environments they can lead to SLA violations, customer impact and potential regulatory risks – especially when availability and resilience are part of compliance requirements (e.g. Article 32 of the GDPR for systems that process personal data). Properly managing connections – opening them only when necessary and ensuring deterministic cleanup – is therefore not just ‘good hygiene’ in R programming, but a practical aspect of building robust, reliable systems.

These risks are also relevant in the R ecosystem, where database access is common in long-running processes such as Shiny applications, Plumber APIs, and scheduled data pipelines. Although R is often associated with academic analysis, it is also widely used in government and industrial production systems. For example, in Shiny apps, any user session can open database connections, and poorly managed connections can quickly build up and deplete backend resources. Because these applications are typically long-lived, small connection leaks can silently degrade performance or cause outages. Good connection management is therefore not only a good practice in R; it is essential for building reliable, scalable and production-ready applications.

This post introduces a pattern using around on.exit() that makes connections safe, automatic and worry-free.

When developing an R package, one generally strives to divide code into functions to make maintenance easier. For example, when developing a Shiny app, it is practical to (a) package it as an R package and (b) organize the code into different files within the folder. R/ folder and further in various functions.

When working with data from databases, the first thing you need to do to access this data is to open a connection to the database in question.

# example for opening a database connection
db <- DBI::dbConnect(drv = odbc::odbc(), "database")

For example, let’s say we want to write a function to encapsulate data access like this:

get_data <- function() {
  db <- DBI::dbConnect(drv = odbc::odbc(), "database")  # open ressource connection
  df <- dbGetQuery(conn = db, statement = "SELECT ...") # get data
  DBI::dbDisconnect(db)                                 # close ressource connection
  return(df)                                            # return result
}

The disadvantages of this approach are:

each time we receive data a connection is opened and closed, while we may want to keep a connection open for further data access
data cannot be returned directly as a result of DBI::dbGetQuery() (in R, the value of the last statement in a function is what the function returns
when writing different data access functions that follow this pattern, we have to repeat the same lines of code. For many such functions this can be prone to errors and potentially opening up the many parallel connections

A simple way to handle this would be to remove the connection part from the function and simply assume that the connection object is present (perhaps by placing it somewhere accessible, for example in a special package environment). This means we can no longer directly test with our function outside the app, so we once again have to explicitly open and close connections multiple times.

From the R documentation of function on.exit:

on.exit records the expression given as an argument to be executed when the current function exits (naturally or as a result of an error). This is useful for resetting graphics parameters or performing other cleanup actions.

This is a versatile function of R functions and we can use it to build a connection function that will close the connection again. Essentially, we want to build a function that can do something like this:

get_data <- function() {
  connect()
  DBI::dbGetQuery(...)
}

To make this work, connect() must register the R expression that will close the open connection when it calls the function endings, in R language this is called the parent frame and (unsurprisingly) it can be retrieved via the function parent.frame(). Equipped with these two features we can now create our connect() function:

connect <- function() {
  caller <- parent.frame()
  # If connection already exists in this frame, reuse it
  if (exists(".db", envir = caller, inherits = FALSE)) {
    return(get(".db", envir = caller))
  }
  # Create new connection
  db <- dbConnect(odbc::odbc(), "mot")
  # Store it in the caller's frame
  assign(".db", db, envir = caller)
  # Ensure cleanup when the calling function exits
  do.call(
    what = on.exit,
    args = list(
      quote({
        if (exists(".db", inherits = FALSE)) {
          dbDisconnect(.db)
          rm(.db)
        }
      }),
      add = TRUE
    ),
    envir = caller
  )
  db
}

Also note in the function definition above that in order to be able to reuse an open connection, the connection Object (db) must be stored somewhere. Returning it as a result of the function does not guarantee that it will be assigned to a symbol. That’s problematic because anything not tied to a symbol in an R session is subject to R’s garbage collection. Therefore, it is also assigned to a variable called .db in the call frame.

One problem with this implementation is that it can be converted into a name collision: We don’t know if there would already exist a symbol in the calling frame that – perhaps coincidentally – has the same name .dbbut perhaps represents something completely different.

To fix this, we can explicitly create a closure that is, a function that carries its own environment. For more details see Hadley Wickham’s explanation here.

I’ll give a toy example here, where our connection object is represented by a random string of three letters. Setting this variable to NULL is intended to indicate that the connection to the source is being closed. You can replace these lines with your code to actually connect and disconnect the source.

connect <- local({
  # Private environment, created once
  state <- new.env(parent = emptyenv())
  function() {
    caller <- parent.frame()
    if (identical(caller, .GlobalEnv)) {
      warning("connect() was called from Global Environment, cannot perform automatic disconnect")
    }

    # If connection already exists, reuse it
    if (exists("db", envir = state, inherits = FALSE)) {
      cat("connection already exists", state$db, '\n')
    } else { 
      # Create connection
      state$db <- paste0("db_", paste0(sample(letters, 3), collapse = ''))
      # Ensure cleanup when the *calling* function exits
      cleanup <- function() {
        if (exists("db", envir = state)) {
          # dbDisconnect(state$db)
          cat("disconnecting", state$db, '\n')
          state$db <- NULL
          rm("db", envir = state)
        }
      }
      # do.call(on.exit, list(quote(cleanup()), add = TRUE), envir = caller)
      do.call(on.exit, list(
      substitute(FUN(), list(FUN = cleanup)), add = TRUE), envir = caller)
      cat("new connection", state$db, '\n')
    }
  }
})

foo <- function() {
  cat("inside foo\n")
  connect()
  bar()
}

bar <- function() {
  cat("inside bar\n")
  goo()
}

goo <- function() {
  cat("inside goo\n")
  connect()
}

foo()
## inside foo
## new connection db_kpt 
## inside bar
## inside goo
## connection already exists db_kpt 
## disconnecting db_kpt
goo()
## inside goo
## new connection db_npc 
## disconnecting db_npc

This pattern uses a closure to hold the connection state and on.exit() to register clearing logic in the caller’s frame. Apply substitute() to the cleanup function is important, because our function definition only lives within the created closure, not in the frame of the parent. What substitute() does is replace the symbol (FUN) by our cleanup functions, unevaluated parsing tree. This in itself is a closure (a function plus an associated environment) that retains access to its private state even when it is evaluated later. The result is a secure, idempotent connection manager that works transparently through nested function calls without introducing leaks or requiring explicit disassembly.

No global status, no name clashes and no burden for the caller.

Remark that, when connect() is carried out in the Global Environmentthe expression specified for on.exit() is evaluated and it is the caller’s responsibility to manually disconnect.

connect()
new connection db_ykc
Warning:
In connect() : connect() was called from Global Environment, cannot perform automatic disconnect

#Open #close #automatically #resource #connection #pattern #bloggers

Open once, close automatically: A resource connection pattern for R | R bloggers

Related

Like this:

Related

Similar Posts

The Swedes haven’t found it yet: this gem is just a ferry ride away

Colorful in Bukhara, Uzbekistan – crafts, art and dance

Leave a Reply Cancel reply

Related

Share this:

Like this:

Related

Similar Posts

Leave a Reply Cancel reply