Open once, close automatically: A resource connection pattern for R | R bloggers

Open once, close automatically: A resource connection pattern for R | R bloggers

Hanging database connections rarely cause direct data leaks, but are a common cause of outages, reduced performance and availability incidents. In regulated or public sector systems, disruptions in availability can have legal and contractual consequences. Properly managing connections is therefore not only a technical problem, but also a reliability problem.

https://devclass.com/2020/03/27/github-reveals-database-infrastructure-was-the-villain-behind-february-spate-of-outages-again/

The third incident on February 25 again involved ProxySQL, when “active database connections exceeded a critical threshold that changed the behavior of this new infrastructure. Because connections remained above the critical threshold after recovery, the system reverted to a degraded state.”

These types of errors may seem operational, but in production environments they can lead to SLA violations, customer impact and potential regulatory risks – especially when availability and resilience are part of compliance requirements (e.g. Article 32 of the GDPR for systems that process personal data). Properly managing connections – opening them only when necessary and ensuring deterministic cleanup – is therefore not just ‘good hygiene’ in R programming, but a practical aspect of building robust, reliable systems.

These risks are also relevant in the R ecosystem, where database access is common in long-running processes such as Shiny applications, Plumber APIs, and scheduled data pipelines. Although R is often associated with academic analysis, it is also widely used in government and industrial production systems. For example, in Shiny apps, any user session can open database connections, and poorly managed connections can quickly build up and deplete backend resources. Because these applications are typically long-lived, small connection leaks can silently degrade performance or cause outages. Good connection management is therefore not only a good practice in R; it is essential for building reliable, scalable and production-ready applications.

This post introduces a pattern using around on.exit() that makes connections safe, automatic and worry-free.

From the R documentation of function on.exit:

on.exit records the expression given as an argument to be executed when the current function exits (naturally or as a result of an error). This is useful for resetting graphics parameters or performing other cleanup actions.

This is a versatile function of R functions and we can use it to build a connection function that will close the connection again. Essentially, we want to build a function that can do something like this:

get_data <- function() {
  connect()
  DBI::dbGetQuery(...)
}

To make this work, connect() must register the R expression that will close the open connection when it calls the function endings, in R language this is called the parent frame and (unsurprisingly) it can be retrieved via the function parent.frame(). Equipped with these two features we can now create our connect() function:

connect <- function() {
  caller <- parent.frame()
  # If connection already exists in this frame, reuse it
  if (exists(".db", envir = caller, inherits = FALSE)) {
    return(get(".db", envir = caller))
  }
  # Create new connection
  db <- dbConnect(odbc::odbc(), "mot")
  # Store it in the caller's frame
  assign(".db", db, envir = caller)
  # Ensure cleanup when the calling function exits
  do.call(
    what = on.exit,
    args = list(
      quote({
        if (exists(".db", inherits = FALSE)) {
          dbDisconnect(.db)
          rm(.db)
        }
      }),
      add = TRUE
    ),
    envir = caller
  )
  db
}

Also note in the function definition above that in order to be able to reuse an open connection, the connection Object (db) must be stored somewhere. Returning it as a result of the function does not guarantee that it will be assigned to a symbol. That’s problematic because anything not tied to a symbol in an R session is subject to R’s garbage collection. Therefore, it is also assigned to a variable called .db in the call frame.

One problem with this implementation is that it can be converted into a name collision: We don’t know if there would already exist a symbol in the calling frame that – perhaps coincidentally – has the same name .dbbut perhaps represents something completely different.

To fix this, we can explicitly create a closure that is, a function that carries its own environment. For more details see Hadley Wickham’s explanation here.

I’ll give a toy example here, where our connection object is represented by a random string of three letters. Setting this variable to NULL is intended to indicate that the connection to the source is being closed. You can replace these lines with your code to actually connect and disconnect the source.

connect <- local({
  # Private environment, created once
  state <- new.env(parent = emptyenv())
  function() {
    caller <- parent.frame()
    if (identical(caller, .GlobalEnv)) {
      warning("connect() was called from Global Environment, cannot perform automatic disconnect")
    }

    # If connection already exists, reuse it
    if (exists("db", envir = state, inherits = FALSE)) {
      cat("connection already exists", state$db, '\n')
    } else { 
      # Create connection
      state$db <- paste0("db_", paste0(sample(letters, 3), collapse = ''))
      # Ensure cleanup when the *calling* function exits
      cleanup <- function() {
        if (exists("db", envir = state)) {
          # dbDisconnect(state$db)
          cat("disconnecting", state$db, '\n')
          state$db <- NULL
          rm("db", envir = state)
        }
      }
      # do.call(on.exit, list(quote(cleanup()), add = TRUE), envir = caller)
      do.call(on.exit, list(
      substitute(FUN(), list(FUN = cleanup)), add = TRUE), envir = caller)
      cat("new connection", state$db, '\n')
    }
  }
})

foo <- function() {
  cat("inside foo\n")
  connect()
  bar()
}

bar <- function() {
  cat("inside bar\n")
  goo()
}

goo <- function() {
  cat("inside goo\n")
  connect()
}

foo()
## inside foo
## new connection db_kpt 
## inside bar
## inside goo
## connection already exists db_kpt 
## disconnecting db_kpt
goo()
## inside goo
## new connection db_npc 
## disconnecting db_npc

This pattern uses a closure to hold the connection state and on.exit() to register clearing logic in the caller’s frame. Apply substitute() to the cleanup function is important, because our function definition only lives within the created closure, not in the frame of the parent. What substitute() does is replace the symbol (FUN) by our cleanup functions, unevaluated parsing tree. This in itself is a closure (a function plus an associated environment) that retains access to its private state even when it is evaluated later. The result is a secure, idempotent connection manager that works transparently through nested function calls without introducing leaks or requiring explicit disassembly.

No global status, no name clashes and no burden for the caller.

Remark that, when connect() is carried out in the Global Environmentthe expression specified for on.exit() is evaluated and it is the caller’s responsibility to manually disconnect.

connect()
new connection db_ykc
Warning:
In connect() : connect() was called from Global Environment, cannot perform automatic disconnect


#Open #close #automatically #resource #connection #pattern #bloggers

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *