Do you want to share your content on R-bloggers? Click here if you have a blog, or here If you don’t.
In the episode of this Week of the series “Hidden Monads in R”, I will explore the vector aspect of R data structures and see how the flatmap processing can be quite useful.
Flat folder? Are not all cards flat?
The Nobel Prize Organization offers An API With information about the prices and laureates. We can pick up a JSON file, what I did. I read the file and examine one of the listings below.
# Source: http://api.nobelprize.org/v1/prize.json
prizes <- jsonlite::fromJSON("./prize.json", simplifyDataFrame = FALSE)[["prizes"]]
str(prizes[[11]])
## List of 3
## $ year : chr "2023"
## $ category : chr "physics"
## $ laureates:List of 3
## ..$ :List of 5
## .. ..$ id : chr "1026"
## .. ..$ firstname : chr "Pierre"
## .. ..$ surname : chr "Agostini"
## .. ..$ motivation: chr "\"for experimental methods that generate attosecond pulses of light for the study of electron dynamics in matter\""
## .. ..$ share : chr "3"
## ..$ :List of 5
## .. ..$ id : chr "1027"
## .. ..$ firstname : chr "Ferenc"
## .. ..$ surname : chr "Krausz"
## .. ..$ motivation: chr "\"for experimental methods that generate attosecond pulses of light for the study of electron dynamics in matter\""
## .. ..$ share : chr "3"
## ..$ :List of 5
## .. ..$ id : chr "1028"
## .. ..$ firstname : chr "Anne"
## .. ..$ surname : chr "L’Huillier"
## .. ..$ motivation: chr "\"for experimental methods that generate attosecond pulses of light for the study of electron dynamics in matter\""
## .. ..$ share : chr "3"
Let’s say that I want a character vector that contains the full names of Nobel Prize winners in medicine since 2020. First I can come up with a position that gets such a vector from a single entry (I know, this one is physics).
who_got_it <- function(prize) {
laureates <- vapply(
X = prize[["laureates"]],
FUN = \(l) c(l[["surname"]] %||% "", l[["firstname"]] %||% ""),
FUN.VALUE = c("Doe", "John")
)
trimws(paste(laureates[2,], laureates[1,]))
}
who_got_it(prizes[[11]])
## [1] "Pierre Agostini" "Ferenc Krausz" "Anne L’Huillier"
To achieve my goal, I just have to filter the list, and lapply the function on the matching entries.
(medicine_since_2020 <- Filter(
f = \(p) p[["category"]] == "medicine" & as.numeric(p[["year"]]) >= 2020,
x = prizes
) |>
lapply(who_got_it)
)
## [[1]]
## [1] "Victor Ambros" "Gary Ruvkun"
##
## [[2]]
## [1] "Katalin Karikó" "Drew Weissman"
##
## [[3]]
## [1] "Svante Pääbo"
##
## [[4]]
## [1] "David Julius" "Ardem Patapoutian"
##
## [[5]]
## [1] "Harvey Alter" "Michael Houghton" "Charles Rice"
Neat! But I want them in a single vector. So I have one not Step at the end.
unlist(medicine_since_2020) ## [1] "Victor Ambros" "Gary Ruvkun" "Katalin Karikó" ## [4] "Drew Weissman" "Svante Pääbo" "David Julius" ## [7] "Ardem Patapoutian" "Harvey Alter" "Michael Houghton" ## [10] "Charles Rice"
Yes, it’s that simple. This is one flat vectors process, and it is a composition of one card and one plain Step (LAPPLY AND OFFFILTION IN THIS CASE). It looks almost stupid to write a flatmap function, after all, it is not that difficult to patch up and make it false. But it is often used, so it saves time and reduces errors. In this case – to be correct – I should have used it
unlist(recursive = FALSE)Otherwise, west lists, and that would be wrong.
Laboratory experiments are often executed in plastic plates with 96 pits, with 8 rows (labeled AH) and 12 columns (labeled 1-12). Each microwell is a separate micro experiment (labeled A1-H12). Let’s generate labels well for such a data set!
rows <- LETTERS[1:8] columns <- 1:12 |> sprintf(fmt = "%02i")
So everything we have to do is combine one vector of values with the other, using the handy paste0() function, right? Wrong.
paste0(rows, columns) |> noquote() ## [1] A01 B02 C03 D04 E05 F06 G07 H08 A09 B10 C11 D12
We only have 12 values instead of 96, and the shorter vector (letters) is recycled if necessary. It is often what you want, so it is done in this way for a good reason. But in this case we would rather have a combination of every with-an-al.
Some readers may have already started with daydreams of Genest for Loops (please not). More experienced R -programmers would probably for expand.grid() or
rep(rows, each = length(columns) To match the vectors, and then paste()
them together. But R is a versatile language and there are many paths to the same destination. A functional R Programmer could just take it flatmap From the shelf, and here is how.
For purely didactic reasons, let’s define a non-vectorized pasta function, called Paste01 . It costs a single value and a drawing vector and returns a drawing vector – the combination of the value for each member of the vector.
$$paste01 :: Str \rightarrow [Str] \rightarrow [Str]$$
paste01 <- \(x, y) { stopifnot(length(x) == 1L); paste0(x, y)}
paste01(rows[1], columns)
## [1] "A01" "A02" "A03" "A04" "A05" "A06" "A07" "A08" "A09" "A10" "A11" "A12"
When we map this position on our rows Vector, we almost Get what we need.
$$lapply(paste01) :: [Str] \rightarrow [Str] \rightarrow [[Str]]$$
lapply(rows, paste01, columns) |> head(3L) ## [[1]] ## [1] "A01" "A02" "A03" "A04" "A05" "A06" "A07" "A08" "A09" "A10" "A11" "A12" ## ## [[2]] ## [1] "B01" "B02" "B03" "B04" "B05" "B06" "B07" "B08" "B09" "B10" "B11" "B12" ## ## [[3]] ## [1] "C01" "C02" "C03" "C04" "C05" "C06" "C07" "C08" "C09" "C10" "C11" "C12"
It is a list of vectors, so we have to flatten it. Yupp, it’s a flat folder.
$$unlist(lapply(paste01)) :: [Str] \rightarrow [Str] \rightarrow [Str]$$
unlist(lapply(rows, paste01, columns)) |>
noquote()
## [1] A01 A02 A03 A04 A05 A06 A07 A08 A09 A10 A11 A12 B01 B02 B03 B04 B05 B06 B07
## [20] B08 B09 B10 B11 B12 C01 C02 C03 C04 C05 C06 C07 C08 C09 C10 C11 C12 D01 D02
## [39] D03 D04 D05 D06 D07 D08 D09 D10 D11 D12 E01 E02 E03 E04 E05 E06 E07 E08 E09
## [58] E10 E11 E12 F01 F02 F03 F04 F05 F06 F07 F08 F09 F10 F11 F12 G01 G02 G03 G04
## [77] G05 G06 G07 G08 G09 G10 G11 G12 H01 H02 H03 H04 H05 H06 H07 H08 H09 H10 H11
## [96] H12
Tada!
A flatmap function for vectors can therefore be defined. It takes:
- A vector of values
- A function that runs An of those in a (possibly different type) vector
The output type corresponds to the 2nd type of vector.
$$ flatmap :: [a] \ Rightarrow (A \ Rightarrow [b]) \ Rightarrow [b] $$
flatmap <- function(X, FUN, ..., USE.NAMES = TRUE) {
unlist(lapply(X, FUN, ...), recursive = FALSE, USE.NAMES = USE.NAMES)
}
Debrief
Such a function can also be defined as an infix operator and can take the form of %>>=%For example. If that looks known, it is no coincidence: flatmap is The binding for the vector monad.
EarlierI assumed that another Infix operator is not the most necessary, and instead I made a function -wrapper.
The same can be done here! R already has a very similar package,
base::Vectorize()which only needs a little tweak, unlist()-ing the results. It is so trivial that I will not even write it here.
What excites me much more is the possibility to combine the two ideas: dealing with NA-S and flat mapping in a single bind Wrapper function, which could really focus on logic, and have the “expert” wrapper treated with the rest. As usual, a little more exploration is needed.
Related
#Vector #Victor #RBloggers


