Small uselessly usual R-functions-Markov Babbler | R-Bloggers

Small uselessly usual R-functions-Markov Babbler | R-Bloggers

[This article was first published on R – TomazTsql, and kindly contributed to R-bloggers]. (You can report problems here about the content on this page)


Do you want to share your content on R-bloggers? Click here if you have a blog, or here If you don’t.

This is called, yes, you guessed it after Markov chains. The Babbler is there to connect the simplicity of useless R function.

It is a simple calculation of the likelihood of words that chains and signs, chained words, reminds of Markov chain (although this is not!).

The core is tokenization of words, counting the appearances and calculating the opportunities.

markov_babbler <- function(text, order = 2, n = 50, by_word = TRUE) {
  tokens <- if (by_word) str_split(text, "\\s+")[[1]] else unlist(str_split(text, ""))
  tokens <- tokens[tokens != ""]
  
  #add the removal of full stops,....
  token <- c('I', 'I am', 'to', 'all', 'Oh')
  
  df <- data.frame(
    from = sapply(seq_len(length(tokens) - order), function(i) paste(tokens[i:(i + order - 1)], collapse = " ")),
    to = tokens[(order + 1):length(tokens)],
    stringsAsFactors = FALSE
  )
  
  probs <- df %>%
    group_by(from, to) %>%
    summarise(freq = n(), .groups = "drop") %>%
    group_by(from) %>%
    mutate(prob = freq / sum(freq))
  
  current <- sample(unique(probs$from), 1)
  output <- unlist(str_split(current, " "))
  
  for (i in seq_len(n)) {
    next_word <- probs %>% filter(from == current)
    if (nrow(next_word) == 0) break
    next_token <- sample(next_word$to, 1, prob = next_word$prob)
    output <- c(output, next_token)
    current <- paste(tail(output, order), collapse = " ")
  }

With this in mind I took Red Ridding Hood (Brother Grimm) and connected the story to the position. In both English and Slovenian languages.

Playing with useless statistics is fun. Fussy pleasure 🙂

And no function is complete with little GGPLOT for drawing the words network.

  g <- graph_from_data_frame(probs %>% filter(freq > 1), directed = TRUE)
  plot <- ggraph(g, layout = "fr") +
    geom_edge_link(aes(edge_alpha = prob, edge_width = prob), color = "firebrick") +
    geom_node_label(aes(label = name), size = 4, repel = TRUE) +
    theme_void() +
    labs(title = "Markov Chain: Token Transitions")

As always, the full code is available on Github in Useless_r_function repository. The sample file in this repository is here (file name: Markov_Babbler.R). Check the repository for future updates.

Happy R coding and stay healthy!


#Small #uselessly #usual #RfunctionsMarkov #Babbler #RBloggers

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *