A comment about the Dirichlet distribution R-Bloggers

[This article was first published on R Works, and kindly contributed to R-bloggers]. (You can report problems here about the content on this page)

Do you want to share your content on R-bloggers? Click here if you have a blog, or here If you don’t.

In 1839, the gifted mathematician Peter Gustav Lejeune Dirichlet Was attached to the Philosophy department at the University of Berlin, which worked for less than full wage, although in 1832 he had joined the Prussian Academy of Sciences. At that time it was to become a “complete professor” at the university. Habilitation thesis Lecture in Latin. Apparently the facility of Dirichlet with the Latin was not the task, so like many competent “deputy professors” today, Dirichlet took a side performance to maintain his family. He gave mathematics at a military school. Anyway, I wander. It was around that time that Dirichlet started working on a problem in celestial mechanics in which it was integrally involved:

Here it is attracted to a point Where Is the power of attraction And Is the Euclidic standard.

After a supernatural insightful series -manipulations in detail by Gupta and RichardsDirichlet arrived at the next integral

that you will recognize as the beta function, the normalizing constant for the Dirichlet -Distribution:

unemployed and variety

Where

And

The Dirichlet distribution is a multivariate generalization of the beta distribution that is often used in Bayesian statistics as an earlier distribution for categorical and multinomial distributions. I illustrated this use of the Dirichlet in one Previous post While building a Bayesian model for a Markov chain with three states. The Dirichlet distribution is remarkable because the 18th and 19th -century work brings together in analysis, as illustrated by the Gamma, beta and Digamma functions with early 20th ideas from geometry and topology (the Simplex) and modern Bayesian statistics.

De (2) -Sublex

A simplex is a generalization of the idea from a triangle to several dimensions. Informal in K measurements, a simplex is the simplest polygon that is the convex hull of its K -corner points. The vectors that include the simplex must be non-negative and sum are up to 1. So a simplex is a natural way to give that sum to 1 in multidimensional spaces.

The support for the three-dimensional Dirichlet distribution, the points on which the distribution is defined, is a (2)-suxe the triangular subset of a two-dimensional plane that the Euclidean axes (1.0.0), (0.1.0) and (0.0.1). (Orient the triangle in the interactive plot below so that the reference surface is at the top and the point points down and you will see how the axes are in line.)

R packages used in this post

library(ggplot2)
library(gganimate)
library(dplyr)
library(magick)
library(MCMCpack)  # for rdirichlet
library(gtools)  # for ddirichlet
#library(patchwork)  # for combining plots
library(threejs)
library(extraDistr)

Show the code

set.seed(42)

# Sample from Dirichlet distribution over 3 categories
n_samples <- 2000
alpha <- c(1, 1, 1)  # uniform prior over the simplex
samples <- rdirichlet(n_samples, alpha)

# 3D coordinates: each row is (x, y, z)
x <- samples[,1]
y <- samples[,2]
z <- samples[,3]

# Visualize using threejs scatterplot
scatterplot3js(x = x, y = y, z = z,
               color = "steelblue",
               size = 0.2,
               bg = "black",
               main = "2-simplex",
               axisLabels = c(
                 "(1,0,0)",
                 "(0,1,0)",
                 "(0,0,1)"
               ))

When The Dirichlet -density is symmetrical around the center of the simplex, . In the special case when The density is uniform over the simplex. When everything The density is concentrated on the corners of the Simplex, and when The density is concentrated in the middle of the Simplex with the most mass concentrated on a few values.

How the Symetric Dirichlet distribution changes like changes

The next animation, which projects the above plot on two dimensions, shows how the Dirichlet distribution changes as the common value of called the concentration parameter, moves systematically from (1,1,1), the uniform distribution, to (0.1,1,1,0,1).

Code for Helper functions

# This function generates the data frame for the animation
generate_dirichlet_animation_data <- function(alpha1_values, alpha2_values, alpha3_values, n_samples = 2000) {
  library(MCMCpack)
  
  # Triangle vertices
  v1 <- c(1, 0)
  v2 <- c(0, 1)
  v3 <- c(0, 0)
  
  # Projection function
  project_to_triangle <- function(x1, x2, x3) {
    x <- x1 * v1[1] + x2 * v2[1] + x3 * v3[1]
    y <- x1 * v1[2] + x2 * v2[2] + x3 * v3[2]
    data.frame(x = x, y = y)
  }
  
  # Generate animation data
  animation_data <- do.call(rbind, lapply(seq_along(alpha1_values), function(i) {
    alpha <- c(alpha1_values[i], alpha2_values[i], alpha3_values[i])
    samples <- rdirichlet(n_samples, alpha)
    projected <- project_to_triangle(samples[,1], samples[,2], samples[,3])
    projected$alpha1 <- alpha[1]
    projected$alpha2 <- alpha[2]
    projected$alpha3 <- alpha[3]
    projected$frame <- i
    projected
  }))
  
  return(animation_data)
}

# This function creates the animated plot
plot_dirichlet_evolution <- function(animation_data) {
  library(ggplot2)
  library(gganimate)
  library(grid)  # for arrow units
  
  # Triangle vertices
  v1 <- c(1, 0)
  v2 <- c(0, 1)
  v3 <- c(0, 0)
  
  # Label positions
  label_df <- data.frame(
    x = c(v1[1], v2[1], v3[1]),
    y = c(v1[2], v2[2], v3[2]),
    label = c("(1,0,0)", "(0,1,0)", "(0,0,1)"),
    nudge_x = c(-0.08, 0.1, 0),
    nudge_y = c(0, 0, 0.08)
  )
  
  # Triangle outline
  triangle_df <- data.frame(
    x = c(v1[1], v2[1], v3[1], v1[1]),
    y = c(v1[2], v2[2], v3[2], v1[2])
  )
  
  # Build plot
  p_animation <- ggplot(animation_data, aes(x = x, y = y)) +
    geom_point(alpha = 0.3, size = 0.8, color = "steelblue") +
    geom_density_2d(color = "red", alpha = 0.7) +
    geom_path(data = triangle_df, aes(x = x, y = y), color = "black", size = 1) +
    annotate("segment", x = 1, y = 0, xend = 0, yend = 1,
             arrow = arrow(length = unit(0.2, "cm")),
             color = "darkgreen", size = 1.2) +
    xlim(0, 1) + ylim(0, 1) +
    geom_text(data = label_df,
              aes(x = x, y = y, label = label),
              nudge_x = label_df$nudge_x,
              nudge_y = label_df$nudge_y,
              size = 4, fontface = "bold", color = "black") +
    labs(
      title = "Dirichlet Prior Evolution",
      subtitle = "",
      x = "Projected X",
      y = "Projected Y",
      caption = "Transition from uniform to concentrated distribution"
    ) +
    theme_minimal() +
    theme(
      plot.title = element_text(size = 16, hjust = 0.5),
      plot.subtitle = element_text(size = 14, hjust = 0.5),
      axis.title = element_text(size = 12),
      plot.caption = element_text(size = 10, hjust = 0.5)
    ) +
    transition_states(frame,
                      transition_length = 1,
                      state_length = 2) +
    ease_aes('sine-in-out')
  
  return(p_animation)
}

#| message: false
#| warning: false
#| code-fold: true
#| code-summary: "Animation Code"
#| 
set.seed(42)

# Parameters
alpha1_values <- seq(1, 0.1, by = -0.01)  # starts high, ends low
alpha2_values <- seq(1, 0.1, by = -0.01)   # starts low, ends high
alpha3_values <- seq(1, 0.1, by = -0.01)   # starts low, ends high

animation_data <- generate_dirichlet_animation_data(alpha1 = alpha1_values, 
                                                    alpha2 = alpha2_values, 
                                                    alpha3 = alpha3_values, 
                                                    n_samples = 2000)

plot_dirichlet_evolution(animation_data)

These next two suddenly, the first and final frames of the animation, clearly show how the density moves from uniform spread over the simplex to concentrated on the corners of the simplex. When modeling the development of a Multi-State Markov chain, as I did in the post I referred above, it is customary to select a uniform Dirichlet . However, if you believe that the process is likely to be uniform among the states, then a prior with Can be suitable. If you had a reason to believe that the process would start concentrated on certain situations, you could investigate using an asymmetrical distribution by setting different values for the . The code that controls these animations can be useful.

Show the code

ggplot(subset(animation_data, frame == 1), aes(x = x, y = y)) +
  geom_point(alpha = 0.3, color = "darkblue") +
  ggtitle("Initial Frame: Uniform Prior: alpha = (1,1,1)")

Show the code

  ggplot(subset(animation_data, frame == max(animation_data$frame)), aes(x = x, y = y)) +
  geom_point(alpha = 0.3, color = "red") +
  ggtitle("Final Frame: Concetrated Prior: alpha = (.1,.1,.1)")

This concentration of density as Increases are very clear in this next simulation as Moves from (0.1, 0.1, 0.1) to (10.0, 10.0, 10.0). Here we see the distribution that focuses on the average, = (1/3, 1/3, 1/3).

Show the code

#Define alpha trajectories
alpha1 <- seq(.1, 10, by = 0.1)
alpha2 <- seq(.1, 10, by = 0.1)
alpha3 <- seq(.1, 10, by = 0.1)

animation_data <- generate_dirichlet_animation_data(alpha1, alpha2, alpha3, n_samples = 2000)

anim2 <- plot_dirichlet_evolution(animation_data)
anim2

Note that the animation continues (0.5, 0.5, 0.5), what the Jeffreys earlier For the Dirichlet distribution.

Variation and differential entropy

The Wikipedia article for it Dirichlet -Distribution Prominently shows the differential entropy of the distribution:

Where And have been defined above and Is the Digamma function. (This comparison has activated my mention of the Digamma function above.) But keep in mind that differential entropy defined as: For continuous distributions is not the same as Shannon’s entropy For discreet distributions and has no similar interpretation. Differential entropy can, among other things, be negative, not invariant under a change of variables, and probably does not meet any intuition that you may have developed about maximum entropy. The plot in the bottom left shows the behavior of the differential entropy for the symmetrical direct distribution that we have considered as if Moves from (0.1, 0.1, 0.1) to (5.0, 5.0, 5.0). Note that the entropy continues to increase beyond the point that corresponds to the uniform distribution over the Simplex.

Show the code

# Define the number of dimensions for the symmetric Dirichlet distribution
n <- 3

# Define a function to calculate the entropy of a symmetric Dirichlet distribution
dirichlet_entropy <- function(alpha, n) {
  # This formula is for a symmetric Dirichlet distribution
  # where all parameters a_i are equal to alpha.
  # The formula uses the digamma function (psi) and the log-gamma function (lgamma).
  # H(alpha) = log(Beta(alpha)) + (n * alpha - n) * psi(n * alpha) - n * (alpha - 1) * psi(alpha)
  # log(Beta(alpha)) = n * lgamma(alpha) - lgamma(n * alpha)
  # The final formula is simplified for symmetric case.
  
  # lgamma(x) is the log-gamma function
  # digamma(x) is the psi function, the first derivative of lgamma(x)
  
  entropy_val <- lgamma(n * alpha) - n * lgamma(alpha) +
                 (n * alpha - n) * digamma(n * alpha) -
                 n * (alpha - 1) * digamma(alpha)
                 
  return(entropy_val)
}

# Create a sequence of alpha values from 0.1 to 5
alpha_values <- seq(0.1, 5, by = 0.01)

# Calculate the entropy for each alpha value
entropy_values <- sapply(alpha_values, dirichlet_entropy, n = n)

# Create a data frame for plotting
plot_data <- data.frame(alpha = alpha_values, entropy = entropy_values)

# Plot the entropy values
ggplot(plot_data, aes(x = alpha, y = entropy)) +
  geom_line(size = 1.2, color = "steelblue") +
  labs(
    title = "Entropy of a Symmetric Dirichlet Distribution",
    subtitle = paste("for n =", n, "dimensions"),
    x = "Alpha (Concentration Parameter)",
    y = "Entropy"
  ) +
  geom_vline(xintercept = 1, linetype = "dashed", color = "red", size = 1) +
  annotate("text", x = 1.1, y = max(entropy_values) * 0.9,
           label = "Alpha = 1 (Uniform Distribution)",
           color = "red", hjust = 0, size = 4) +
  theme_minimal(base_size = 14) +
  theme(plot.title = element_text(hjust = 0.5),
        plot.subtitle = element_text(hjust = 0.5))

Show the code

k <- seq(.1, 10, by = 0.1)

mean_dir <- numeric(length(k))
var_dir <- numeric(length(k))
for (i in seq_along(k)) {
  alpha <- k[i]
  sum_alpha <- alpha * 3
  mean_dir[i] <- alpha / sum_alpha
  var_dir[i] <- (alpha * (sum_alpha - alpha)) / (sum_alpha^2 * (sum_alpha + 1))
}
df <- data.frame(k = k, mean = mean_dir, variance = var_dir)
ggplot(df, aes(x = k, y = variance)) +
  geom_line(color = "blue", size = 1) +
  labs(
    title = "Variance of Dirichlet Distribution vs Alpha Parameter",
    x = "Alpha Parameter (k)",
    y = "Variance"
  )

The plot on the right shows the variance of the Dirichlet distribution as a function that increases . We see that the variance is decreasing to zero as Increases. This is reflected in the second animation above that shows the distribution that focuses on the average. Uncertainty goes to zero, but differential entropy shoots to infinity. If you are nevertheless intrigued by differential entropy, you may want to take a look at the references that I have included below.

References

#comment #Dirichlet #distribution #RBloggers

A comment about the Dirichlet distribution R-Bloggers

De (2) -Sublex

How the Symetric Dirichlet distribution changes like changes

Variation and differential entropy

References

Related

Like this:

Related

Similar Posts

Travel News Market 2025 – the travel industry’s annual event

Outboxing of Sellpy Finds!

Leave a Reply Cancel reply

De (2) -Sublex

How the Symetric Dirichlet distribution changes like <img decoding="async" src="https://latex.codecogs.com/png.latex?%5Calpha"/> changes

Variation and differential entropy

References

Related

Share this:

Like this:

Related

Similar Posts

Leave a Reply Cancel reply

How the Symetric Dirichlet distribution changes like changes