Wasserstein Distance
Wasserstein Distance: For probability distributions μ and ν on a metric space Ω with distance metric D, the p-Wasserstein distance is defined as:
where $P(\Omega)$ represents the space of all distributions on $\Omega$. The collection $\Pi(\mu,\nu)$ contains all joint distributions $\pi$ on $\Omega \times \Omega$ with marginals $\mu$ and $\nu$. Intuitively, this distance measures the minimum ``work'' needed to transform $\mu$ into $\nu$, with the effort of moving each mass unit captured by the $p$-th power of distance $D$.
Wasserstein Barycenter: Given N distributions $N$ distributions ${\nu_1,\dots,\nu_N} \in \mathbb{P} \subset P(\Omega)$, a Wasserstein barycenter solves:
where $\mathbb{P}$ represents a subset of distributions in $P(\Omega)$, and $W_p^p(\mu, \nu_i)$ denotes the $p$-Wasserstein distance (raised to power $p$) between $\mu$ and each distribution $\nu_i$. This formulation allows us to find a central, representative distribution that best summarizes a collection of distributions according to their geometric properties.

We demonstrate the ability of the Wasserstein barycenter to condense the core characteristics of distributions, and compare its effectiveness with KL divergence and Maximum Mean Discrepancy (MMD).