Hamiltonian Monte Carlo

Job Feldbrugge

The Hamiltonian or Hybrid Monte Carlo (HMC) method is an elegant way to sample nontrivial distributions in probability theory based on physical intuitions. It was developed by algorithm was originally proposed by Simon Duane, Anthony Kennedy, Brian Pendleton and Duncan Roweth in 1987 for calculations in lattice quantum chromodynamic (see the paper). These notes serve as a brief introduction to the core of the HMC method for sampling distributions, inspired by the review “A Conceptual Introduction to Hamiltonian Monte Carlo” by Michael Betancourt.

The sampling of classical probability distribution – including the uniform, the Poisson and Gaussian distributions – are nowadays implemented in most programming languages. However, one often encounters more intricate distributions that one may want to sample. In these cases, Monte Carlo methods shine, allowing one to transform samples from the classical distributions into samples of the target distribution \(\pi(q)\). The Hamiltonian Monte Carlo method is an efficient implementation of this idea borrowing insights form Hamiltonian dynamics in physics.

The Metropolis-Hastings Algorithm

The Metropolis-Hastings Algorithm consists of a conceptual procedure to sample a target distribution \(\pi\). The algorithm consists of two steps. First, starting from the sample \(q\), we obtain the candidate \(q'\). Next, we evaluate the acceptance probability \(a(q'|q)\) and check whether this number lies above a sample from a uniform distribution on the domain \([0,1]\). If the candidate is accepted, we repeat this process from \(q'\). If the candidate is rejected, we start the process over from \(q\).

Let’s assume we have a sample \(q\) and design a procedure to generate a candidate sample \(q',\) where the transition probability of \(q'\) conditioned on \(q\) is given by \(\mathbb{Q}(q|q')\). The candidate \(q'\) is accepted with probability

\[a(q'|q) = \text{min}\left(1, \frac{\mathbb{Q}(q|q') \pi(q')}{\mathbb{Q}(q'|q)\pi(q)}\right)\,,\]

The Metropolis-Hastings algorithm is an implementation of detailed balance, where the transition to go from \(q\) to \(q'\) is reversible, satisfying the equation

\[\mathbb{Q}(q'|q)\pi(q) = \mathbb{Q}(q|q')\pi(q')\,.\]

Traditionally, the candidate sample was obtained by performing a random walk, defined by the transition probability

\[\mathbb{Q}(q'|q) = \mathcal{N}(q'|q,\Sigma)\,,\]

with the normal distribution \(\mathcal{N}(x|\mu,\Sigma)\) defined by the mean \(\mu\) covariance matrix \(\Sigma\). When the transition probability \(\mathbb{Q}(q'|q)\) is symmetric, we find that the acceptance probability is independent of the mechanism with which we obtained the candidate \(q\)’, i.e.,

\[a(q'|q) = \text{min}\left(1, \frac{\pi(q')}{\pi(q)}\right)\,.\]

The Hamiltonian Monte Carlo algorithm

The Hamiltonian Monte Carlo algorithm is a specific method to find candidate samples \(q'\) given the sample \(q\), taking the target distribution into account to improving the acceptance probability. In Hamiltonian Monte Carlo, we interpret the variable \(q\) as the position of a non-relativistic particle and introduce an auxiliary variable \(p\) that we will associate with the momentum of this particle. On phase-space, consisting of the pair \((q,p)\), we define the probability distribution

\[\pi(q,p) = \pi(p|q)\pi(q)\,,\]

where the conditional probability \(\pi(p|q)\) will be selected below to make contact with Hamiltonian dynamics. In a sense, the introduction of the function \(\pi(p|q)\) will give us the elbow room to construct a powerful mechanism to sample a \(\mathbb{Q}(q'|q)\) while optimizing the acceptance probability.

Now, writing

\[\pi(q,p) = \mathcal{A}\, e^{-H(q,p)}\,,\]

for some function \(H\) with some normalization constant \(\mathcal{A}\), we observe that the acceptance probability on phase-space assumes the form

\[a(q', p'|q,p) = \text{min}\left(1, \frac{\mathbb{Q}(q,p|q',p') }{\mathbb{Q}(q',p'|q,p)} e^{-H(q',p')+H(q,p)}\right)\,,\]

Hence, if we were we design the sampling mechanism that for the candidate \(q'\) that (1) is symmetric \(\mathbb{Q}(q,p|q',p')=\mathbb{Q}(q',p'|q,p)\), and (2) preserves the function \(H\), we will always accept \(q'\)! Since these are exactly the properties of a particle in Hamiltonian mechanics, we interpret \(H\) as the Hamiltonian of the particle,

\[H(q,p) = \frac{1}{2} p^T M^{-1} p +V(q)\,,\]

in terms of the kinetic energy with the mass matrix \(M\) and potential energy \(V(q)\). Now, substituting the definition of \(H\), we find

\[H(q,p) = - \ln \pi(q,p) =- \ln \pi(p|q) - \ln \pi(q)\,,\]

leading on the on hand to the conditional probability

\[\pi(p|q) \propto e^{-\frac{1}{2}p^T M^{-1} p}\,,\]

and on the other hand to the potential energy of the particle

\[V(q) = - \ln \pi(q)\,.\]

Now that we have connected the word of probability theory with theoretical physics, we update \(q\) to \((q,p)\) by sampling \(\pi(p|q)\). Next we move our particle forward by solving the Hamilton equations

\[ \frac{\partial q}{\partial t} = \frac{\partial H(q,p)}{\partial p}\,,\quad \frac{\partial p}{\partial t} = -\frac{\partial H(q,p)}{\partial q}\,.\]

This system of differential equations is time reversible upon reversing the momentum \(p \mapsto -p\) and preserves the Hamiltonian as

\[\frac{\mathrm{d}H(q,p)}{\mathrm{d}t}=\frac{\partial H(q,p)}{\partial q} \frac{\partial q}{\partial t} +\frac{\partial H(q,p)}{\partial p} \frac{\partial p}{\partial t} =0\,.\]

The implementation

The implementation relies on two steps. Given the sample \(q\) of the target distribution \(\pi(q)\), we sample the momentum from the normal distribution

\[p \sim \mathcal{N}(0, M)\,.\]

The mass matrix \(M\) is a positive definite symmetric matrix that we can freely tune for our problem. We evolve the particle in phase-space \((q,p)\) by solving the Hamilton equations for a propagation time \(T\), where \(T\) is a free parameter. As the conservation of the Hamiltonian is more important than the actual solution of the Hamilton equations, HMC is traditionally implemented with a symplectic integrator. In particular, we use the first-order leapfrog scheme. Starting from \((q_0,p_0)=(q,p)\), we update the position and momentum following the steps

\[ \begin{align} p_{n+\frac{1}{2}} &= p_n - \frac{\epsilon}{2} \frac{\partial V}{\partial q}(q_n)\,,\\ q_{n+1} &= q_n + \epsilon\, p_{n+\frac{1}{2}}\,,\\ p_{n+1} &= p_{n+\frac{1}{2}} - \frac{\epsilon}{2} \frac{\partial V}{\partial q}(q_{n+1})\,, \end{align} \]

with \((q_n,p_n)=(q(\epsilon n), p(\epsilon n))\), where \(\epsilon = T/L\) is the integration step size with \(L\) steps, till we reach \((q_L,p_L)\).

Next, we evaluate the difference in Hamiltonian

\[\Delta H = H(q_L,p_L) - H(q_0,p_0)\]

and the associated acceptance probability

\[a(q_L, p_L|q_0,p_0) = \text{min}\left(1, \frac{\mathbb{Q}(q,p|q_L,p_L) }{\mathbb{Q}(q_L,p_L|q,p)} e^{-\Delta H}\right)\,.\]

The approximate solution of the Hamilton equations with the leap-frog method is a deterministic process, i.e.,

\[\mathbb{Q}(q',p'|q_0,p_0) = \delta(q'-q_L)\delta(p'-p_L)\,.\]

Consequently, the ratio of the transition probabilities vanishes, i.e.,

\[\frac{\mathbb{Q}(q_0,p_0|q_L,p_L) }{\mathbb{Q}(q_L,p_L|q_0,p_0)} = \frac{0}{\infty}=0\,.\]

The particle always moves from \((q_0,p_0)\) to \((q_L,p_L)\) but almost never backwards. We can solve this problem by flipping the momentum \(p \mapsto -p\), making the transition reversible, i.e.,

\[\mathbb{Q}(q',p'|q_0,p_0) = \delta(q'-q_L)\delta(p'+p_L)\,.\]

Now the acceptance probability

\[a(q_L, -p_L|q_0,p_0) = \text{min}\left(1, e^{-\Delta H}\right)\,.\]

The point \(q'=q_L\) is accepted by drawing a number \(r\) from a uniform distribution on \([0,1]\) and checking whether \(r< a(q_L, -p_L|q_0,p_0)\). Note that the flip in momentum does not change the Hamiltonian. We repeat these steps till we have adequately sampled the target distribution \(\pi(q)\).

For optimal performance, it is common to set the mass matrix \(M\) equal to the covariance of the variable \(q\). The integration time \(T\) is set such that we explore the complete support of the distribution. The number of steps \(L\) is taken large enough to ensure the change in Hamiltonian \(\Delta H\) is small.

Generalizations

The discussion above shows the bare minimum implementation of the Hamiltonian Monte Carlo algorithm. Over the last few years, several extensions and generalizations were proposed. In particular, this includes Riemannian Hamiltonian Monte Carlo, where the mass matrix \(M\) is dependent on the position \(q\), and the no-u-turn algorithm. For more details, see the review “A Conceptual Introduction to Hamiltonian Monte Carlo” by Michael Betancourt and references therein.