Markov Chain Monte Carlo (MCMC)
Metropolis-Hashtings Algorithm is a Markov Chain Monte Carlo (MCMC) method for obtaining a sequence of random samples which converge to being distributed according to a target probability distribution for which direct sampling is difficult.
Introduction
This article assumes the reader understands the following:
- Markov Chains & its properties:
- ergodic property
- stationary distribution
- detailed balanced
- Monte Carlo Methods
Problem:
- approximate or sample from target distribution 𝜋
Solution:
- Markov Chain idea: given an ergodic transition matrix 𝑇 there exists a stationary distribution 𝜋
- Metropolis Algorithm idea: given a target distribution 𝜋 construct an ergodic transition matrix 𝑇 that will produce 𝜋
- the ergodic theorem states that sampling from this Markov chain 𝑇 will approximate the target distribution 𝜋
Metropolis Algorithm:
- given current state 𝑥, sample next state 𝑥’ from a proposal distribution 𝐐(𝑥 → 𝑥')
- accept next state 𝑥’ with acceptance probability 𝐀(𝑥 → 𝑥')
- if accepted, move to 𝑥'
- otherwise stay at 𝑥
- repeat 𝑛 number of times
From the algorithm, the transition function 𝑇 is defined as:
if 𝑥 ≠ 𝑥':
- $𝑇(𝑥 → 𝑥’) = 𝐐(𝑥 → 𝑥’)𝐀(𝑥 → 𝑥’)$
if 𝑥 = 𝑥':
- $𝑇(𝑥 → 𝑥) = 𝐐(𝑥 → 𝑥) + 𝛴_{𝑥≠𝑥’} [𝐐(𝑥 → 𝑥’)(1 - 𝐀(𝑥 → 𝑥’))]$
construct acceptance probability 𝐀 such that detailed balance holds for 𝑇 (detailed balance implies stationary distribution, and thus the ergodic theorem above applies):
- $𝜋(𝑥’)𝑇(𝑥’ → 𝑥) = 𝜋(𝑥)𝑇(𝑥 → 𝑥’)$
- $𝜋(𝑥’)𝐐(𝑥’ → 𝑥)𝐀(𝑥’ → 𝑥) = 𝜋(𝑥)𝐐(𝑥 → 𝑥’)𝐀(𝑥 → 𝑥’)$
- $\frac{𝐀(𝑥 → 𝑥’)}{𝐀(𝑥’ → 𝑥)} = \frac{𝜋(𝑥’)𝐐(𝑥’ → 𝑥)}{𝜋(𝑥)𝐐(𝑥 → 𝑥’)}$
- $𝐀(𝑥 → 𝑥’) = 𝑚𝑖𝑛(1, \frac{𝜋(𝑥’)𝐐(𝑥’ → 𝑥)}{𝜋(𝑥)𝐐(𝑥 → 𝑥’)})$
- $𝐀(𝑥 → 𝑥’) = 𝑚𝑖𝑛(1, \frac{\frac{𝜋’(𝑥’)}{𝑍} 𝐐(𝑥’ → 𝑥)}{\frac{𝜋’(𝑥)}{𝑍} 𝐐(𝑥 → 𝑥’)})$
- $𝐀(𝑥 → 𝑥’) = 𝑚𝑖𝑛(1, \frac{𝜋’(𝑥’) 𝐐(𝑥’ → 𝑥)}{𝜋’(𝑥) 𝐐(𝑥 → 𝑥’)})$
1 by definition of detailed balanced
5 because 𝜋(𝑥) is assumed to be hard to compute because of its normalizing constant 1/𝑍, we can rewrite it as $𝜋(𝑥) = \frac{𝜋’(𝑥)}{𝑍}$
6 the normalizing constants are cancelled out, rendering 𝐀(𝑥 → 𝑥’) to be easy to compute
Choice of Proposal Distribution 𝐐
𝐐 must be reversible:
- 𝐐(𝑥’ → 𝑥) > 0 implies 𝐐(𝑥 → 𝑥’) > 0
opposing forces:
- 𝐐 should be spread out to improve mixing
- but then acceptance probability will be low, which in turn slows down mixing
Random Walk Metropolis
If 𝐐 is chosen to be symmetric (i.e. 𝐐(𝑥 → 𝑥’) = 𝐐(𝑥’ → 𝑥) for all 𝑥’ and 𝑥), then the acceptance probability 𝐀 becomes:
- $𝐀(𝑥 → 𝑥’) = 𝑚𝑖𝑛(1, \frac{𝜋(𝑥’)𝐐(𝑥’→ 𝑥)}{𝜋(𝑥)𝐐(𝑥 → 𝑥’)})$
- $𝐀(𝑥 → 𝑥’) = 𝑚𝑖𝑛(1, \frac{𝜋(𝑥’)}{𝜋(𝑥)})$
now:
- if 𝑥’ has density greater than or equal to 𝑥’s density (i.e. 𝜋(𝑥’) ≥ 𝜋(𝑥)) then we will always accept 𝑥'
- if 𝑥’ has density less than 𝑥’s density (i.e. 𝜋(𝑥’) < 𝜋(𝑥)) then we will always accept 𝑥’ with probability $\frac{𝜋(𝑥’)}{𝜋(𝑥)}$
Independent Metropolis-Hastings Proposal
If 𝐐 is chosen to be independent (i.e. 𝐐(𝑥 → 𝑥’) = 𝐐(𝑥’)), then the acceptance probability becomes:
- $𝐀(𝑥 → 𝑥’) = 𝑚𝑖𝑛(1, \frac{𝜋(𝑥’)𝐐(𝑥’→ 𝑥)}{𝜋(𝑥)𝐐(𝑥 → 𝑥’)})$
- $𝐀(𝑥 → 𝑥’) = 𝑚𝑖𝑛(1, \frac{𝜋(𝑥’)𝐐(𝑥)}{𝜋(𝑥)𝐐(𝑥’)})$