The Metropolis-Hastings algorithm is a Markov Chain Monte Carlo (MCMC) method widely used for generating samples from a probability distribution, especially when direct sampling is challenging or impractical. The primary goal of the Metropolis-Hastings algorithm is to simulate a Markov chain whose stationary distribution matches the desired probability distribution. This is achieved by proposing candidate samples from a proposal distribution and accepting or rejecting them based on a carefully designed acceptance criterion. We will first look at the basics of Markov chain, then carry it over to the Metropolis-Hastings algorithm, and implement it in Python.
Markov chain Basics
Firstly, let us define a Markov chain: A first-order Markov chain is defined to be a series of random variables $\mathbf{z}^{(1)}, \mathbf{z}^{(2)}, …, \mathbf{z}^{(M)}$ such that the following conditional independence property holds for $m \in \{1, 2, …, M\}$ $$p(\mathbf{z}^{(M)} | \mathbf{z}^{(M – 1)}, \mathbf{z}^{(M – 2)}, …, \mathbf{z}^{(1)}) = p(\mathbf{z}^{(M)} | \mathbf{z}^{(M – 1)})$$ With this, we can specify the Markov chain by giving the probability distribution for the initial variable $p(\mathbf{z}^{(0)})$ and the transition probabilities $T(\mathbf{z}^{(m)} \rightarrow \mathbf{z}^{(m + 1)}) = p(\mathbf{z}^{(m + 1)} | \mathbf{z}^{(m )})$
Using the Markov properties, we can readily calculate the marginal probability of a particular variable by using the sum rule $$p(\mathbf{z}^{(m+1)}) = \sum_{\mathbf{z}^{(m)}} p(\mathbf{z}^{(m + 1)} | \mathbf{z}^{(m)}) p(\mathbf{z}^{(m)})$$ Let’s now define a concept called stationary distribution: A distribution is stationary, with respect to a Markov chain if each step in the chain leaves that distribution invariant. More concretely: $$p(\mathbf{z}) = \sum_{\mathbf{z}’} T(\mathbf{z}’ \rightarrow \mathbf{z}) p(\mathbf{z}’)$$ To ensure that $p(\mathbf{z})$ is a stationary distribution of a Markov chain, we can design the transition probabilities such that the detailed balanced property is satisfied $$p(\mathbf{z}) T(\mathbf{z} \rightarrow \mathbf{z}’) = p(\mathbf{z}’) T(\mathbf{z}’ \rightarrow \mathbf{z})$$ To see this, notice that $$\sum_{\mathbf{z}’} p(\mathbf{z}’) T(\mathbf{z}’ \rightarrow \mathbf{z}) = \sum_{\mathbf{z}’} p(\mathbf{z}) T(\mathbf{z} \rightarrow \mathbf{z}’) = p(\mathbf{z})$$ We will use this property to design a Markov chain such that its stationary distribution is the desired sampling distribution.
Metropolis-Hastings algorithm
The Metropolis-Hastings works as follows: In particular at step $t$ of the algorithm, in which the current state is $\mathbf{z}^{(t)}$, we draw a sample $\mathbf{z}^*$ from the distribution $q_k(\mathbf{z} | \mathbf{z}^{(t)})$, then we accept the sample $\mathbf{z}^*$ with probability $A_k(\mathbf{z}^*, \mathbf{z}^{(t)})$ where $$A_k(\mathbf{z}^*, \mathbf{z}^{(t)}) = \min(1, \cfrac{\tilde{p(\mathbf{z}^{*})} q_k(\mathbf{z}^{(t)} | \mathbf{z}^*)}{\tilde{p(\mathbf{z}^{(t)})} q_k(\mathbf{z}^* | \mathbf{z}^{(t)})})$$ It can be proven that the stationary distribution of this Markov chain is the targeted sampling distribution. First, notice that the transition probability is defined as $$T(\mathbf{z} \rightarrow \mathbf{z}’) = q_k(\mathbf{z}’ | \mathbf{z}) A_k(\mathbf{z}’, \mathbf{z})$$ This is equivalent to sample $\mathbf{z}’$ first, and then successfully accept it as a valid sample.
This transition probability satisfies the detailed balance property, because $$p(\mathbf{z}^{(t)}) q_k(\mathbf{z}^* | \mathbf{z}^{(t)}) A_k(\mathbf{z}^*, \mathbf{z}^{(t)}) = \min(p(\mathbf{z}^{*}) q_k(\mathbf{z}^{(t)} | \mathbf{z}^*), p(\mathbf{z}^{(t)}) q_k(\mathbf{z}^* | \mathbf{z}^{(t)})) = p(\mathbf{z}^{*}) q_k(\mathbf{z}^{(t)} | \mathbf{z}^*) A_k(\mathbf{z}^{(t)}, \mathbf{z}^*)$$ For continuous state spaces, a common choice is a Gaussian centred on the current state: $q_k(\mathbf{z}^* | \mathbf{z}^{(t)}) = N(\mathbf{z}^* | \mathbf{z}^{(t)}, \sigma^2)$
Python Implementation
We will sample from an exponential distribution using the Metropolis-Hastings algorithm on its unnormalised probability density function. The code implementation is available https://github.com/SonPhatTranDeveloper/basic-sampling-algorithms/blob/main/metropolis-hastings-sampling.ipynb
# Define parameters
TOTAL_SAMPLES = 100000
STEP_VARIANCE = 2
PROPOSAL_VARIANCE = 1
# Initialize a first state
z = 1
# Start generating samples
samples = []
# Until we have 1000 samples
while len(samples) < TOTAL_SAMPLES:
# Generate a random sample from normal distribution
z_sample = np.random.normal(loc=z, scale=np.sqrt(STEP_VARIANCE))
# See if we should reject or accept this value
a = min(1, (unnormalised_exponential_distribution_pdf(z_sample, LAMBDA) * normal_distribution_pdf(z, z_sample, PROPOSAL_VARIANCE))
/ (unnormalised_exponential_distribution_pdf(z, LAMBDA) * normal_distribution_pdf(z_sample, z, PROPOSAL_VARIANCE)))
# See if we accept or reject the same
uniform_variable = np.random.uniform(low=0, high=1)
# Accept if uniform_variable < a
# Reject otherwise
if uniform_variable < a:
samples.append(z_sample)
z = z_sample
else:
# print('REJECT')
pass
The histogram of the generated values only roughly corresponds to the exponential distribution’s PDF.