Markov chain and The Metropolis-Hastings algorithm

The Metropolis-Hastings algorithm is a Markov Chain Monte Carlo (MCMC) method widely used for generating samples from a probability distribution, especially when direct sampling is challenging or impractical. The primary goal of the Metropolis-Hastings algorithm is to simulate a Markov chain whose stationary distribution matches the desired probability distribution. This is achieved by proposing candidate samples from a proposal distribution and accepting or rejecting them based on a carefully designed acceptance criterion. We will first look at the basics of Markov chain, then carry it over to the Metropolis-Hastings algorithm, and implement it in Python.

Markov chain Basics

Firstly, let us define a Markov chain: A first-order Markov chain is defined to be a series of random variables $\mathbf{z}^{(1)}, \mathbf{z}^{(2)}, …, \mathbf{z}^{(M)}$ such that the following conditional independence property holds for $m \in \{1, 2, …, M\}$ $$p(\mathbf{z}^{(M)} | \mathbf{z}^{(M – 1)}, \mathbf{z}^{(M – 2)}, …, \mathbf{z}^{(1)}) = p(\mathbf{z}^{(M)} | \mathbf{z}^{(M – 1)})$$ With this, we can specify the Markov chain by giving the probability distribution for the initial variable $p(\mathbf{z}^{(0)})$ and the transition probabilities $T(\mathbf{z}^{(m)} \rightarrow \mathbf{z}^{(m + 1)}) = p(\mathbf{z}^{(m + 1)} | \mathbf{z}^{(m )})$

Using the Markov properties, we can readily calculate the marginal probability of a particular variable by using the sum rule $$p(\mathbf{z}^{(m+1)}) = \sum_{\mathbf{z}^{(m)}} p(\mathbf{z}^{(m + 1)} | \mathbf{z}^{(m)}) p(\mathbf{z}^{(m)})$$ Let’s now define a concept called stationary distribution: A distribution is stationary, with respect to a Markov chain if each step in the chain leaves that distribution invariant. More concretely: $$p(\mathbf{z}) = \sum_{\mathbf{z}’} T(\mathbf{z}’ \rightarrow \mathbf{z}) p(\mathbf{z}’)$$ To ensure that $p(\mathbf{z})$ is a stationary distribution of a Markov chain, we can design the transition probabilities such that the detailed balanced property is satisfied $$p(\mathbf{z}) T(\mathbf{z} \rightarrow \mathbf{z}’) = p(\mathbf{z}’) T(\mathbf{z}’ \rightarrow \mathbf{z})$$ To see this, notice that $$\sum_{\mathbf{z}’} p(\mathbf{z}’) T(\mathbf{z}’ \rightarrow \mathbf{z}) = \sum_{\mathbf{z}’} p(\mathbf{z}) T(\mathbf{z} \rightarrow \mathbf{z}’) = p(\mathbf{z})$$ We will use this property to design a Markov chain such that its stationary distribution is the desired sampling distribution.

Metropolis-Hastings algorithm

The Metropolis-Hastings works as follows: In particular at step $t$ of the algorithm, in which the current state is $\mathbf{z}^{(t)}$, we draw a sample $\mathbf{z}^*$ from the distribution $q_k(\mathbf{z} | \mathbf{z}^{(t)})$, then we accept the sample $\mathbf{z}^*$ with probability $A_k(\mathbf{z}^*, \mathbf{z}^{(t)})$ where $$A_k(\mathbf{z}^*, \mathbf{z}^{(t)}) = \min(1, \cfrac{\tilde{p(\mathbf{z}^{*})} q_k(\mathbf{z}^{(t)} | \mathbf{z}^*)}{\tilde{p(\mathbf{z}^{(t)})} q_k(\mathbf{z}^* | \mathbf{z}^{(t)})})$$ It can be proven that the stationary distribution of this Markov chain is the targeted sampling distribution. First, notice that the transition probability is defined as $$T(\mathbf{z} \rightarrow \mathbf{z}’) = q_k(\mathbf{z}’ | \mathbf{z}) A_k(\mathbf{z}’, \mathbf{z})$$ This is equivalent to sample $\mathbf{z}’$ first, and then successfully accept it as a valid sample.

This transition probability satisfies the detailed balance property, because $$p(\mathbf{z}^{(t)}) q_k(\mathbf{z}^* | \mathbf{z}^{(t)}) A_k(\mathbf{z}^*, \mathbf{z}^{(t)}) = \min(p(\mathbf{z}^{*}) q_k(\mathbf{z}^{(t)} | \mathbf{z}^*), p(\mathbf{z}^{(t)}) q_k(\mathbf{z}^* | \mathbf{z}^{(t)})) = p(\mathbf{z}^{*}) q_k(\mathbf{z}^{(t)} | \mathbf{z}^*) A_k(\mathbf{z}^{(t)}, \mathbf{z}^*)$$ For continuous state spaces, a common choice is a Gaussian centred on the current state: $q_k(\mathbf{z}^* | \mathbf{z}^{(t)}) = N(\mathbf{z}^* | \mathbf{z}^{(t)}, \sigma^2)$

Python Implementation

We will sample from an exponential distribution using the Metropolis-Hastings algorithm on its unnormalised probability density function. The code implementation is available https://github.com/SonPhatTranDeveloper/basic-sampling-algorithms/blob/main/metropolis-hastings-sampling.ipynb

# Define parameters
TOTAL_SAMPLES = 100000
STEP_VARIANCE = 2
PROPOSAL_VARIANCE = 1

# Initialize a first state
z = 1

# Start generating samples
samples = []

# Until we have 1000 samples
while len(samples) < TOTAL_SAMPLES:
    # Generate a random sample from normal distribution
    z_sample = np.random.normal(loc=z, scale=np.sqrt(STEP_VARIANCE))
    
    # See if we should reject or accept this value
    a = min(1, (unnormalised_exponential_distribution_pdf(z_sample, LAMBDA) * normal_distribution_pdf(z, z_sample, PROPOSAL_VARIANCE)) 
            / (unnormalised_exponential_distribution_pdf(z, LAMBDA) * normal_distribution_pdf(z_sample, z, PROPOSAL_VARIANCE)))
    
    # See if we accept or reject the same
    uniform_variable = np.random.uniform(low=0, high=1)
    
    # Accept if uniform_variable < a
    # Reject otherwise
    if uniform_variable < a:
        samples.append(z_sample)
        z = z_sample
    else:
        # print('REJECT')
        pass

The histogram of the generated values only roughly corresponds to the exponential distribution’s PDF.

Markov chain Basics

Metropolis-Hastings algorithm

Python Implementation

Leave a Comment Cancel Reply