Conor Lawless

Scientific Computing

data science, dynamic simulation modelling, genomics, interactive visualisation, dashboards, image & video analysis

If you’d like me to work on projects of any size, please get in touch
e: cnr.lwlss@gmail.com
t: @cnrlwlss

About me

Visualising correlation in an AR(1) process

Autoregression models are a type of stochastic, dynamic process. They are a mathematical representation of some value that varies with time, where the variation includes a random, unpredictable component. Using a computer to generate (pseudo-)random numbers, we can generate a set of simulated values across time that are consistent with this kind of model. Stochastic simulations aim to capture the random component in the process and so are usually different every time they’re run.

The simplest type of autoregression model, the so-called AR(1) process, is interesting as a model flexible enough to include both one dimensional Brownian motion and white Gaussian noise as special cases. An AR(1) process is plotted above. You can change the model parameters $\alpha$ and $\sigma$ (both described below) using the sliders beneath the plot. Similarly, you can zoom in to or out of the start of the process using the “Horizontal zoom” slider.

During Bayesian inference by Gibbs sampling, for example, we often need to investigate chains of MCMC output. It is important to verify that there is no correlations within these MCMC chains before assuming that samples from the chains correspond to unbiased samples from the posterior distribution. One way to do this is to treat the chains as time series, to plot these and then to visually check the resulting dynamic process for evidence of autocorrelation. Visual inspection of simulated output from AR(1) processes can help us to build an intuitive sense for whether a stochastic time series is autocorrelated.

AR(1) is a stochastic process $X$ varying with discrete time $t$, where the value of $X$ at time $t$ (written $X_t$) is proportional to the previous value ($X_{t-1}$), adjusted by the addition of a noise term ($\epsilon_t$). $\alpha$ is the constant of proportionality and it controls the degree of autocorrelation in the process:

$X_t = \alpha X_{t-1} + \epsilon_t$

where $\epsilon_t \sim N(0,\sigma)$

A sensible algorithm to simulate $t_{max}$ steps from $X$ would be to start off by defining $X_0$ and then, for each subsequent $t \in 1,…,t_{max}$, use your computer to generate a (pseudo-)random value for $\epsilon_t$, and use that to generate $X_t$, following the expression above. Some example Python code:

import random

alpha = 0.5
sigma = 1.0
tmax = 100
X = [None] * (tmax+1)

X[0] = 0
for t in range(1,len(X)):
  X[t] = alpha * X[t-1] + random.gauss(0,sigma)

Another way to do the same thing is to generate and store some random samples from the Gaussian distribution separately, then use the same random samples to explore the effect of different values of $\alpha$ or $\sigma$. This way allows the generation of the plot above, where we can explore what the AR(1) process looks like for different values of $\alpha$, all other values being the same:

import random

tmax = 100

eps = [random.gauss(0.0,1.0) for x in range(0,tmax+1)]

def simulate(eps, alpha = 1.0, sigma = 1.0):
  X = [None] * (len(eps))
  X[0] = 0
  for t in range(1,len(X)):
    X[t] = alpha * X[t-1] + sigma * eps[t]
  return(X)

X_brownian_motion = simulate(eps, alpha = 1.0, sigma = 1.0)
X_white_noise = simulate(eps, alpha = 0.0, sigma = 1.0)

A third way might be to accept that computers are completely deterministic machines and that the random numbers your favourite software tool generates are, in fact, pseudo-random and completely reproducible. However, I won’t go into this now, I will probably write another short post about this issue soon.

It’s interesting to think about what would happen if $|\alpha|>1$. In this case, the process becomes dominated by exponential growth (a kind of numerical explosion!) which rapidly becomes much greater in magnitude than the noise component. In this case the resulting process is no longer obviously stochastic, and so not really AR(1). As such, I’ve restricted the adjustable range for $\alpha$ in the interactive plot above so that it lies between $0$ and $1$.

Much of the interesting changes in shape that take place while varying $\alpha$ occur for values very close to $1$. To highlight that, I have applied a nonlinear transformation between the slider position and the value of the $\alpha$ parameter used to draw the plot. Note that the actual $\alpha$ parameter value is reported at the top of the plot.

Setting $\alpha = 1$ recovers highly autocorrelated one-dimensional Brownian motion (the so-called Weiner process). Setting $\alpha = 0$ gives completely independent samples from the Normal distribution (white Gaussian noise). It’s interesting to note that, by eye, it is quite difficult to differentiate between the process generated when $\alpha$ is as high as 0.5 and that when there is no autocorrelation ($\alpha = 0$). It might be worth bearing this in mind when visually inspecting MCMC output for correlation.