Acor, statistical analysis of a time series



Estimating the mean of a time series: statistical error bars

Markov chain Monte Carlo produces a time series of correlated values that are to be used to estimate the expected value of something. A statistical error bar is an estimate of how far the mean of the time series is from the actual mean of the process -- what you would get if the series were infinitely long. Closely related to the statistical error bar is the autocorrelation time. The statistical error bar of the mean is the standard deviation of the time series divided by the effective sample size. The effective sample size is L/tau, where L is the length of the time series and tau is the autocorrelation time.

Monte Carlo practitioners should know the autocorrelation time of their processes because the results can be striking. An autocorrelation time tau = 100 means that a time series of 10,000 samples, which seems like a lot, has an effective sample size of only 100.

The autocorrelation time in Markov chain Monte Carlo depends both on the Markov chain and on the quantity being measured. This distinguishes it from other measures of mixing rates, such as the spectral gap or the time to achieve a close approximation to the invariant measure in the total variation sense. Both of those measures give upper bounds for autocorrelation time. As the Sokal notes explain (link on the left), it is the integrated autocorrelation time that gives the most important information for Monte Carlo estimation, the size of the statistical error bar.

Accuracy and failure

Estimating autocorrelation time is not that easy to do well. This program is not terribly accurate. For example, a run of length 4 Million (effective sample size about 200,000) gave an estimate tau = 19.13 when the exact answer was 19. It would an interesting research project to explore the best possible estimate of the autocorrelation time and the error bar. It seems likely that there is a better way than this code.

The code gives up and prints an error message if it decides that the sequence length, n, is less than ten times (an adjustable parameter equal to ten out of the box) the autocorrelation time.

Software description

The C++ software (link on the left) is very simple. It either reads numbers from an ASCII file or is called from another procedure. It computes the autocorrelation time using the Kubo formula (see the code or Sokal's notes -- link on the left) and a trick to shorten the series if the autocorrelation time is too long. The running time is O(L) for a time series of length L, because it does not use the FFT to compute the autocovariance function.