Noise and Residuals

Next: Further Reading Up: Maximum Entropy Previous: MEM Images Contents

Noise and Residuals

The discussion so far has made no reference to noise in the interferometric measurements. But this can readily be accomodated in the Bayesian framework. One now treats the measurements not as constraints but as having a Gaussian distribution around the ``true'' value which the real sky would Fourier transform to. Thus the first factor $P(B\vert A)$ on the right hand side of Bayes theorem would now read

$\begin{displaymath}P(B\vert A)=\prod \exp ( -(\int\int I(l,m) exp(-2 \pi i (lu+mv))~dl~dm-V_m(u,v)\vert^2/2\sigma_{u,v} ^2.\end{displaymath}$

The product is over measured values of

. A nice feature of the gaussian distribution is that when we take its logarithm, we get the sum of the squares of the residuals between the model predictions (the integral above) and the measurements

- also known as ``chi-squared'' or $\chi^2$ . The logarithm of the prior is of course the entropy factor. So, in practice, we end up maximising a linear combination of the entropy and $\chi^2$ , the latter with a negative coefficient. This is exactly what one would have done, using the method of Lagrange multipliers, if we were maximising entropy subject to the constraint that the residuals should have the right size, predicted by our knowledge of the noise.

All is not well with this recipe for handling the noise. The discrepancy between the measured data and the model predictions can be thought of as a residual vector in a multidimensional data space. We have forced the length to be right, but what about the direction? True residuals should be random, i.e the residual vector should be uniformly distributed on the sphere of constant $\chi^2$ . But since we are maximising entropy on this sphere, there will be a bias towards that direction which points along the gradient of the entropy function. This shows in the maps as a systematic deviation tending to lower the peaks and raise the ``baseline'' i.e the parts of the image near zero . To lowest order, this can be rectified by adding back the residual vector found by the algorithm. This does not take care of the invisible distribution which the MEM has produced from the residuals, but is the best we can do. Even in the practice of CLEAN, residuals are added back for similar reasons.

The term ``bias'' is used by statisticians to describe the following phenomenon. We estimate some quantity, and even after taking a large number of trials its average is not the noise-free value. The noise has got ``rectified" by the non-linear algorithm and shows itself as a systematic error. There are suggestions for controlling this bias by imposing the right distribution and spatial correlations of residuals. These are likely to be algorithmically complex but deserve exploration. They could still leave one with some subtle bias since one cannot really solve for noise. But to a follower of Bayes, bias is not necesarily a bad thing. What is a prior but an expression of prejudice? Perhaps the only way to avoid bias is to stop with publishing a list of the measured visibility values with their errors. Perhaps the only truly open mind is an empty mind!

Next: Further Reading Up: Maximum Entropy Previous: MEM Images Contents

NCRA-TIFR