## Saturday, February 19, 2011

### Quantized Redshifts. V. Exploring the PSD with Simple Datasets

The ready availability of the Fast Fourier Transform (FFT) in numerics libraries on a variety of computing platforms has turned the FFT into a tool available to many researchers.  However, the availability of a tool does not mean that its behavior is understood.  Too many researchers seem to use the FFT as if it blindly 'finds frequencies' and that these frequencies are physically relevant to their problem of interest.

As I've demonstrated in earlier posts in this series, the fundamental idea behind the Fourier series & transforms is that ALL functions can be decomposed in waves of various frequencies - therefore EVERY SIGNAL will exhibit frequencies in a PSD.  The question becomes which frequencies are physically relevant and not an artifact of the data selection and distribution.

An important issue in working with the PSD is to understand just how it behaves with different types of known data as input.  Generally, if your data contains a clean, sinusoidal signal, the PSD will generate nice, sharp peaks at the frequency of the signal.  But as we will see below, even a little bit of deviation from the perfect, infinite signal changes the PSD.

A note on normalization:  For comparisons, I want to maintain a matching scale between analytically computed power spectra using Fourier transforms with the power spectra using the FFT.  I have adopted the convention of the previous post (Quantized Redshifts. IV. The Fourier Transform, FFT & PSD) for analytic computations.  I have also included a multiplicative factor for normalizing the PSD generated by the FFT.  In Python using the numpy library, this can be expressed as

>>> import numpy as np
>>> from numpy import fft

FFT computation.  nbins=number of data bins
>>> ff=fft.fft(dataf,nbins)
>>> psd=np.abs(ff)**2/nbins

Analytic computation
>>> psdref=(nbins/(2.0*np.pi))*np.abs(fftref)**2

PSD of Pure Sinusoidal Signal:
First we'll consider the basic case of the sine wave.  The ideal sine wave has infinite in duration in time, t, with some well-defined frequency, \omega_0.  It is expressed mathematically by

But in the real world, we cannot have a sine wave of infinite duration.  All data runs are of finite duration in time and space.  Therefore, we will define a sine wave of finite duration, starting at a time \tau and lasting for time, T.  Mathematically, this can be expressed as:

We can compute the Fourier transform, and compute the FFT of this system, and we plot them below.
 Click to enlarge

In this example, the infinite sine wave produces a PSD that is a delta function (wikipedia) at the frequency of the wave, which I've designated with red in the graphic above.  Note that in the above plot, we are plotting the frequency, f, not the angular frequency, \omega.  But the PSD generated by the FFT (blue) and the analytic function (green) are slightly different.  We see this more clearly in the plot below where we plot the amplitude of the PSD on a logarithmic scale.

 Click to enlarge
Here we see that the peak is still at 10Hz, but power leaks into other frequencies, above and below the input frequency.  We also see that the two different methods of generating the PSD generate slightly different results.  With the normalization chosen, the amplitudes at the higher frequencies differ by a factor of ten.

PSD of A Couple of (Independent) Sine Waves
Next we'll consider the superposition of two overlapping sine waves of finite duration.  Because the Fourier Transform is a linear transformation, the Fourier transform of the sum of the Fourier transform of the two input waves.  However, when we compute the PSD, we must still compute the product with the total Fourier transform.  This means that the PSD of two signals is not necessarily equal to the sum of the PSDs of the individual signals.
 Click to Enlarge
Again, we will plot the power on a logarithmic scale so we can see how the two computational techniques vary.

Note that while in this example, the PSD is equal to the sum of the PSDs of the two input signals, this will not always the case for reasons we will see next week.

PSD of a Square Pulse
Finally, we consider a rectangular pulse, which has a constant amplitude, different from zero, for some finite duration, T.  To make the inquiry more general, I'll also consider offset from time t=0 by a time \tau.

For the analytic part of the analysis, we compute the Fourier Transform of this function:
And now we compute the power spectrum using the function above, and the FFT.  The results are plotted below.
 Click to Enlarge
Rather than plotting the full range of the frequency scale for this test, I've zoomed in on the lower frequencies to reveal how the power is distributed in a series of broad peaks.  We even see a slight difference in the locations of the peaks when comparing the analytic PSD and the PSD computed from the FFT.

Here again are frequencies in the power spectrum from an input that is clearly perfectly uniform within its boundaries, it has no 'internal structure'.

This is a very counter-intuitive result.  What generates all these peaks?

Whatever the origin of the peaks, it means we must be very careful when claiming peaks in a PSD represent actual internal structure in a dataset.  To make sure we understand what our analysis techniques are doing, we must test the null hypothesis (Wikipedia) - what kind of PSD is produced if we have a uniform region subjected to the same sampling as our data?

Here's another question.  Suppose we change the offset value, \tau.  How will the Fourier Transform change?  How will the PSD change?  Suppose the signal is not a square pulse, will the answers change?  How is this question relevant to cosmological power spectral analysis?

Some General Observations
• We see that the power spectrum computed by the Fourier Transform and the FFT are indeed different, but still exhibit many of the same qualitative features, such as the locations of local peak values.  Some of this difference is due to aliasing, in the Discrete Fourier Transform (DFT).
• Narrow features in time or space are broad in frequency and wave number.  Well-defined frequencies correspond to infinite waves.  Broad structures in spatial dimensions generate low-frequencies in the power spectra.
I've laid the basic mathematical groundwork on what the power spectral density (PSD) is, and how it is computed.

Enough of the introductions.  I've now covered enough material to explore some specific errors.  Time to get to the meat of our story...

Next Weekend: Common Power Spectral Density "Bloopers"