In this assignment, we will look at the basics of spectral analysis. As complex-valued or bivariate data is quite common in the earth sciences, we will work with horizontal velocity data from an oceanographic current meter.
We'll be working with data from the m1244 mooring in the Labrador Sea, which you can get in NetCDF form here. (This is included in the full distribution of the course materials.)
Many thanks to Jan-Adrian Kallmyr for helping with translating the Matlab tutorial into Python.
import warnings
warnings.filterwarnings('ignore') #suppress some warnings about future code changes
import netCDF4 as nc
import xarray as xr
import numpy as np
import datetime #https://docs.python.org/3/library/datetime.html
import seaborn as sns
import matplotlib.pyplot as plt
import scipy.signal as sg #Package for signal analysis
import scipy.ndimage as si #Another package for signal analysis
#from scipy.interpolate import interp1d #for converting cell to grid-centered coordinates
#from scipy import stats #Used for 2D binned statistics
#from mpl_toolkits.axes_grid1 import make_axes_locatable #For plotting interior colobars
#import cartopy.crs as ccrs
import spectrum
#from scipy import signal, ndimage
from scipy import fft as spfft
from scipy.fft import fft
from scipy.stats import chi2
from scipy.special import digamma
#function for coverting Matlab's datenum into Python's datetime
#from https://gist.github.com/victorkristof/b9d794fe1ed12e708b9d
#with modifications to support array input and output
def datenum_to_datetime(datenum):
"""
Convert Matlab datenum into Python datetime.
Args:
datenum: Date in datenum format
Returns:
Datetime object corresponding to datenum.
"""
date=[]
for day in datenum:
date.append(datetime.datetime.fromordinal(int(day)) \
+ datetime.timedelta(days=day%1) \
- datetime.timedelta(days=366))
date=np.array(date)
return date
plt.rcParams["figure.figsize"] = (10,6.5) #set default figure size
plt.rcParams['font.size'] = 14
plt.rcParams['font.family'] = 'serif'
Let's take a quick look at the dataset.
datadir = "../data/" #choose this appropriate for your system
filename = "m1244.nc"
ds = xr.open_dataset(datadir+filename) #Load the dataset
ds = xr.Dataset.transpose(ds) #Enforce the convection that time is in rows
print(ds)
<xarray.Dataset> Dimensions: (time: 7371, depth: 4) Dimensions without coordinates: time, depth Data variables: lat float64 ... lon float64 ... num (time) float64 ... depths (depth) float64 ... u (time, depth) float64 ... v (time, depth) float64 ... t (time, depth) float64 ... p (time, depth) float64 ...
The first point to know is that the spectrum is not something that can be computed. Instead, it is estimated. The true spectrum is a theoretical object that can't be computed unless you have an infinite amount of time and perfect sampling. The things that can be computed are called spectral estimates or estimated spectra.
Firstly, we will look at the modulus-squared Fourier transform. It is common for this to be referred to as ‘the spectrum’. However, this terminology is incorrect and misleading. Instead, the modulus-squared Fourier transform is a type of spectral estimate called the periodogram. It is known, actually, to be a very poor spectral estimate for reasons we will learn about in the course notes.
#date = datenum_to_datetime(ds["num"].date) #date as a datetime object
cv = ds['u'][:,3].data +1j*ds['v'][:,3].data #complex-valued velocity at deepest depth
fig, ax = plt.subplots(1, 1)
ax.semilogy(np.abs(fft(cv-np.mean(cv)))**2,linewidth=0.75)
ax.autoscale(enable=True, tight=True)
ax.set_ylim(1e2, 1e8)
fig.tight_layout()
plt.xlabel('Fourier Coefficient Number')
plt.ylabel('Power Spectral Density')
plt.title('Periodogram Rotary Spectral Estimate');
The x-axis is simply the index number of the terms in the squared discrete Fourier transform.
Here we have set the y-axis to be logarithmic. This is often useful in dealing with spectra. Note also that we have removed the mean prior to taking the fft, which minimizes broadband bias from the zero frequency.
In the array output by fft, positive frequencies (circles rotating in a counterclockwise sense) are on the left, while negative frequencies (circles rotating in a clockwise sense) are on the right, appearing in reverse order as we have discussed in class. Thus the highest resolvable frequency, called the Nyquist frequency, is in the middle.
You can see clearly that the periodogram roughly has a certain symmetry if you reflect it about the middle. However, it is not completely symmetric because, with a complex-valued signal such as velocity, the twin frequencies (positive and negative rotations at the same absolute frequency) are not required to cancel.
We will discuss the nature of the features we see here shortly. For the moment, we compare this with the periodogram of just the eastward velocity, the real part of our complex-valued velocity time series:
fig, ax = plt.subplots(1, 1)
ax.semilogy(np.abs(fft(np.real(cv-np.mean(cv))))**2,linewidth=0.75)
ax.autoscale(enable=True, tight=True)
ax.set_ylim(1e2, 1e8)
fig.tight_layout()
plt.xlabel('Fourier Coefficient Number')
plt.ylabel('Power Spectral Density')
plt.title('Periodogram Spectral Estimate of the Eastward Velocity');
Now, the periodogram is perfectly symmetric about the center or Nyquist frequency. As discussed in the course notes, twin frequencies occur in conjugate pairs, with the same magnitudes but reversed phase. Because of this, two complex exponentials rotating in opposite directions cancel to yield a real-valued, phase-shifted sinuosoid. The symmetry we see in the periodogram is a reflection of the fact that the time series is real-valued.
Adding two oppositely-rotating circles in general leads to an ellipse. When the magnitude of one circle vanishes the result is a circle. When the magnitude of both circles are identical, the result is a line, and this is the case for a real-valued time series.
There are several inconvenient aspects of presenting the periodogram in this way. Firstly, it is a little bit of a hassle to figure out where the frequencies are. Secondly, it is a bit odd to see the positive and negative frequencies meet in the middle at the Nyquist. Thirdly, the y-axis should be normalized in a meaningful way.
We'll make a new plot that addresses these issues.
dt = ds["num"][1].data - ds["num"][0].data #sampling interval in days
f, S = sg.periodogram(cv-np.mean(cv), fs=1/dt) #fs = sampling frequency (cyclic)
omega = f*2*np.pi #convert cyclic frequency to radian frequency
fig, ax = plt.subplots(1, 1)
ax.semilogy( omega[np.where(omega>=0)],S[np.where(omega>=0)],linewidth=0.75)#plot positive side
ax.semilogy(-omega[np.where(omega<=0)],S[np.where(omega<=0)],linewidth=0.75)#plot negative side
ax.autoscale(enable=True, tight=True)
ax.set_ylim(1e-2, 1e3)
fig.tight_layout()
ax.legend(['Positive Frequencies','Negative Frequencies'])
plt.xlabel('Frequency (radians/day)')
plt.ylabel('Power Spectral Density (cm$^2$/s$^2$ days)')
plt.title('Periodogram Rotary Spectral Estimate');
Here the two-sided periodgram has been split into two one-sided portions, called the positive and negative rotary spectra, each containing roughly half as many data points as the original time series. Their structure is still hard to see at the moment; we will get to that later.
We have also normalized the y-axis in a sensible way. The spectrum integrates to the variance, so we would like our spectral estimate to do so, too. For the two-sided periodgram with an odd number of points, using radian frequency, the correct formula to recover the variance is
(1/2/np.pi)*(omega[1]-omega[0])*np.sum(S) #value of the summed spectrum
94.5525803225848
where the omega[1]-omega[0] is a differential, as would appear in an intergal. When we compare this to the directly computed variance, we find
np.std(cv)**2
94.55258032258482
verifying that our spectral estimate is correctly normalized to obtain the variance as computed in the time domain. These numbers imply a standard deviation of $\sigma=\sqrt{94.5}=9.7$ cm/s.
This explains the units of the spectrum. Its units must the square of whatever the units are of your time series, in this case cm/s, divided by the frequency units, in this case 1/days (recall that radians are dimensionless). This is because the spectrum is a partitioning of variance across frequencies.
Next we put down some markers of some meaningful frequencies: the Coriolis frequency at the latitude of the mooring location---around which the oceanic internal wave field is concentrated---and eight prominent tidal frequencies.
#define functions to return the Coriolis and tidal frequencies
def corfreq(lat):
"""
The Coriolis frequency in rad / day at a given latitude.
Args:
lat: Latitude in degree
Returns:
The Coriolis frequecy at latitude lat
"""
omega=7.2921159e-5;
return 2*np.sin(lat*2*np.pi/360)*omega*(3600)*24;
def tidefreq():
"""
Eight major tidal frequencies in rad / day. See Gill (1982) page 335.
Args:
None
Returns:
An array of eight frequencies
"""
return 24*2*np.pi/np.array([327.85,25.8194,24.0659,23.9344,12.6584,12.4206,12.0000,11.9673])
fig, ax = plt.subplots(1, 1)
ax.semilogy( omega[np.where(omega>=0)],S[np.where(omega>=0)],linewidth=0.75)#plot positive side
ax.semilogy(-omega[np.where(omega<=0)],S[np.where(omega<=0)],linewidth=0.75)#plot negative side
ax.autoscale(enable=True, tight=True)
ax.set_ylim(1e-2, 1e3)
fig.tight_layout()
ax.vlines(tidefreq(),ax.get_ylim()[0],ax.get_ylim()[1],linestyle=":", color="black")
ax.vlines(corfreq(ds["lat"].data),ax.get_ylim()[0],ax.get_ylim()[1],linestyle="--", color="gray",linewidth=2)
ax.legend(['Positive Frequencies','Negative Frequencies'])
plt.xlabel('Frequency (radians/day)')
plt.ylabel('Power Spectral Density (cm$^2$/s$^2$ days)')
plt.title('Periodogram Rotary Spectral Estimate');
The tidal frequencies appear in three groups. From right to left, there are four semidiurnal tidal frequencies, three diurnal frequencies, and one low-frequency tide, the Mf lunal fortnightly tide at about one cycle per 13.6 days.
As we have discussed, when a frequency is expressed in units of radians per unit time, it is called a radian or angular frequency. When a frequency is expressed in units of cycles per unit time, it is called a cyclic frequency. This is more easily seen in how one writes sinusoid or a complex exponential:
\begin{equation} e^{i \omega t}~~~(\mathrm{radian}) \quad\quad \mathrm{vs.}\quad\quad e^{2\pi i f t}~~~(\mathrm{cyclic}) \end{equation}Thus the relationship between the radian frequency $\omega$ and the cyclic frequency $f$ is $f=\omega/2\pi$.
I find it useful to work with both types of frequencies. Radian frequencies are convenient for theoretical expressions. However, cyclic frequencies are more intutive and therefore to be preferred when plotting or quoting values.
So now we redo the above plot in cyclic frequency, and covert from radians per day to cycles per day.
f, S = sg.periodogram(cv-np.mean(cv), fs=1/dt) #fs = sampling frequency (cyclic)
fig, ax = plt.subplots(1, 1)
ax.semilogy( f[np.where(f>=0)],S[np.where(f>=0)],linewidth=0.75)#plot positive side
ax.semilogy(-f[np.where(f<=0)],S[np.where(f<=0)],linewidth=0.75)#plot negative side
ax.autoscale(enable=True, tight=True)
ax.set_ylim(1e-2, 1e3)
fig.tight_layout()
ax.vlines(tidefreq()/2/np.pi,ax.get_ylim()[0],ax.get_ylim()[1],linestyle=":", color="black")
ax.vlines(corfreq(ds["lat"].data)/2/np.pi,ax.get_ylim()[0],ax.get_ylim()[1],linestyle="--", color="gray",linewidth=2)
plt.legend(['Positive Frequencies','Negative Frequencies'])
plt.xlabel('Frequency (cycles/day)')
plt.ylabel('Power Spectral Density (cm$^2$/s$^2$ days)')
plt.title('Periodogram Rotary Spectral Estimate');