Variance Ellipses

In this lab, you'll be working with current meter data, that is, observations of the horizontal ocean currents made from a fixed point. The main analysis tool we will learn are variance ellipses, a fundamental tool for analyzing velocity datasets or any kind of bivariate (or complex-valued) data. You'll see their relationship to the two-dimensional histogram, and will learn some associated analysis and visualization ideas, in particular the value of finding a suitable coordinate rotaiton.

For this assignment, we are going to use a mooring from the Labrador Current on the west side of the Labrador Sea known as the ‘m1244’ mooring. Please download my version of it here and put it on your Matlab search path if you have not done so already. (This is included in the full distribution of the course materials.)

This notebook requires my jlab toolbox to be installed. You will also need to have set up Jupyter Lab to work with Matlab, following these instructions.

A Quick Look at the Data

Now let's load and examine the data.

The data is organized as a structure with different fields. The field variables are as follows:

depths --- Depths of 4 different currents meters, in meters
num -- date in Matlab's "datenum" format
p --- Pressure in decibar
t --- Temperature in Centigrade
cv --- Complex velocity u+iv in cm/s, u = eastward, v = northward

Ordinarily, you would access these with the notation "m1244.cv", and so forth. However, jLab has a function called "use" that will map all the fields in a structure into variables in memory.

As you can see all of the fields now exist as variables. This makes it convenient to have different datasets with the same variable names, and swap out which one we are working with at the moment, thus keeping our variable names short and consistent.

Before we proceed let's find out the sampling interval and duration of the time series.

So we have a roughly 1.7 year record of the currents sampled every two hours.

Let's take a look at the currents the deepest depth. Note that I often like to use year.fraction as a simple time axis for time periods of a year or more.

Here a mean value is readily apparent in both the eastward and northward components. In addition, we see a lot of small-scale variability, possibly with a greater amplitude in the eastward component. Hints of multiple timescales of variability are present, with a fine noise-like variability superposed on somewhat longer timescales.

It is a little difficult to make sense of this plot, however, because it is not rotated into the most useful coordinate system. Let's figure out a sensible rotation angle.

Choosing a Coordinate Rotation

To do this, we will look at the progressive vector diagram. For a current measurement at a point, the progressive vector diagram is defined as a plot of its time integral. In other words, the progressive vector diagram shows the displacement that would occur if a particle were advected with the given currents.

Here, the currents appear to be directed due southeast. However, there is a problem with this plot: the x and y axes, which are the same physical quantity of displacement, correspond to different physical lengths! If we re-scaled the figure window, the direction of the mean flow would appear to change. Thus our perspective on the comparison between the eastward and northward currents is distorted.

This problem is fixed by setting the data aspect ratio. Not setting the data aspect ratio correctly is actually a common problem, even, regrettably, in many published papers.

We'll now remake the above figure with the correct aspect ratio.

Now we see a strong mean flow at all depths directed to the east-southeast. This matches what we saw above with the line plot, where the positive mean of the eastward currents is roughly twice as large as the negative mean of the southward currents.

Rotating by the Mean Flow Direction

It's natural to rotate our current meter data so that the east-southeastward direction of the mean flow corresponds to the first velocity component, and the direction normal to that to the second velocity component. In other words, we will rotate our $(u,v)$ data to become $)\tilde u,\tilde v)$ with $\tilde u$ corresponding to the ‘downstream’ direction and $\tilde v$ corresponds to the ‘cross-stream’ direction. We will take a little time to do this so we understand how rotations work.

Firstly we find the direction of the mean flow at the deepest current meter, where the flow is strongest.

So the angle is about 30 degrees clockwise from due east. That looks about right!

Rotation of a complex-valued number $z=u+iv$ are straightforward. To rotate $z=|z|e^{i\varphi}$ through some angle $\phi$, we simply multiply by $e^{i\phi}$. This leads to a new version of $z$ , denoted $\tilde z =\tilde u+ i\tilde v=|z|e^{i(\varphi+\phi)}$.

In this case, we want to choose the rotation angle $\phi$ as the negative of the angle of the mean flow. Let's look at the mean real and imaginary components of the velocity before and after this rotation.

This shows that after the rotation, the mean of the cross-stream velocity (the imaginary part) is zero, as expected. Now let's re-plot the progressive vector diagram.

At first glance, this plot appears rather boring. But actually, boring can be a good sign because it means we've found a way to look at our dataset in such a way that it simplifies!

Let's return to the line plot we made earlier, but now make it for the rotated velocity data.

The downstream and cross-stream components are seen to be distinctly different. The cross-stream flow oscillates about a mean of zero, which is by construction, and also presents higher-frequency variability than does the downstream flow. Or, perhaps more accuately, it appears that the cross-stream flow is lacking an intermediate-timescale component that is present in the downstream flow.

Let's take a closer look at the timescales present in this dataset with a simple lowpass filter. We'll use a 24-point filter, which corresponds to a 2-day running mean since the sampling interval is 2 hours.

Here we can see clearly that the downstream flow presents an intermediate-timescale variability that is not present in the cross-stream flow.

Our rotation, designed just based on the mean, has thus also revealed distinctions in variability. The variability of the flow is now seen as being anisotropic, that is, lacking the property of being the same in all directions. This was not visible before the rotation because the downsteam and cross-stream components were mixed.

In general, finding ways to "rotate" a dataset, perhaps in an abstract way, in such a way that variability presents anisotropic structure is a quite simple yet powerful approach we can use to unlock its information content.

Here, the different timescales between the two components is not what one would expect if the variability were entirely due to the advection of eddies past the mooring; the timescales in that case would be the same. So this plot informs the physical hypotheses we would frame regarding the nature of the variability.

Two-Dimensional Histograms

Next we will summarize the statistics of the velocity through looking at its properties on the $u,v$ plane. First, we'll make a simple line plot of $\tilde u$ versus $\tilde v$---a type of plot known as a hodograph---for the rotated versions of the currents we created earlier. We'll work with the lowpassed version of the time series for the moment.

The mean flow is plotted for reference. Note that the time integral of this plot is the progressive vector diagram.

At first glance, this plot is not very informative. It basically tells us that there is varability about the mean, with greater variability along the downstream axis then the cross-stream axis, which we already knew. But we can use statsitics to nicely summarize what we are seeing.

To do this we will look at two-dimensional distributions of the velocity. Firstly, we plot the two-dimensional histogram at the deepest depth.

This plots shows the number of observations within each bin, which we have specified to be $1\times 1$ cm/s bins ranging from -50 to 50 cm/s in both axes. The white bins are never observed.

The histogram of the velocity has a roughly elliptical shape, elongated along the x-axis. We now have a picture of a "cloud" of possibilities, with, generally speaking, bins closer to the location of the mean flow occurring more frequently.

Note this plot contains more information than was visible in the line plot, because while the line plot also showed us the shape of the distribution, it did not provide us with information as to the relative frequency of occurrence of velocity values.

Adjusting 2D Histogram Plots

Two-dimensional statistics are an exceedingly powerly analysis tool. Before proceeding, let's take a look at how we can choose our settings to get the most out of them.

First, you'll want to pay attention to your choice of bin size.

Here, the bins have been chosen to be too coarse, so we can't see much structure, and the plot has a blocky look to it.

Here, the bins have been chosen to be too fine, so the "cloud" of points has been reduced to a mist, and again, we can't see much. Thus, you'll want to play around with your bin sizes to choose one that seems to reveal the most structure. The 1 cm/s bins we used earlier seem ideal for this dataset.

You'll also want to pay attention to your choice of color scale.

If you see a splotch of all one color in your dataset, one likely explanation is that the color axes limits have been set inappropriately, wiping out some of the information. Thus you'll want to check the color axes and see if an improvement could be made.

Finally, one useful trick is to plot not the number of observations per bin, but rather the logarithm of the number of observations, like so: