class: center, middle
.title[Statistics and Structures:
Examples from Oceanography]
.author[Jonathan Lilly] .institution[Planetary Science Institute, Tucson, Arizona]
.date[December 14, 2024]
.note[Created with [{Liminal}](https://github.com/jonathanlilly/liminal) using [{Remark.js}](http://remarkjs.com/) + [{Markdown}](https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet) + [{KaTeX}](https://katex.org)] --- class: center ##Multi-Scale Variability in Earth’s Oceans
Sea surface height slope magnitude from a global simulation by H. Simmons, University of Alaska Fairbanks. --- class: center ## The Global Surface Drifter Dataset
A dataset of freely-drifting instruments tracking the surface currents with `$\sim\!30$`K instruments, `$\sim\!50$`M measurements. Known as *Lagrangian trajectories*. Multi-scale information, not readily accessed. Need statistical tools! --- class: center ## Coherent Eddies, A Feature of Interest
To make an eddy, stir your coffee and multiply by about 10 million. Very important features for the ocean circulation! How can we study these from Lagrangian Trajectories? --- class: left ##Step 1: A Conceptual Model A model for Lagrangian trajectories with eddies: `\[z(t)=x(t) + \mathrm{i} y(t) =z_\epsilon(t) + \boxed{z_\star(t)}.\]` Stochastic background portion $z\_\epsilon(t)$ plus oscillatory portion $z\_\star(t)$. The oscillatory portion $z\_\star(t)$ is modeled as a *modulated ellipse*, a bivariate generalization of the notion of an AM/FM signal: `\[z_\star(t)=\mathrm{e}^{\mathrm{i}\theta(t)}\left[a(t)\cos \phi(t)+\mathrm{i}b(t)\sin \phi(t)\right].\]`
--- class: center ##Step 2: A Structure Detector & Extractor
An application of the wavelet transform to identify curves, called
ridges
, that trace out frequency-modulated oscillations. --- class: left ##Recovering Modulated Oscillations The wavelet transform is a tool for
recovering
or
estimating
the properties of modulated oscillation immersed in background noise. The idea is to project the time series onto an oscillatory test signal, or
wavelet
, to find a “best fit” frequencies at each moment. The fits are then chained together into a continuous curve called a
ridge.
For a vector-valued signal `$\mathbf{x}_o(t)=\begin{bmatrix}x_1(t) & x_2(t) & \cdots & x_N(t)\end{bmatrix}^T$` in noise `$\mathbf{x}_\epsilon(t)$`, `$\mathbf{x}(t)=\mathbf{x}_o(t)+\mathbf{x}_\epsilon(t)$`, define the wavelet transform `\[\mathbf{w}(t ,s) \equiv \int_{-\infty}^{\infty} \frac{1}{s} \psi^*\left(\frac{\tau-t}{s}\right)\,\mathbf{x}(\tau)\,\mathrm{d} \tau\]` (also a vector) and then find the
wavelet ridges
`$s(t)$` from `\[\frac{\partial}{\partial s}\, \left\|\mathbf{w}(t ,s)\right\| = 0,\quad\quad \frac{\partial^2}{\partial s^ 2}\, \left\|\mathbf{w}(t ,s)\right\| < 0.\]` The oscillation is estimated simply by `$\widehat{\mathbf{x}_o}(t)\equiv\Re\left\{\mathbf{w}(t,s(t))\right\}.$` --- class: center ##Application to the Gulf of Mexico
Problem in application to large datasets: false positives. --- class: center ## A Noise Dataset & Null Hypothesis
To assess significance, a noise dataset is created. Isotropic velocity spectrum $S\_{\varepsilon\varepsilon}(\omega) \equiv \min\left\\\{\widehat S\_{++}(\omega),\widehat S\_{--}(\omega) \right\\\}$. Same variance, initial location, and mean velocity as original data— but no quasi-oscillatory structures. --- class: center ## Significance Test Using Survival Function
Signals are expected to be both long-duration and near-circular. Define a significance parameter as $X\equiv L \overline{\zeta}^{\,4}$. $L$ = ridge duration, $\overline{\zeta}$ = time-averaged polarization $\frac{2a|b|}{a^2+b^2}$ Compare survival function `$\widehat{\mathcal{S}}_{X}(x;\omega)$` with that of noise `$\widehat{\mathcal{S}}_{X}^\varepsilon(x;\omega)$`: `\[\rho_{X}(x;\omega)\equiv \frac{\widehat{\mathcal{S}}_{X}^\varepsilon(x;\omega)}{\widehat{\mathcal{S}}_{X}(x;\omega)}\]` --- class: center ## Ridge Properties Before Editing
Remove inertial oscillations (below -0.5) on physical grounds. Remove all ridges significant at less than the 90% level. --- class: center ## Ridge Properties After Editing
Major asymmetries: (i) intense submesoscale cyclones; (ii) mesoscale cylcones; (iii) anticyclonic Loop Current Eddies --- class: center ##Application to the Gulf of Mexico
After applying the wavelet ridge analysis followed by the significance test, we obtain the following $\Longrightarrow$. --- class: center ## $|Ro|\gt 1/6$ Cyclones Colored by $Ro\_\star$
--- class: center ## $|Ro|\gt 1/6$ Cyclones Colored by $L\_\star$
--- class: left ##Analysis of Modulated Oscillations in Large, Noisy Datasets This analysis method consists of three aspects: 1. Extracting oscillatory features using multivariate wavelet ridge analysis. .cite[[{Lilly and Olhede (2012a)}](https://jmlilly.net/papers/lilly12-itsp-cp.pdf)] 3. Assessing statistical confidence through comparison with a null hypothesis. .cite[[{Lilly and Pérez-Brunius (2021b)}](https://npg.copernicus.org/articles/28/181/2021/npg-28-181-2021.pdf)] 2. Estimating bias arising from amplitude and frequency modulation. .cite[[{Lilly and Olhede (2010)}](https://jmlilly.net/papers/lilly10-itit_cp.pdf)] We have applied this method to study oceanic vortices in Lagrangian trajectories, but its applicability is likely much broader. --- class: center ## Satellite Observations of the Oceans
The Jason-class satellite altimeter measures the height of the ocean surface with an accuracy of a few mm (!!), producing global maps every 9.92 days for the past 32 years. These measurements give key insight into the ocean currents. --- class: center ## Detection of Isolated Anomalies
A few dozen coefficients (center) capture all of the structure in this data; the residual (right) appears to be random noise. --- class: left ## A Model for a Localized Signal The type of signal we expect for eddies observed by along-track data consists of short-duration, isolated ‘bursts’ or ‘impulses’—that is, events which may be represented as a wavelet or the temporal integral of a wavelet. This type of signal may also be appropriate for other data as well. This suggests a model for a real, univariate signal of the form `\begin{equation}\label{signalmodel} x(t) = \sum_{n=1}^N \Re\left\{c_n \psi\left(\frac{t-t_n}{\rho_n}\right)\right\} +x_\epsilon(t) \end{equation}` consisting of superpositions of amplitude scaled, stretched, and phase-shifted versions of some basis signal `$\psi(t)$`, called the *element*. It is assumed that the different realizations of the element are sufficiently separated in time and frequency such that they do not interfere, in a way that will be made precise later. --- class: center ## Localized Wavelets
These are generalized Morse wavelets, $\psi_{\beta,\gamma}(t)$. The method is suitable for events that themselves resemble wavelets, or the temporal integral of a wavelet (the `$\beta=0$` case). --- class: left ## Signal Model Our signal model, using the generalized Morse wavelets, is `\begin{equation}\label{morseelementmodel} x(t) = \sum_{n=1}^N \Re \left\{ c_n \psi_{\mu,\gamma}\left(\frac{t-t_n}{\rho_n}\right)\right\} +x_\epsilon(t) \end{equation}` where `$\mu$` is the most suitable value of the order or `$\beta$` parameter. The `$n$`th event is then characterized by coefficient `$c_n$`, time `$t_n$`, and scale `$\rho_n$`. We wish to *estimate* these unknown quantities. We then take the wavelet transform with an order `$\beta$` wavelet in the same `$\gamma$` family, leading to `\begin{equation}\label{transformofelementmodel} w_{\beta,\gamma}(\tau,s)=\frac{1}{2}\sum_{n=1}^N c_n\int_{-\infty}^{\infty} \frac{1}{s} \psi_{\beta,\gamma}^*\left(\frac{t-\tau}{s}\right)\psi_{\mu,\gamma}\left(\frac{t-t_n}{\rho_n}\right)\,d t+\varepsilon_{\beta,\gamma}(\tau,s). \end{equation}` Owing to the properties of the generalized Morse wavelets, the integral in the above equation has a closed-form expression in terms of a rescaled `$\psi_{\beta+\mu,\gamma}(t)$` wavelet. --- class: left ## Event Detection We then find *isolated maxima* of the transform, that is, time-scale `$(\tau,s)$` points at which `\begin{multline} \quad\quad\quad\quad\frac{\partial}{\partial \tau}\left|w_{\beta,\gamma}(\tau,s)\right| =\frac{\partial}{\partial s}\left|w_{\beta,\gamma}(\tau,s)\right|=0, \\\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\frac{\partial^2}{\partial \tau^2}\left|w_{\beta,\gamma}(\tau,s)\right|<0,\quad\frac{\partial^2}{\partial s^2}\left|w_{\beta,\gamma}(\tau,s)\right|<0.\label{maxconditions} \end{multline}` Because we have a simple expression for `$w_{\beta,\gamma}(\tau,s)$`, we can relate the event properties `$c_n$` `$t_n$`, and `$\rho_n$` to the value of `$w_{\beta,\gamma}(\tau,s)$` at a maxima. This lets us work backwards from the maxima points to estimates of the event properties. --- class: center ##An Example of Events in White Noise
The events are based on the `$\psi_{2,2}(t)$` wavelet in this case. Grey dots are all maxima, black are significant and isolated. --- class: center ##A Significance Test
Again we look at the rate at which events occur in data vs. noise. The noise model in this case is colored noise of arbitrary slope. The noise distribution is found via Monte Carlo simulation. But, exploiting similarity, we need only simulate a five-vector at each scale—we don't have to simulate a time series and then transform. --- class: left ##Analysis of Localized Features in Large, Noisy Datasets This method for identifying isolated signal anomalies is termed *element analysis*, see .cite[[{Lilly (2017)}](https://royalsocietypublishing.org/doi/pdf/10.1098/rspa.2016.0776)]. It consists of several steps: 1. Choice of element—a function matching proposed signals 1. Event detection via maxima of the wavelet transform 2. Rejection of statistically insignificant events 3. Rejection of non-isolated events Again it is expected that this method could be more broadly applicable. --- class: left ##Conclusions In the study of oceanic vortices, the need emerges for rigorous methods for analyzing (i) modulated oscillations and (ii) impulsive, localized signals. Methods for both problems have been devised by combining the continuous wavelet transform with the null hypothesis of a colored noise process. The survival function ratio is identified as a useful means of assessing the *density* of detections compared with the null, and therefore statistical signficance. A general lesson from these efforts is the central importance of a suitable structural model—here,a modulated oscillation, or something which itself resembles a wavelet—as a basis for framing statistical questions. All papers and software available at [{www.jmlilly.net}](https://jmlilly.net). Thanks!