# POLYSMOOTH is the jMap module of jLab.

``` POLYSMOOTH  Smoothing scattered 2D data with local polynomial fitting.

POLYSMOOTH generates a map from scattered data in two dimensions---
either on the plane or on the sphere---using a local least squares fit
to a polynomial.

A numerically efficient algorithm is used that avoids explicit loops.
Also, the data are pre-sorted so that different mapping parameters can
be tried out at little computational expense.

A reference paper for this function is currently in preparation.  In
the meantime, for more details see the following book:

Fan and Gijbels, 1996.  Local polynomial modelling and its
applications. Chapman and Hall.
__________________________________________________________________

Smoothing on the plane

Let's say we have an array Z of data is at locations X,Y, where X,Y,
and Z are all arrays of the same size.  The problem is to obtain a
mapped field ZHAT on some regular grid specified by arrays XO and YO.

Calling POLYSMOOTH is a two-step process:

[DS,XS,YS,ZS]=TWODSORT(X,Y,Z,XO,YO,CUTOFF);
ZHAT=POLYSMOOTH(DS,XS,YS,ZS,B,P);

Firstly, TWODSORT which returns ZS, a 3D array of data values at each
grid point, sorted by increasing distance DS, and the corresponding
positions XS and YS. Here XO and YO are arrays specifying the bin
center locations of the grid.  See TWODSORT for more details.

Here CUTOFF determines the maximum distance included in the sorting
and should be chosen to be greater than B.

Secondly, POLYSMOOTH fits a Pth order polynomial at each gridpoint
within a neighborhood specified by the "bandwidth" B.

Data and grid point locations are presumed to have the same units as
the bandwidth B (e.g., kilometers).

P may be chosen as P=0 (fit to a constant), P=1 (fit to a plane), or
else P=2 (fit to a parabolic surface).

POLYSMOOTH can also be used for data on the sphere, as described next.
__________________________________________________________________

Smoothing on the sphere

POLYSMOOTH supports a local polynomial fit on the sphere.  As before
this is a two-step process:

[DS,XS,YS,ZS]=SPHERESORT(LAT,LON,Z,LATO,LONO,CUTOFF);
ZHAT=POLYSMOOTH(DS,XS,YS,ZS,B,P,'sphere')

Firstly one calls SPHERESORT, which is the analogue of TWODSORT for the
sphere.  LATO and LONO are arrays specifying the bin center locations
of the grid.  See SPHERESORT for more details.

Secondly, POLYSMOOTH fits a Pth order polynomial at each gridpoint
within a neighborhood specified by the "bandwidth" B.  The bandwidth
here should have units of kilometers.

Note that SPHERESORT and POLYSMOOTH both assume the sphere to be the
radius of the earth, as specified by the function RADEARTH.
__________________________________________________________________

Multiple bandwidths

Note that the bandwidth B may be an array, in which case ZHAT, as well
as the other output fields described below, will have LENGTH(B) 'pages'
along their third dimensions.

This can result in a significant speed improvement compared with
calling POLYSMOOTH by looping over multiple bandwith values.  The
reason is that POLYSMOOTH performs a computationally expensive initial
sort to remove non-finite values from the grid.  In calling POLYSMOOTH
with an array-valued B, this sort only need to be performed one time.
__________________________________________________________________

Additional options

A number of different options are possible, as descibed below.  These
are specified with trailing string arguments, which can be given in any
order provided they are after the numeric arguments.
__________________________________________________________________

Weighting function

POLYSMOOTH weights the data points in the vicinity of each grid point
according to some decaying function of distance.

1) Parabolic weighting

POLYSMOOTH(...,'epa'), the default behavior, uses the Epanechnikov
kernel --- a fancy name for a parabolic weighting function.  This
function vanishes outside a radius of the bandwidth B.

The Epanechnikov kernel has the numerical advantage that distances from
far away points do not have to be computed, and is therefore preferred.

2) Gaussian weighting:

POLYSMOOTH(...,'gau') uses a Gaussian weighting function instead, again
vanishing outside of radius B, and with a standard deviation of B/3.
That is, the points more than 3-sigma from the center are set to zero.
__________________________________________________________________

Additional output arguments

[ZHAT,R,N,C]=POLYSMOOTH(...) returns the weighted average distance R of
all the data points used at each grid point, the number N of such data
points, and the condition number C.  All are the same size as ZHAT.

R is computed using the same weighting function as for the mapping.
Large R values mean that a typical data point contribution was far from
the current grid point, indicated that ZHAT is potentially biased.

For uniform data spacing with the Epanechnikov kernel, the weighted
mean radius R is 0.533*B, while for the Gaussian kernel it is 0.410*B.

Small numbers N of enclosed data points means that little averaging was
involved, indicating that the values of ZHAT may be more subject to
variance, and/or less representative, than at points where N is large.

Note that R includes the effect of the additional weighting factor, if
input as described below, whereaas N does not.

The matrix condition number C is computed by COND.  At points where C
is large, the least squares solution may be unstable, and one should
consider reverting to a lower-order solution.  Note that for zeroth-
order fit with P=0, the matrix condition number is equal to one.
__________________________________________________________________

Estimated derivatives

As a part of the least squares problem, derivatives of order P and
lower are estimated as a part of the Pth-order fit.

[ZHAT,R,N,C,ZX,ZY]=POLYSMOOTH(...) with P>1 returns the estimated X and
Y, or zonal and meridional, derivatives of Z.

[ZHAT,R,N,C,ZX,ZY,ZXX,ZXY,ZYY]=POLYSMOOTH(...) with P=2 similarly
returns the estimated second partial derivatives of Z.

Derivative fields are set to values of NaNs if they are not explicitly
computed, so for example, [ZHAT,R,N,C,ZX,ZY]=POLYSMOOTH(...) with P=0
will lead to ZX and ZY being arrays of NaNs of the same size as ZHAT.
__________________________________________________________________

Variable bandwidth

ZHAT=POLYSMOOTH(...,N,P,'variable') uses the parameter N instead of a
fixed bandwidth.  The bandwidth now varies over the grid such that N
points fall within one bandwidth distance from every grid point.

[ZHAT,R,B,C]=POLYSMOOTH(...,N,P,'variable') returns as its third output
argument the bandwidth B at each gridpoint rthat than the total number
of data points N.  All other output arguments are unchanged.

The variable bandwidth algorithm can give good results when the data
spacing is highly uneven, particularly when used with a higher-order
(P=1 or P=2) fit.

Derivative fields may also be output in this case, as described above.

When using the variable bandwidth method, be aware that the bandwidth
necessary to include N points may turn out to be larger than CUTOFF,
the maximum distance value given to TWODSORT or SPHERESORT.
__________________________________________________________________

Weighted data points

POLYSMOOTH can incorporate an additional weighting factor on the data
points. Let W be an array of positive values the same size as the data
array Z.  Each data point is then treated as if it were W data points.

To use weighted data points, call TWODSORT or SPHERESORT with W added:

[DS,XS,YS,ZS,WS]=TWODSORT(X,Y,Z,W,XO,YO,CUTOFF);
or [DS,XS,YS,ZS,WS]=SPHERESORT(LAT,LON,Z,W,LATO,LONO,CUTOFF);

followed by    ZHAT=POLYSMOOTH(DS,XS,YS,ZS,WS,B,P);
or    ZHAT=POLYSMOOTH(DS,XS,YS,ZS,WS,B,P,'sphere');

for data distributed on the plane or on the sphere, respectively.

For very large datasets, this may be useful in first condensing the
data into a manageable size, or alternatively it may be used to weight
observations according to a measure of their presumed quality.
__________________________________________________________________

Algorithms

POLYSMOOTH has several algorithm choices for optimal performance, both
of which give identical answers.

1) Speed optimization

The default algorithm is optimized for speed but uses a great deal of
memory.  It solves the least squares problem for all grid points
simultaneously (by directly solving matrix inversions---see MATINV.)

This can be greater than an order of magnitude faster than the obvious
approach of looping over the grid points.

2) Memory optimization

ZHAT=POLYSMOOTH(DS,XS,YS,ZS,B,P,'memory') performs an explicit loop.
This is mostly used for testing, purposes, but may also be useful if
the dataset is so large that memory becomes a limiting factor.
__________________________________________________________________

'polysmooth --t' runs some tests.
'polysmooth --f' generates some sample figures.
'polysmooth --f2' with jData installed generates the figure shown
above, which may require a relatively powerful computer.

Usage:  [ds,xs,ys,zs]=twodsort(x,y,z,xo,yo,cutoff);
zhat=polysmooth(ds,xs,ys,zs,b,p);
[zhat,R,N,C]=polysmooth(ds,xs,ys,zs,b,p);
--or--
[ds,xs,ys,zs]=spheresort(lat,lon,z,w,lato,lono,cutoff);
[zhat,R,N,C]=polysmooth(ds,xs,ys,zs,b,p,'sphere');
__________________________________________________________________
This is part of JLAB --- type 'help jlab' for more information
(C) 2008--2017 J.M. Lilly --- type 'help jlab_license' for details
```