BasisSmoother#

class skfda.preprocessing.smoothing.BasisSmoother(basis, *, smoothing_parameter=1.0, weights=None, regularization=None, output_points=None, method='svd', return_basis=False)[source]#

Transform raw data to a smooth functional form.

Takes functional data in a discrete form and makes an approximates it to the closest function that can be generated by the basis.a.

The fit is made so as to reduce the penalized sum of squared errors [RS05-5-2-6]:

\[PENSSE(c) = (y - \Phi c)' W (y - \Phi c) + \lambda c'Rc\]

where \(y\) is the vector or matrix of observations, \(\Phi\) the matrix whose columns are the basis functions evaluated at the sampling points, \(c\) the coefficient vector or matrix to be estimated, \(\lambda\) a smoothness parameter and \(c'Rc\) the matrix representation of the roughness penalty \(\int \left[ L( x(s)) \right] ^2 ds\) where \(L\) is a linear differential operator.

Each element of \(R\) has the following close form:

\[R_{ij} = \int L\phi_i(s) L\phi_j(s) ds\]

By deriving the first formula we obtain the closed formed of the estimated coefficients matrix:

\[\hat{c} = \left( \Phi' W \Phi + \lambda R \right)^{-1} \Phi' W y\]

The solution of this matrix equation is done using the cholesky method for the resolution of a LS problem. If this method throughs a rounding error warning you may want to use the QR factorisation that is more numerically stable despite being more expensive to compute. [RS05-5-2-8]

Parameters:
  • basis (Basis) – Basis used.

  • weights (Optional[NDArrayFloat]) – Matrix to weight the observations. Defaults to the identity matrix.

  • smoothing_parameter (float) – Smoothing parameter. Trying with several factors in a logarithm scale is suggested. If 0 no smoothing is performed. Defaults to 1.

  • regularization (Optional[L2Regularization[FDataGrid]]) – Regularization object. This allows the penalization of complicated models, which applies additional smoothing. By default is None meaning that no additional smoothing has to take place.

  • method (LstsqMethod) – Algorithm used for calculating the coefficients using the least squares method. The values admitted are ‘cholesky’, ‘qr’ and ‘svd’ for Cholesky, QR and SVD factorisation methods respectively, or a callable similar to the lstsq function. The default is ‘svd’, which is the most robust but less performant one.

  • output_points (Optional[GridPointsLike]) – The output points. If ommited, the input points are used. If return_basis is True, this parameter is ignored.

  • return_basis (bool) – If False (the default) returns the smoothed data as an FDataGrid, like the other smoothers. If True returns a FDataBasis object.

Examples

By default, this smoother returns a FDataGrid, like the other smoothers:

>>> import numpy as np
>>> import skfda
>>> t = np.linspace(0, 1, 5)
>>> x = np.sin(2 * np.pi * t) + np.cos(2 * np.pi * t) + 2
>>> x
array([ 3.,  3.,  1.,  1.,  3.])
>>> fd = skfda.FDataGrid(data_matrix=x, grid_points=t)
>>> basis = skfda.representation.basis.FourierBasis((0, 1), n_basis=3)
>>> smoother = skfda.preprocessing.smoothing.BasisSmoother(basis)
>>> fd_smooth = smoother.fit_transform(fd)
>>> fd_smooth.data_matrix.round(2)
array([[[ 3.],
        [ 3.],
        [ 1.],
        [ 1.],
        [ 3.]]])

However, the parameter return_basis can be used to return the data in basis form, by default, without extra smoothing:

>>> fd = skfda.FDataGrid(data_matrix=x, grid_points=t)
>>> basis = skfda.representation.basis.FourierBasis((0, 1), n_basis=3)
>>> smoother = skfda.preprocessing.smoothing.BasisSmoother(
...     basis,
...     method='cholesky',
...     return_basis=True,
... )
>>> fd_basis = smoother.fit_transform(fd)
>>> fd_basis.coefficients.round(2)
array([[ 2.  , 0.71, 0.71]])
>>> smoother = skfda.preprocessing.smoothing.BasisSmoother(
...     basis,
...     method='qr',
...     return_basis=True,
... )
>>> fd_basis = smoother.fit_transform(fd)
>>> fd_basis.coefficients.round(2)
array([[ 2.  , 0.71, 0.71]])
>>> smoother = skfda.preprocessing.smoothing.BasisSmoother(
...     basis,
...     method='svd',
...     return_basis=True,
... )
>>> fd_basis = smoother.fit_transform(fd)
>>> fd_basis.coefficients.round(2)
array([[ 2.  , 0.71, 0.71]])
>>> smoother.hat_matrix().round(2)
array([[ 0.43,  0.14, -0.14,  0.14,  0.43],
       [ 0.14,  0.71,  0.29, -0.29,  0.14],
       [-0.14,  0.29,  0.71,  0.29, -0.14],
       [ 0.14, -0.29,  0.29,  0.71,  0.14],
       [ 0.43,  0.14, -0.14,  0.14,  0.43]])

We can penalize approximations that are not smooth enough using some kind of regularization:

>>> from skfda.misc.regularization import L2Regularization
>>> from skfda.misc.operators import LinearDifferentialOperator
>>>
>>> fd = skfda.FDataGrid(data_matrix=x, grid_points=t)
>>> basis = skfda.representation.basis.FourierBasis((0, 1), n_basis=3)
>>> smoother = skfda.preprocessing.smoothing.BasisSmoother(
...     basis,
...     method='cholesky',
...     regularization=L2Regularization(
...         LinearDifferentialOperator([0.1, 0.2]),
...     ),
...     return_basis=True,
... )
>>> fd_basis = smoother.fit_transform(fd)
>>> fd_basis.coefficients.round(2)
array([[ 2.04,  0.51,  0.55]])
>>> fd = skfda.FDataGrid(data_matrix=x, grid_points=t)
>>> basis = skfda.representation.basis.FourierBasis((0, 1), n_basis=3)
>>> smoother = skfda.preprocessing.smoothing.BasisSmoother(
...     basis,
...     method='qr',
...     regularization=L2Regularization(
...         LinearDifferentialOperator([0.1, 0.2]),
...     ),
...     return_basis=True,
... )
>>> fd_basis = smoother.fit_transform(fd)
>>> fd_basis.coefficients.round(2)
array([[ 2.04,  0.51,  0.55]])
>>> fd = skfda.FDataGrid(data_matrix=x, grid_points=t)
>>> basis = skfda.representation.basis.FourierBasis((0, 1), n_basis=3)
>>> smoother = skfda.preprocessing.smoothing.BasisSmoother(
...     basis,
...     method='svd',
...     regularization=L2Regularization(
...         LinearDifferentialOperator([0.1, 0.2]),
...     ),
...     return_basis=True,
... )
>>> fd_basis = smoother.fit_transform(fd)
>>> fd_basis.coefficients.round(2)
array([[ 2.04,  0.51,  0.55]])

References

[RS05-5-2-6]

Ramsay, J., Silverman, B. W. (2005). How spline smooths are computed. In Functional Data Analysis (pp. 86-87). Springer.

[RS05-5-2-8]

Ramsay, J., Silverman, B. W. (2005). HSpline smoothing as an augmented least squares problem. In Functional Data Analysis (pp. 86-87). Springer.

Methods

fit(X[, y])

Compute the hat matrix for the desired output points.

fit_transform(X[, y])

Fit to data, then transform it.

get_metadata_routing()

Get metadata routing of this object.

get_params([deep])

Get parameters for this estimator.

hat_matrix([input_points, output_points])

score(X, y)

Return the generalized cross validation (GCV) score.

set_output(*[, transform])

Set output container.

set_params(**params)

Set the parameters of this estimator.

transform(X[, y])

Smooth the data.

fit(X, y=None)[source]#

Compute the hat matrix for the desired output points.

Parameters:
Returns:

self

Return type:

BasisSmoother

fit_transform(X, y=None, **fit_params)[source]#

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Input samples.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).

  • **fit_params (dict) – Additional fit parameters.

Returns:

X_new – Transformed array.

Return type:

ndarray array of shape (n_samples, n_features_new)

get_metadata_routing()#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:

routing – A MetadataRequest encapsulating routing information.

Return type:

MetadataRequest

get_params(deep=True)#

Get parameters for this estimator.

Parameters:

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params – Parameter names mapped to their values.

Return type:

dict

hat_matrix(input_points=None, output_points=None)[source]#
Parameters:
  • input_points (Union[ArrayLike, Sequence[ArrayLike]] | None) –

  • output_points (Union[ArrayLike, Sequence[ArrayLike]] | None) –

Return type:

ndarray[Any, dtype[float64]]

score(X, y)[source]#

Return the generalized cross validation (GCV) score.

Parameters:
  • X (FDataGrid) – The data to smooth.

  • y (FDataGrid) – The target data. Typically the same as X.

Returns:

Generalized cross validation score.

Return type:

float

set_output(*, transform=None)#

Set output container.

See Introducing the set_output API for an example on how to use the API.

Parameters:

transform ({"default", "pandas"}, default=None) –

Configure output of transform and fit_transform.

  • ”default”: Default output format of a transformer

  • ”pandas”: DataFrame output

  • ”polars”: Polars output

  • None: Transform configuration is unchanged

New in version 1.4: “polars” option was added.

Returns:

self – Estimator instance.

Return type:

estimator instance

set_params(**params)#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**params (dict) – Estimator parameters.

Returns:

self – Estimator instance.

Return type:

estimator instance

transform(X, y=None)[source]#

Smooth the data.

Parameters:
Returns:

Smoothed data.

Return type:

FData