MissingValuesInterpolation#

class skfda.preprocessing.missing.MissingValuesInterpolation[source]#

Class to interpolate missing values.

Missing values are represented as NaNs. They are interpolated from nearby values with valid data. Note that this may be a poor choice if there are large contiguous portions of the function with missing values, as some of them would be inferred from very far away points.

Examples

It is possible to interpolate NaNs scalar-valued univariate functions:

>>> from skfda import FDataGrid
>>> from skfda.preprocessing.missing import MissingValuesInterpolation
>>> import numpy as np

>>> X = FDataGrid([
...     [1, 2, np.nan, 4],
...     [5, np.nan, 7, 8],
...     [9, 10, np.nan, 12],
... ])
>>> nan_interp = MissingValuesInterpolation()
>>> X_transformed = nan_interp.fit_transform(X)
>>> X_transformed.data_matrix[..., 0]
array([[ 1.,  2.,  3.,  4.],
       [ 5.,  6.,  7.,  8.],
       [ 9., 10., 11., 12.]])

For vector-valued functions each coordinate is interpolated independently:

>>> X = FDataGrid(
...     [
...         [
...             (1, 5),
...             (2, np.nan),
...             (np.nan, 7),
...             (4, 8),
...         ],
...         [
...             (9, 13),
...             (10, np.nan),
...             (np.nan, np.nan),
...             (12, 16),
...         ],
...     ],
...     grid_points=np.linspace(0, 1, 4)
... )
>>> nan_interp = MissingValuesInterpolation()
>>> X_transformed = nan_interp.fit_transform(X)
>>> X_transformed.data_matrix 
array([[[  1.,  5.],
        [  2.,  6.],
        [  3.,  7.],
        [  4.,  8.]],
       [[  9., 13.],
        [ 10., 14.],
        [ 11., 15.],
        [ 12., 16.]]])

For multivariate functions, such as surfaces all dimensions are considered. This is currently done using LinearNDInterpolator, which triangulates the space and performs linear barycentric interpolation:

>>> X = FDataGrid(
...     [
...         [
...             [1, 2, 3, 4],
...             [5, np.nan, 7, 8],
...             [10, 10, np.nan, 10],
...             [13, 14, 15, 16],
...         ],
...     ],
...     grid_points=(np.linspace(0, 1, 4), np.linspace(0, 1, 4))
... )
>>> nan_interp = MissingValuesInterpolation()
>>> X_transformed = nan_interp.fit_transform(X)
>>> X_transformed.data_matrix[..., 0]
array([[[  1.,   2.,   3.,   4.],
        [  5.,   6.,   7.,   8.],
        [ 10.,  10.,  11.,  10.],
        [ 13.,  14.,  15.,  16.]]])

Methods

`fit`(X[, y])
`fit_transform`(X[, y])	Fit to data, then transform it.
`get_metadata_routing`()	Get metadata routing of this object.
`get_params`([deep])	Get parameters for this estimator.
`set_output`(*[, transform])	Set output container.
`set_params`(**params)	Set the parameters of this estimator.
`transform`(X)

fit(X, y=None)[source]#

Parameters:

self (SelfType) –
X (Input) –
y (Target | None) –

Return type:

SelfType

fit_transform(X, y=None, **fit_params)[source]#

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:

X (array-like of shape (n_samples, n_features)) – Input samples.
y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).
**fit_params (dict) – Additional fit parameters.

Returns:

X_new – Transformed array.

Return type:

ndarray array of shape (n_samples, n_features_new)

get_metadata_routing()#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:: routing – A MetadataRequest encapsulating routing information.
Return type:: MetadataRequest

get_params(deep=True)#

Get parameters for this estimator.

Parameters:: deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:: params – Parameter names mapped to their values.
Return type:: dict

set_output(*, transform=None)#

Set output container.

See Introducing the set_output API for an example on how to use the API.

Parameters:

transform ({"default", "pandas"}, default=None) –

Configure output of transform and fit_transform.

”default”: Default output format of a transformer
”pandas”: DataFrame output
”polars”: Polars output
None: Transform configuration is unchanged

New in version 1.4: “polars” option was added.

Returns:

self – Estimator instance.

Return type:

estimator instance

set_params(**params)#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:: **params (dict) – Estimator parameters.
Returns:: self – Estimator instance.
Return type:: estimator instance

transform(X)[source]#

Parameters:: X (T) –
Return type:: T