MissingValuesInterpolation#
- class skfda.preprocessing.missing.MissingValuesInterpolation[source]#
Class to interpolate missing values.
Missing values are represented as NaNs. They are interpolated from nearby values with valid data. Note that this may be a poor choice if there are large contiguous portions of the function with missing values, as some of them would be inferred from very far away points.
Examples
It is possible to interpolate NaNs scalar-valued univariate functions:
>>> from skfda import FDataGrid >>> from skfda.preprocessing.missing import MissingValuesInterpolation >>> import numpy as np
>>> X = FDataGrid([ ... [1, 2, np.nan, 4], ... [5, np.nan, 7, 8], ... [9, 10, np.nan, 12], ... ]) >>> nan_interp = MissingValuesInterpolation() >>> X_transformed = nan_interp.fit_transform(X) >>> X_transformed.data_matrix[..., 0] array([[ 1., 2., 3., 4.], [ 5., 6., 7., 8.], [ 9., 10., 11., 12.]])
For vector-valued functions each coordinate is interpolated independently:
>>> X = FDataGrid( ... [ ... [ ... (1, 5), ... (2, np.nan), ... (np.nan, 7), ... (4, 8), ... ], ... [ ... (9, 13), ... (10, np.nan), ... (np.nan, np.nan), ... (12, 16), ... ], ... ], ... grid_points=np.linspace(0, 1, 4) ... ) >>> nan_interp = MissingValuesInterpolation() >>> X_transformed = nan_interp.fit_transform(X) >>> X_transformed.data_matrix array([[[ 1., 5.], [ 2., 6.], [ 3., 7.], [ 4., 8.]], [[ 9., 13.], [ 10., 14.], [ 11., 15.], [ 12., 16.]]])
For multivariate functions, such as surfaces all dimensions are considered. This is currently done using
LinearNDInterpolator
, which triangulates the space and performs linear barycentric interpolation:>>> X = FDataGrid( ... [ ... [ ... [1, 2, 3, 4], ... [5, np.nan, 7, 8], ... [10, 10, np.nan, 10], ... [13, 14, 15, 16], ... ], ... ], ... grid_points=(np.linspace(0, 1, 4), np.linspace(0, 1, 4)) ... ) >>> nan_interp = MissingValuesInterpolation() >>> X_transformed = nan_interp.fit_transform(X) >>> X_transformed.data_matrix[..., 0] array([[[ 1., 2., 3., 4.], [ 5., 6., 7., 8.], [ 10., 10., 11., 10.], [ 13., 14., 15., 16.]]])
Methods
fit
(X[, y])fit_transform
(X[, y])Fit to data, then transform it.
Get metadata routing of this object.
get_params
([deep])Get parameters for this estimator.
set_output
(*[, transform])Set output container.
set_params
(**params)Set the parameters of this estimator.
transform
(X)- fit(X, y=None)[source]#
- Parameters:
self (SelfType) –
X (Input) –
y (Target | None) –
- Return type:
SelfType
- fit_transform(X, y=None, **fit_params)[source]#
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Input samples.
y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).
**fit_params (dict) – Additional fit parameters.
- Returns:
X_new – Transformed array.
- Return type:
ndarray array of shape (n_samples, n_features_new)
- get_metadata_routing()#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
routing – A
MetadataRequest
encapsulating routing information.- Return type:
MetadataRequest
- get_params(deep=True)#
Get parameters for this estimator.
- set_output(*, transform=None)#
Set output container.
See Introducing the set_output API for an example on how to use the API.
- Parameters:
transform ({"default", "pandas"}, default=None) –
Configure output of transform and fit_transform.
”default”: Default output format of a transformer
”pandas”: DataFrame output
”polars”: Polars output
None: Transform configuration is unchanged
New in version 1.4: “polars” option was added.
- Returns:
self – Estimator instance.
- Return type:
estimator instance
- set_params(**params)#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
**params (dict) – Estimator parameters.
- Returns:
self – Estimator instance.
- Return type:
estimator instance