SmoothingParameterSearch#

class skfda.preprocessing.smoothing.validation.SmoothingParameterSearch(estimator, param_values, *, param_name='smoothing_parameter', scoring=None, n_jobs=None, verbose=0, pre_dispatch='2*n_jobs', error_score=nan)[source]#

Chooses the best smoothing parameter and performs smoothing.

Performs the smoothing of a FDataGrid object choosing the best parameter of a given list using a cross validation scoring method.

Note

This is similar to fitting a scikit-learn GridSearchCV over the data, using the cv_method as a scorer.

Parameters:
  • estimator (smoother estimator) – scikit-learn compatible smoother.

  • param_values (iterable) – iterable containing the values to test for smoothing_parameter.

  • scoring (scoring method) – scoring method used to measure the performance of the smoothing. If None (the default) the score method of the estimator is used.

  • n_jobs (int or None, optional (default=None)) – Number of jobs to run in parallel. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors. See scikit-learn Glossary for more details.

  • pre_dispatch (int, or string, optional) –

    Controls the number of jobs that get dispatched during parallel execution. Reducing this number can be useful to avoid an explosion of memory consumption when more jobs get dispatched than CPUs can process. This parameter can be:

    • None, in which case all the jobs are immediately created and spawned. Use this for lightweight and fast-running jobs, to avoid delays due to on-demand spawning of the jobs

    • An int, giving the exact number of total jobs that are spawned

    • A string, giving an expression as a function of n_jobs, as in ‘2*n_jobs’

  • verbose (integer) – Controls the verbosity: the higher, the more messages.

  • error_score ('raise' or numeric) – Value to assign to the score if an error occurs in estimator fitting. If set to ‘raise’, the error is raised. If a numeric value is given, FitFailedWarning is raised. This parameter does not affect the refit step, which will always raise the error. Default is np.nan.

  • param_name (str) –

Examples

Creates a FDataGrid object of the function \(y=x^2\) and peforms smoothing by means of the k-nearest neighbours method.

>>> import skfda
>>> from skfda.preprocessing.smoothing import KernelSmoother
>>> from skfda.misc.hat_matrix import KNeighborsHatMatrix
>>> x = np.linspace(-2, 2, 5)
>>> fd = skfda.FDataGrid(x ** 2, x)
>>> grid = SmoothingParameterSearch(
...         KernelSmoother(
...             kernel_estimator=KNeighborsHatMatrix()),
...         [2,3],
...         param_name='kernel_estimator__n_neighbors')
>>> _ = grid.fit(fd)
>>> np.array(grid.cv_results_['mean_test_score']).round(2)
array([-11.67, -12.37])
>>> round(grid.best_score_, 2)
-11.67
>>> grid.best_params_['kernel_estimator__n_neighbors']
2
>>> grid.best_estimator_.hat_matrix().round(2)
array([[ 0.5 , 0.5 , 0.  , 0.  , 0.  ],
       [ 0.33, 0.33, 0.33, 0.  , 0.  ],
       [ 0.  , 0.33, 0.33, 0.33, 0.  ],
       [ 0.  , 0.  , 0.33, 0.33, 0.33],
       [ 0.  , 0.  , 0.  , 0.5 , 0.5 ]])
>>> grid.transform(fd).round(2)
FDataGrid(
    array([[[ 2.5 ],
            [ 1.67],
            [ 0.67],
            [ 1.67],
            [ 2.5 ]]]),
    grid_points=(array([-2., -1.,  0.,  1.,  2.]),),
    domain_range=((-2.0, 2.0),),
    ...)

Other validation methods can be used such as cross-validation or general cross validation using other penalization functions.

>>> grid = SmoothingParameterSearch(
...         KernelSmoother(
...             kernel_estimator=KNeighborsHatMatrix()),
...         [2,3],
...         param_name='kernel_estimator__n_neighbors',
...         scoring=LinearSmootherLeaveOneOutScorer())
>>> _ = grid.fit(fd)
>>> np.array(grid.cv_results_['mean_test_score']).round(2)
array([-4.2, -5.5])
>>> grid = SmoothingParameterSearch(
...         KernelSmoother(
...             kernel_estimator=KNeighborsHatMatrix()),
...         [2,3],
...         param_name='kernel_estimator__n_neighbors',
...         scoring=LinearSmootherGeneralizedCVScorer(
...                         akaike_information_criterion))
>>> _ = grid.fit(fd)
>>> np.array(grid.cv_results_['mean_test_score']).round(2)
array([ -9.35, -10.71])
>>> grid = SmoothingParameterSearch(
...         KernelSmoother(
...             kernel_estimator=KNeighborsHatMatrix()),
...         [2,3],
...         param_name='kernel_estimator__n_neighbors',
...         scoring=LinearSmootherGeneralizedCVScorer(
...                         finite_prediction_error))
>>> _ = grid.fit(fd)
>>> np.array(grid.cv_results_['mean_test_score']).round(2)
array([ -9.8, -11. ])
>>> grid = SmoothingParameterSearch(
...         KernelSmoother(
...             kernel_estimator=KNeighborsHatMatrix()),
...         [2,3],
...         param_name='kernel_estimator__n_neighbors',
...         scoring=LinearSmootherGeneralizedCVScorer(shibata))
>>> _ = grid.fit(fd)
>>> np.array(grid.cv_results_['mean_test_score']).round(2)
array([-7.56, -9.17])
>>> grid = SmoothingParameterSearch(
...         KernelSmoother(
...             kernel_estimator=KNeighborsHatMatrix()),
...         [2,3],
...         param_name='kernel_estimator__n_neighbors',
...         scoring=LinearSmootherGeneralizedCVScorer(rice))
>>> _ = grid.fit(fd)
>>> np.array(grid.cv_results_['mean_test_score']).round(2)
array([-21. , -16.5])

Different output points can also be used. In that case the value used as a target is still the smoothed value at the input points:

>>> output_points = np.linspace(-2, 2, 9)
>>> grid = SmoothingParameterSearch(
...         KernelSmoother(
...             kernel_estimator=KNeighborsHatMatrix(),
...             output_points=output_points),
...         [2,3],
...         param_name='kernel_estimator__n_neighbors')
>>> _ = grid.fit(fd)
>>> np.array(grid.cv_results_['mean_test_score']).round(2)
array([-11.67, -12.37])
>>> grid.transform(fd).data_matrix.round(2)
array([[[ 2.5 ],
        [ 2.5 ],
        [ 1.67],
        [ 0.5 ],
        [ 0.67],
        [ 0.5 ],
        [ 1.67],
        [ 2.5 ],
        [ 2.5 ]]])

Methods

decision_function(X)

Call decision_function on the estimator with the best found parameters.

fit(X[, y, groups])

Run fit with all sets of parameters.

get_metadata_routing()

Get metadata routing of this object.

get_params([deep])

Get parameters for this estimator.

inverse_transform(Xt)

Call inverse_transform on the estimator with the best found params.

predict(X)

Call predict on the estimator with the best found parameters.

predict_log_proba(X)

Call predict_log_proba on the estimator with the best found parameters.

predict_proba(X)

Call predict_proba on the estimator with the best found parameters.

score(X[, y])

Return the score on the given data, if the estimator has been refit.

score_samples(X)

Call score_samples on the estimator with the best found parameters.

set_fit_request(*[, groups])

Request metadata passed to the fit method.

set_params(**params)

Set the parameters of this estimator.

transform(X)

Call transform on the estimator with the best found parameters.

decision_function(X)#

Call decision_function on the estimator with the best found parameters.

Only available if refit=True and the underlying estimator supports decision_function.

Parameters:

X (indexable, length n_samples) – Must fulfill the input assumptions of the underlying estimator.

Returns:

y_score – Result of the decision function for X based on the estimator with the best found parameters.

Return type:

ndarray of shape (n_samples,) or (n_samples, n_classes) or (n_samples, n_classes * (n_classes-1) / 2)

fit(X, y=None, groups=None, **fit_params)[source]#

Run fit with all sets of parameters.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Training vector, where n_samples is the number of samples and n_features is the number of features.

  • y (array-like of shape (n_samples, n_output) or (n_samples,), default=None) – Target relative to X for classification or regression; None for unsupervised learning.

  • **params (dict of str -> object) –

    Parameters passed to the fit method of the estimator, the scorer, and the CV splitter.

    If a fit parameter is an array-like whose length is equal to num_samples then it will be split across CV groups along with X and y. For example, the sample_weight parameter is split because len(sample_weights) = len(X).

  • groups (NDArrayInt | None) –

  • fit_params (Any) –

Returns:

self – Instance of fitted estimator.

Return type:

object

get_metadata_routing()#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

New in version 1.4.

Returns:

routing – A MetadataRouter encapsulating routing information.

Return type:

MetadataRouter

get_params(deep=True)#

Get parameters for this estimator.

Parameters:

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params – Parameter names mapped to their values.

Return type:

dict

inverse_transform(Xt)#

Call inverse_transform on the estimator with the best found params.

Only available if the underlying estimator implements inverse_transform and refit=True.

Parameters:

Xt (indexable, length n_samples) – Must fulfill the input assumptions of the underlying estimator.

Returns:

X – Result of the inverse_transform function for Xt based on the estimator with the best found parameters.

Return type:

{ndarray, sparse matrix} of shape (n_samples, n_features)

predict(X)#

Call predict on the estimator with the best found parameters.

Only available if refit=True and the underlying estimator supports predict.

Parameters:

X (indexable, length n_samples) – Must fulfill the input assumptions of the underlying estimator.

Returns:

y_pred – The predicted labels or values for X based on the estimator with the best found parameters.

Return type:

ndarray of shape (n_samples,)

predict_log_proba(X)#

Call predict_log_proba on the estimator with the best found parameters.

Only available if refit=True and the underlying estimator supports predict_log_proba.

Parameters:

X (indexable, length n_samples) – Must fulfill the input assumptions of the underlying estimator.

Returns:

y_pred – Predicted class log-probabilities for X based on the estimator with the best found parameters. The order of the classes corresponds to that in the fitted attribute classes_.

Return type:

ndarray of shape (n_samples,) or (n_samples, n_classes)

predict_proba(X)#

Call predict_proba on the estimator with the best found parameters.

Only available if refit=True and the underlying estimator supports predict_proba.

Parameters:

X (indexable, length n_samples) – Must fulfill the input assumptions of the underlying estimator.

Returns:

y_pred – Predicted class probabilities for X based on the estimator with the best found parameters. The order of the classes corresponds to that in the fitted attribute classes_.

Return type:

ndarray of shape (n_samples,) or (n_samples, n_classes)

score(X, y=None, **params)#

Return the score on the given data, if the estimator has been refit.

This uses the score defined by scoring where provided, and the best_estimator_.score method otherwise.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Input data, where n_samples is the number of samples and n_features is the number of features.

  • y (array-like of shape (n_samples, n_output) or (n_samples,), default=None) – Target relative to X for classification or regression; None for unsupervised learning.

  • **params (dict) –

    Parameters to be passed to the underlying scorer(s).

    ..versionadded:: 1.4

    Only available if enable_metadata_routing=True. See Metadata Routing User Guide for more details.

Returns:

score – The score defined by scoring if provided, and the best_estimator_.score method otherwise.

Return type:

float

score_samples(X)#

Call score_samples on the estimator with the best found parameters.

Only available if refit=True and the underlying estimator supports score_samples.

New in version 0.24.

Parameters:

X (iterable) – Data to predict on. Must fulfill input requirements of the underlying estimator.

Returns:

y_score – The best_estimator_.score_samples method.

Return type:

ndarray of shape (n_samples,)

set_fit_request(*, groups='$UNCHANGED$')#

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:
  • groups (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for groups parameter in fit.

  • self (SmoothingParameterSearch) –

Returns:

self – The updated object.

Return type:

object

set_params(**params)#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**params (dict) – Estimator parameters.

Returns:

self – Estimator instance.

Return type:

estimator instance

transform(X)#

Call transform on the estimator with the best found parameters.

Only available if the underlying estimator supports transform and refit=True.

Parameters:

X (indexable, length n_samples) – Must fulfill the input assumptions of the underlying estimator.

Returns:

XtX transformed in the new space based on the estimator with the best found parameters.

Return type:

{ndarray, sparse matrix} of shape (n_samples, n_features)

Examples using skfda.preprocessing.smoothing.validation.SmoothingParameterSearch#

Kernel Smoothing

Kernel Smoothing