SmoothingParameterSearch#
- class skfda.preprocessing.smoothing.validation.SmoothingParameterSearch(estimator, param_values, *, param_name='smoothing_parameter', scoring=None, n_jobs=None, verbose=0, pre_dispatch='2*n_jobs', error_score=nan)[source]#
Chooses the best smoothing parameter and performs smoothing.
Performs the smoothing of a FDataGrid object choosing the best parameter of a given list using a cross validation scoring method.
Note
This is similar to fitting a scikit-learn GridSearchCV over the data, using the cv_method as a scorer.
- Parameters:
estimator (smoother estimator) – scikit-learn compatible smoother.
param_values (iterable) – iterable containing the values to test for smoothing_parameter.
scoring (scoring method) – scoring method used to measure the performance of the smoothing. If
None
(the default) thescore
method of the estimator is used.n_jobs (int or None, optional (default=None)) – Number of jobs to run in parallel.
None
means 1 unless in ajoblib.parallel_backend
context.-1
means using all processors. See scikit-learn Glossary for more details.pre_dispatch (int, or string, optional) –
Controls the number of jobs that get dispatched during parallel execution. Reducing this number can be useful to avoid an explosion of memory consumption when more jobs get dispatched than CPUs can process. This parameter can be:
None, in which case all the jobs are immediately created and spawned. Use this for lightweight and fast-running jobs, to avoid delays due to on-demand spawning of the jobs
An int, giving the exact number of total jobs that are spawned
A string, giving an expression as a function of n_jobs, as in ‘2*n_jobs’
verbose (integer) – Controls the verbosity: the higher, the more messages.
error_score ('raise' or numeric) – Value to assign to the score if an error occurs in estimator fitting. If set to ‘raise’, the error is raised. If a numeric value is given, FitFailedWarning is raised. This parameter does not affect the refit step, which will always raise the error. Default is np.nan.
param_name (str) –
Examples
Creates a FDataGrid object of the function \(y=x^2\) and peforms smoothing by means of the k-nearest neighbours method.
>>> import skfda >>> from skfda.preprocessing.smoothing import KernelSmoother >>> from skfda.misc.hat_matrix import KNeighborsHatMatrix >>> x = np.linspace(-2, 2, 5) >>> fd = skfda.FDataGrid(x ** 2, x) >>> grid = SmoothingParameterSearch( ... KernelSmoother( ... kernel_estimator=KNeighborsHatMatrix()), ... [2,3], ... param_name='kernel_estimator__n_neighbors') >>> _ = grid.fit(fd) >>> np.array(grid.cv_results_['mean_test_score']).round(2) array([-11.67, -12.37]) >>> round(grid.best_score_, 2) -11.67 >>> grid.best_params_['kernel_estimator__n_neighbors'] 2 >>> grid.best_estimator_.hat_matrix().round(2) array([[ 0.5 , 0.5 , 0. , 0. , 0. ], [ 0.33, 0.33, 0.33, 0. , 0. ], [ 0. , 0.33, 0.33, 0.33, 0. ], [ 0. , 0. , 0.33, 0.33, 0.33], [ 0. , 0. , 0. , 0.5 , 0.5 ]]) >>> grid.transform(fd).round(2) FDataGrid( array([[[ 2.5 ], [ 1.67], [ 0.67], [ 1.67], [ 2.5 ]]]), grid_points=(array([-2., -1., 0., 1., 2.]),), domain_range=((-2.0, 2.0),), ...)
Other validation methods can be used such as cross-validation or general cross validation using other penalization functions.
>>> grid = SmoothingParameterSearch( ... KernelSmoother( ... kernel_estimator=KNeighborsHatMatrix()), ... [2,3], ... param_name='kernel_estimator__n_neighbors', ... scoring=LinearSmootherLeaveOneOutScorer()) >>> _ = grid.fit(fd) >>> np.array(grid.cv_results_['mean_test_score']).round(2) array([-4.2, -5.5]) >>> grid = SmoothingParameterSearch( ... KernelSmoother( ... kernel_estimator=KNeighborsHatMatrix()), ... [2,3], ... param_name='kernel_estimator__n_neighbors', ... scoring=LinearSmootherGeneralizedCVScorer( ... akaike_information_criterion)) >>> _ = grid.fit(fd) >>> np.array(grid.cv_results_['mean_test_score']).round(2) array([ -9.35, -10.71]) >>> grid = SmoothingParameterSearch( ... KernelSmoother( ... kernel_estimator=KNeighborsHatMatrix()), ... [2,3], ... param_name='kernel_estimator__n_neighbors', ... scoring=LinearSmootherGeneralizedCVScorer( ... finite_prediction_error)) >>> _ = grid.fit(fd) >>> np.array(grid.cv_results_['mean_test_score']).round(2) array([ -9.8, -11. ]) >>> grid = SmoothingParameterSearch( ... KernelSmoother( ... kernel_estimator=KNeighborsHatMatrix()), ... [2,3], ... param_name='kernel_estimator__n_neighbors', ... scoring=LinearSmootherGeneralizedCVScorer(shibata)) >>> _ = grid.fit(fd) >>> np.array(grid.cv_results_['mean_test_score']).round(2) array([-7.56, -9.17]) >>> grid = SmoothingParameterSearch( ... KernelSmoother( ... kernel_estimator=KNeighborsHatMatrix()), ... [2,3], ... param_name='kernel_estimator__n_neighbors', ... scoring=LinearSmootherGeneralizedCVScorer(rice)) >>> _ = grid.fit(fd) >>> np.array(grid.cv_results_['mean_test_score']).round(2) array([-21. , -16.5])
Different output points can also be used. In that case the value used as a target is still the smoothed value at the input points:
>>> output_points = np.linspace(-2, 2, 9) >>> grid = SmoothingParameterSearch( ... KernelSmoother( ... kernel_estimator=KNeighborsHatMatrix(), ... output_points=output_points), ... [2,3], ... param_name='kernel_estimator__n_neighbors') >>> _ = grid.fit(fd) >>> np.array(grid.cv_results_['mean_test_score']).round(2) array([-11.67, -12.37]) >>> grid.transform(fd).data_matrix.round(2) array([[[ 2.5 ], [ 2.5 ], [ 1.67], [ 0.5 ], [ 0.67], [ 0.5 ], [ 1.67], [ 2.5 ], [ 2.5 ]]])
Methods
Call decision_function on the estimator with the best found parameters.
fit
(X[, y, groups])Run fit with all sets of parameters.
Get metadata routing of this object.
get_params
([deep])Get parameters for this estimator.
Call inverse_transform on the estimator with the best found params.
predict
(X)Call predict on the estimator with the best found parameters.
Call predict_log_proba on the estimator with the best found parameters.
Call predict_proba on the estimator with the best found parameters.
score
(X[, y])Return the score on the given data, if the estimator has been refit.
Call score_samples on the estimator with the best found parameters.
set_fit_request
(*[, groups])Request metadata passed to the
fit
method.set_params
(**params)Set the parameters of this estimator.
transform
(X)Call transform on the estimator with the best found parameters.
- decision_function(X)#
Call decision_function on the estimator with the best found parameters.
Only available if
refit=True
and the underlying estimator supportsdecision_function
.- Parameters:
X (indexable, length n_samples) – Must fulfill the input assumptions of the underlying estimator.
- Returns:
y_score – Result of the decision function for X based on the estimator with the best found parameters.
- Return type:
ndarray of shape (n_samples,) or (n_samples, n_classes) or (n_samples, n_classes * (n_classes-1) / 2)
- fit(X, y=None, groups=None, **fit_params)[source]#
Run fit with all sets of parameters.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Training vector, where n_samples is the number of samples and n_features is the number of features.
y (array-like of shape (n_samples, n_output) or (n_samples,), default=None) – Target relative to X for classification or regression; None for unsupervised learning.
**params (dict of str -> object) –
Parameters passed to the
fit
method of the estimator, the scorer, and the CV splitter.If a fit parameter is an array-like whose length is equal to num_samples then it will be split across CV groups along with X and y. For example, the sample_weight parameter is split because len(sample_weights) = len(X).
fit_params (Any) –
- Returns:
self – Instance of fitted estimator.
- Return type:
- get_metadata_routing()#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
New in version 1.4.
- Returns:
routing – A
MetadataRouter
encapsulating routing information.- Return type:
MetadataRouter
- get_params(deep=True)#
Get parameters for this estimator.
- inverse_transform(Xt)#
Call inverse_transform on the estimator with the best found params.
Only available if the underlying estimator implements
inverse_transform
andrefit=True
.- Parameters:
Xt (indexable, length n_samples) – Must fulfill the input assumptions of the underlying estimator.
- Returns:
X – Result of the inverse_transform function for Xt based on the estimator with the best found parameters.
- Return type:
{ndarray, sparse matrix} of shape (n_samples, n_features)
- predict(X)#
Call predict on the estimator with the best found parameters.
Only available if
refit=True
and the underlying estimator supportspredict
.- Parameters:
X (indexable, length n_samples) – Must fulfill the input assumptions of the underlying estimator.
- Returns:
y_pred – The predicted labels or values for X based on the estimator with the best found parameters.
- Return type:
ndarray of shape (n_samples,)
- predict_log_proba(X)#
Call predict_log_proba on the estimator with the best found parameters.
Only available if
refit=True
and the underlying estimator supportspredict_log_proba
.- Parameters:
X (indexable, length n_samples) – Must fulfill the input assumptions of the underlying estimator.
- Returns:
y_pred – Predicted class log-probabilities for X based on the estimator with the best found parameters. The order of the classes corresponds to that in the fitted attribute classes_.
- Return type:
ndarray of shape (n_samples,) or (n_samples, n_classes)
- predict_proba(X)#
Call predict_proba on the estimator with the best found parameters.
Only available if
refit=True
and the underlying estimator supportspredict_proba
.- Parameters:
X (indexable, length n_samples) – Must fulfill the input assumptions of the underlying estimator.
- Returns:
y_pred – Predicted class probabilities for X based on the estimator with the best found parameters. The order of the classes corresponds to that in the fitted attribute classes_.
- Return type:
ndarray of shape (n_samples,) or (n_samples, n_classes)
- score(X, y=None, **params)#
Return the score on the given data, if the estimator has been refit.
This uses the score defined by
scoring
where provided, and thebest_estimator_.score
method otherwise.- Parameters:
X (array-like of shape (n_samples, n_features)) – Input data, where n_samples is the number of samples and n_features is the number of features.
y (array-like of shape (n_samples, n_output) or (n_samples,), default=None) – Target relative to X for classification or regression; None for unsupervised learning.
**params (dict) –
Parameters to be passed to the underlying scorer(s).
- ..versionadded:: 1.4
Only available if enable_metadata_routing=True. See Metadata Routing User Guide for more details.
- Returns:
score – The score defined by
scoring
if provided, and thebest_estimator_.score
method otherwise.- Return type:
- score_samples(X)#
Call score_samples on the estimator with the best found parameters.
Only available if
refit=True
and the underlying estimator supportsscore_samples
.New in version 0.24.
- Parameters:
X (iterable) – Data to predict on. Must fulfill input requirements of the underlying estimator.
- Returns:
y_score – The
best_estimator_.score_samples
method.- Return type:
ndarray of shape (n_samples,)
- set_fit_request(*, groups='$UNCHANGED$')#
Request metadata passed to the
fit
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed tofit
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it tofit
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.- Parameters:
groups (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
groups
parameter infit
.self (SmoothingParameterSearch) –
- Returns:
self – The updated object.
- Return type:
- set_params(**params)#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
**params (dict) – Estimator parameters.
- Returns:
self – Estimator instance.
- Return type:
estimator instance
- transform(X)#
Call transform on the estimator with the best found parameters.
Only available if the underlying estimator supports
transform
andrefit=True
.- Parameters:
X (indexable, length n_samples) – Must fulfill the input assumptions of the underlying estimator.
- Returns:
Xt – X transformed in the new space based on the estimator with the best found parameters.
- Return type:
{ndarray, sparse matrix} of shape (n_samples, n_features)