RKHSVariableSelection#

class skfda.preprocessing.dim_reduction.variable_selection.RKHSVariableSelection(n_features_to_select=1)[source]#

Reproducing kernel variable selection.

This is a filter variable selection method for binary classification problems. With a fixed number \(d\) of variables to select, it aims to find the variables \(X(t_1), \ldots, X(t_d)\) for the values \(t_1, \ldots, t_d\) that maximize the separation of the class means in the reduced space, measured using the Mahalanobis distance

\[\phi(t_1, \ldots, t_d) = m_{t_1, \ldots, t_d}^T K_{t_1, \ldots, t_d}^{-1} m_{t_1, \ldots, t_d}\]

where \(m_{t_1, \ldots, t_d}\) is the difference of the mean functions of both classes evaluated at points \(t_1, \ldots, t_d\) and \(K_{t_1, \ldots, t_d}\) is the common covariance function evaluated at the same points.

This method is optimal, with a fixed value of \(d\), for variable selection in Gaussian binary classification problems with the same covariance in both classes (homoscedasticity), when all possible combinations of points are taken into account. That means that for all possible selections of \(t_1, \ldots, t_d\), the one in which \(\phi(t_1, \ldots, t_d)\) is greater minimizes the optimal misclassification error of all the classification problems with the reduced dimensionality. For a longer discussion about the optimality and consistence of this method, we refer the reader to the original article [1].

In practice the points are selected one at a time, using a greedy approach, so this optimality is not always guaranteed.

Parameters:: n_features_to_select (int) – number of features to select.

Examples

>>> from skfda.preprocessing.dim_reduction import variable_selection
>>> from skfda.datasets import make_gaussian_process
>>> import skfda
>>> import numpy as np

We create trajectories from two classes, one with zero mean and the other with a peak-like mean. Both have Brownian covariance.

>>> n_samples = 10000
>>> n_features = 200
>>>
>>> def mean_1(t):
...     return (np.abs(t - 0.25)
...             - 2 * np.abs(t - 0.5)
...             + np.abs(t - 0.75))
>>>
>>> X_0 = make_gaussian_process(n_samples=n_samples // 2,
...                             n_features=n_features,
...                             random_state=0)
>>> X_1 = make_gaussian_process(n_samples=n_samples // 2,
...                             n_features=n_features,
...                             mean=mean_1,
...                             random_state=1)
>>> X = skfda.concatenate((X_0, X_1))
>>>
>>> y = np.zeros(n_samples)
>>> y [n_samples // 2:] = 1

Select the relevant points to distinguish the two classes

>>> rkvs = variable_selection.RKHSVariableSelection(
...                               n_features_to_select=3)
>>> _ = rkvs.fit(X, y)
>>> point_mask = rkvs.get_support()
>>> points = X.grid_points[0][point_mask]
>>> np.allclose(points, [0.25, 0.5, 0.75], rtol=1e-2)
True

Apply the learned dimensionality reduction

>>> X_dimred = rkvs.transform(X)
>>> len(X.grid_points[0])
200
>>> X_dimred.shape
(10000, 3)

References

Methods

`fit`(X, y)
`fit_transform`(X[, y])	Fit to data, then transform it.
`get_metadata_routing`()	Get metadata routing of this object.
`get_params`([deep])	Get parameters for this estimator.
`get_support`([indices])	Get a mask, or integer index, of the features selected.
`set_output`(*[, transform])	Set output container.
`set_params`(**params)	Set the parameters of this estimator.
`transform`(X[, Y])

fit(X, y)[source]#

Parameters:

X (FDataGrid)
y (ndarray[Any, dtype[int64]])

Return type:

RKHSVariableSelection

fit_transform(X, y=None, **fit_params)[source]#

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:

X (array-like of shape (n_samples, n_features)) – Input samples.
y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).
**fit_params (dict) – Additional fit parameters.

Returns:

X_new – Transformed array.

Return type:

ndarray array of shape (n_samples, n_features_new)

get_metadata_routing()#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:: routing – A MetadataRequest encapsulating routing information.
Return type:: MetadataRequest

get_params(deep=True)#

Get parameters for this estimator.

Parameters:: deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:: params – Parameter names mapped to their values.
Return type:: dict

get_support(indices=False)[source]#

Get a mask, or integer index, of the features selected.

Parameters:: indices (bool) – If True, the return value will be an array of integers, rather than a boolean mask.
Returns:: An index that selects the retained features from a FDataGrid object. If indices is False, this is a boolean array of shape [# input features], in which an element is True iff its corresponding feature is selected for retention. If indices is True, this is an integer array of shape [# output features] whose values are indices into the input feature vector.
Return type:: ndarray[Any, dtype[int64]]

set_output(*, transform=None)#

Set output container.

See Introducing the set_output API for an example on how to use the API.

Parameters:

transform ({"default", "pandas", "polars"}, default=None) –

Configure output of transform and fit_transform.

”default”: Default output format of a transformer
”pandas”: DataFrame output
”polars”: Polars output
None: Transform configuration is unchanged

Added in version 1.4: “polars” option was added.

Returns:

self – Estimator instance.

Return type:

estimator instance

set_params(**params)#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:: **params (dict) – Estimator parameters.
Returns:: self – Estimator instance.
Return type:: estimator instance

transform(X, Y=None)[source]#

Parameters:

X (FDataGrid)
Y (None)

Return type:

ndarray[Any, dtype[float64]]

Examples using `skfda.preprocessing.dim_reduction.variable_selection.RKHSVariableSelection`#

Scikit-fda and scikit-learn

RKHSVariableSelection#

Examples using skfda.preprocessing.dim_reduction.variable_selection.RKHSVariableSelection#

This Page

Examples using `skfda.preprocessing.dim_reduction.variable_selection.RKHSVariableSelection`#