MaximaHunting#

class skfda.preprocessing.dim_reduction.variable_selection.MaximaHunting(dependence_measure=<function u_distance_correlation_sqr>, local_maxima_selector=None)[source]#

Maxima Hunting variable selection.

This is a filter variable selection method for problems with a target variable. It evaluates a dependence measure between each point of the function and the target variable, and keeps those points in which this dependence is a local maximum.

Selecting the local maxima serves two purposes. First, it ensures that the points that are relevant in isolation are selected, as they must maximice their dependence with the target variable. Second, the points that are relevant only because they are near a relevant point (and are thus highly correlated with it) are NOT selected, as only local maxima are selected, minimizing the redundancy of the selected variables.

For a longer explanation about the method, and comparison with other functional variable selection methods, we refer the reader to the original article [1].

Parameters:
  • dependence_measure (callable) – Dependence measure to use. By default, it uses the bias corrected squared distance correlation.

  • local_maxima_selector (callable) – Function to detect local maxima. The default is select_local_maxima() with order parameter equal to one. The original article used a similar function testing different values of order.

Examples

>>> from skfda.preprocessing.dim_reduction import variable_selection
>>> from skfda.preprocessing.dim_reduction.variable_selection.\
...     maxima_hunting import RelativeLocalMaximaSelector
>>> from skfda.datasets import make_gaussian_process
>>> import skfda
>>> import numpy as np

We create trajectories from two classes, one with zero mean and the other with a peak-like mean. Both have Brownian covariance.

>>> n_samples = 10000
>>> n_features = 100
>>>
>>> def mean_1(t):
...     return (np.abs(t - 0.25)
...             - 2 * np.abs(t - 0.5)
...             + np.abs(t - 0.75))
>>>
>>> X_0 = make_gaussian_process(
...     n_samples=n_samples // 2,
...     n_features=n_features,
...     random_state=0,
... )
>>> X_1 = make_gaussian_process(
...     n_samples=n_samples // 2,
...     n_features=n_features,
...     mean=mean_1,
...     random_state=1,
... )
>>> X = skfda.concatenate((X_0, X_1))
>>>
>>> y = np.zeros(n_samples)
>>> y [n_samples // 2:] = 1

Select the relevant points to distinguish the two classes

>>> local_maxima_selector = RelativeLocalMaximaSelector(
...     smoothing_parameter=10,
... )
>>> mh = variable_selection.MaximaHunting(
...            local_maxima_selector=local_maxima_selector,
... )
>>> _ = mh.fit(X, y)
>>> point_mask = mh.get_support()
>>> points = X.grid_points[0][point_mask]
>>> np.allclose(points, [0.5], rtol=0.1)
True

Apply the learned dimensionality reduction

>>> X_dimred = mh.transform(X)
>>> len(X.grid_points[0])
100
>>> X_dimred.shape
(10000, 1)

References

Methods

fit(X, y)

fit_transform(X[, y])

Fit to data, then transform it.

get_metadata_routing()

Get metadata routing of this object.

get_params([deep])

Get parameters for this estimator.

get_support([indices])

set_output(*[, transform])

Set output container.

set_params(**params)

Set the parameters of this estimator.

transform(X[, y])

fit(X, y)[source]#
Parameters:
Return type:

MaximaHunting

fit_transform(X, y=None, **fit_params)[source]#

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Input samples.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).

  • **fit_params (dict) – Additional fit parameters.

Returns:

X_new – Transformed array.

Return type:

ndarray array of shape (n_samples, n_features_new)

get_metadata_routing()#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:

routing – A MetadataRequest encapsulating routing information.

Return type:

MetadataRequest

get_params(deep=True)#

Get parameters for this estimator.

Parameters:

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params – Parameter names mapped to their values.

Return type:

dict

get_support(indices=False)[source]#
Parameters:

indices (bool) –

Return type:

ndarray[Any, dtype[int64]]

set_output(*, transform=None)#

Set output container.

See Introducing the set_output API for an example on how to use the API.

Parameters:

transform ({"default", "pandas"}, default=None) –

Configure output of transform and fit_transform.

  • ”default”: Default output format of a transformer

  • ”pandas”: DataFrame output

  • ”polars”: Polars output

  • None: Transform configuration is unchanged

New in version 1.4: “polars” option was added.

Returns:

self – Estimator instance.

Return type:

estimator instance

set_params(**params)#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**params (dict) – Estimator parameters.

Returns:

self – Estimator instance.

Return type:

estimator instance

transform(X, y=None)[source]#
Parameters:
  • X (FDataGrid) –

  • y (NDArrayInt | NDArrayFloat | None) –

Return type:

NDArrayFloat

Examples using skfda.preprocessing.dim_reduction.variable_selection.MaximaHunting#

Spectrometric data: derivatives, regression, and variable selection

Spectrometric data: derivatives, regression, and variable selection