RecursiveMaximaHunting#

class skfda.preprocessing.dim_reduction.variable_selection.RecursiveMaximaHunting(*, dependence_measure=<function u_distance_correlation_sqr>, max_features=None, correction=None, redundancy_condition=None, stopping_condition=None, _get_intermediate_results=False)[source]#

Recursive Maxima Hunting variable selection.

This is a filter variable selection method for problems with a target variable. It evaluates a dependence measure between each point of the function and the target variable, selects the point that maximizes this dependence, subtracts the information of the selected point from the original functions and repeat the process.

This method is inspired by MaximaHunting, and shares similarities with it. However, as the information of the selected point is subtracted from each function in each step of the algorithm, this algorithm can uncover points that are not relevant by themselves but are relevant once other points are selected. Those points would not be selected by MaximaHunting alone.

This method was originally described in a special case in article [1]. Additional information about the usage of this method can be found in Recursive Maxima Hunting.

Parameters:
  • dependence_measure (_DepMeasure[NDArrayFloat, NDArrayFloat]) – Dependence measure to use. By default, it uses the bias corrected squared distance correlation.

  • max_features (Optional[int]) – Maximum number of features to select. By default there is no limit.

  • correction (Optional[Correction]) – Correction used to subtract the information of each selected point in each iteration. By default it is a UniformCorrection object.

  • redundancy_condition (Optional[RedundancyCondition]) – Condition to consider a point redundant with the selected maxima and discard it from future consideration as a maximum. By default it is a DependenceThresholdRedundancy object.

  • stopping_condition (Optional[StoppingCondition]) – Condition to stop the algorithm. By default it is a AsymptoticIndependenceTestStop object.

  • _get_intermediate_results (bool) –

Examples

>>> from skfda.preprocessing.dim_reduction import variable_selection
>>> from skfda.datasets import make_gaussian_process
>>> import skfda
>>> import numpy as np

We create trajectories from two classes, one with zero mean and the other with a peak-like mean. Both have Brownian covariance.

>>> n_samples = 1000
>>> n_features = 100
>>>
>>> def mean_1(t):
...     return (
...         np.abs(t - 0.25)
...         - 2 * np.abs(t - 0.5)
...         + np.abs(t - 0.75)
...     )
>>>
>>> X_0 = make_gaussian_process(
...     n_samples=n_samples // 2,
...     n_features=n_features,
...     random_state=0,
... )
>>> X_1 = make_gaussian_process(
...     n_samples=n_samples // 2,
...     n_features=n_features,
...     mean=mean_1,
...     random_state=1,
... )
>>> X = skfda.concatenate((X_0, X_1))
>>>
>>> y = np.zeros(n_samples)
>>> y [n_samples // 2:] = 1

Select the relevant points to distinguish the two classes

>>> rmh = variable_selection.RecursiveMaximaHunting()
>>> _ = rmh.fit(X, y)
>>> point_mask = rmh.get_support()
>>> points = X.grid_points[0][point_mask]
>>> np.allclose(points, [0.25, 0.5, 0.75], rtol=1e-1)
True

Apply the learned dimensionality reduction

>>> X_dimred = rmh.transform(X)
>>> len(X.grid_points[0])
100
>>> X_dimred.shape
(1000, 3)

References

Methods

fit(X, y)

Recursive maxima hunting algorithm.

fit_transform(X[, y])

Fit to data, then transform it.

get_metadata_routing()

Get metadata routing of this object.

get_params([deep])

Get parameters for this estimator.

get_support()

set_output(*[, transform])

Set output container.

set_params(**params)

Set the parameters of this estimator.

transform(X)

fit(X, y)[source]#

Recursive maxima hunting algorithm.

Parameters:
Return type:

RecursiveMaximaHunting

fit_transform(X, y=None, **fit_params)[source]#

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Input samples.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).

  • **fit_params (dict) – Additional fit parameters.

Returns:

X_new – Transformed array.

Return type:

ndarray array of shape (n_samples, n_features_new)

get_metadata_routing()#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:

routing – A MetadataRequest encapsulating routing information.

Return type:

MetadataRequest

get_params(deep=True)#

Get parameters for this estimator.

Parameters:

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params – Parameter names mapped to their values.

Return type:

dict

get_support(indices: typing_extensions.Literal[True]) Sequence[Tuple[int, ...]][source]#
get_support(indices: typing_extensions.Literal[False] = False) ndarray[Any, dtype[bool_]]
set_output(*, transform=None)#

Set output container.

See Introducing the set_output API for an example on how to use the API.

Parameters:

transform ({"default", "pandas"}, default=None) –

Configure output of transform and fit_transform.

  • ”default”: Default output format of a transformer

  • ”pandas”: DataFrame output

  • ”polars”: Polars output

  • None: Transform configuration is unchanged

New in version 1.4: “polars” option was added.

Returns:

self – Estimator instance.

Return type:

estimator instance

set_params(**params)#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**params (dict) – Estimator parameters.

Returns:

self – Estimator instance.

Return type:

estimator instance

transform(X)[source]#
Parameters:

X (FDataGrid) –

Return type:

ndarray[Any, dtype[float64]]