RecursiveMaximaHunting#
- class skfda.preprocessing.dim_reduction.variable_selection.RecursiveMaximaHunting(*, dependence_measure=<function u_distance_correlation_sqr>, max_features=None, correction=None, redundancy_condition=None, stopping_condition=None, _get_intermediate_results=False)[source]#
Recursive Maxima Hunting variable selection.
This is a filter variable selection method for problems with a target variable. It evaluates a dependence measure between each point of the function and the target variable, selects the point that maximizes this dependence, subtracts the information of the selected point from the original functions and repeat the process.
This method is inspired by
MaximaHunting
, and shares similarities with it. However, as the information of the selected point is subtracted from each function in each step of the algorithm, this algorithm can uncover points that are not relevant by themselves but are relevant once other points are selected. Those points would not be selected byMaximaHunting
alone.This method was originally described in a special case in article [1]. Additional information about the usage of this method can be found in Recursive Maxima Hunting.
- Parameters:
dependence_measure (_DepMeasure[NDArrayFloat, NDArrayFloat]) – Dependence measure to use. By default, it uses the bias corrected squared distance correlation.
max_features (Optional[int]) – Maximum number of features to select. By default there is no limit.
correction (Optional[Correction]) – Correction used to subtract the information of each selected point in each iteration. By default it is a
UniformCorrection
object.redundancy_condition (Optional[RedundancyCondition]) – Condition to consider a point redundant with the selected maxima and discard it from future consideration as a maximum. By default it is a
DependenceThresholdRedundancy
object.stopping_condition (Optional[StoppingCondition]) – Condition to stop the algorithm. By default it is a
AsymptoticIndependenceTestStop
object._get_intermediate_results (bool) –
Examples
>>> from skfda.preprocessing.dim_reduction import variable_selection >>> from skfda.datasets import make_gaussian_process >>> import skfda >>> import numpy as np
We create trajectories from two classes, one with zero mean and the other with a peak-like mean. Both have Brownian covariance.
>>> n_samples = 1000 >>> n_features = 100 >>> >>> def mean_1(t): ... return ( ... np.abs(t - 0.25) ... - 2 * np.abs(t - 0.5) ... + np.abs(t - 0.75) ... ) >>> >>> X_0 = make_gaussian_process( ... n_samples=n_samples // 2, ... n_features=n_features, ... random_state=0, ... ) >>> X_1 = make_gaussian_process( ... n_samples=n_samples // 2, ... n_features=n_features, ... mean=mean_1, ... random_state=1, ... ) >>> X = skfda.concatenate((X_0, X_1)) >>> >>> y = np.zeros(n_samples) >>> y [n_samples // 2:] = 1
Select the relevant points to distinguish the two classes
>>> rmh = variable_selection.RecursiveMaximaHunting() >>> _ = rmh.fit(X, y) >>> point_mask = rmh.get_support() >>> points = X.grid_points[0][point_mask] >>> np.allclose(points, [0.25, 0.5, 0.75], rtol=1e-1) True
Apply the learned dimensionality reduction
>>> X_dimred = rmh.transform(X) >>> len(X.grid_points[0]) 100 >>> X_dimred.shape (1000, 3)
References
Methods
fit
(X, y)Recursive maxima hunting algorithm.
fit_transform
(X[, y])Fit to data, then transform it.
Get metadata routing of this object.
get_params
([deep])Get parameters for this estimator.
set_output
(*[, transform])Set output container.
set_params
(**params)Set the parameters of this estimator.
transform
(X)- fit_transform(X, y=None, **fit_params)[source]#
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Input samples.
y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).
**fit_params (dict) – Additional fit parameters.
- Returns:
X_new – Transformed array.
- Return type:
ndarray array of shape (n_samples, n_features_new)
- get_metadata_routing()#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
routing – A
MetadataRequest
encapsulating routing information.- Return type:
MetadataRequest
- get_params(deep=True)#
Get parameters for this estimator.
- get_support(indices: typing_extensions.Literal[True]) Sequence[Tuple[int, ...]] [source]#
- get_support(indices: typing_extensions.Literal[False] = False) ndarray[Any, dtype[bool_]]
- set_output(*, transform=None)#
Set output container.
See Introducing the set_output API for an example on how to use the API.
- Parameters:
transform ({"default", "pandas"}, default=None) –
Configure output of transform and fit_transform.
”default”: Default output format of a transformer
”pandas”: DataFrame output
”polars”: Polars output
None: Transform configuration is unchanged
New in version 1.4: “polars” option was added.
- Returns:
self – Estimator instance.
- Return type:
estimator instance
- set_params(**params)#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
**params (dict) – Estimator parameters.
- Returns:
self – Estimator instance.
- Return type:
estimator instance