MinimumRedundancyMaximumRelevance#

class skfda.preprocessing.dim_reduction.variable_selection.MinimumRedundancyMaximumRelevance(*, n_features_to_select: int = 1)[source]#
class skfda.preprocessing.dim_reduction.variable_selection.MinimumRedundancyMaximumRelevance(*, n_features_to_select: int = 1, method: Method[dtype_y_T] | Literal['MID', 'MIQ'])
class skfda.preprocessing.dim_reduction.variable_selection.MinimumRedundancyMaximumRelevance(*, n_features_to_select: int = 1, dependence_measure: Callable[[ndarray[Any, dtype[float64]], ndarray[Any, dtype[float64 | dtype_y_T]]], ndarray[Any, dtype[float64]]], criterion: Callable[[ndarray[Any, dtype[float64]], ndarray[Any, dtype[float64]]], ndarray[Any, dtype[float64]]] | Literal['difference', 'quotient'])
class skfda.preprocessing.dim_reduction.variable_selection.MinimumRedundancyMaximumRelevance(*, n_features_to_select: int = 1, relevance_dependence_measure: Callable[[ndarray[Any, dtype[float64]], ndarray[Any, dtype[dtype_y_T]]], ndarray[Any, dtype[float64]]], redundancy_dependence_measure: Callable[[ndarray[Any, dtype[float64]], ndarray[Any, dtype[float64]]], ndarray[Any, dtype[float64]]], criterion: Callable[[ndarray[Any, dtype[float64]], ndarray[Any, dtype[float64]]], ndarray[Any, dtype[float64]]] | Literal['difference', 'quotient'])

Minimum redundancy maximum relevance (mRMR) method.

This is a greedy version of mRMR that selects the variables iteratively. This method considers the relevance of a variable as well as its redundancy with respect of the already selected ones.

It uses a dependence measure between random variables to compute the dependence between the candidate variable and the target (for the relevance) and another to compute the dependence between two variables (for the redundancy). It combines both measurements using a criterion such as the difference or the quotient, and then selects the variable that maximizes that quantity. For example, using the quotient criterion and the same dependence function \(D\) for relevance and redundancy, the variable selected at the \(i\)-th step would be \(X(t_i)\) with

\[t_i = \underset {t}{\operatorname {arg\,max}} \frac{D(X(t), y)} {\frac{1}{i-1}\sum_{j < i} D(X(t), X(t_j))}.\]

For further discussion of the applicability of this method to functional data see [1].

Parameters:
  • n_features_to_select (int) – Number of features to select.

  • method (Method[dtype_y_T] | MethodName | None) – Predefined method to use (MID or MIQ).

  • dependence_measure (_DependenceMeasure[np.typing.NDArray[np.float_], np.typing.NDArray[np.float_ | dtype_y_T]] | None) – Dependence measure to use both for relevance and for redundancy.

  • relevance_dependence_measure (_DependenceMeasure[np.typing.NDArray[np.float_], np.typing.NDArray[dtype_y_T]] | None) – Dependence measure used to compute relevance.

  • redundancy_dependence_measure (_DependenceMeasure[np.typing.NDArray[np.float_], np.typing.NDArray[np.float_]] | None) – Dependence measure used to compute redundancy.

  • criterion (_CriterionLike | None) – Criterion to combine relevance and redundancy. It must be a Python callable with two inputs. As common choices include the difference and the quotient, both can be especified as strings.

Examples

>>> from skfda.preprocessing.dim_reduction import variable_selection
>>> from skfda.datasets import make_gaussian_process
>>> import skfda
>>> import numpy as np
>>> import dcor

We create trajectories from two classes, one with zero mean and the other with a peak-like mean. Both have Brownian covariance.

>>> n_samples = 1000
>>> n_features = 100
>>>
>>> def mean_1(t):
...     return (
...         np.abs(t - 0.25)
...         - 2 * np.abs(t - 0.5)
...         + np.abs(t - 0.75)
...     )
>>>
>>> X_0 = make_gaussian_process(
...     n_samples=n_samples // 2,
...     n_features=n_features,
...     random_state=0,
... )
>>> X_1 = make_gaussian_process(
...     n_samples=n_samples // 2,
...     n_features=n_features,
...     mean=mean_1,
...     random_state=1,
... )
>>> X = skfda.concatenate((X_0, X_1))
>>>
>>> y = np.zeros(n_samples, dtype=np.int_)
>>> y [n_samples // 2:] = 1

Select the relevant points to distinguish the two classes. You may specify a method such as MIQ (the default) or MID.

>>> mrmr = variable_selection.MinimumRedundancyMaximumRelevance(
...     n_features_to_select=3,
...     method="MID",
... )
>>> _ = mrmr.fit(X, y)
>>> point_mask = mrmr.get_support()
>>> points = X.grid_points[0][point_mask]

Apply the learned dimensionality reduction

>>> X_dimred = mrmr.transform(X)
>>> len(X.grid_points[0])
100
>>> X_dimred.shape
(1000, 3)

It is also possible to specify the measure of dependence used (or even different ones for relevance and redundancy) as well as the function to combine relevance and redundancy (usually the division or subtraction operations).

>>> mrmr = variable_selection.MinimumRedundancyMaximumRelevance(
...     n_features_to_select=3,
...     dependence_measure=dcor.u_distance_correlation_sqr,
...     criterion="quotient",
... )
>>> _ = mrmr.fit(X, y)

As a toy example illustrating the customizability of this method, consider the following:

>>> mrmr = variable_selection.MinimumRedundancyMaximumRelevance(
...     n_features_to_select=3,
...     relevance_dependence_measure=dcor.u_distance_covariance_sqr,
...     redundancy_dependence_measure=dcor.u_distance_correlation_sqr,
...     criterion=lambda rel, red: 0.5 * rel / red,
... )
>>> _ = mrmr.fit(X, y)

References

Methods

fit(X, y)

fit_transform(X[, y])

Fit to data, then transform it.

get_metadata_routing()

Get metadata routing of this object.

get_params([deep])

Get parameters for this estimator.

get_support([indices])

set_output(*[, transform])

Set output container.

set_params(**params)

Set the parameters of this estimator.

transform(X[, y])

fit(X, y)[source]#
Parameters:
Return type:

SelfType

fit_transform(X, y=None, **fit_params)[source]#

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Input samples.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).

  • **fit_params (dict) – Additional fit parameters.

Returns:

X_new – Transformed array.

Return type:

ndarray array of shape (n_samples, n_features_new)

get_metadata_routing()#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:

routing – A MetadataRequest encapsulating routing information.

Return type:

MetadataRequest

get_params(deep=True)#

Get parameters for this estimator.

Parameters:

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params – Parameter names mapped to their values.

Return type:

dict

get_support(indices=False)[source]#
Parameters:

indices (bool) –

Return type:

ndarray[Any, dtype[int64]]

set_output(*, transform=None)#

Set output container.

See Introducing the set_output API for an example on how to use the API.

Parameters:

transform ({"default", "pandas"}, default=None) –

Configure output of transform and fit_transform.

  • ”default”: Default output format of a transformer

  • ”pandas”: DataFrame output

  • ”polars”: Polars output

  • None: Transform configuration is unchanged

New in version 1.4: “polars” option was added.

Returns:

self – Estimator instance.

Return type:

estimator instance

set_params(**params)#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**params (dict) – Estimator parameters.

Returns:

self – Estimator instance.

Return type:

estimator instance

transform(X, y=None)[source]#
Parameters:
Return type:

ndarray[Any, dtype[float64]]