MinimumRedundancyMaximumRelevance#
- class skfda.preprocessing.dim_reduction.variable_selection.MinimumRedundancyMaximumRelevance(*, n_features_to_select: int = 1)[source]#
- class skfda.preprocessing.dim_reduction.variable_selection.MinimumRedundancyMaximumRelevance(*, n_features_to_select: int = 1, method: Method[dtype_y_T] | Literal['MID', 'MIQ'])
- class skfda.preprocessing.dim_reduction.variable_selection.MinimumRedundancyMaximumRelevance(*, n_features_to_select: int = 1, dependence_measure: Callable[[ndarray[Any, dtype[float64]], ndarray[Any, dtype[float64 | dtype_y_T]]], ndarray[Any, dtype[float64]]], criterion: Callable[[ndarray[Any, dtype[float64]], ndarray[Any, dtype[float64]]], ndarray[Any, dtype[float64]]] | Literal['difference', 'quotient'])
- class skfda.preprocessing.dim_reduction.variable_selection.MinimumRedundancyMaximumRelevance(*, n_features_to_select: int = 1, relevance_dependence_measure: Callable[[ndarray[Any, dtype[float64]], ndarray[Any, dtype[dtype_y_T]]], ndarray[Any, dtype[float64]]], redundancy_dependence_measure: Callable[[ndarray[Any, dtype[float64]], ndarray[Any, dtype[float64]]], ndarray[Any, dtype[float64]]], criterion: Callable[[ndarray[Any, dtype[float64]], ndarray[Any, dtype[float64]]], ndarray[Any, dtype[float64]]] | Literal['difference', 'quotient'])
Minimum redundancy maximum relevance (mRMR) method.
This is a greedy version of mRMR that selects the variables iteratively. This method considers the relevance of a variable as well as its redundancy with respect of the already selected ones.
It uses a dependence measure between random variables to compute the dependence between the candidate variable and the target (for the relevance) and another to compute the dependence between two variables (for the redundancy). It combines both measurements using a criterion such as the difference or the quotient, and then selects the variable that maximizes that quantity. For example, using the quotient criterion and the same dependence function \(D\) for relevance and redundancy, the variable selected at the \(i\)-th step would be \(X(t_i)\) with
\[t_i = \underset {t}{\operatorname {arg\,max}} \frac{D(X(t), y)} {\frac{1}{i-1}\sum_{j < i} D(X(t), X(t_j))}.\]For further discussion of the applicability of this method to functional data see [1].
- Parameters:
n_features_to_select (int) – Number of features to select.
method (Method[dtype_y_T] | MethodName | None) – Predefined method to use (MID or MIQ).
dependence_measure (_DependenceMeasure[np.typing.NDArray[np.float_], np.typing.NDArray[np.float_ | dtype_y_T]] | None) – Dependence measure to use both for relevance and for redundancy.
relevance_dependence_measure (_DependenceMeasure[np.typing.NDArray[np.float_], np.typing.NDArray[dtype_y_T]] | None) – Dependence measure used to compute relevance.
redundancy_dependence_measure (_DependenceMeasure[np.typing.NDArray[np.float_], np.typing.NDArray[np.float_]] | None) – Dependence measure used to compute redundancy.
criterion (_CriterionLike | None) – Criterion to combine relevance and redundancy. It must be a Python callable with two inputs. As common choices include the difference and the quotient, both can be especified as strings.
Examples
>>> from skfda.preprocessing.dim_reduction import variable_selection >>> from skfda.datasets import make_gaussian_process >>> import skfda >>> import numpy as np >>> import dcor
We create trajectories from two classes, one with zero mean and the other with a peak-like mean. Both have Brownian covariance.
>>> n_samples = 1000 >>> n_features = 100 >>> >>> def mean_1(t): ... return ( ... np.abs(t - 0.25) ... - 2 * np.abs(t - 0.5) ... + np.abs(t - 0.75) ... ) >>> >>> X_0 = make_gaussian_process( ... n_samples=n_samples // 2, ... n_features=n_features, ... random_state=0, ... ) >>> X_1 = make_gaussian_process( ... n_samples=n_samples // 2, ... n_features=n_features, ... mean=mean_1, ... random_state=1, ... ) >>> X = skfda.concatenate((X_0, X_1)) >>> >>> y = np.zeros(n_samples, dtype=np.int_) >>> y [n_samples // 2:] = 1
Select the relevant points to distinguish the two classes. You may specify a method such as MIQ (the default) or MID.
>>> mrmr = variable_selection.MinimumRedundancyMaximumRelevance( ... n_features_to_select=3, ... method="MID", ... ) >>> _ = mrmr.fit(X, y) >>> point_mask = mrmr.get_support() >>> points = X.grid_points[0][point_mask]
Apply the learned dimensionality reduction
>>> X_dimred = mrmr.transform(X) >>> len(X.grid_points[0]) 100 >>> X_dimred.shape (1000, 3)
It is also possible to specify the measure of dependence used (or even different ones for relevance and redundancy) as well as the function to combine relevance and redundancy (usually the division or subtraction operations).
>>> mrmr = variable_selection.MinimumRedundancyMaximumRelevance( ... n_features_to_select=3, ... dependence_measure=dcor.u_distance_correlation_sqr, ... criterion="quotient", ... ) >>> _ = mrmr.fit(X, y)
As a toy example illustrating the customizability of this method, consider the following:
>>> mrmr = variable_selection.MinimumRedundancyMaximumRelevance( ... n_features_to_select=3, ... relevance_dependence_measure=dcor.u_distance_covariance_sqr, ... redundancy_dependence_measure=dcor.u_distance_correlation_sqr, ... criterion=lambda rel, red: 0.5 * rel / red, ... ) >>> _ = mrmr.fit(X, y)
References
Methods
fit
(X, y)fit_transform
(X[, y])Fit to data, then transform it.
Get metadata routing of this object.
get_params
([deep])Get parameters for this estimator.
get_support
([indices])set_output
(*[, transform])Set output container.
set_params
(**params)Set the parameters of this estimator.
transform
(X[, y])- fit_transform(X, y=None, **fit_params)[source]#
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Input samples.
y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).
**fit_params (dict) – Additional fit parameters.
- Returns:
X_new – Transformed array.
- Return type:
ndarray array of shape (n_samples, n_features_new)
- get_metadata_routing()#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
routing – A
MetadataRequest
encapsulating routing information.- Return type:
MetadataRequest
- get_params(deep=True)#
Get parameters for this estimator.
- set_output(*, transform=None)#
Set output container.
See Introducing the set_output API for an example on how to use the API.
- Parameters:
transform ({"default", "pandas"}, default=None) –
Configure output of transform and fit_transform.
”default”: Default output format of a transformer
”pandas”: DataFrame output
”polars”: Polars output
None: Transform configuration is unchanged
New in version 1.4: “polars” option was added.
- Returns:
self – Estimator instance.
- Return type:
estimator instance
- set_params(**params)#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
**params (dict) – Estimator parameters.
- Returns:
self – Estimator instance.
- Return type:
estimator instance