DDGClassifier#

class skfda.ml.classification.DDGClassifier(*, multivariate_classifier=None, depth_method=None)[source]#

Generalized depth-versus-depth (DD) classifier for functional data.

This classifier builds an interface around the DDGTransfomer. The transformer takes a list of k depths and performs the following map [1]:

\[\begin{split}\mathcal{X} &\rightarrow \mathbb{R}^G \\ x &\rightarrow \textbf{d} = (D_1^1(x), D_1^2(x),...,D_g^k(x))\end{split}\]

Where \(D_i^j(x)\) is the depth of the point \(x\) with respect to the data in the \(i\)-th group using the \(j\)-th depth of the provided list. Note that \(\mathcal{X}\) is possibly multivariate, that is, \(\mathcal{X} = \mathcal{X}_1 \times ... \times \mathcal{X}_p\).

In the G dimensional space the classification is performed using a multivariate classifer.

Parameters:
  • depth_method (Depth[Input] | Sequence[Tuple[str, Depth[Input]]] | None) – The depth class or sequence of depths to use when calculating the depth of a test sample in a class. See the documentation of the depths module for a list of available depths. By default it is ModifiedBandDepth.

  • multivariate_classifier (ClassifierMixin[NDArrayFloat, NDArrayInt] | None) – The multivariate classifier to use in the DDG-plot.

Examples

Firstly, we will import and split the Berkeley Growth Study dataset

>>> from skfda.datasets import fetch_growth
>>> from sklearn.model_selection import train_test_split
>>> dataset = fetch_growth()
>>> fd = dataset['data']
>>> y = dataset['target']
>>> X_train, X_test, y_train, y_test = train_test_split(
...     fd, y, test_size=0.25, stratify=y, random_state=0)
>>> from skfda.exploratory.depth import (
...     ModifiedBandDepth,
...     IntegratedDepth,
... )
>>> from sklearn.neighbors import KNeighborsClassifier

We will fit a DDG-classifier using KNN

>>> from skfda.ml.classification import DDGClassifier
>>> clf = DDGClassifier(
...     depth_method=ModifiedBandDepth(),
...     multivariate_classifier=KNeighborsClassifier(),
... )
>>> clf.fit(X_train, y_train)
DDGClassifier(...)

We can predict the class of new samples

>>> clf.predict(X_test)
array([ 1, 1, 1, 0, 1, 1, 1, 1, 0, 0, 0, 1,
        1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 1, 1])

Finally, we calculate the mean accuracy for the test data

>>> clf.score(X_test, y_test)
0.875

It is also possible to use several depth functions to increase the number of features available to the classifier

>>> clf = DDGClassifier(
...     depth_method=[
...         ("mbd", ModifiedBandDepth()),
...         ("id", IntegratedDepth()),
...     ],
...     multivariate_classifier=KNeighborsClassifier(),
... )
>>> clf.fit(X_train, y_train)
DDGClassifier(...)
>>> clf.predict(X_test)
array([ 1, 1, 1, 0, 1, 1, 1, 1, 0, 0, 0, 1,
        1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 1, 1])
>>> clf.score(X_test, y_test)
0.875

See also

DDClassifier MaximumDepthClassifier _ddg_transformer

References

Methods

fit(X, y)

Fit the model using X as training data and y as target values.

get_metadata_routing()

Get metadata routing of this object.

get_params([deep])

Get parameters for this estimator.

predict(X)

Predict the class labels for the provided data.

score(X, y[, sample_weight])

Return accuracy on provided data and labels.

set_params(**params)

Set the parameters of this estimator.

set_score_request(*[, sample_weight])

Configure whether metadata should be requested to be passed to the score method.

fit(X, y)[source]#

Fit the model using X as training data and y as target values.

Parameters:
  • X (Input) – FDataGrid with the training data.

  • y (Target) – Target values of shape = (n_samples).

Returns:

self

Return type:

DDGClassifier[Input, Target]

get_metadata_routing()#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:

routing – A MetadataRequest encapsulating routing information.

Return type:

MetadataRequest

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters:

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params – Parameter names mapped to their values.

Return type:

dict

predict(X)[source]#

Predict the class labels for the provided data.

Parameters:

X (Input) – FDataGrid with the test samples.

Returns:

Array of shape (n_samples) with class labels

for each data sample.

Return type:

Target

score(X, y, sample_weight=None)[source]#

Return accuracy on provided data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Test samples.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – True labels for X.

  • sample_weight (array-like of shape (n_samples,), default=None) – Sample weights.

Returns:

score – Mean accuracy of self.predict(X) w.r.t. y.

Return type:

float

set_params(**params)[source]#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**params (dict) – Estimator parameters.

Returns:

self – Estimator instance.

Return type:

estimator instance

set_score_request(*, sample_weight='$UNCHANGED$')#

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

  • self (DDGClassifier)

Returns:

self – The updated object.

Return type:

object

Examples using skfda.ml.classification.DDGClassifier#

Classification methods

Classification methods

Depth based classification

Depth based classification

Scikit-fda and scikit-learn

Scikit-fda and scikit-learn