DTMClassifier#

class skfda.ml.classification.DTMClassifier(proportiontocut, depth_method=None, metric=LpDistance(p=2, vector_norm=None))[source]#

Distance to trimmed means (DTM) classification.

Test samples are classified to the class that minimizes the distance of the observation to the trimmed mean of the group [1].

Parameters:

proportiontocut (float) – Indicates the percentage of functions to remove. It is not easy to determine as it varies from dataset to dataset.
depth_method (Depth[Input] | None) – The depth class used to order the data. See the documentation of the depths module for a list of available depths. By default it is ModifiedBandDepth.
metric (Metric[Input]) – Distance function between two functional objects. See the documentation of the metrics module for a list of available metrics. L2 distance is used by default.

Examples

Firstly, we will import and split the Berkeley Growth Study dataset

>>> from skfda.datasets import fetch_growth
>>> from sklearn.model_selection import train_test_split
>>> dataset = fetch_growth()
>>> fd = dataset['data']
>>> y = dataset['target']
>>> X_train, X_test, y_train, y_test = train_test_split(
...     fd, y, test_size=0.25, stratify=y, random_state=0)

We will fit a Distance to trimmed means classifier

>>> from skfda.ml.classification import DTMClassifier
>>> clf = DTMClassifier(proportiontocut=0.25)
>>> clf.fit(X_train, y_train)
DTMClassifier(...)

We can predict the class of new samples

>>> clf.predict(X_test) # Predict labels for test samples
array([ 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1,
        1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 1, 1])

Finally, we calculate the mean accuracy for the test data

>>> clf.score(X_test, y_test)
0.875

See also

NearestCentroid trim_mean

References

Methods

`fit`(X, y)	Fit the model using X as training data and y as target values.
`get_metadata_routing`()	Get metadata routing of this object.
`get_params`([deep])	Get parameters for this estimator.
`predict`(X)	Predict the class labels for the provided data.
`score`(X, y[, sample_weight])	Return accuracy on provided data and labels.
`set_params`(**params)	Set the parameters of this estimator.
`set_score_request`(*[, sample_weight])	Configure whether metadata should be requested to be passed to the `score` method.

fit(X, y)[source]#

Fit the model using X as training data and y as target values.

Parameters:

X (Input) – FDataGrid with the training data or array matrix with shape (n_samples, n_samples) if metric=’precomputed’.
y (Target) – Target values of shape = (n_samples) or (n_samples, n_outputs).

Returns:

self

Return type:

NearestCentroid[Input, Target]

get_metadata_routing()#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:: routing – A MetadataRequest encapsulating routing information.
Return type:: MetadataRequest

get_params(deep=True)#

Get parameters for this estimator.

Parameters:: deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:: params – Parameter names mapped to their values.
Return type:: dict

predict(X)[source]#

Predict the class labels for the provided data.

Parameters:

X (Input) – FDataGrid with the test samples.

Returns:

Array of shape (n_samples) or: (n_samples, n_outputs) with class labels for each data sample.

Return type:

Target

score(X, y, sample_weight=None)[source]#

Return accuracy on provided data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters:

X (array-like of shape (n_samples, n_features)) – Test samples.
y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – True labels for X.
sample_weight (array-like of shape (n_samples,), default=None) – Sample weights.

Returns:

score – Mean accuracy of self.predict(X) w.r.t. y.

Return type:

float

set_params(**params)#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:: **params (dict) – Estimator parameters.
Returns:: self – Estimator instance.
Return type:: estimator instance

set_score_request(*, sample_weight='$UNCHANGED$')#

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.
self (DTMClassifier)

Returns:

self – The updated object.

Return type:

object

DTMClassifier#

This Page