FuzzyCMeans#

class skfda.ml.clustering.FuzzyCMeans(*, n_clusters=2, init=None, metric=LpDistance(p=2, vector_norm=None), n_init=1, max_iter=100, tol=0.0001, random_state=0, fuzzifier=2)[source]#

Fuzzy c-Means clustering for functional data.

Let \(\mathbf{X = \left\{ x_{1}, x_{2}, ..., x_{n}\right\}}\) be a given dataset to be analyzed, and \(\mathbf{V = \left\{ v_{1}, v_{2}, ..., v_{c}\right\}}\) be the set of centers of clusters in \(\mathbf{X}\) dataset in \(m\) dimensional space \(\left( \mathbb{R}^m \right)\). Where \(n\) is the number of objects, \(m\) is the number of features, and \(c\) is the number of partitions or clusters.

FCM minimizes the following objective function:

\[J_{FCM}\left(\mathbf{X}; \mathbf{U, V}\right) = \sum_{i=1}^{c} \sum_{j=1}^{n}u_{ij}^{f}D_{ij}^2.\]

This function differs from classical KM with the use of weighted squared errors instead of using squared errors only. In the objective function, \(\mathbf{U}\) is a fuzzy partition matrix that is computed from dataset \(\mathbf{X}\): \(\mathbf{U} = [u_{ij}] \in M_{FCM}\).

The fuzzy clustering of \(\mathbf{X}\) is represented with \(\mathbf{U}\) membership matrix. The element \(u_{ij}\) is the membership value of j-th object to i-th cluster. In this case, the i-th row of \(\mathbf{U}\) matrix is formed with membership values of \(n\) objects to i-th cluster. \(\mathbf{V}\) is a prototype vector of cluster prototypes (centroids): \(\mathbf{V = \left\{ v_{1}, v_{2}, ..., v_{c}\right\}}\),:math:mathbf{v_{i}}in mathbb{R}^m.

\(D_{ij}^2\) is the squared chosen distance measure which can be any p-norm: \(D_{ij} =\lVert x_{ij} - v_{i} \rVert = \left( \int_I \lvert x_{ij} - v_{i}\rvert^p dx \right)^{ \frac{1}{p}}\), being \(I\) the domain where \(\mathbf{X}\) is defined, \(1 \leqslant i \leqslant c\), \(1 \leqslant j\leqslant n_{i}\). Where \(n_{i}\) represents the number of data points in i-th cluster.

FCM is an iterative process and stops when the number of iterations is reached to maximum, or when the centroids of the clusters do not change. The steps involved in FCM are:

  1. Centroids of \(c\) clusters are chosen from \(\mathbf{X}\)

    randomly or are passed to the function as a parameter.

  2. Membership values of data points to each cluster are calculated

    with: \(u_{ij} = \left[ \sum_{k=1}^c\left( D_{ij}/D_{kj} \right)^\frac{2}{f-1} \right]^{-1}\).

  3. Cluster centroids are updated by using the following formula:

    \(\mathbf{v_{i}} =\frac{\sum_{j=1}^{n}u_{ij}^f x_{j}}{ \sum_{j=1}^{n} u_{ij}^f}\), \(1 \leqslant i \leqslant c\).

  4. If no cluster centroid changes the run of algorithm is stopped,

    otherwise return to step 2.

This algorithm is applied for each dimension on the image of the FDataGrid object.

Parameters:
  • n_clusters (int) – Number of groups into which the samples are classified. Defaults to 2.

  • init (Input | None) – Contains the initial centers of the different clusters the algorithm starts with. Its data_marix must be of the shape (n_clusters, fdatagrid.ncol, fdatagrid.dim_codomain). Defaults to None, and the centers are initialized randomly.

  • metric (Metric[Input]) – functional data metric. Defaults to l2_distance.

  • n_init (int) – Number of time the k-means algorithm will be run with different centroid seeds. The final results will be the best output of n_init consecutive runs in terms of inertia.

  • max_iter (int) – Maximum number of iterations of the clustering algorithm for a single run. Defaults to 100.

  • tol (float) – tolerance used to compare the centroids calculated with the previous ones in every single run of the algorithm.

  • random_state (RandomStateLike) – Determines random number generation for centroid initialization. Use an int to make the randomness deterministic. Defaults to 0. See Glossary.

  • fuzzifier (float) – Scalar parameter used to specify the degree of fuzziness in the fuzzy algorithm. Defaults to 2.

Attributes:
  • membership_degree_ – Matrix in which each entry contains the probability of belonging to each group.

  • labels_ – Vector in which each entry contains the cluster each observation belongs to (the one with the maximum membership degree).

  • cluster_centers_ – data_matrix of shape (n_clusters, ncol, dim_codomain) and contains the centroids for each cluster.

  • inertia_ – Sum of squared distances of samples to their closest cluster center for each dimension.

  • n_iter_ – number of iterations the algorithm was run for each dimension.

Example

>>> import skfda
>>> data_matrix = [[[1, 0.3], [2, 0.4], [3, 0.5], [4, 0.6]],
...                [[2, 0.5], [3, 0.6], [4, 0.7], [5, 0.7]],
...                [[3, 0.2], [4, 0.3], [5, 0.4], [6, 0.5]]]
>>> grid_points = [2, 4, 6, 8]
>>> fd = skfda.FDataGrid(data_matrix, grid_points)
>>> fuzzy_kmeans = skfda.ml.clustering.FuzzyCMeans(random_state=0)
>>> fuzzy_kmeans.fit(fd)
FuzzyCMeans(...)
>>> fuzzy_kmeans.cluster_centers_.data_matrix
array([[[ 2.83994301,  0.24786354],
        [ 3.83994301,  0.34786354],
        [ 4.83994301,  0.44786354],
        [ 5.83994301,  0.53191927]],
       [[ 1.25134384,  0.35023779],
        [ 2.25134384,  0.45023779],
        [ 3.25134384,  0.55023779],
        [ 4.25134384,  0.6251158 ]]])

Methods

fit(X[, y, sample_weight])

Fit the model.

fit_predict(X[, y])

Perform clustering on X and returns cluster labels.

fit_transform(X[, y, sample_weight])

Compute clustering and transform X to cluster-distance space.

get_metadata_routing()

Get metadata routing of this object.

get_params([deep])

Get parameters for this estimator.

predict(X[, sample_weight])

Predict the closest cluster each sample in X belongs to.

predict_proba(X[, sample_weight])

Predict the probability of belonging to each cluster.

score(X[, y, sample_weight])

Opposite of the value of X on the K-means objective.

set_fit_request(*[, sample_weight])

Request metadata passed to the fit method.

set_output(*[, transform])

Set output container.

set_params(**params)

Set the parameters of this estimator.

set_predict_proba_request(*[, sample_weight])

Request metadata passed to the predict_proba method.

set_predict_request(*[, sample_weight])

Request metadata passed to the predict method.

set_score_request(*[, sample_weight])

Request metadata passed to the score method.

transform(X)

Transform X to a cluster-distance space.

fit(X, y=None, sample_weight=None)[source]#

Fit the model.

Parameters:
  • X (Input) – Object whose samples are clusered, classified into different groups.

  • y (object | None) – present here for API consistency by convention.

  • sample_weight (None) – present here for API consistency by convention.

  • self (SelfType) –

Returns:

Fitted model.

Return type:

SelfType

fit_predict(X, y=None)[source]#

Perform clustering on X and returns cluster labels.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Input data.

  • y (Ignored) – Not used, present for API consistency by convention.

  • **kwargs (dict) –

    Arguments to be passed to fit.

    New in version 1.4.

Returns:

labels – Cluster labels.

Return type:

ndarray of shape (n_samples,), dtype=np.int64

fit_transform(X, y=None, sample_weight=None)[source]#

Compute clustering and transform X to cluster-distance space.

Parameters:
  • X (Input) – Object whose samples are classified into different groups.

  • y (object) – present here for API consistency by convention.

  • sample_weight (None) – present here for API consistency by convention.

Returns:

Distances of each sample to each cluster.

Return type:

ndarray[Any, dtype[float64]]

get_metadata_routing()#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:

routing – A MetadataRequest encapsulating routing information.

Return type:

MetadataRequest

get_params(deep=True)#

Get parameters for this estimator.

Parameters:

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params – Parameter names mapped to their values.

Return type:

dict

predict(X, sample_weight=None)[source]#

Predict the closest cluster each sample in X belongs to.

Parameters:
  • X (Input) – Object whose samples are classified into different groups.

  • sample_weight (None) – present here for API consistency by convention.

Returns:

Label of each sample.

Return type:

ndarray[Any, dtype[int64]]

predict_proba(X, sample_weight=None)[source]#

Predict the probability of belonging to each cluster.

Parameters:
  • X (Input) – Object whose samples are classified into different groups.

  • sample_weight (None) – present here for API consistency by convention.

Returns:

Probability of belonging to each cluster for each sample.

Return type:

ndarray[Any, dtype[float64]]

score(X, y=None, sample_weight=None)[source]#

Opposite of the value of X on the K-means objective.

Parameters:
  • X (Input) – Object whose samples are classified into different groups.

  • y (object | None) – present here for API consistency by convention.

  • sample_weight (None) – present here for API consistency by convention.

Returns:

Negative inertia_ attribute.

Return type:

float

set_fit_request(*, sample_weight='$UNCHANGED$')#

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:
  • sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in fit.

  • self (FuzzyCMeans) –

Returns:

self – The updated object.

Return type:

object

set_output(*, transform=None)#

Set output container.

See Introducing the set_output API for an example on how to use the API.

Parameters:

transform ({"default", "pandas"}, default=None) –

Configure output of transform and fit_transform.

  • ”default”: Default output format of a transformer

  • ”pandas”: DataFrame output

  • ”polars”: Polars output

  • None: Transform configuration is unchanged

New in version 1.4: “polars” option was added.

Returns:

self – Estimator instance.

Return type:

estimator instance

set_params(**params)#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**params (dict) – Estimator parameters.

Returns:

self – Estimator instance.

Return type:

estimator instance

set_predict_proba_request(*, sample_weight='$UNCHANGED$')#

Request metadata passed to the predict_proba method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to predict_proba if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to predict_proba.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:
  • sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in predict_proba.

  • self (FuzzyCMeans) –

Returns:

self – The updated object.

Return type:

object

set_predict_request(*, sample_weight='$UNCHANGED$')#

Request metadata passed to the predict method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to predict if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to predict.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:
  • sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in predict.

  • self (FuzzyCMeans) –

Returns:

self – The updated object.

Return type:

object

set_score_request(*, sample_weight='$UNCHANGED$')#

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:
  • sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

  • self (FuzzyCMeans) –

Returns:

self – The updated object.

Return type:

object

transform(X)[source]#

Transform X to a cluster-distance space.

Parameters:

X (Input) – Object whose samples are classified into different groups.

Returns:

distances of each sample to each cluster.

Return type:

distances_to_centers

Examples using skfda.ml.clustering.FuzzyCMeans#

Clustering

Clustering