FuzzyCMeans#
- class skfda.ml.clustering.FuzzyCMeans(*, n_clusters=2, init=None, metric=LpDistance(p=2, vector_norm=None), n_init=1, max_iter=100, tol=0.0001, random_state=0, fuzzifier=2)[source]#
Fuzzy c-Means clustering for functional data.
Let \(\mathbf{X = \left\{ x_{1}, x_{2}, ..., x_{n}\right\}}\) be a given dataset to be analyzed, and \(\mathbf{V = \left\{ v_{1}, v_{2}, ..., v_{c}\right\}}\) be the set of centers of clusters in \(\mathbf{X}\) dataset in \(m\) dimensional space \(\left( \mathbb{R}^m \right)\). Where \(n\) is the number of objects, \(m\) is the number of features, and \(c\) is the number of partitions or clusters.
FCM minimizes the following objective function:
\[J_{FCM}\left(\mathbf{X}; \mathbf{U, V}\right) = \sum_{i=1}^{c} \sum_{j=1}^{n}u_{ij}^{f}D_{ij}^2.\]This function differs from classical KM with the use of weighted squared errors instead of using squared errors only. In the objective function, \(\mathbf{U}\) is a fuzzy partition matrix that is computed from dataset \(\mathbf{X}\): \(\mathbf{U} = [u_{ij}] \in M_{FCM}\).
The fuzzy clustering of \(\mathbf{X}\) is represented with \(\mathbf{U}\) membership matrix. The element \(u_{ij}\) is the membership value of j-th object to i-th cluster. In this case, the i-th row of \(\mathbf{U}\) matrix is formed with membership values of \(n\) objects to i-th cluster. \(\mathbf{V}\) is a prototype vector of cluster prototypes (centroids): \(\mathbf{V = \left\{ v_{1}, v_{2}, ..., v_{c}\right\}}\),:math:mathbf{v_{i}}in mathbb{R}^m.
\(D_{ij}^2\) is the squared chosen distance measure which can be any p-norm: \(D_{ij} =\lVert x_{ij} - v_{i} \rVert = \left( \int_I \lvert x_{ij} - v_{i}\rvert^p dx \right)^{ \frac{1}{p}}\), being \(I\) the domain where \(\mathbf{X}\) is defined, \(1 \leqslant i \leqslant c\), \(1 \leqslant j\leqslant n_{i}\). Where \(n_{i}\) represents the number of data points in i-th cluster.
FCM is an iterative process and stops when the number of iterations is reached to maximum, or when the centroids of the clusters do not change. The steps involved in FCM are:
- Centroids of \(c\) clusters are chosen from \(\mathbf{X}\)
randomly or are passed to the function as a parameter.
- Membership values of data points to each cluster are calculated
with: \(u_{ij} = \left[ \sum_{k=1}^c\left( D_{ij}/D_{kj} \right)^\frac{2}{f-1} \right]^{-1}\).
- Cluster centroids are updated by using the following formula:
\(\mathbf{v_{i}} =\frac{\sum_{j=1}^{n}u_{ij}^f x_{j}}{ \sum_{j=1}^{n} u_{ij}^f}\), \(1 \leqslant i \leqslant c\).
- If no cluster centroid changes the run of algorithm is stopped,
otherwise return to step 2.
This algorithm is applied for each dimension on the image of the FDataGrid object.
- Parameters:
n_clusters (int) – Number of groups into which the samples are classified. Defaults to 2.
init (Input | None) – Contains the initial centers of the different clusters the algorithm starts with. Its data_marix must be of the shape (n_clusters, fdatagrid.ncol, fdatagrid.dim_codomain). Defaults to None, and the centers are initialized randomly.
metric (Metric[Input]) – functional data metric. Defaults to l2_distance.
n_init (int) – Number of time the k-means algorithm will be run with different centroid seeds. The final results will be the best output of n_init consecutive runs in terms of inertia.
max_iter (int) – Maximum number of iterations of the clustering algorithm for a single run. Defaults to 100.
tol (float) – tolerance used to compare the centroids calculated with the previous ones in every single run of the algorithm.
random_state (RandomStateLike) – Determines random number generation for centroid initialization. Use an int to make the randomness deterministic. Defaults to 0. See Glossary.
fuzzifier (float) – Scalar parameter used to specify the degree of fuzziness in the fuzzy algorithm. Defaults to 2.
- Attributes:
membership_degree_ – Matrix in which each entry contains the probability of belonging to each group.
labels_ – Vector in which each entry contains the cluster each observation belongs to (the one with the maximum membership degree).
cluster_centers_ – data_matrix of shape (n_clusters, ncol, dim_codomain) and contains the centroids for each cluster.
inertia_ – Sum of squared distances of samples to their closest cluster center for each dimension.
n_iter_ – number of iterations the algorithm was run for each dimension.
Example
>>> import skfda >>> data_matrix = [[[1, 0.3], [2, 0.4], [3, 0.5], [4, 0.6]], ... [[2, 0.5], [3, 0.6], [4, 0.7], [5, 0.7]], ... [[3, 0.2], [4, 0.3], [5, 0.4], [6, 0.5]]] >>> grid_points = [2, 4, 6, 8] >>> fd = skfda.FDataGrid(data_matrix, grid_points) >>> fuzzy_kmeans = skfda.ml.clustering.FuzzyCMeans(random_state=0) >>> fuzzy_kmeans.fit(fd) FuzzyCMeans(...) >>> fuzzy_kmeans.cluster_centers_.data_matrix array([[[ 2.83994301, 0.24786354], [ 3.83994301, 0.34786354], [ 4.83994301, 0.44786354], [ 5.83994301, 0.53191927]], [[ 1.25134384, 0.35023779], [ 2.25134384, 0.45023779], [ 3.25134384, 0.55023779], [ 4.25134384, 0.6251158 ]]])
Methods
fit
(X[, y, sample_weight])Fit the model.
fit_predict
(X[, y])Perform clustering on X and returns cluster labels.
fit_transform
(X[, y, sample_weight])Compute clustering and transform X to cluster-distance space.
Get metadata routing of this object.
get_params
([deep])Get parameters for this estimator.
predict
(X[, sample_weight])Predict the closest cluster each sample in X belongs to.
predict_proba
(X[, sample_weight])Predict the probability of belonging to each cluster.
score
(X[, y, sample_weight])Opposite of the value of X on the K-means objective.
set_fit_request
(*[, sample_weight])Request metadata passed to the
fit
method.set_output
(*[, transform])Set output container.
set_params
(**params)Set the parameters of this estimator.
set_predict_proba_request
(*[, sample_weight])Request metadata passed to the
predict_proba
method.set_predict_request
(*[, sample_weight])Request metadata passed to the
predict
method.set_score_request
(*[, sample_weight])Request metadata passed to the
score
method.transform
(X)Transform X to a cluster-distance space.
- fit(X, y=None, sample_weight=None)[source]#
Fit the model.
- Parameters:
X (Input) – Object whose samples are clusered, classified into different groups.
y (object | None) – present here for API consistency by convention.
sample_weight (None) – present here for API consistency by convention.
self (SelfType) –
- Returns:
Fitted model.
- Return type:
SelfType
- fit_predict(X, y=None)[source]#
Perform clustering on X and returns cluster labels.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Input data.
y (Ignored) – Not used, present for API consistency by convention.
**kwargs (dict) –
Arguments to be passed to
fit
.New in version 1.4.
- Returns:
labels – Cluster labels.
- Return type:
ndarray of shape (n_samples,), dtype=np.int64
- fit_transform(X, y=None, sample_weight=None)[source]#
Compute clustering and transform X to cluster-distance space.
- Parameters:
X (Input) – Object whose samples are classified into different groups.
y (object) – present here for API consistency by convention.
sample_weight (None) – present here for API consistency by convention.
- Returns:
Distances of each sample to each cluster.
- Return type:
- get_metadata_routing()#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
routing – A
MetadataRequest
encapsulating routing information.- Return type:
MetadataRequest
- get_params(deep=True)#
Get parameters for this estimator.
- set_fit_request(*, sample_weight='$UNCHANGED$')#
Request metadata passed to the
fit
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed tofit
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it tofit
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.- Parameters:
sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
sample_weight
parameter infit
.self (FuzzyCMeans) –
- Returns:
self – The updated object.
- Return type:
- set_output(*, transform=None)#
Set output container.
See Introducing the set_output API for an example on how to use the API.
- Parameters:
transform ({"default", "pandas"}, default=None) –
Configure output of transform and fit_transform.
”default”: Default output format of a transformer
”pandas”: DataFrame output
”polars”: Polars output
None: Transform configuration is unchanged
New in version 1.4: “polars” option was added.
- Returns:
self – Estimator instance.
- Return type:
estimator instance
- set_params(**params)#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
**params (dict) – Estimator parameters.
- Returns:
self – Estimator instance.
- Return type:
estimator instance
- set_predict_proba_request(*, sample_weight='$UNCHANGED$')#
Request metadata passed to the
predict_proba
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed topredict_proba
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it topredict_proba
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.- Parameters:
sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
sample_weight
parameter inpredict_proba
.self (FuzzyCMeans) –
- Returns:
self – The updated object.
- Return type:
- set_predict_request(*, sample_weight='$UNCHANGED$')#
Request metadata passed to the
predict
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed topredict
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it topredict
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.- Parameters:
sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
sample_weight
parameter inpredict
.self (FuzzyCMeans) –
- Returns:
self – The updated object.
- Return type:
- set_score_request(*, sample_weight='$UNCHANGED$')#
Request metadata passed to the
score
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.- Parameters:
sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
sample_weight
parameter inscore
.self (FuzzyCMeans) –
- Returns:
self – The updated object.
- Return type: