AgglomerativeClustering#

class skfda.ml.clustering.AgglomerativeClustering(n_clusters=2, *, metric=LpDistance(p=2, vector_norm=None), memory=None, connectivity=None, compute_full_tree='auto', linkage, distance_threshold=None)[source]#

Agglomerative Clustering.

Recursively merges the pair of clusters that minimally increases a given linkage distance.

Notes

This class is an extension of sklearn.cluster.AgglomerativeClustering that accepts functional data objects and metrics. Please check also the documentation of the original class.

Parameters:

n_clusters (int | None) – The number of clusters to find. It must be None if distance_threshold is not None.
metric (MetricOrPrecomputed[MetricElementType]) – Metric used to compute the linkage. If it is skfda.misc.metrics.PRECOMPUTED or the string "precomputed", a distance matrix (instead of a similarity matrix) is needed as input for the fit method.
memory (str | joblib.Memory | None) – Used to cache the output of the computation of the tree. By default, no caching is done. If a string is given, it is the path to the caching directory.
connectivity (Connectivity[MetricElementType]) – Connectivity matrix. Defines for each sample the neighboring samples following a given structure of the data. This can be a connectivity matrix itself or a callable that transforms the data into a connectivity matrix, such as derived from kneighbors_graph. Default is None, i.e, the hierarchical clustering algorithm is unstructured.
compute_full_tree (Literal['auto'] | bool) – Stop early the construction of the tree at n_clusters. This is useful to decrease computation time if the number of clusters is not small compared to the number of samples. This option is useful only when specifying a connectivity matrix. Note also that when varying the number of clusters and using caching, it may be advantageous to compute the full tree. It must be True if distance_threshold is not None. By default compute_full_tree is “auto”, which is equivalent to True when distance_threshold is not None or that n_clusters is inferior to the maximum between 100 or 0.02 * n_samples. Otherwise, “auto” is equivalent to False.
linkage (LinkageCriterionLike) –
Which linkage criterion to use. The linkage criterion determines which distance to use between sets of observation. The algorithm will merge the pairs of clusters that minimize this criterion.
- average uses the average of the distances of each observation of the two sets.
- complete or maximum linkage uses the maximum distances between all observations of the two sets.
- single uses the minimum of the distances between all observations of the two sets.
distance_threshold (float | None) – The linkage distance threshold above which, clusters will not be merged. If not None, n_clusters must be None and compute_full_tree must be True.

Attributes:

n_clusters_ – The number of clusters found by the algorithm. If distance_threshold=None, it will be equal to the given n_clusters.
labels_ – cluster labels for each point
n_leaves_ – Number of leaves in the hierarchical tree.
n_connected_components_ – The estimated number of connected components in the graph.
children_ – The children of each non-leaf node. Values less than n_samples correspond to leaves of the tree which are the original samples. A node i greater than or equal to n_samples is a non-leaf node and has children children_[i - n_samples]. Alternatively at the i-th iteration, children[i][0] and children[i][1] are merged to form node n_samples + i

Examples

>>> from skfda import FDataGrid
>>> from skfda.ml.clustering import AgglomerativeClustering
>>> import numpy as np
>>> data_matrix = np.array([[1, 2], [1, 4], [1, 0],
...                        [4, 2], [4, 4], [4, 0]])
>>> X = FDataGrid(data_matrix)
>>> clustering = AgglomerativeClustering(
...     linkage=AgglomerativeClustering.LinkageCriterion.COMPLETE,
... )
>>> clustering.fit(X)
AgglomerativeClustering(...)
>>> clustering.labels_.astype(np.int_)
array([ 0, 0, 1, 0, 0, 1])

Methods

`fit`(X[, y])
`fit_predict`(X[, y])	Perform clustering on X and returns cluster labels.
`get_metadata_routing`()	Get metadata routing of this object.
`get_params`([deep])	Get parameters for this estimator.
`set_params`(**params)	Set the parameters of this estimator.

fit(X, y=None)[source]#

Parameters:

X (MetricElementType)
y (None)

Return type:

AgglomerativeClustering[MetricElementType]

fit_predict(X, y=None)[source]#

Perform clustering on X and returns cluster labels.

Parameters:

X (array-like of shape (n_samples, n_features)) – Input data.
y (Ignored) – Not used, present for API consistency by convention.
**kwargs (dict) –
Arguments to be passed to fit.

Added in version 1.4.

Returns:

labels – Cluster labels.

Return type:

ndarray of shape (n_samples,), dtype=np.int64

get_metadata_routing()#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:: routing – A MetadataRequest encapsulating routing information.
Return type:: MetadataRequest

get_params(deep=True)#

Get parameters for this estimator.

Parameters:: deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:: params – Parameter names mapped to their values.
Return type:: dict

set_params(**params)#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:: **params (dict) – Estimator parameters.
Returns:: self – Estimator instance.
Return type:: estimator instance

AgglomerativeClustering#

This Page