.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/plot_clustering.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code or to run this example in your browser via Binder .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_plot_clustering.py: Clustering ========== In this example, the use of the clustering plot methods is shown applied to the Canadian Weather dataset. K-Means and Fuzzy K-Means algorithms are employed to calculate the results plotted. .. GENERATED FROM PYTHON SOURCE LINES 9-26 .. code-block:: Python # Author: Amanda Hernando Bernabé # License: MIT # sphinx_gallery_thumbnail_number = 6 import matplotlib.pyplot as plt import numpy as np from skfda import datasets from skfda.exploratory.visualization.clustering import ( ClusterMembershipLinesPlot, ClusterMembershipPlot, ClusterPlot, ) from skfda.ml.clustering import FuzzyCMeans, KMeans .. GENERATED FROM PYTHON SOURCE LINES 27-31 First, the Canadian Weather dataset is downloaded from the package 'fda' in CRAN. It contains a FDataGrid with daily temperatures and precipitations, that is, it has a 2-dimensional image. We are interested only in the daily average temperatures, so we select the first coordinate function. .. GENERATED FROM PYTHON SOURCE LINES 31-41 .. code-block:: Python X, y = datasets.fetch_weather(return_X_y=True, as_frame=True) fd = X.iloc[:, 0].values fd_temperatures = fd.coordinates[0] target = y.values # The desired FDataGrid only contains 10 random samples, so that the example # provides clearer plots. indices_samples = np.array([1, 3, 5, 10, 14, 17, 21, 25, 27, 30]) fd = fd_temperatures[indices_samples] .. GENERATED FROM PYTHON SOURCE LINES 42-45 The data is plotted to show the curves we are working with. They are divided according to the target. In this case, it includes the different climates to which the weather stations belong to. .. GENERATED FROM PYTHON SOURCE LINES 45-56 .. code-block:: Python climates = target[indices_samples].remove_unused_categories() # Assigning the color to each of the groups. colormap = plt.cm.get_cmap('tab20b') n_climates = len(climates.categories) climate_colors = colormap(np.arange(n_climates) / (n_climates - 1)) fd.plot(group=climates.codes, group_names=climates.categories, group_colors=climate_colors) .. image-sg:: /auto_examples/images/sphx_glr_plot_clustering_001.png :alt: Canadian Weather :srcset: /auto_examples/images/sphx_glr_plot_clustering_001.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none /home/docs/checkouts/readthedocs.org/user_builds/fda/checkouts/latest/examples/plot_clustering.py:49: MatplotlibDeprecationWarning: The get_cmap function was deprecated in Matplotlib 3.7 and will be removed two minor releases later. Use ``matplotlib.colormaps[name]`` or ``matplotlib.colormaps.get_cmap(obj)`` instead. colormap = plt.cm.get_cmap('tab20b')
.. GENERATED FROM PYTHON SOURCE LINES 57-60 The number of clusters is set with the number of climates, in order to see the performance of the clustering methods, and the seed is set to one in order to obatain always the same result for the example. .. GENERATED FROM PYTHON SOURCE LINES 60-64 .. code-block:: Python n_clusters = n_climates seed = 2 .. GENERATED FROM PYTHON SOURCE LINES 65-71 First, the class :class:`~skfda.ml.clustering.KMeans` is instantiated with the desired. parameters. Its :func:`~skfda.ml.clustering.KMeans.fit` method is called, resulting in the calculation of several attributes which include among others, the the number of cluster each sample belongs to (labels), and the centroids of each cluster. The labels are obtaiined calling the method :func:`~skfda.ml.clustering.KMeans.predict`. .. GENERATED FROM PYTHON SOURCE LINES 71-76 .. code-block:: Python kmeans = KMeans(n_clusters=n_clusters, random_state=seed) kmeans.fit(fd) print(kmeans.predict(fd)) .. rst-class:: sphx-glr-script-out .. code-block:: none [0 1 0 0 0 2 2 1 0 2] .. GENERATED FROM PYTHON SOURCE LINES 77-80 To see the information in a graphic way, the method :func:`~skfda.exploratory.visualization.clustering_plots.plot_clusters` can be used. .. GENERATED FROM PYTHON SOURCE LINES 80-89 .. code-block:: Python # Customization of cluster colors and labels in order to match the first image # of raw data. cluster_colors = climate_colors[np.array([0, 2, 1])] cluster_labels = climates.categories[np.array([0, 2, 1])] ClusterPlot(kmeans, fd, cluster_colors=cluster_colors, cluster_labels=cluster_labels).plot() .. image-sg:: /auto_examples/images/sphx_glr_plot_clustering_002.png :alt: Canadian Weather :srcset: /auto_examples/images/sphx_glr_plot_clustering_002.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none
.. GENERATED FROM PYTHON SOURCE LINES 90-100 Other clustering algorithm implemented is the Fuzzy K-Means found in the class :class:`~skfda.ml.clustering.FuzzyCMeans`. Following the above procedure, an object of this type is instantiated with the desired data and then, the :func:`~skfda.ml.clustering.FuzzyCMeans.fit` method is called. Internally, the attribute ``membership_degree_`` is calculated, which contains ´n_clusters´ elements for each sample and dimension, denoting the degree of membership of each sample to each cluster. They are obtained calling the method :func:`~skfda.ml.clustering.FuzzyCMeans.predict_proba`. Also, the centroids of each cluster are obtained. .. GENERATED FROM PYTHON SOURCE LINES 100-105 .. code-block:: Python fuzzy_kmeans = FuzzyCMeans(n_clusters=n_clusters, random_state=seed) fuzzy_kmeans.fit(fd) print(fuzzy_kmeans.predict_proba(fd)) .. rst-class:: sphx-glr-script-out .. code-block:: none [[0.8721254 0.11189295 0.01598165] [0.4615364 0.51285956 0.02560405] [0.97428363 0.01882257 0.0068938 ] [0.91184323 0.05369029 0.03446648] [0.79072268 0.18411219 0.02516513] [0.178624 0.05881132 0.76256468] [0.01099498 0.00492593 0.98407909] [0.03156897 0.96349997 0.00493106] [0.8084018 0.13418057 0.05741763] [0.03767122 0.01891178 0.943417 ]] .. GENERATED FROM PYTHON SOURCE LINES 106-110 To see the information in a graphic way, the method :func:`~skfda.exploratory.visualization.clustering_plots.plot_clusters` can be used. It assigns each sample to the cluster whose membership value is the greatest. .. GENERATED FROM PYTHON SOURCE LINES 110-114 .. code-block:: Python ClusterPlot(fuzzy_kmeans, fd, cluster_colors=cluster_colors, cluster_labels=cluster_labels).plot() .. image-sg:: /auto_examples/images/sphx_glr_plot_clustering_003.png :alt: Canadian Weather :srcset: /auto_examples/images/sphx_glr_plot_clustering_003.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none
.. GENERATED FROM PYTHON SOURCE LINES 115-121 Another plot implemented to show the results in the class :class:`~skfda.ml.clustering.FuzzyCMeans` is :func:`~skfda.exploratory.visualization.clustering_plots.plot_cluster_lines` which is similar to parallel coordinates. It is recommended to assign colors to each of the samples in order to identify them. In this example, the colors are the ones of the first plot, dividing the samples by climate. .. GENERATED FROM PYTHON SOURCE LINES 121-127 .. code-block:: Python colors_by_climate = colormap(climates.codes / (n_climates - 1)) ClusterMembershipLinesPlot(fuzzy_kmeans, fd, cluster_labels=cluster_labels, sample_colors=colors_by_climate).plot() .. image-sg:: /auto_examples/images/sphx_glr_plot_clustering_004.png :alt: Degrees of membership of the samples to each cluster :srcset: /auto_examples/images/sphx_glr_plot_clustering_004.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none
.. GENERATED FROM PYTHON SOURCE LINES 128-132 Finally, the function :func:`~skfda.exploratory.visualization.clustering_plots.plot_cluster_bars` returns a barplot. Each sample is designated with a bar which is filled proportionally to the membership values with the color of each cluster. .. GENERATED FROM PYTHON SOURCE LINES 132-136 .. code-block:: Python ClusterMembershipPlot(fuzzy_kmeans, fd, cluster_colors=cluster_colors, cluster_labels=cluster_labels).plot() .. image-sg:: /auto_examples/images/sphx_glr_plot_clustering_005.png :alt: Degrees of membership of the samples to each cluster :srcset: /auto_examples/images/sphx_glr_plot_clustering_005.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none
.. GENERATED FROM PYTHON SOURCE LINES 137-142 The possibility of sorting the bars according to a cluster is given specifying the number of cluster, which belongs to the interval [0, n_clusters). We can order the data using the first cluster: .. GENERATED FROM PYTHON SOURCE LINES 142-145 .. code-block:: Python ClusterMembershipPlot(fuzzy_kmeans, fd, sort=0, cluster_colors=cluster_colors, cluster_labels=cluster_labels).plot() .. image-sg:: /auto_examples/images/sphx_glr_plot_clustering_006.png :alt: Degrees of membership of the samples to each cluster :srcset: /auto_examples/images/sphx_glr_plot_clustering_006.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none
.. GENERATED FROM PYTHON SOURCE LINES 146-147 Using the second cluster: .. GENERATED FROM PYTHON SOURCE LINES 147-150 .. code-block:: Python ClusterMembershipPlot(fuzzy_kmeans, fd, sort=1, cluster_colors=cluster_colors, cluster_labels=cluster_labels).plot() .. image-sg:: /auto_examples/images/sphx_glr_plot_clustering_007.png :alt: Degrees of membership of the samples to each cluster :srcset: /auto_examples/images/sphx_glr_plot_clustering_007.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none
.. GENERATED FROM PYTHON SOURCE LINES 151-152 And using the third cluster: .. GENERATED FROM PYTHON SOURCE LINES 152-154 .. code-block:: Python ClusterMembershipPlot(fuzzy_kmeans, fd, sort=2, cluster_colors=cluster_colors, cluster_labels=cluster_labels).plot() .. image-sg:: /auto_examples/images/sphx_glr_plot_clustering_008.png :alt: Degrees of membership of the samples to each cluster :srcset: /auto_examples/images/sphx_glr_plot_clustering_008.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none
.. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 1.479 seconds) .. _sphx_glr_download_auto_examples_plot_clustering.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: binder-badge .. image:: images/binder_badge_logo.svg :target: https://mybinder.org/v2/gh/GAA-UAM/scikit-fda/develop?filepath=examples/plot_clustering.py :alt: Launch binder :width: 150 px .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_clustering.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_clustering.py ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_