.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/plot_classification_methods.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code or to run this example in your browser via Binder .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_plot_classification_methods.py: Classification methods ================================== It shows a comparison between the accuracies of different classification methods. There has been selected one method of each kind present in the library. In particular, there is one based on depths, Maximum Depth Classifier, another one based on centroids, Nearest Centroid Classifier, another one based on the K-Nearest Neighbors, K-Nearest Neighbors Classifier, and finally, one based on the quadratic discriminant analysis, Parameterized Functional QDA. The Berkeley Growth Study dataset is used as input data. .. GENERATED FROM PYTHON SOURCE LINES 16-35 .. code-block:: Python # Author:Álvaro Castillo García # License: MIT import matplotlib.pyplot as plt import pandas as pd from sklearn.model_selection import train_test_split from skfda.datasets import fetch_growth from skfda.exploratory.depth import ModifiedBandDepth from skfda.exploratory.stats.covariance import ParametricGaussianCovariance from skfda.misc.covariances import Gaussian from skfda.ml.classification import ( KNeighborsClassifier, MaximumDepthClassifier, NearestCentroid, QuadraticDiscriminantAnalysis, ) .. GENERATED FROM PYTHON SOURCE LINES 36-41 The Berkeley Growth Study data contains the heights of 39 boys and 54 girls from age 1 to 18 and the ages at which they were collected. Males are assigned the numeric value 0 while females are assigned a 1. In our comparison of the different methods, we will try to learn the sex of a person by using its growth curve. .. GENERATED FROM PYTHON SOURCE LINES 41-47 .. code-block:: Python X, y = fetch_growth(return_X_y=True, as_frame=True) X = X.iloc[:, 0].values categories = y.values.categories y = y.values.codes .. GENERATED FROM PYTHON SOURCE LINES 48-51 As in many ML algorithms, we split the dataset into train and test. In this graph, we can see the training dataset. These growth curves will be used to train the model. Hence, the predictions will be data-driven. .. GENERATED FROM PYTHON SOURCE LINES 51-62 .. code-block:: Python X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.3, stratify=y, random_state=0, ) # Plot samples grouped by sex X_train.plot(group=y_train, group_names=categories).show() .. image-sg:: /auto_examples/images/sphx_glr_plot_classification_methods_001.png :alt: Berkeley Growth Study :srcset: /auto_examples/images/sphx_glr_plot_classification_methods_001.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 63-65 Below are the growth graphs of those individuals that we would like to classify. Some of them will be male and some female. .. GENERATED FROM PYTHON SOURCE LINES 65-68 .. code-block:: Python X_test.plot().show() .. image-sg:: /auto_examples/images/sphx_glr_plot_classification_methods_002.png :alt: Berkeley Growth Study :srcset: /auto_examples/images/sphx_glr_plot_classification_methods_002.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 69-74 As said above, we are trying to compare four different methods: :class:`~skfda.ml.classification.MaximumDepthClassifier`, :class:`~skfda.ml.classification.KNeighborsClassifier`, :class:`~skfda.ml.classification.NearestCentroid` and :class:`~skfda.ml.classification.QuadraticDiscriminantAnalysis` .. GENERATED FROM PYTHON SOURCE LINES 77-79 The first method we are going to use is the Maximum Depth Classifier. As depth method we will consider the Modified Band Depth. .. GENERATED FROM PYTHON SOURCE LINES 79-89 .. code-block:: Python depth = MaximumDepthClassifier(depth_method=ModifiedBandDepth()) depth.fit(X_train, y_train) depth_pred = depth.predict(X_test) print(depth_pred) print('The score of Maximum Depth Classifier is {0:2.2%}'.format( depth.score(X_test, y_test), )) .. rst-class:: sphx-glr-script-out .. code-block:: none [0 1 0 0 1 1 0 1 0 1 1 0 0 0 1 0 1 1 1 1 1 0 0 0 1 0 1 1] The score of Maximum Depth Classifier is 82.14% .. GENERATED FROM PYTHON SOURCE LINES 90-91 The second method to consider is the K-Nearest Neighbours Classifier. .. GENERATED FROM PYTHON SOURCE LINES 91-100 .. code-block:: Python knn = KNeighborsClassifier() knn.fit(X_train, y_train) knn_pred = knn.predict(X_test) print(knn_pred) print('The score of KNN is {0:2.2%}'.format(knn.score(X_test, y_test))) .. rst-class:: sphx-glr-script-out .. code-block:: none [0 1 0 0 1 1 0 1 0 1 0 0 1 1 0 0 1 1 1 1 1 0 1 0 0 0 1 1] The score of KNN is 96.43% .. GENERATED FROM PYTHON SOURCE LINES 101-102 The third method we are going to use is the Nearest Centroid Classifier. .. GENERATED FROM PYTHON SOURCE LINES 102-112 .. code-block:: Python centroid = NearestCentroid() centroid.fit(X_train, y_train) centroid_pred = centroid.predict(X_test) print(centroid_pred) print('The score of Nearest Centroid Classifier is {0:2.2%}'.format( centroid.score(X_test, y_test), )) .. rst-class:: sphx-glr-script-out .. code-block:: none [0 1 0 0 1 1 0 1 0 1 1 1 1 0 1 0 1 1 1 1 1 0 1 0 1 0 1 1] The score of Nearest Centroid Classifier is 85.71% .. GENERATED FROM PYTHON SOURCE LINES 113-121 The fourth method considered is a functional quadratic discriminant, where the covariance is assumed as having a parametric form, specified with a kernel, or covariance function. We have selected a Gaussian kernel with initial hyperparameters: variance=6 and lengthscale=1. The selection of the initial parameters does not really affect the results as the algorithm will automatically optimize them. As regularizer a small value 0.05 has been chosen. .. GENERATED FROM PYTHON SOURCE LINES 121-136 .. code-block:: Python qda = QuadraticDiscriminantAnalysis( ParametricGaussianCovariance( Gaussian(variance=6, length_scale=1), ), regularizer=0.05, ) qda.fit(X_train, y_train) qda_pred = qda.predict(X_test) print(qda_pred) print('The score of functional QDA is {0:2.2%}'.format( qda.score(X_test, y_test), )) .. rst-class:: sphx-glr-script-out .. code-block:: none [0 1 0 0 1 1 0 1 0 1 0 0 1 1 1 0 1 1 1 1 1 0 1 0 1 0 1 1] The score of functional QDA is 96.43% .. GENERATED FROM PYTHON SOURCE LINES 137-146 As it can be seen, the classifier with the lowest score is the Maximum Depth Classifier. It obtains a 82.14% accuracy for the test set. KNN and the functional QDA can be seen as the best classifiers for this problem, with an accuracy of 96.43%. Instead, the Nearest Centroid Classifier is not as good as the others. However, it obtains an accuracy of 85.71% for the test set. It can be concluded that all classifiers work well for this problem, as they achieve more than an 80% of score, but the most robust ones are KNN and functional QDA. .. GENERATED FROM PYTHON SOURCE LINES 146-175 .. code-block:: Python accuracies = pd.DataFrame({ 'Classification methods': [ 'Maximum Depth Classifier', 'K-Nearest-Neighbors', 'Nearest Centroid Classifier', 'Functional QDA', ], 'Accuracy': [ '{0:2.2%}'.format( depth.score(X_test, y_test), ), '{0:2.2%}'.format( knn.score(X_test, y_test), ), '{0:2.2%}'.format( centroid.score(X_test, y_test), ), '{0:2.2%}'.format( qda.score(X_test, y_test), ), ], }) accuracies .. raw:: html
Classification methods Accuracy
0 Maximum Depth Classifier 82.14%
1 K-Nearest-Neighbors 96.43%
2 Nearest Centroid Classifier 85.71%
3 Functional QDA 96.43%


.. GENERATED FROM PYTHON SOURCE LINES 176-179 The figure below shows the results of the classification for the test set on the four methods considered. It can be seen that the curves are similarly classified by all of them. .. GENERATED FROM PYTHON SOURCE LINES 179-197 .. code-block:: Python fig, axs = plt.subplots(2, 2) plt.subplots_adjust(hspace=0.45, bottom=0.06) X_test.plot(group=centroid_pred, group_names=categories, axes=axs[0][1]) axs[0][1].set_title('Nearest Centroid Classifier', loc='left') X_test.plot(group=depth_pred, group_names=categories, axes=axs[0][0]) axs[0][0].set_title('Maximum Depth Classifier', loc='left') X_test.plot(group=knn_pred, group_names=categories, axes=axs[1][0]) axs[1][0].set_title('KNN', loc='left') X_test.plot(group=qda_pred, group_names=categories, axes=axs[1][1]) axs[1][1].set_title('Functional QDA', loc='left') plt.show() .. image-sg:: /auto_examples/images/sphx_glr_plot_classification_methods_003.png :alt: Berkeley Growth Study, Maximum Depth Classifier, Nearest Centroid Classifier, KNN, Functional QDA :srcset: /auto_examples/images/sphx_glr_plot_classification_methods_003.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 0.847 seconds) .. _sphx_glr_download_auto_examples_plot_classification_methods.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: binder-badge .. image:: images/binder_badge_logo.svg :target: https://mybinder.org/v2/gh/GAA-UAM/scikit-fda/develop?filepath=examples/plot_classification_methods.py :alt: Launch binder :width: 150 px .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_classification_methods.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_classification_methods.py ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_