.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_tutorial/plot_skfda_sklearn.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code or to run this example in your browser via Binder .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_tutorial_plot_skfda_sklearn.py: Scikit-fda and scikit-learn =========================== In this section, we will explain how scikit-fda interacts with the popular machine learning package scikit-learn. We will introduce briefly the main concepts of scikit-learn and how scikit-fda reuses the same concepts extending them to the :term:`functional data analysis` field. .. Disable isort isort:skip_file .. GENERATED FROM PYTHON SOURCE LINES 14-18 .. code-block:: Python # Author: Carlos Ramos Carreño # License: MIT .. GENERATED FROM PYTHON SOURCE LINES 19-44 A brief summary of scikit-learn architecture -------------------------------------------- The library `scikit-learn `_ is probably the most well-known Python package for machine learning. This package focuses in machine learning using multivariate data, which should be stored in a numpy :class:`~numpy.ndarray` in order to process it. However, this library has defined a particular architecture that can be followed in order to provide new tools that work in situations not even imagined by the original authors, while remaining compatible with the tools already provided in scikit-learn. In scikit-fda, the same architecture is applied in order to work with functional data observations. As a result, scikit-fda tools are largely compatible with scikit-learn tools, and it is possible to reuse objects such as :class:`pipelines ` or even hyperparameter selection methods such as :class:`grid search cross-validation ` in the functional data setting. We will introduce briefly the main concepts in scikit-learn, and explain how the tools in scikit-fda are related with them. This is not intended as a full explanation of scikit-learn architecture, and the reader is encouraged to look at the `scikit-learn tutorials `_ in order to achieve a deeper understanding of it. .. GENERATED FROM PYTHON SOURCE LINES 46-62 The Estimator object ^^^^^^^^^^^^^^^^^^^^ A central concept in scikit-learn (and scikit-fda) is what is called an estimator. An estimator in this context is an object that can learn from the data. Thus, classification, regression and clustering methods, as well as transformations with parameters learned from the training data are particular kinds of estimators. Estimators can also be instanced passing parameters, which can be tuned to the data using hyperparameter selection methods. Estimator objects have a ``fit`` method, with receive the training data and (if necessary) the training targets. This method uses the training data in order to learn some parameters of a model. When the learned parameters are part of the user-facing API, then by convention they are attributes of the estimator ending in with the ``_`` character. .. GENERATED FROM PYTHON SOURCE LINES 64-75 As a concrete example of this, consider a nearest centroid classifier for functional data. The object :class:`~skfda.ml.classification.NearestCentroid` is a classifier, and thus an estimator. As part of the training process the centroids of the classes are computed and available as the learned parameter ``centroids_``. .. note:: The function :func:`~sklearn.model_selection.train_test_split` is one of the functions originally from scikit-learn that can be directly reused in scikit-fda. .. GENERATED FROM PYTHON SOURCE LINES 75-89 .. code-block:: Python import skfda from sklearn.model_selection import train_test_split import matplotlib.pyplot as plt X, y = skfda.datasets.fetch_growth(return_X_y=True) X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0) classifier = skfda.ml.classification.NearestCentroid() classifier.fit(X_train, y_train) classifier.centroids_.plot() plt.show() .. image-sg:: /auto_tutorial/images/sphx_glr_plot_skfda_sklearn_001.png :alt: Berkeley Growth Study :srcset: /auto_tutorial/images/sphx_glr_plot_skfda_sklearn_001.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 90-102 Transformers ^^^^^^^^^^^^ :term:`Transformers ` are estimators which can convert data to a new form. Examples of them are preprocessing methods, such as smoothing, registration and dimensionality reduction methods. They always implement ``fit_transform`` for fitting and transforming the data in one step. The transformers may be :term:`sklearn:inductive`, which means that can transform new data using the learned parameters. In that case they implement the ``transform`` method to transform new data. If the transformation is reversible, they usually also implement ``ìnverse_transform``. .. GENERATED FROM PYTHON SOURCE LINES 104-110 As an example consider the smoothing method :class:`skfda.preprocessing.smoothing.NadarayaWatsonHatMatrix`. Smoothing methods attempt to remove noise from the data leveraging its continuous nature. As these methods discard information of the original data they usually are not reversible. .. GENERATED FROM PYTHON SOURCE LINES 110-126 .. code-block:: Python import skfda.preprocessing.smoothing as ks from skfda.misc.hat_matrix import NadarayaWatsonHatMatrix X, y = skfda.datasets.fetch_phoneme(return_X_y=True) # Keep the first 5 functions X = X[:5] X.plot() smoother = ks.KernelSmoother(kernel_estimator=NadarayaWatsonHatMatrix()) X_smooth = smoother.fit_transform(X) X_smooth.plot() plt.show() .. rst-class:: sphx-glr-horizontal * .. image-sg:: /auto_tutorial/images/sphx_glr_plot_skfda_sklearn_002.png :alt: Phoneme :srcset: /auto_tutorial/images/sphx_glr_plot_skfda_sklearn_002.png :class: sphx-glr-multi-img * .. image-sg:: /auto_tutorial/images/sphx_glr_plot_skfda_sklearn_003.png :alt: Phoneme :srcset: /auto_tutorial/images/sphx_glr_plot_skfda_sklearn_003.png :class: sphx-glr-multi-img .. GENERATED FROM PYTHON SOURCE LINES 127-147 Predictors (classifiers, regressors, clusterers...) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ :term:`Predictors ` in scikit-learn are estimators that can assign a certain target to a particular observation. This includes supervised methods such as classifiers (for which the target will be a class label), or regressors (for which the target is a real value, a vector, or, in functional data analysis, even a function!) and also unsupervised methods such as clusterers or outlying detector methods. Predictors should implement the ``fit_predict`` method for fitting the estimators and predicting the targets in one step and/or the ``predict`` method for predicting the targets of possibly non previously observed data. Usually :term:`sklearn:transductive` estimators implement only the former one, while :term:`sklearn:inductive` estimators implement the latter one (or both). Predictors can have additional non-mandatory methods, such as ``predict-proba`` for obtaining the probability of a particular prediction or ``score`` for evaluating the results of the prediction. .. GENERATED FROM PYTHON SOURCE LINES 149-153 As an example, we can look at the :class:`~skfda.ml.clustering.KMeans` clustering method for functional data. This method will try to separate the data into different clusters according to the distance between observations. .. GENERATED FROM PYTHON SOURCE LINES 153-165 .. code-block:: Python X, y = skfda.datasets.fetch_weather(return_X_y=True) # Use only the first value (temperature) X = X.coordinates[0] clusterer = skfda.ml.clustering.KMeans(n_clusters=3) y_pred = clusterer.fit_predict(X) X.plot(group=y_pred) plt.show() .. image-sg:: /auto_tutorial/images/sphx_glr_plot_skfda_sklearn_004.png :alt: Canadian Weather :srcset: /auto_tutorial/images/sphx_glr_plot_skfda_sklearn_004.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 166-172 Metaestimators ^^^^^^^^^^^^^^ In scikit-learn jargon, a :term:`sklearn:metaestimator` is an estimator that takes other estimators as parameters. There are several reasons for doing that, which will be explained now. .. GENERATED FROM PYTHON SOURCE LINES 174-196 Composition metaestimators ++++++++++++++++++++++++++ It is very common in machine learning to apply one or more preprocessing steps one after the other, before applying a final predictor. For this purpose scikit-learn offers the :class:`~sklearn.pipeline.Pipeline`, which join the steps together and uses the same estimator API for performing all steps in order (this is usually referred as the composite pattern in software engineering). The :class:`~sklearn.pipeline.Pipeline` estimator can be used with the functional data estimators available in scikit-fda. Moreover, as transformers such as dimensionality reduction methods can convert functional data to multivariate data usable by scikit-learn methods it is possible to mix methods from scikit-fda and scikit-learn in the same pipeline. .. warning:: In addition, scikit-learn offers estimators that can join several transformations as new features of the same dataset ( :class:`~sklearn.pipeline.FeatureUnion`) or that can apply different transformers to different columns of the data (:class:`~sklearn.compose.ColumnTransformer`). These transformers are not yet usable with functional data. .. GENERATED FROM PYTHON SOURCE LINES 198-202 As an example, we can construct a pipeline that registers the data using shift registation, then applies a variable selection method to transform each observation to a 3D vector and then uses a SVM classifier to classify the data. .. GENERATED FROM PYTHON SOURCE LINES 202-221 .. code-block:: Python from skfda.preprocessing.dim_reduction import variable_selection as vs from skfda.preprocessing.registration import LeastSquaresShiftRegistration from sklearn.pipeline import Pipeline from sklearn.svm import SVC X, y = skfda.datasets.fetch_growth(return_X_y=True) X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0) pipeline = Pipeline([ ("registration", LeastSquaresShiftRegistration()), ("dim_reduction", vs.RKHSVariableSelection(n_features_to_select=3)), ("classifier", SVC()), ]) pipeline.fit(X_train, y_train) pipeline.score(X_test, y_test) .. rst-class:: sphx-glr-script-out .. code-block:: none 1.0 .. GENERATED FROM PYTHON SOURCE LINES 222-238 Hyperparameter optimizers +++++++++++++++++++++++++ Some of the parameters used for the creation of an estimator need to be tuned to each particular dataset in order to improve the prediction accuracy and generalization. There are several techniques to do that already available in scikit-learn, such as grid search cross-validation (:class:`~sklearn.model_selection.GridSearchCV`) or randomized search (:class:`~sklearn.model_selection.RandomizedSearchCV`). As these hyperparameter optimizers only need to split the data and call ``score`` in the predictor, they can be directly used with the methods in scikit-fda. .. note:: In addition one could use any optimizer that understand the scikit-learn API such as those in `scikit-optimize `_. .. GENERATED FROM PYTHON SOURCE LINES 240-243 As an example, we will use :class:`~sklearn.model_selection.GridSearchCV` to select the number of neighbors used in a :class:`~skfda.ml.classification.KNeighborsClassifier`. .. GENERATED FROM PYTHON SOURCE LINES 243-263 .. code-block:: Python from sklearn.model_selection import GridSearchCV X, y = skfda.datasets.fetch_growth(return_X_y=True) X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0) classifier = skfda.ml.classification.KNeighborsClassifier() grid_search = GridSearchCV( estimator=classifier, param_grid={"n_neighbors": range(1, 10, 2)}, ) grid_search.fit(X_train, y_train) n_neighbors = grid_search.best_estimator_.n_neighbors score = grid_search.score(X_test, y_test) print(n_neighbors, score) .. rst-class:: sphx-glr-script-out .. code-block:: none 3 0.9583333333333334 .. GENERATED FROM PYTHON SOURCE LINES 264-279 Ensemble methods ++++++++++++++++ The ensemble methods :class:`~sklearn.ensemble.VotingClassifier` and :class:`~sklearn.ensemble.VotingRegressor` in scikit-learn use several different estimators in order to predict the targets. As this is done by evaluating the passed estimators as black boxes, these predictors can also be combined with scikit-fda predictors. .. warning:: Other ensemble methods, such as :class:`~sklearn.ensemble.BaggingClassifier` or :class:`~sklearn.ensemble.AdaBoostClassifier` cannot yet be used with functional data unless it has been transformed to a multivariate dataset. .. GENERATED FROM PYTHON SOURCE LINES 281-284 As an example we will use a voting classifier to classify data using as classifiers a knn-classifier, a nearest centroid classifier and a maximum depth classifier. .. GENERATED FROM PYTHON SOURCE LINES 284-304 .. code-block:: Python from sklearn.ensemble import VotingClassifier X, y = skfda.datasets.fetch_growth(return_X_y=True) X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0) knn = skfda.ml.classification.KNeighborsClassifier() nearest_centroid = skfda.ml.classification.NearestCentroid() mdc = skfda.ml.classification.MaximumDepthClassifier() voting = VotingClassifier([ ("knn", knn), ("nearest_centroid", nearest_centroid), ("mdc", mdc), ]) voting.fit(X_train, y_train) voting.score(X_test, y_test) .. rst-class:: sphx-glr-script-out .. code-block:: none 0.75 .. GENERATED FROM PYTHON SOURCE LINES 305-313 Multiclass and multioutput classification utilities +++++++++++++++++++++++++++++++++++++++++++++++++++ The scikit-learn library also offers additional utilities that can convert a binary classifier into a multiclass classifier (such as :class:`~sklearn.multiclass.OneVsRestClassifier`) or to extend a single output classifier or regressor to accept also multioutput (vector-valued) targets. .. GENERATED FROM PYTHON SOURCE LINES 315-322 In this example we want to use as a classifier the combination of a dimensionality reduction method ( :class:`~skfda.preprocessing.dim_reduction.variable_selection.RKHSVariableSelection`) and a SVM classifier (:class:`~sklearn.svm.SVC`). As that particular dimensionality reduction method is only suitable for binary data, we use :class:`~sklearn.multiclass.OneVsRestClassifier` to classify in a multiclass dataset. .. GENERATED FROM PYTHON SOURCE LINES 322-339 .. code-block:: Python from sklearn.multiclass import OneVsRestClassifier X, y = skfda.datasets.fetch_phoneme(return_X_y=True) X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0) pipeline = Pipeline([ ("dim_reduction", vs.RKHSVariableSelection(n_features_to_select=3)), ("classifier", SVC()), ]) multiclass = OneVsRestClassifier(pipeline) multiclass.fit(X_train, y_train) multiclass.score(X_test, y_test) .. rst-class:: sphx-glr-script-out .. code-block:: none 0.9140070921985816 .. GENERATED FROM PYTHON SOURCE LINES 340-358 Other scikit-learn utilities ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ In addition to the aforementioned objects, there are plenty of objects in scikit-learn that can be applied directly to functional data. We have already seen in the examples the function :func:`~sklearn.model_selection.train_test_split`. Other objects and functions such as :class:`~sklearn.model_selection.KFold` can be directly applied to functional data in order to split it into folds. Scorers for classification or regression, such as :func:`~sklearn.metrics.accuracy_score` can be directly applied to functional data problems. Moreover, there are plenty of libraries that aim to extend scikit-learn in several directions (take a look at the `list of related projects `_). You will probably see that a lot of the functionality can be applied to scikit-fda, as it uses the same API as scikit-learn. .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 6.341 seconds) .. _sphx_glr_download_auto_tutorial_plot_skfda_sklearn.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: binder-badge .. image:: images/binder_badge_logo.svg :target: https://mybinder.org/v2/gh/GAA-UAM/scikit-fda/develop?filepath=tutorial/plot_skfda_sklearn.py :alt: Launch binder :width: 150 px .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_skfda_sklearn.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_skfda_sklearn.py ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_