.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/plot_fpca_regression.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code or to run this example in your browser via Binder .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_plot_fpca_regression.py: Functional Principal Component Analysis Regression. =================================================== This example explores the use of the functional principal component analysis (FPCA) in regression problems. .. GENERATED FROM PYTHON SOURCE LINES 9-19 .. code-block:: Python # Author: David del Val # License: MIT import matplotlib.pyplot as plt from sklearn.model_selection import GridSearchCV, train_test_split import skfda from skfda.ml.regression import FPCARegression .. GENERATED FROM PYTHON SOURCE LINES 20-24 In this example, we will demonstrate the use of the FPCA regression method using the :func:`tecator ` dataset. This data set contains 215 samples. Each of those samples is comprised of a spectrum of absorbances and the contents of water, fat and protein. .. GENERATED FROM PYTHON SOURCE LINES 24-29 .. code-block:: Python X, y = skfda.datasets.fetch_tecator(return_X_y=True, as_frame=True) X = X.iloc[:, 0].values y = y["fat"].values .. GENERATED FROM PYTHON SOURCE LINES 30-34 Our goal will be to estimate the fat percentage from the spectrum. However, in order to better understand the data, we will first plot all the spectra curves. The color of these curves depends on the amount of fat, from least (yellow) to highest (red). .. GENERATED FROM PYTHON SOURCE LINES 34-38 .. code-block:: Python X.plot(gradient_criteria=y, legend=True, colormap="Greens") plt.show() .. image-sg:: /auto_examples/images/sphx_glr_plot_fpca_regression_001.png :alt: Spectrometric curves :srcset: /auto_examples/images/sphx_glr_plot_fpca_regression_001.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 39-42 In order to evaluate the performance of the model, we will split the data into train and test sets. The former will contain 80% of the samples, while the latter will contain the remaining 20%. .. GENERATED FROM PYTHON SOURCE LINES 42-49 .. code-block:: Python X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.2, random_state=1, ) .. GENERATED FROM PYTHON SOURCE LINES 50-53 Since the FPCA regression provides good results with a small number of components, we will start by using only 5 components. After training the model, we can check its performance on the test set. .. GENERATED FROM PYTHON SOURCE LINES 53-59 .. code-block:: Python reg = FPCARegression(n_components=5) reg.fit(X_train, y_train) test_score = reg.score(X_test, y_test) print(f"Score with 5 components: {test_score:.4f}") .. rst-class:: sphx-glr-script-out .. code-block:: none Score with 5 components: 0.9062 .. GENERATED FROM PYTHON SOURCE LINES 60-67 We have obtained a pretty good result considering that the model has only used 5 components. That is to say, the dimensionality of the problem has been reduced from 100 (each spectrum has 100 points) to 5. However, we can improve the performance of the model by using more components. To do so, we will use cross validation to find the best number of components. We will test with values from 1 to 100. .. GENERATED FROM PYTHON SOURCE LINES 67-79 .. code-block:: Python param_grid = {"n_components": range(1, 100, 1)} reg = FPCARegression() # Perform grid search with cross-validation gscv = GridSearchCV(reg, param_grid, cv=5) gscv.fit(X_train, y_train) print("Best params:", gscv.best_params_) print(f"Best cross-validation score: {gscv.best_score_:.4f}") .. rst-class:: sphx-glr-script-out .. code-block:: none Best params: {'n_components': 28} Best cross-validation score: 0.9652 .. GENERATED FROM PYTHON SOURCE LINES 80-88 The best performance for the train set is obtained using 30 components. This still provides a good reduction in dimensionality. However, it is important to note that the performance of the model scales very slowly with the number of components. This phenomenon can be seen in the following plot, and confirms that FPCA already provides a good approximation of the data with a small number of components. .. GENERATED FROM PYTHON SOURCE LINES 88-103 .. code-block:: Python fig = plt.figure() ax = fig.add_subplot(1, 1, 1) ax.plot( param_grid["n_components"], gscv.cv_results_["mean_test_score"], linestyle="dashed", marker="o", ) ax.set_xticks(range(0, 110, 10)) ax.set_xlabel("Number of Components") ax.set_ylabel("Cross-validation score") ax.set_ylim((0.5, 1)) fig.show() .. image-sg:: /auto_examples/images/sphx_glr_plot_fpca_regression_002.png :alt: plot fpca regression :srcset: /auto_examples/images/sphx_glr_plot_fpca_regression_002.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 104-109 To conclude, we can calculate the score of the model on the test set after it has been trained on the whole train set. Moreover, we can check that the score barely changes when we use a somewhat smaller number of components. .. GENERATED FROM PYTHON SOURCE LINES 109-119 .. code-block:: Python reg = FPCARegression(n_components=30) reg.fit(X_train, y_train) test_score = reg.score(X_test, y_test) print(f"Score with 30 components: {test_score:.4f}") reg = FPCARegression(n_components=15) reg.fit(X_train, y_train) test_score = reg.score(X_test, y_test) print(f"Score with 15 components: {test_score:.4f}") .. rst-class:: sphx-glr-script-out .. code-block:: none Score with 30 components: 0.9667 Score with 15 components: 0.9584 .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 44.066 seconds) .. _sphx_glr_download_auto_examples_plot_fpca_regression.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: binder-badge .. image:: images/binder_badge_logo.svg :target: https://mybinder.org/v2/gh/GAA-UAM/scikit-fda/develop?filepath=examples/plot_fpca_regression.py :alt: Launch binder :width: 150 px .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_fpca_regression.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_fpca_regression.py ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_