.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/plot_tecator_regression.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code or to run this example in your browser via Binder .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_plot_tecator_regression.py: Spectrometric data: derivatives, regression, and variable selection =================================================================== Shows the use of derivatives, functional regression and variable selection for functional data. .. GENERATED FROM PYTHON SOURCE LINES 8-28 .. code-block:: Python # License: MIT # sphinx_gallery_thumbnail_number = 4 import matplotlib.pyplot as plt import sklearn.linear_model from sklearn.metrics import r2_score from sklearn.model_selection import train_test_split from sklearn.pipeline import Pipeline from sklearn.tree import DecisionTreeRegressor, plot_tree from skfda.datasets import fetch_tecator from skfda.ml.regression import LinearRegression from skfda.preprocessing.dim_reduction.variable_selection.maxima_hunting import ( MaximaHunting, RelativeLocalMaximaSelector, ) from skfda.representation.basis import BSplineBasis .. GENERATED FROM PYTHON SOURCE LINES 29-39 This example uses the Tecator dataset\ :footcite:`borggaard+thodberg_1992_optimal` in order to illustrate the problems of functional regression and functional variable selection. This dataset contains the spectra of absorbances of several pieces of finely chopped meat, as well as the percent of its content in water, fat and protein. This is one of the examples presented in the ICTAI conference\ :footcite:p:`ramos-carreno++_2022_scikitfda`. .. GENERATED FROM PYTHON SOURCE LINES 41-43 We will first load the Tecator data, keeping only the fat content target, and plot it. .. GENERATED FROM PYTHON SOURCE LINES 43-49 .. code-block:: Python X, y = fetch_tecator(return_X_y=True) y = y[:, 0] X.plot(gradient_criteria=y) plt.show() .. image-sg:: /auto_examples/images/sphx_glr_plot_tecator_regression_001.png :alt: Spectrometric curves :srcset: /auto_examples/images/sphx_glr_plot_tecator_regression_001.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 50-53 For spectrometric data, the relevant information of the curves can often be found in the derivatives\ :footcite:`ferraty+vieu_2006_computational`. Thus, we compute numerically the second derivative and plot it. .. GENERATED FROM PYTHON SOURCE LINES 53-57 .. code-block:: Python X_der = X.derivative(order=2) X_der.plot(gradient_criteria=y) plt.show() .. image-sg:: /auto_examples/images/sphx_glr_plot_tecator_regression_002.png :alt: Spectrometric curves :srcset: /auto_examples/images/sphx_glr_plot_tecator_regression_002.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 58-62 We first apply a simple linear regression model to compute a baseline for our regression predictions. In order to compute functional linear regression we first convert the data to a basis expansion. .. GENERATED FROM PYTHON SOURCE LINES 62-67 .. code-block:: Python basis = BSplineBasis( n_basis=10, ) X_der_basis = X_der.to_basis(basis) .. GENERATED FROM PYTHON SOURCE LINES 68-70 We split the data in train and test, and compute the regression score using the linear regression model. .. GENERATED FROM PYTHON SOURCE LINES 70-82 .. code-block:: Python X_train, X_test, y_train, y_test = train_test_split( X_der_basis, y, random_state=0, ) regressor = LinearRegression() regressor.fit(X_train, y_train) y_pred = regressor.predict(X_test) score = r2_score(y_test, y_pred) print(score) .. rst-class:: sphx-glr-script-out .. code-block:: none 0.9505439228770038 .. GENERATED FROM PYTHON SOURCE LINES 83-91 We now will take a different approach. It is possible to note from the plot of the derivatives that most information necessary for regression can be found at some particular "impact" points. Thus, we now apply a functional variable selection method to detect those points and use them with a multivariate classifier. The variable selection method that we employ here is maxima hunting\ :footcite:`berrendero++_2016_variable`, a filter method that computes a relevance score for each point of the curve and selects all the local maxima. .. GENERATED FROM PYTHON SOURCE LINES 91-98 .. code-block:: Python var_sel = MaximaHunting( local_maxima_selector=RelativeLocalMaximaSelector(max_points=2), ) X_mv = var_sel.fit_transform(X_der, y) print(var_sel.indexes_) .. rst-class:: sphx-glr-script-out .. code-block:: none [41 89] .. GENERATED FROM PYTHON SOURCE LINES 99-100 We can visualize the relevance function and the selected points. .. GENERATED FROM PYTHON SOURCE LINES 100-105 .. code-block:: Python var_sel.dependence_.plot() for p in var_sel.indexes_: plt.axvline(X_der.grid_points[0][p], color="black") plt.show() .. image-sg:: /auto_examples/images/sphx_glr_plot_tecator_regression_003.png :alt: Spectrometric curves :srcset: /auto_examples/images/sphx_glr_plot_tecator_regression_003.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 106-107 We also can visualize the selected points on the curves. .. GENERATED FROM PYTHON SOURCE LINES 107-112 .. code-block:: Python X_der.plot(gradient_criteria=y) for p in var_sel.indexes_: plt.axvline(X_der.grid_points[0][p], color="black") plt.show() .. image-sg:: /auto_examples/images/sphx_glr_plot_tecator_regression_004.png :alt: Spectrometric curves :srcset: /auto_examples/images/sphx_glr_plot_tecator_regression_004.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 113-115 We split the data again (using the same seed), but this time without the basis expansion. .. GENERATED FROM PYTHON SOURCE LINES 115-121 .. code-block:: Python X_train, X_test, y_train, y_test = train_test_split( X_der, y, random_state=0, ) .. GENERATED FROM PYTHON SOURCE LINES 122-124 We now make a pipeline with the variable selection and a multivariate linear regression method for comparison. .. GENERATED FROM PYTHON SOURCE LINES 124-133 .. code-block:: Python pipeline = Pipeline([ ("variable_selection", var_sel), ("classifier", sklearn.linear_model.LinearRegression()), ]) pipeline.fit(X_train, y_train) y_predicted = pipeline.predict(X_test) score = r2_score(y_test, y_predicted) print(score) .. rst-class:: sphx-glr-script-out .. code-block:: none 0.8959181661493842 .. GENERATED FROM PYTHON SOURCE LINES 134-136 We can use a tree regressor instead to improve both the score and the interpretability. .. GENERATED FROM PYTHON SOURCE LINES 136-145 .. code-block:: Python pipeline = Pipeline([ ("variable_selection", var_sel), ("classifier", DecisionTreeRegressor(max_depth=3)), ]) pipeline.fit(X_train, y_train) y_predicted = pipeline.predict(X_test) score = r2_score(y_test, y_predicted) print(score) .. rst-class:: sphx-glr-script-out .. code-block:: none 0.9417361615246084 .. GENERATED FROM PYTHON SOURCE LINES 146-147 We can plot the final version of the tree to explain every prediction. .. GENERATED FROM PYTHON SOURCE LINES 147-151 .. code-block:: Python fig, ax = plt.subplots(figsize=(10, 10)) plot_tree(pipeline.named_steps["classifier"], precision=6, filled=True, ax=ax) plt.show() .. image-sg:: /auto_examples/images/sphx_glr_plot_tecator_regression_005.png :alt: plot tecator regression :srcset: /auto_examples/images/sphx_glr_plot_tecator_regression_005.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 152-156 References ---------- .. footbibliography:: .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 1.488 seconds) .. _sphx_glr_download_auto_examples_plot_tecator_regression.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: binder-badge .. image:: images/binder_badge_logo.svg :target: https://mybinder.org/v2/gh/GAA-UAM/scikit-fda/develop?filepath=examples/plot_tecator_regression.py :alt: Launch binder :width: 150 px .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_tecator_regression.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_tecator_regression.py ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_