.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/plot_neighbors_scalar_regression.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code or to run this example in your browser via Binder .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_plot_neighbors_scalar_regression.py: Neighbors Scalar Regression =========================== Shows the usage of the nearest neighbors regressor with scalar response. .. GENERATED FROM PYTHON SOURCE LINES 7-20 .. code-block:: Python # Author: Pablo Marcos Manchón # License: MIT # sphinx_gallery_thumbnail_number = 3 import matplotlib.pyplot as plt import numpy as np from sklearn.model_selection import GridSearchCV, train_test_split import skfda from skfda.ml.regression import KNeighborsRegressor .. GENERATED FROM PYTHON SOURCE LINES 21-36 In this example, we are going to show the usage of the nearest neighbors regressors with scalar response. There is available a K-nn version, :class:`KNeighborsRegressor `, and other one based in the radius, :class:`RadiusNeighborsRegressor `. Firstly we will fetch a dataset to show the basic usage. The Canadian weather dataset contains the daily temperature and precipitation at 35 different locations in Canada averaged over 1960 to 1994. The following figure shows the different temperature and precipitation curves. .. GENERATED FROM PYTHON SOURCE LINES 37-45 .. code-block:: Python data = skfda.datasets.fetch_weather() fd = data['data'] # Split dataset, temperatures and curves of precipitation X, y_func = fd.coordinates .. GENERATED FROM PYTHON SOURCE LINES 46-47 Temperatures .. GENERATED FROM PYTHON SOURCE LINES 47-50 .. code-block:: Python X.plot() .. image-sg:: /auto_examples/images/sphx_glr_plot_neighbors_scalar_regression_001.png :alt: Canadian Weather :srcset: /auto_examples/images/sphx_glr_plot_neighbors_scalar_regression_001.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none
.. GENERATED FROM PYTHON SOURCE LINES 51-52 Precipitation .. GENERATED FROM PYTHON SOURCE LINES 52-55 .. code-block:: Python y_func.plot() .. image-sg:: /auto_examples/images/sphx_glr_plot_neighbors_scalar_regression_002.png :alt: Canadian Weather :srcset: /auto_examples/images/sphx_glr_plot_neighbors_scalar_regression_002.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none
.. GENERATED FROM PYTHON SOURCE LINES 56-59 We will try to predict the total log precipitation, i.e, :math:`logPrecTot_i = \log \sum_{t=0}^{365} prec_i(t)` using the temperature curves. .. GENERATED FROM PYTHON SOURCE LINES 60-68 .. code-block:: Python # Sum directly from the data matrix prec = y_func.data_matrix.sum(axis=1)[:, 0] log_prec = np.log(prec) print(log_prec) .. rst-class:: sphx-glr-script-out .. code-block:: none [7.30033776 7.28276118 7.29600641 7.14084916 7.0914925 7.02811278 6.6861106 6.79860983 6.83668883 7.09721794 7.01148446 6.84673058 6.81640724 6.66262171 6.86484778 6.5572044 6.23284087 6.10724558 6.01322604 5.91647157 6.0078299 5.89357605 6.14246742 5.99271377 5.60543435 7.0519422 6.74711693 6.41165405 7.86010789 5.60469852 5.79209856 5.59136005 6.02707297 5.56106617 4.9698133 ] .. GENERATED FROM PYTHON SOURCE LINES 69-72 As in the nearest neighbors classifier examples, we will split the dataset in two partitions, for training and test, using the sklearn function :func:`~sklearn.model_selection.train_test_split`. .. GENERATED FROM PYTHON SOURCE LINES 73-80 .. code-block:: Python X_train, X_test, y_train, y_test = train_test_split( X, log_prec, random_state=7, ) .. GENERATED FROM PYTHON SOURCE LINES 81-90 Firstly we will try make a prediction with the default values of the estimator, using 5 neighbors and the :math:`\mathbb{L}^2` distance. We can fit the :class:`~skfda.ml.regression.KNeighborsRegressor` in the same way than the sklearn estimators. This estimator is an extension of the sklearn :class:`~sklearn.neighbors.KNeighborsRegressor`, but accepting a :class:`~skfda.representation.grid.FDataGrid` as input instead of an array with multivariate data. .. GENERATED FROM PYTHON SOURCE LINES 91-95 .. code-block:: Python knn = KNeighborsRegressor(weights='distance') knn.fit(X_train, y_train) .. raw:: html
KNeighborsRegressor(weights='distance')
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.


.. GENERATED FROM PYTHON SOURCE LINES 96-98 We can predict values for the test partition using :meth:`~skfda.ml.regression.KNeighborsScalarRegressor.predict`. .. GENERATED FROM PYTHON SOURCE LINES 99-104 .. code-block:: Python pred = knn.predict(X_test) print(pred) .. rst-class:: sphx-glr-script-out .. code-block:: none [7.11225785 5.99768933 7.05559273 6.88718564 6.78535172 5.97132028 6.56125279 6.47991884 6.92965595] .. GENERATED FROM PYTHON SOURCE LINES 105-107 The following figure compares the real precipitations with the predicted values. .. GENERATED FROM PYTHON SOURCE LINES 108-118 .. code-block:: Python fig = plt.figure() ax = fig.add_subplot(1, 1, 1) ax.scatter(y_test, pred) ax.plot(y_test, y_test) ax.set_xlabel("Total log precipitation") ax.set_ylabel("Prediction") .. image-sg:: /auto_examples/images/sphx_glr_plot_neighbors_scalar_regression_003.png :alt: plot neighbors scalar regression :srcset: /auto_examples/images/sphx_glr_plot_neighbors_scalar_regression_003.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none Text(42.597222222222214, 0.5, 'Prediction') .. GENERATED FROM PYTHON SOURCE LINES 119-126 We can quantify how much variability it is explained by the model with the coefficient of determination :math:`R^2` of the prediction, using :meth:`~skfda.ml.regression.KNeighborsScalarRegressor.score` for that. The coefficient :math:`R^2` is defined as :math:`(1 - u/v)`, where :math:`u` is the residual sum of squares :math:`\sum_i (y_i - y_{pred_i})^ 2` and :math:`v` is the total sum of squares :math:`\sum_i (y_i - \bar y )^2`. .. GENERATED FROM PYTHON SOURCE LINES 127-133 .. code-block:: Python score = knn.score(X_test, y_test) print(score) .. rst-class:: sphx-glr-script-out .. code-block:: none 0.9244558571515601 .. GENERATED FROM PYTHON SOURCE LINES 134-144 In this case, we obtain a really good aproximation with this naive approach, although, due to the small number of samples, the results will depend on how the partition was done. In the above case, the explained variation is inflated for this reason. We will perform cross-validation to test more robustly our model. Also, we can make a grid search, using :class:`~sklearn.model_selection.GridSearchCV`, to determine the optimal number of neighbors and the best way to weight their votes. .. GENERATED FROM PYTHON SOURCE LINES 145-161 .. code-block:: Python param_grid = { 'n_neighbors': range(1, 12, 2), 'weights': ['uniform', 'distance'], } knn = KNeighborsRegressor() gscv = GridSearchCV( knn, param_grid, cv=5, ) gscv.fit(X, log_prec) .. raw:: html
GridSearchCV(cv=5, estimator=KNeighborsRegressor(),
                 param_grid={'n_neighbors': range(1, 12, 2),
                             'weights': ['uniform', 'distance']})
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.


.. GENERATED FROM PYTHON SOURCE LINES 162-163 We obtain that 7 is the optimal number of neighbors. .. GENERATED FROM PYTHON SOURCE LINES 164-169 .. code-block:: Python print("Best params", gscv.best_params_) print("Best score", gscv.best_score_) .. rst-class:: sphx-glr-script-out .. code-block:: none Best params {'n_neighbors': 3, 'weights': 'distance'} Best score -2.5211096524610666 .. GENERATED FROM PYTHON SOURCE LINES 170-178 More detailed information about the Canadian weather dataset can be obtained in the following references. * Ramsay, James O., and Silverman, Bernard W. (2006). Functional Data Analysis, 2nd ed. , Springer, New York. * Ramsay, James O., and Silverman, Bernard W. (2002). Applied Functional Data Analysis, Springer, New York\n' .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 0.785 seconds) .. _sphx_glr_download_auto_examples_plot_neighbors_scalar_regression.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: binder-badge .. image:: images/binder_badge_logo.svg :target: https://mybinder.org/v2/gh/GAA-UAM/scikit-fda/develop?filepath=examples/plot_neighbors_scalar_regression.py :alt: Launch binder :width: 150 px .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_neighbors_scalar_regression.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_neighbors_scalar_regression.py ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_