HistoricalLinearRegression#

class skfda.ml.regression.HistoricalLinearRegression(*, n_intervals, fit_intercept=True, lag=inf)[source]#

Historical functional linear regression.

This is a linear regression method where the covariate and the response are both functions $\mathbb{R}$ to $\mathbb{R}$ with the same domain. In order to predict the value of the response function at point $t$, only the information of the covariate at points $s < t$ is used. Is thus an “historical” model in the sense that, if the domain represents time, only the data from the past, or historical data, is used to predict a given point[1].

The model assumed by this method is:

\[y_i = \alpha(t) + \int_{s_0(t)}^t x_i(s) \beta(s, t) ds\]

where $s_0(t) = \max(0, t - \delta)$ and $\delta$ is a predefined time lag that can be specified so that points far in the past do not affect the predicted value.

Parameters:

n_intervals (int) – Number of intervals used to create the basis of the coefficients. This will be a bidimensional FiniteElement basis, and this parameter indirectly specifies the number of elements of that basis, and thus the granularity.
fit_intercept (bool) – Whether to calculate the intercept for this model. If set to False, no intercept will be used in calculations (i.e. data is expected to be centered).
lag (float) – The maximum time lag at which points in the past can still influence the prediction.

Attributes:

basis_coef_ – The fitted coefficient function as a FDataBasis.
coef_ – The fitted coefficient function as a FDataGrid.
intercept_ – Independent term in the linear model. Set to the constant function 0 if fit_intercept = False.

Examples

The following example test a case that conforms to this model.

>>> from skfda import FDataGrid
>>> from skfda.ml.regression import HistoricalLinearRegression
>>> import numpy as np
>>> import scipy.integrate

>>> random_state = np.random.RandomState(0)
>>> data_matrix = random_state.choice(10, size=(8, 6)).astype(float)
>>> data_matrix
array([[ 5., 0., 3., 3., 7., 9.],
       [ 3., 5., 2., 4., 7., 6.],
       [ 8., 8., 1., 6., 7., 7.],
       [ 8., 1., 5., 9., 8., 9.],
       [ 4., 3., 0., 3., 5., 0.],
       [ 2., 3., 8., 1., 3., 3.],
       [ 3., 7., 0., 1., 9., 9.],
       [ 0., 4., 7., 3., 2., 7.]])
>>> intercept = random_state.choice(10, size=(1, 6)).astype(float)
>>> intercept
array([[ 2., 0., 0., 4., 5., 5.]])
>>> y_data = scipy.integrate.cumulative_trapezoid(
...              data_matrix,
...              initial=0,
...              axis=1,
...          ) + intercept
>>> y_data
array([[  2. ,   2.5,   4. ,  11. ,  17. ,  25. ],
       [  2. ,   4. ,   7.5,  14.5,  21. ,  27.5],
       [  2. ,   8. ,  12.5,  20. ,  27.5,  34.5],
       [  2. ,   4.5,   7.5,  18.5,  28. ,  36.5],
       [  2. ,   3.5,   5. ,  10.5,  15.5,  18. ],
       [  2. ,   2.5,   8. ,  16.5,  19.5,  22.5],
       [  2. ,   5. ,   8.5,  13. ,  19. ,  28. ],
       [  2. ,   2. ,   7.5,  16.5,  20. ,  24.5]])
>>> X = FDataGrid(data_matrix)
>>> y = FDataGrid(y_data)
>>> hist = HistoricalLinearRegression(n_intervals=8)
>>> _ = hist.fit(X, y)
>>> hist.predict(X).data_matrix[..., 0].round(1)
array([[  2. ,   2.5,   4. ,  11. ,  17. ,  25. ],
       [  2. ,   4. ,   7.5,  14.5,  21. ,  27.5],
       [  2. ,   8. ,  12.5,  20. ,  27.5,  34.5],
       [  2. ,   4.5,   7.5,  18.5,  28. ,  36.5],
       [  2. ,   3.5,   5. ,  10.5,  15.5,  18. ],
       [  2. ,   2.5,   8. ,  16.5,  19.5,  22.5],
       [  2. ,   5. ,   8.5,  13. ,  19. ,  28. ],
       [  2. ,   2. ,   7.5,  16.5,  20. ,  24.5]])
>>> abs(hist.intercept_.data_matrix[..., 0].round())
array([[ 2.,  0.,  0.,  4.,  5.,  5.]])

References

Methods

`fit`(X, y)
`fit_predict`(X, y)
`get_metadata_routing`()	Get metadata routing of this object.
`get_params`([deep])	Get parameters for this estimator.
`predict`(X)
`score`(X, y[, sample_weight])	Return coefficient of determination on test data.
`set_params`(**params)	Set the parameters of this estimator.
`set_score_request`(*[, sample_weight])	Configure whether metadata should be requested to be passed to the `score` method.

fit(X, y)[source]#

Parameters:

X (FDataGrid)
y (FDataGrid)

Return type:

HistoricalLinearRegression

fit_predict(X, y)[source]#

Parameters:

X (FDataGrid)
y (FDataGrid)

Return type:

FDataGrid

get_metadata_routing()#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:: routing – A MetadataRequest encapsulating routing information.
Return type:: MetadataRequest

get_params(deep=True)#

Get parameters for this estimator.

Parameters:: deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:: params – Parameter names mapped to their values.
Return type:: dict

predict(X)[source]#

Parameters:: X (FDataGrid)
Return type:: FDataGrid

score(X, y, sample_weight=None)[source]#

Return coefficient of determination on test data.

The coefficient of determination, $R^2$, is defined as $(1 - \frac{u}{v})$, where $u$ is the residual sum of squares ((y_true - y_pred)** 2).sum() and $v$ is the total sum of squares ((y_true - y_true.mean()) ** 2).sum(). The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a $R^2$ score of 0.0.

Parameters:

X (array-like of shape (n_samples, n_features)) – Test samples. For some estimators this may be a precomputed kernel matrix or a list of generic objects instead with shape (n_samples, n_samples_fitted), where n_samples_fitted is the number of samples used in the fitting for the estimator.
y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – True values for X.
sample_weight (array-like of shape (n_samples,), default=None) – Sample weights.

Returns:

score – $R^2$ of self.predict(X) w.r.t. y.

Return type:

float

Notes

The $R^2$ score used when calling score on a regressor uses multioutput='uniform_average' from version 0.23 to keep consistent with default value of r2_score(). This influences the score method of all the multioutput regressors (except for MultiOutputRegressor).

set_params(**params)#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:: **params (dict) – Estimator parameters.
Returns:: self – Estimator instance.
Return type:: estimator instance

set_score_request(*, sample_weight='$UNCHANGED$')#

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.
self (HistoricalLinearRegression)

Returns:

self – The updated object.

Return type:

object

HistoricalLinearRegression#

This Page