HistoricalLinearRegression#
- class skfda.ml.regression.HistoricalLinearRegression(*, n_intervals, fit_intercept=True, lag=inf)[source]#
Historical functional linear regression.
This is a linear regression method where the covariate and the response are both functions \(\mathbb{R}\) to \(\mathbb{R}\) with the same domain. In order to predict the value of the response function at point \(t\), only the information of the covariate at points \(s < t\) is used. Is thus an “historical” model in the sense that, if the domain represents time, only the data from the past, or historical data, is used to predict a given point[1].
The model assumed by this method is:
\[y_i = \alpha(t) + \int_{s_0(t)}^t x_i(s) \beta(s, t) ds\]where \(s_0(t) = \max(0, t - \delta)\) and \(\delta\) is a predefined time lag that can be specified so that points far in the past do not affect the predicted value.
- Parameters:
n_intervals (int) – Number of intervals used to create the basis of the coefficients. This will be a bidimensional
FiniteElementbasis, and this parameter indirectly specifies the number of elements of that basis, and thus the granularity.fit_intercept (bool) – Whether to calculate the intercept for this model. If set to False, no intercept will be used in calculations (i.e. data is expected to be centered).
lag (float) – The maximum time lag at which points in the past can still influence the prediction.
- Attributes:
basis_coef_ – The fitted coefficient function as a FDataBasis.
coef_ – The fitted coefficient function as a FDataGrid.
intercept_ – Independent term in the linear model. Set to the constant function 0 if fit_intercept = False.
Examples
The following example test a case that conforms to this model.
>>> from skfda import FDataGrid >>> from skfda.ml.regression import HistoricalLinearRegression >>> import numpy as np >>> import scipy.integrate
>>> random_state = np.random.RandomState(0) >>> data_matrix = random_state.choice(10, size=(8, 6)).astype(float) >>> data_matrix array([[ 5., 0., 3., 3., 7., 9.], [ 3., 5., 2., 4., 7., 6.], [ 8., 8., 1., 6., 7., 7.], [ 8., 1., 5., 9., 8., 9.], [ 4., 3., 0., 3., 5., 0.], [ 2., 3., 8., 1., 3., 3.], [ 3., 7., 0., 1., 9., 9.], [ 0., 4., 7., 3., 2., 7.]]) >>> intercept = random_state.choice(10, size=(1, 6)).astype(float) >>> intercept array([[ 2., 0., 0., 4., 5., 5.]]) >>> y_data = scipy.integrate.cumulative_trapezoid( ... data_matrix, ... initial=0, ... axis=1, ... ) + intercept >>> y_data array([[ 2. , 2.5, 4. , 11. , 17. , 25. ], [ 2. , 4. , 7.5, 14.5, 21. , 27.5], [ 2. , 8. , 12.5, 20. , 27.5, 34.5], [ 2. , 4.5, 7.5, 18.5, 28. , 36.5], [ 2. , 3.5, 5. , 10.5, 15.5, 18. ], [ 2. , 2.5, 8. , 16.5, 19.5, 22.5], [ 2. , 5. , 8.5, 13. , 19. , 28. ], [ 2. , 2. , 7.5, 16.5, 20. , 24.5]]) >>> X = FDataGrid(data_matrix) >>> y = FDataGrid(y_data) >>> hist = HistoricalLinearRegression(n_intervals=8) >>> _ = hist.fit(X, y) >>> hist.predict(X).data_matrix[..., 0].round(1) array([[ 2. , 2.5, 4. , 11. , 17. , 25. ], [ 2. , 4. , 7.5, 14.5, 21. , 27.5], [ 2. , 8. , 12.5, 20. , 27.5, 34.5], [ 2. , 4.5, 7.5, 18.5, 28. , 36.5], [ 2. , 3.5, 5. , 10.5, 15.5, 18. ], [ 2. , 2.5, 8. , 16.5, 19.5, 22.5], [ 2. , 5. , 8.5, 13. , 19. , 28. ], [ 2. , 2. , 7.5, 16.5, 20. , 24.5]]) >>> abs(hist.intercept_.data_matrix[..., 0].round()) array([[ 2., 0., 0., 4., 5., 5.]])
References
Methods
fit(X, y)fit_predict(X, y)Get metadata routing of this object.
get_params([deep])Get parameters for this estimator.
predict(X)score(X, y[, sample_weight])Return coefficient of determination on test data.
set_params(**params)Set the parameters of this estimator.
set_score_request(*[, sample_weight])Configure whether metadata should be requested to be passed to the
scoremethod.- get_metadata_routing()#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
routing – A
MetadataRequestencapsulating routing information.- Return type:
MetadataRequest
- get_params(deep=True)#
Get parameters for this estimator.
- score(X, y, sample_weight=None)[source]#
Return coefficient of determination on test data.
The coefficient of determination, \(R^2\), is defined as \((1 - \frac{u}{v})\), where \(u\) is the residual sum of squares
((y_true - y_pred)** 2).sum()and \(v\) is the total sum of squares((y_true - y_true.mean()) ** 2).sum(). The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a \(R^2\) score of 0.0.- Parameters:
X (array-like of shape (n_samples, n_features)) – Test samples. For some estimators this may be a precomputed kernel matrix or a list of generic objects instead with shape
(n_samples, n_samples_fitted), wheren_samples_fittedis the number of samples used in the fitting for the estimator.y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – True values for X.
sample_weight (array-like of shape (n_samples,), default=None) – Sample weights.
- Returns:
score – \(R^2\) of
self.predict(X)w.r.t. y.- Return type:
Notes
The \(R^2\) score used when calling
scoreon a regressor usesmultioutput='uniform_average'from version 0.23 to keep consistent with default value ofr2_score(). This influences thescoremethod of all the multioutput regressors (except forMultiOutputRegressor).
- set_params(**params)#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline). The latter have parameters of the form<component>__<parameter>so that it’s possible to update each component of a nested object.- Parameters:
**params (dict) – Estimator parameters.
- Returns:
self – Estimator instance.
- Return type:
estimator instance
- set_score_request(*, sample_weight='$UNCHANGED$')#
Configure whether metadata should be requested to be passed to the
scoremethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed toscoreif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it toscore.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
sample_weightparameter inscore.self (HistoricalLinearRegression)
- Returns:
self – The updated object.
- Return type: