Scikit-fda and scikit-learn#

In this section, we will explain how scikit-fda interacts with the popular machine learning package scikit-learn. We will introduce briefly the main concepts of scikit-learn and how scikit-fda reuses the same concepts extending them to the functional data analysis field.

# Author: Carlos Ramos Carreño
# License: MIT

A brief summary of scikit-learn architecture#

The library scikit-learn is probably the most well-known Python package for machine learning. This package focuses in machine learning using multivariate data, which should be stored in a numpy ndarray in order to process it. However, this library has defined a particular architecture that can be followed in order to provide new tools that work in situations not even imagined by the original authors, while remaining compatible with the tools already provided in scikit-learn.

In scikit-fda, the same architecture is applied in order to work with functional data observations. As a result, scikit-fda tools are largely compatible with scikit-learn tools, and it is possible to reuse objects such as pipelines or even hyperparameter selection methods such as grid search cross-validation in the functional data setting.

We will introduce briefly the main concepts in scikit-learn, and explain how the tools in scikit-fda are related with them. This is not intended as a full explanation of scikit-learn architecture, and the reader is encouraged to look at the scikit-learn tutorials in order to achieve a deeper understanding of it.

The Estimator object#

A central concept in scikit-learn (and scikit-fda) is what is called an estimator. An estimator in this context is an object that can learn from the data. Thus, classification, regression and clustering methods, as well as transformations with parameters learned from the training data are particular kinds of estimators. Estimators can also be instanced passing parameters, which can be tuned to the data using hyperparameter selection methods.

Estimator objects have a fit method, with receive the training data and (if necessary) the training targets. This method uses the training data in order to learn some parameters of a model. When the learned parameters are part of the user-facing API, then by convention they are attributes of the estimator ending in with the _ character.

As a concrete example of this, consider a nearest centroid classifier for functional data. The object NearestCentroid is a classifier, and thus an estimator. As part of the training process the centroids of the classes are computed and available as the learned parameter centroids_.

Note

The function train_test_split() is one of the functions originally from scikit-learn that can be directly reused in scikit-fda.

import skfda
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt

X, y = skfda.datasets.fetch_growth(return_X_y=True)

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)

classifier = skfda.ml.classification.NearestCentroid()
classifier.fit(X_train, y_train)
classifier.centroids_.plot()
plt.show()
Berkeley Growth Study

Transformers#

Transformers are estimators which can convert data to a new form. Examples of them are preprocessing methods, such as smoothing, registration and dimensionality reduction methods. They always implement fit_transform for fitting and transforming the data in one step. The transformers may be inductive, which means that can transform new data using the learned parameters. In that case they implement the transform method to transform new data. If the transformation is reversible, they usually also implement ìnverse_transform.

As an example consider the smoothing method skfda.preprocessing.smoothing.NadarayaWatsonHatMatrix. Smoothing methods attempt to remove noise from the data leveraging its continuous nature. As these methods discard information of the original data they usually are not reversible.

import skfda.preprocessing.smoothing as ks
from skfda.misc.hat_matrix import NadarayaWatsonHatMatrix
X, y = skfda.datasets.fetch_phoneme(return_X_y=True)

# Keep the first 5 functions
X = X[:5]

X.plot()

smoother = ks.KernelSmoother(kernel_estimator=NadarayaWatsonHatMatrix())
X_smooth = smoother.fit_transform(X)

X_smooth.plot()
plt.show()
  • Phoneme
  • Phoneme

Predictors (classifiers, regressors, clusterers…)#

Predictors in scikit-learn are estimators that can assign a certain target to a particular observation. This includes supervised methods such as classifiers (for which the target will be a class label), or regressors (for which the target is a real value, a vector, or, in functional data analysis, even a function!) and also unsupervised methods such as clusterers or outlying detector methods.

Predictors should implement the fit_predict method for fitting the estimators and predicting the targets in one step and/or the predict method for predicting the targets of possibly non previously observed data. Usually transductive estimators implement only the former one, while inductive estimators implement the latter one (or both).

Predictors can have additional non-mandatory methods, such as predict-proba for obtaining the probability of a particular prediction or score for evaluating the results of the prediction.

As an example, we can look at the KMeans clustering method for functional data. This method will try to separate the data into different clusters according to the distance between observations.

X, y = skfda.datasets.fetch_weather(return_X_y=True)

# Use only the first value (temperature)
X = X.coordinates[0]

clusterer = skfda.ml.clustering.KMeans(n_clusters=3)
y_pred = clusterer.fit_predict(X)

X.plot(group=y_pred)
plt.show()
Canadian Weather

Metaestimators#

In scikit-learn jargon, a metaestimator is an estimator that takes other estimators as parameters. There are several reasons for doing that, which will be explained now.

Composition metaestimators#

It is very common in machine learning to apply one or more preprocessing steps one after the other, before applying a final predictor. For this purpose scikit-learn offers the Pipeline, which join the steps together and uses the same estimator API for performing all steps in order (this is usually referred as the composite pattern in software engineering). The Pipeline estimator can be used with the functional data estimators available in scikit-fda. Moreover, as transformers such as dimensionality reduction methods can convert functional data to multivariate data usable by scikit-learn methods it is possible to mix methods from scikit-fda and scikit-learn in the same pipeline.

Warning

In addition, scikit-learn offers estimators that can join several transformations as new features of the same dataset ( FeatureUnion) or that can apply different transformers to different columns of the data (ColumnTransformer). These transformers are not yet usable with functional data.

As an example, we can construct a pipeline that registers the data using shift registation, then applies a variable selection method to transform each observation to a 3D vector and then uses a SVM classifier to classify the data.

from skfda.preprocessing.dim_reduction import variable_selection as vs
from skfda.preprocessing.registration import LeastSquaresShiftRegistration
from sklearn.pipeline import Pipeline
from sklearn.svm import SVC

X, y = skfda.datasets.fetch_growth(return_X_y=True)

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)

pipeline = Pipeline([
    ("registration", LeastSquaresShiftRegistration()),
    ("dim_reduction", vs.RKHSVariableSelection(n_features_to_select=3)),
    ("classifier", SVC()),
])

pipeline.fit(X_train, y_train)
pipeline.score(X_test, y_test)
1.0

Hyperparameter optimizers#

Some of the parameters used for the creation of an estimator need to be tuned to each particular dataset in order to improve the prediction accuracy and generalization. There are several techniques to do that already available in scikit-learn, such as grid search cross-validation (GridSearchCV) or randomized search (RandomizedSearchCV). As these hyperparameter optimizers only need to split the data and call score in the predictor, they can be directly used with the methods in scikit-fda.

Note

In addition one could use any optimizer that understand the scikit-learn API such as those in scikit-optimize.

As an example, we will use GridSearchCV to select the number of neighbors used in a KNeighborsClassifier.

from sklearn.model_selection import GridSearchCV

X, y = skfda.datasets.fetch_growth(return_X_y=True)

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)

classifier = skfda.ml.classification.KNeighborsClassifier()

grid_search = GridSearchCV(
    estimator=classifier,
    param_grid={"n_neighbors": range(1, 10, 2)},
)

grid_search.fit(X_train, y_train)
n_neighbors = grid_search.best_estimator_.n_neighbors
score = grid_search.score(X_test, y_test)

print(n_neighbors, score)
3 0.9583333333333334

Ensemble methods#

The ensemble methods VotingClassifier and VotingRegressor in scikit-learn use several different estimators in order to predict the targets. As this is done by evaluating the passed estimators as black boxes, these predictors can also be combined with scikit-fda predictors.

Warning

Other ensemble methods, such as BaggingClassifier or AdaBoostClassifier cannot yet be used with functional data unless it has been transformed to a multivariate dataset.

As an example we will use a voting classifier to classify data using as classifiers a knn-classifier, a nearest centroid classifier and a maximum depth classifier.

0.75

Multiclass and multioutput classification utilities#

The scikit-learn library also offers additional utilities that can convert a binary classifier into a multiclass classifier (such as OneVsRestClassifier) or to extend a single output classifier or regressor to accept also multioutput (vector-valued) targets.

In this example we want to use as a classifier the combination of a dimensionality reduction method ( RKHSVariableSelection) and a SVM classifier (SVC). As that particular dimensionality reduction method is only suitable for binary data, we use OneVsRestClassifier to classify in a multiclass dataset.

from sklearn.multiclass import OneVsRestClassifier

X, y = skfda.datasets.fetch_phoneme(return_X_y=True)

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)

pipeline = Pipeline([
    ("dim_reduction", vs.RKHSVariableSelection(n_features_to_select=3)),
    ("classifier", SVC()),
])

multiclass = OneVsRestClassifier(pipeline)

multiclass.fit(X_train, y_train)
multiclass.score(X_test, y_test)
0.9140070921985816

Other scikit-learn utilities#

In addition to the aforementioned objects, there are plenty of objects in scikit-learn that can be applied directly to functional data. We have already seen in the examples the function train_test_split(). Other objects and functions such as KFold can be directly applied to functional data in order to split it into folds. Scorers for classification or regression, such as accuracy_score() can be directly applied to functional data problems.

Moreover, there are plenty of libraries that aim to extend scikit-learn in several directions (take a look at the list of related projects). You will probably see that a lot of the functionality can be applied to scikit-fda, as it uses the same API as scikit-learn.

Total running time of the script: (0 minutes 6.341 seconds)

Gallery generated by Sphinx-Gallery