Basis representation#

In this section, we will introduce the basis representation of functional data. This is a very useful representation for functions that belong (or can be reasonably projected) to the space spanned by a finite set of basis functions.

# Author: Carlos Ramos Carreño
# License: MIT
#
# sphinx_gallery_thumbnail_number = 7

Functions and vector spaces#

Functions, which are the objects of study of FDA, can be added and multiplied by scalars, and these operations verify the necessary properties to consider these functions as vectors in a vector space.

The FDataGrid objects that are used to represent functional observations in scikit-fda also support these operations.

In order to show the vector operations, we create two FDatagrids with two functions each, \(\mathbf{X}_1 = \{x_{1i}: \mathbb{R} \to \mathbb{R}\}, i=1,2\) and \(\mathbf{X}_2 = \{x_{2i}: \mathbb{R} \to \mathbb{R}\}, i=1,2\), and plot them.

import numpy as np
import skfda
import matplotlib.pyplot as plt

fig, axes = plt.subplots(1, 2, figsize=(8, 3))

t = np.linspace(0, 1, 100)

fd = skfda.FDataGrid(
    data_matrix=[
        np.sin(6 * t),  # First function
        2 * t,          # Second function
    ],
    grid_points=t,
)

fd.plot(axes=axes[0])
axes[0].set_title(r"$\mathbf{X}_1$")

fd2 = skfda.FDataGrid(
    data_matrix=[
        3 * t**2,              # First function
        np.log(t + 0.1),       # Second function
    ],
    grid_points=t,
)

fd2.plot(axes=axes[1])
axes[1].set_title(r"$\mathbf{X}_2$")

plt.show()
$\mathbf{X}_1$, $\mathbf{X}_2$

Functions can be multiplied by an scalar. This only changes the scale of the functions, but not their shape. Note that all the functions in the dataset are affected.

It is also possible to add two functions together. If you do that with two FDataGrid objects with the same length, the corresponding functions will be added.

fig, axes = plt.subplots(1, 2, figsize=(8, 3))

scalar_mul = 3 * fd

scalar_mul.plot(axes=axes[0])
axes[0].set_title(r"$3 \mathbf{X}_1$")

fd_sum = fd + fd2

fd_sum.plot(axes=axes[1])
axes[1].set_title(r"$\mathbf{X}_1 + \mathbf{X}_2$")

plt.show()
$3 \mathbf{X}_1$, $\mathbf{X}_1 + \mathbf{X}_2$

Infinite (Schauder) basis#

Some functional topological vector spaces admit a Schauder basis. This is a sequence of functions \(\Phi = \{\phi_i\}_{i=1}^{\infty}\) so that for every function \(x\) in the space exists a sequence of scalars \(\{a_i\}_{i=1}^{\infty}\) such that

\[x(t) = \sum_{i=1}^{\infty} a_i \phi_i(t)\]

where the convergence of this series is with respect to the vector space topology.

If you know that your functions of interest belong to one of these vector spaces, it may be interesting to express your functions in a basis. As computers have limited memory and computation resources, it is not possible to obtain the infinite basis expansion. Instead, one typically truncates the expansion to a few basis functions, which are enough to approximate your observations with a certain degree of accuracy. This truncation also has the effect of smoothing the data, as less important variations, such as noise, are eliminated in the process. Moreover, as basis are truncated, the vector space generated by the truncated set of basis functions is different to the original space, and also different between different basis families. Thus, the choice of basis matters, even if originally they would have generated the same space.

In scikit-fda, functions expressed as a basis expansion can be represented using the class FDataBasis. The main attributes of objects of this class are basis, an object representing a basis family of functions, and coefficients, a matrix with the scalar coefficients of the functions in the basis.

As an example, we can create the following function, which is expressed in a truncated monomial basis (and thus it is a polynomial):

\[x(t) = 3 + 2t - 4t^2 + t^3\]
basis = skfda.representation.basis.MonomialBasis(
    n_basis=4,
    domain_range=(-10, 10),
)

fd_basis = skfda.FDataBasis(
    basis=basis,
    coefficients=[
        [3, 2, -4, 1],  # First (and unique) observation
    ],
)

fd_basis.plot()
plt.show()
plot basis representation

Conversion between FDataGrid and FDataBasis#

It is possible to convert between functions in discretized form (class FDataGrid) and basis expansion form ( class FDataBasis). In order to convert FDataGrid objects to a basis representation you will need to call the method to_basis, passing the desired basis as an argument. The functions will then be projected to the functional basis, solving a least squares problem in order to find the optimal coefficients of the expansion. In order to convert a FDataBasis to a discretized representation you should call the method to_grid. This method evaluates the functions in a grid that can be supplied as an argument in order to obtain the values of the discretized representation.

We now can see how the number of basis functions affect the basis expansion representation of a few observations taken from a real-world dataset. You can see that as more basis functions are used, the basis representation provides a better representation of the real data.

max_basis = 9

X, y = skfda.datasets.fetch_phoneme(return_X_y=True)

# Select only the first 5 samples
X = X[:5]

X.plot()

fig, axes = plt.subplots(nrows=3, ncols=3)

for n_basis in range(1, max_basis + 1):
    basis = skfda.representation.basis.MonomialBasis(n_basis=n_basis)
    X_basis = X.to_basis(basis)

    ax = axes.ravel()[n_basis - 1]
    fig = X_basis.plot(axes=ax)
    ax.set_title(f"{n_basis} basis functions")

fig.tight_layout()
plt.show()
  • Phoneme
  • Phoneme, 1 basis functions, 2 basis functions, 3 basis functions, 4 basis functions, 5 basis functions, 6 basis functions, 7 basis functions, 8 basis functions, 9 basis functions

List of available basis functions#

In this section we will provide a list of the available basis in scikit-fda. As explained before, the basis family is important when the basis expansion is truncated (which always happens in order to represent it in a computer). Thus, it is recommended to take a look at the available basis in order to pick one that provides the best representation of the original data.

First we will load a dataset to test the basis representations.

X, y = skfda.datasets.fetch_phoneme(return_X_y=True)

# Select only the first 5 samples
X = X[:5]

X.plot()
plt.show()
Phoneme

Monomial basis#

The monomial basis (class Monomial) is probably one of the simpler and more well-known basis of functions. Often Taylor and McLaurin series are explained in the very first courses of Science and Engineering degrees, and students are familiar with polynomials since much before. Thus, the monomial basis is useful for teaching purposes (and that is why we have used it in the examples). It is also very useful for testing purposes, as it easy to manually derive the expected results of operations involving this basis.

As a basis for functional data analysis, however, it has several issues that usually make preferrable to use other basis instead. First, the usual basis \(\{1, t, t^2, t^3, \ldots\}\) is not orthogonal under the standard inner product in \(L^2\), that is \(\langle x_1, x_2 \rangle = \int_{\mathcal{T}} x_1(t) x_2(t) dt\). This inhibits some performance optimizations that are available for operations that require inner products. It is possible to find an orthogonal basis of polynomials, but it will not be as easy to understand, losing many of its advantages. Another problems with this basis are the necessity of a large number of basis functions to express local features, the bad behaviour at the extremes of the function and the fact that the derivatives of the basis expansion are not good approximations of the derivatives of the original data, as high order polynomials tend to have very large oscillations.

Here we show the first five elements of the monomial basis.

plot basis representation

We now show how the previous observations are represented using the first five elements of this basis.

X_basis = X.to_basis(basis)
X_basis.plot()
plt.show()
Phoneme

Fourier basis#

Probably the second most well known series expansion for staticians, engineers, physicists and mathematicians is the Fourier series. The Fourier basis (class Fourier) consist on a constant term plus sines and cosines of varying frequency, all of them normalized to unit (\(L^2\)) norm. This basis is a good choice for periodic functions (as a function expressed in this basis has the same value at the beginning and at the end of its domain interval if it has the same lenght as the period \(\omega\). Moreover, in this case the functions are orthonormal (that is why the basis used are normalized).

This basis is specially indicated for functions without strong local features and with almost the same order of curvature everywhere, as otherwise the expansion require again a large number of basis to represent those details.

Here we show the first five elements of a Fourier basis.

plot basis representation

We now show how the previous observations are represented using the first five elements of this basis.

X_basis = X.to_basis(basis)
X_basis.plot()
plt.show()
Phoneme

B-spline basis#

Splines are a family of functions that has taken importance with the advent of the modern computers, and nowadays are well known for a lot of engineers and designers. Esentially, they are piecewise polynomials that join smoothly at the separation points (usually called knots). Thus, both polynomials and piecewise linear functions are included in this family. Given a set of knots, a B-spline basis (class BSpline) of a given order can be used to express every spline of the same order that uses the same knots.

This basis is a very powerful basis, as the knots can be adjusted to be able to express local features, and it is even possible to create points where the functions are not necessarily smooth or continuous by placing several knots together. Also the elements of the basis have the compact support property, which allows more efficient computations. Thus, this basis is indicated for non-periodic functions or functions with local features or with different orders of curvature along their domain.

Here we show the first five elements of a B-spline basis.

plot basis representation

We now show how the previous observations are represented using the first five elements of this basis.

X_basis = X.to_basis(basis)
X_basis.plot()
plt.show()
Phoneme

Constant basis#

Sometimes it is useful to consider the basis whose only function is the constant one. In particular, using this basis we can view scalar values as functional observations, which can be used to combine multivariate and functional data in the same model.

Tensor product basis#

The previously explained bases are useful for data that comes in the form of curves, that is, functions \(\{f_i: \mathbb{R} \to \mathbb{R}\}_{i=1}^N\). However, scikit-fda allows also the representation of surfaces or functions in higher dimensions. In this case it is even more useful to be able to represent them using basis expansions, as the number of parameters in the discretized representation grows as the product of the grid points in each dimension of the domain.

The tensor product basis (class Tensor) allows the construction of basis for these higher dimensional functions as tensor products of \(\mathbb{R} \to \mathbb{R}\) basis.

As an example, we can import the digits datasets of scikit-learn, which are surfaces, and convert it to a basis expansion. Note that we use different basis for the different continuous parameters of the function in order to show how it works, although it probably makes no sense in this particular example.

from sklearn.datasets import load_digits

X, y = load_digits(return_X_y=True)
X = X.reshape(-1, 8, 8)

fd = skfda.FDataGrid(X)

basis = skfda.representation.basis.TensorBasis([
    skfda.representation.basis.FourierBasis(  # X axis
        n_basis=5,
        domain_range=fd.domain_range[0],
    ),
    skfda.representation.basis.BSplineBasis(  # Y axis
        n_basis=6,
        domain_range=fd.domain_range[1],
    ),
])

fd_basis = fd.to_basis(basis)

# We only plot the first function
fd_basis[0].plot()
plt.show()
plot basis representation

Finite element basis#

A finite element basis (class FiniteElement) is a basis used in the finite element method (FEM). In order to instantiate a basis, it is necessary to pass a set of vertices and a set of simplices, or cells, that join them, conforming a grid. The basis elements are then functions that are one at exactly one of these vertices and zero in the rest of them.

The advantage of this basis for higher dimensional functions is that one can have more control of the basis, placing more vertices in regions with interesting behaviour, such as local features and less elsewhere.

Here we show an example where the digits dataset of scikit-learn is expressed in the finite element basis. First we create the vertices and simplices that we will use and we plot them.

vertices = np.array([
    (0, 0),
    (0, 1),
    (1, 0),
    (1, 1),
    (0.25, 0.5),
    (0.5, 0.25),
    (0.5, 0.75),
    (0.75, 0.5),
    (0.5, 0.5),
])

cells = np.array([
    (0, 1, 4),
    (0, 2, 5),
    (1, 3, 6),
    (2, 3, 7),
    (0, 4, 5),
    (1, 4, 6),
    (2, 5, 7),
    (3, 6, 7),
    (4, 5, 8),
    (4, 6, 8),
    (5, 7, 8),
    (6, 7, 8),
])

plt.triplot(vertices[:, 0], vertices[:, 1], cells)
plt.show()
plot basis representation

We now represent the digits dataset in this basis.

basis = skfda.representation.basis.FiniteElementBasis(
    vertices=vertices,
    cells=cells,
)

fd_basis = fd.to_basis(basis)

# We only plot the first function
fd_basis[0].plot()
plt.show()
plot basis representation

Vector-valued basis#

With the aforementioned bases, one could express \(\mathbb{R}^p \to \mathbb{R}\) functions. In order to express vector valued functions as a basis expansion, one just need to express each coordinate function as a basis expansion and multiply it by the corresponding unitary vector in the coordinate direction, adding finally all of them together.

The vector-valued basis (VectorValued) allows the representation of vector-valued functions doing just that.

As an example, consider the Canadian Weather dataset, including both temperature and precipitation data as coordinate functions, and plotted below.

X, y = skfda.datasets.fetch_weather(return_X_y=True)

X.plot()
plt.show()
Canadian Weather

We will express this dataset as a basis expansion. Temperatures are now expressed in a Fourier basis, while we express precipitations as B-splines.

basis = skfda.representation.basis.VectorValuedBasis([
    skfda.representation.basis.FourierBasis(  # First coordinate function
        n_basis=5,
        domain_range=X.domain_range,
    ),
    skfda.representation.basis.BSplineBasis(  # Second coordinate function
        n_basis=10,
        domain_range=X.domain_range,
    ),
])

X_basis = X.to_basis(basis)
X_basis.plot()
plt.show()
Canadian Weather

Total running time of the script: (0 minutes 7.884 seconds)

Gallery generated by Sphinx-Gallery