cosine_similarity#

skfda.misc.cosine_similarity(arg1, arg2)[source]#

Return the cosine similarity.

Calculates the cosine similarity between matching samples in two FDataGrid objects.

For two samples x and y the cosine similarity is defined as:

\[\cos \text{sim}(x, y) = \frac{\langle x, y \rangle}{ \sqrt{\langle x, x \rangle \langle y, y \rangle}}\]

where \(\langle {}\cdot{}, {}\cdot{} \rangle\) is the inner product.

The two arguments must have the same number of samples, or one should contain only one sample (and will be broadcasted).

Parameters:
  • arg1 (Vector) – First sample.

  • arg2 (Vector) – Second sample.

Returns:

Vector with the cosine similarity of each pair of samples.

Return type:

ndarray[Any, dtype[float64]]

Examples

This function can compute the multivariate cosine similarity.

>>> import numpy as np
>>> from skfda.misc import cosine_similarity
>>>
>>> array1 = np.array([1, 2, 3])
>>> array2 = np.array([4, 5, 6])
>>> cosine_similarity(array1, array2)
0.9746318461970762

If the arrays contain more than one sample

>>> array1 = np.array([[1, 2, 3], [2, 3, 4]])
>>> array2 = np.array([[4, 5, 6], [1, 1, 1]])
>>> cosine_similarity(array1, array2)
array([ 0.97463185,  0.96490128])

The cosine similarity of the \(f(x) = x\) and the constant \(y=1\) defined over the interval [0,1] is the area of the triangle delimited by the the lines y = 0, x = 1 and y = x; 0.5, multiplied by \(\sqrt{3}\).

>>> import skfda
>>>
>>> x = np.linspace(0,1,1000)
>>>
>>> fd1 = skfda.FDataGrid(x,x)
>>> fd2 = skfda.FDataGrid(np.ones(len(x)),x)
>>> cosine_similarity(fd1, fd2)
array([ 0.8660254])

If the FDataGrid object contains more than one sample

>>> fd1 = skfda.FDataGrid([x, np.ones(len(x))], x)
>>> fd2 = skfda.FDataGrid([np.ones(len(x)), x] ,x)
>>> cosine_similarity(fd1, fd2).round(2)
array([ 0.87,  0.87])

If one argument contains only one sample it is broadcasted.

>>> fd1 = skfda.FDataGrid([x, np.ones(len(x))], x)
>>> fd2 = skfda.FDataGrid([np.ones(len(x))] ,x)
>>> cosine_similarity(fd1, fd2).round(2)
array([ 0.87,  1.  ])

It also work with basis objects

>>> basis = skfda.representation.basis.MonomialBasis(n_basis=3)
>>>
>>> fd1 = skfda.FDataBasis(basis, [0, 1, 0])
>>> fd2 = skfda.FDataBasis(basis, [1, 0, 0])
>>> cosine_similarity(fd1, fd2)
array([ 0.8660254])
>>> basis = skfda.representation.basis.MonomialBasis(n_basis=3)
>>>
>>> fd1 = skfda.FDataBasis(basis, [[0, 1, 0], [0, 0, 1]])
>>> fd2 = skfda.FDataBasis(basis, [1, 0, 0])
>>> cosine_similarity(fd1, fd2)
array([ 0.8660254 ,  0.74535599])
>>> basis = skfda.representation.basis.MonomialBasis(n_basis=3)
>>>
>>> fd1 = skfda.FDataBasis(basis, [[0, 1, 0], [0, 0, 1]])
>>> fd2 = skfda.FDataBasis(basis, [[1, 0, 0], [0, 1, 0]])
>>> cosine_similarity(fd1, fd2)
array([ 0.8660254 ,  0.96824584])