cosine_similarity#
- skfda.misc.cosine_similarity(arg1, arg2)[source]#
Return the cosine similarity.
Calculates the cosine similarity between matching samples in two FDataGrid objects.
For two samples x and y the cosine similarity is defined as:
\[\cos \text{sim}(x, y) = \frac{\langle x, y \rangle}{ \sqrt{\langle x, x \rangle \langle y, y \rangle}}\]where \(\langle {}\cdot{}, {}\cdot{} \rangle\) is the inner product.
The two arguments must have the same number of samples, or one should contain only one sample (and will be broadcasted).
- Parameters:
arg1 (Vector) – First sample.
arg2 (Vector) – Second sample.
- Returns:
Vector with the cosine similarity of each pair of samples.
- Return type:
Examples
This function can compute the multivariate cosine similarity.
>>> import numpy as np >>> from skfda.misc import cosine_similarity >>> >>> array1 = np.array([1, 2, 3]) >>> array2 = np.array([4, 5, 6]) >>> cosine_similarity(array1, array2) 0.9746318461970762
If the arrays contain more than one sample
>>> array1 = np.array([[1, 2, 3], [2, 3, 4]]) >>> array2 = np.array([[4, 5, 6], [1, 1, 1]]) >>> cosine_similarity(array1, array2) array([ 0.97463185, 0.96490128])
The cosine similarity of the \(f(x) = x\) and the constant \(y=1\) defined over the interval [0,1] is the area of the triangle delimited by the the lines y = 0, x = 1 and y = x; 0.5, multiplied by \(\sqrt{3}\).
>>> import skfda >>> >>> x = np.linspace(0,1,1000) >>> >>> fd1 = skfda.FDataGrid(x,x) >>> fd2 = skfda.FDataGrid(np.ones(len(x)),x) >>> cosine_similarity(fd1, fd2) array([ 0.8660254])
If the FDataGrid object contains more than one sample
>>> fd1 = skfda.FDataGrid([x, np.ones(len(x))], x) >>> fd2 = skfda.FDataGrid([np.ones(len(x)), x] ,x) >>> cosine_similarity(fd1, fd2).round(2) array([ 0.87, 0.87])
If one argument contains only one sample it is broadcasted.
>>> fd1 = skfda.FDataGrid([x, np.ones(len(x))], x) >>> fd2 = skfda.FDataGrid([np.ones(len(x))] ,x) >>> cosine_similarity(fd1, fd2).round(2) array([ 0.87, 1. ])
It also work with basis objects
>>> basis = skfda.representation.basis.MonomialBasis(n_basis=3) >>> >>> fd1 = skfda.FDataBasis(basis, [0, 1, 0]) >>> fd2 = skfda.FDataBasis(basis, [1, 0, 0]) >>> cosine_similarity(fd1, fd2) array([ 0.8660254])
>>> basis = skfda.representation.basis.MonomialBasis(n_basis=3) >>> >>> fd1 = skfda.FDataBasis(basis, [[0, 1, 0], [0, 0, 1]]) >>> fd2 = skfda.FDataBasis(basis, [1, 0, 0]) >>> cosine_similarity(fd1, fd2) array([ 0.8660254 , 0.74535599])
>>> basis = skfda.representation.basis.MonomialBasis(n_basis=3) >>> >>> fd1 = skfda.FDataBasis(basis, [[0, 1, 0], [0, 0, 1]]) >>> fd2 = skfda.FDataBasis(basis, [[1, 0, 0], [0, 1, 0]]) >>> cosine_similarity(fd1, fd2) array([ 0.8660254 , 0.96824584])