65.7k views
3 votes
In this problem, we derive the kernel K-means algorithm.

a) We start by getting some intuition about kernels. A kernel is a function

K(x,y) = ⟨ϕ(x),ϕ(y)⟩ where ϕ(.)
is a transformation to a high-dimensional space.
Consider the kernel Kl(x,y) = (xTy)^l, l ∈ {1,2,3,…}. For l=1, this is the linear kernel xTy corresponding to the identity transformation ϕ1(x) = x. Show that for l=2, if x=(x₁, x₂, …, xn)ᵀ, the kernel corresponds to the transformation ϕ2(x) = [x₁ rho₁(x) x₂ ϕ₁(x) ⋮ xn ϕ₁(x)]. This is a transformation from an n-dimensional to an n^2-dimensional space. What is the dimension of ϕl(x), the transformation associated with Kl(x,y)?

User Ulusoyca
by
7.3k points

1 Answer

7 votes

Final Answer:

For
\( l = 2 \), the kernel
\( K_2(x, y) = (x^Ty)^2 \) corresponds to the transformation
\( \phi_2(x) = [x_1, \rho_1(x), x_2, \rho_1(x), \ldots, x_n, \rho_1(x)] \), where
\( \rho_1(x) \) is the linear transformation
\( x^T \). This transformation maps from an
\( n \)-dimensional space to an
\( n^2 \) -dimensional space. The dimension of
\( \phi_2(x) \) is
\( n^2 \).

Step-by-step explanation:

The kernel
\( K_2(x, y) = (x^Ty)^2 \) can be expanded as
\( K_2(x, y) = (x_1y_1 + x_2y_2 + \ldots + x_ny_n)^2 \). When we multiply this out, we get terms like
\( x_i^2 \),
\( y_i^2 \), and
\( x_iy_i \), which are elements of the transformation
\( \phi_2(x) \). Therefore,
\( \phi_2(x) \) has
\( n \) original dimensions
\( x_i \) and
\( n \) additional dimensions
\( \rho_1(x) = x^T \), resulting in a total of
\( n + n^2 = n^2 \)dimensions.

The transformation
\( \phi_2(x) \) is a mapping to a higher-dimensional space, and each element of
\( \phi_2(x) \) corresponds to a combination of the original features
\( x_i \) and the linear transformation
\( x^T \). This transformation allows the algorithm to capture non-linear relationships between data points by implicitly mapping them to a higher-dimensional space. The increase in dimensionality facilitates the separation of data points in a way that might not be achievable in the original space, enhancing the effectiveness of the kernelized algorithm.

User Jaeho
by
8.0k points