Explanation:
Part 1:
We know that y(x,θ) = θ0 + ∑d=1Dθdxnd and x′n = xn + ϵn.
So,
y(x′,θ) = θ0 + ∑d=1Dθd(xnd+ϵnd)
= θ0 + ∑d=1Dθdxnd + ∑d=1Dθdϵnd
Since ϵn is independent of the weights θ, we can take it outside the summation:
y(x′,θ) = y(x,θ) + ∑d=1Dθdϵnd
Therefore, we have shown that y(x′,θ) = y(x,θ) + ∑d=1Dθdϵnd.
Part 2:
The sum-of-squares error function for the noise sample set x′ is given by:
ED'(θ) = 1/2 ∑n=1N [y(x′n,θ) - yn]^2
Using the expression for y(x′,θ) derived in part 1, we have:
ED'(θ) = 1/2 ∑n=1N [y(xn,θ) + ∑d=1Dθdϵnd - yn]^2
Expanding the square term and taking the expectation with respect to the noise ϵ, we get:
E[ED'(θ)] = E[1/2 ∑n=1N [(y(xn,θ) - yn)^2 + 2(y(xn,θ) - yn)∑d=1Dθdϵnd + (∑d=1Dθdϵnd)^2]]
Now, since ϵ is a zero-mean Gaussian noise with variance 2σ^2, we have:
E[ϵnd] = 0
E[ϵnd^2] = σ^2
Using these properties, we can simplify the above expression:
E[ED'(θ)] = E[1/2 ∑n=1N [(y(xn,θ) - yn)^2 + 2(y(xn,θ) - yn)∑d=1DθdE[ϵnd] + (∑d=1Dθd^2E[ϵnd^2])]]
= E[1/2 ∑n=1N (y(xn,θ) - yn)^2] + E[θ]^T E[Z] E[θ]
where Z is a (D-1) x (D-1) matrix with (i,j)-th element being E[ϵiϵj], and E[Z] is the matrix obtained by adding σ^2 to the diagonal elements of Z. The terms involving the cross-product of ϵ are ignored as they are zero.
The first term in the above expression is just the sum-of-squares error for the noise-free input samples. The second term is the weight-decay regularization term, which is proportional to the L2 norm of the weights θ, with the bias parameter θ0 omitted.
Therefore, we have shown that:
E[ED'(θ)] = (theta)^T(theta) + z
where z is the weight-decay regularization term.