16.3k views
3 votes
Consider a linear model of the form:

y(x,theta)=theta0+∑=1thetaxy(xn,θ)=θ0+∑d=1Dθdxnd
where x=(x1,...,x)xn=(xn1,...,xnD) and weights theta=(theta0,...,theta)θ=(θ0,...,θD). Given the the D-dimension input sample set x={x1,...,x}x={x1,...,xn} with corresponding target value y={y1,...,y}y={y1,...,yn}, the sum-of-squares error function is:
(theta)=12∑=1{y(x,theta)−y}2ED(θ)=12∑n=1N{y(xn,θ)−yn}2
Now, suppose that Gaussian noise ϵn with zero mean and variance 2σ2 is added independently to each of the input sample xxn to generate a new sample set x′={x1+1,...,x+}x′={x1+ϵ1,...,xn+ϵn}. For each sample xxn, x′=(x1+1,...,x+)xn′=(xn1+ϵn1,...,xnD+ϵnd), where n and d is independent across both n and d indices.
(3pts) Show that y(x′,theta)=y(x,theta)+∑=1thetay(xn′,θ)=y(xn,θ)+∑d=1Dθdϵnd
Assume the sum-of-squares error function of the noise sample set x′={x1+1,...,x+}x′={x1+ϵ1,...,xn+ϵn} is (theta)′ED(θ)′. Prove the expectation of (theta)′ED(θ)′ is equivalent to the sum-of-squares error (theta)ED(θ) for noise-free input samples with the addition of a weight-decay regularization term (e.g. 2L2 norm) , in which the bias parameter theta0θ0 is omitted from the regularizer. In other words, show that
[(theta)′]=(theta)+z

1 Answer

4 votes

Explanation:

Part 1:

We know that y(x,θ) = θ0 + ∑d=1Dθdxnd and x′n = xn + ϵn.

So,

y(x′,θ) = θ0 + ∑d=1Dθd(xnd+ϵnd)

= θ0 + ∑d=1Dθdxnd + ∑d=1Dθdϵnd

Since ϵn is independent of the weights θ, we can take it outside the summation:

y(x′,θ) = y(x,θ) + ∑d=1Dθdϵnd

Therefore, we have shown that y(x′,θ) = y(x,θ) + ∑d=1Dθdϵnd.

Part 2:

The sum-of-squares error function for the noise sample set x′ is given by:

ED'(θ) = 1/2 ∑n=1N [y(x′n,θ) - yn]^2

Using the expression for y(x′,θ) derived in part 1, we have:

ED'(θ) = 1/2 ∑n=1N [y(xn,θ) + ∑d=1Dθdϵnd - yn]^2

Expanding the square term and taking the expectation with respect to the noise ϵ, we get:

E[ED'(θ)] = E[1/2 ∑n=1N [(y(xn,θ) - yn)^2 + 2(y(xn,θ) - yn)∑d=1Dθdϵnd + (∑d=1Dθdϵnd)^2]]

Now, since ϵ is a zero-mean Gaussian noise with variance 2σ^2, we have:

E[ϵnd] = 0

E[ϵnd^2] = σ^2

Using these properties, we can simplify the above expression:

E[ED'(θ)] = E[1/2 ∑n=1N [(y(xn,θ) - yn)^2 + 2(y(xn,θ) - yn)∑d=1DθdE[ϵnd] + (∑d=1Dθd^2E[ϵnd^2])]]

= E[1/2 ∑n=1N (y(xn,θ) - yn)^2] + E[θ]^T E[Z] E[θ]

where Z is a (D-1) x (D-1) matrix with (i,j)-th element being E[ϵiϵj], and E[Z] is the matrix obtained by adding σ^2 to the diagonal elements of Z. The terms involving the cross-product of ϵ are ignored as they are zero.

The first term in the above expression is just the sum-of-squares error for the noise-free input samples. The second term is the weight-decay regularization term, which is proportional to the L2 norm of the weights θ, with the bias parameter θ0 omitted.

Therefore, we have shown that:

E[ED'(θ)] = (theta)^T(theta) + z

where z is the weight-decay regularization term.

User Lucas Declercq
by
7.8k points