16.3k views
3 votes
Consider a linear model of the form:

y(x,theta)=theta0+∑=1thetaxy(xn,θ)=θ0+∑d=1Dθdxnd
where x=(x1,...,x)xn=(xn1,...,xnD) and weights theta=(theta0,...,theta)θ=(θ0,...,θD). Given the the D-dimension input sample set x={x1,...,x}x={x1,...,xn} with corresponding target value y={y1,...,y}y={y1,...,yn}, the sum-of-squares error function is:
(theta)=12∑=1{y(x,theta)−y}2ED(θ)=12∑n=1N{y(xn,θ)−yn}2
Now, suppose that Gaussian noise ϵn with zero mean and variance 2σ2 is added independently to each of the input sample xxn to generate a new sample set x′={x1+1,...,x+}x′={x1+ϵ1,...,xn+ϵn}. For each sample xxn, x′=(x1+1,...,x+)xn′=(xn1+ϵn1,...,xnD+ϵnd), where n and d is independent across both n and d indices.
(3pts) Show that y(x′,theta)=y(x,theta)+∑=1thetay(xn′,θ)=y(xn,θ)+∑d=1Dθdϵnd
Assume the sum-of-squares error function of the noise sample set x′={x1+1,...,x+}x′={x1+ϵ1,...,xn+ϵn} is (theta)′ED(θ)′. Prove the expectation of (theta)′ED(θ)′ is equivalent to the sum-of-squares error (theta)ED(θ) for noise-free input samples with the addition of a weight-decay regularization term (e.g. 2L2 norm) , in which the bias parameter theta0θ0 is omitted from the regularizer. In other words, show that
[(theta)′]=(theta)+z

1 Answer

4 votes

Explanation:

Part 1:

We know that y(x,θ) = θ0 + ∑d=1Dθdxnd and x′n = xn + ϵn.

So,

y(x′,θ) = θ0 + ∑d=1Dθd(xnd+ϵnd)

= θ0 + ∑d=1Dθdxnd + ∑d=1Dθdϵnd

Since ϵn is independent of the weights θ, we can take it outside the summation:

y(x′,θ) = y(x,θ) + ∑d=1Dθdϵnd

Therefore, we have shown that y(x′,θ) = y(x,θ) + ∑d=1Dθdϵnd.

Part 2:

The sum-of-squares error function for the noise sample set x′ is given by:

ED'(θ) = 1/2 ∑n=1N [y(x′n,θ) - yn]^2

Using the expression for y(x′,θ) derived in part 1, we have:

ED'(θ) = 1/2 ∑n=1N [y(xn,θ) + ∑d=1Dθdϵnd - yn]^2

Expanding the square term and taking the expectation with respect to the noise ϵ, we get:

E[ED'(θ)] = E[1/2 ∑n=1N [(y(xn,θ) - yn)^2 + 2(y(xn,θ) - yn)∑d=1Dθdϵnd + (∑d=1Dθdϵnd)^2]]

Now, since ϵ is a zero-mean Gaussian noise with variance 2σ^2, we have:

E[ϵnd] = 0

E[ϵnd^2] = σ^2

Using these properties, we can simplify the above expression:

E[ED'(θ)] = E[1/2 ∑n=1N [(y(xn,θ) - yn)^2 + 2(y(xn,θ) - yn)∑d=1DθdE[ϵnd] + (∑d=1Dθd^2E[ϵnd^2])]]

= E[1/2 ∑n=1N (y(xn,θ) - yn)^2] + E[θ]^T E[Z] E[θ]

where Z is a (D-1) x (D-1) matrix with (i,j)-th element being E[ϵiϵj], and E[Z] is the matrix obtained by adding σ^2 to the diagonal elements of Z. The terms involving the cross-product of ϵ are ignored as they are zero.

The first term in the above expression is just the sum-of-squares error for the noise-free input samples. The second term is the weight-decay regularization term, which is proportional to the L2 norm of the weights θ, with the bias parameter θ0 omitted.

Therefore, we have shown that:

E[ED'(θ)] = (theta)^T(theta) + z

where z is the weight-decay regularization term.

User Lucas Declercq
by
7.9k points
Welcome to QAmmunity.org, where you can ask questions and receive answers from other members of our community.

9.4m questions

12.2m answers

Categories