117k views
0 votes
Assume we are training a linear model using stochastic gradient descent using L2 loss on a per example basis (updating after each example).

We are using a learning rate of 2.
Currently W1=3
Given the next training example is (<-7,6>, -9) and hwl<-7,6>)=-4
What will wų be after updating?

1 Answer

5 votes

After updating both the gradient and L2 regularization, the final value of w₁ will be 11.

Gradient Calculation: First, we compute the gradient of the L2 loss w.r.t. w₁ for the given example:

∂Loss/∂w₁ = 2 * (hwl(-7,6) - y) * x₁ = 2 * (-4 - (-9)) * (-7) = 35

Parameter Update: Then, we apply the SGD update rule with the learning rate λ = 2:

w₁_new = w₁_old - λ * ∂Loss/∂w₁ = 3 - 2 * 35 = -65

L2 Regularization: However, we haven't considered the L2 regularization yet. With L2 on a per-example basis, we add the term λ * w₁ to the update:

w₁_final = w₁_new + λ * w₁ = -65 + 2 * 3 = -65 + 6 = 11

Therefore, after updating with both the gradient and L2 regularization, the final value of w₁ will be 11.

User Bei
by
7.9k points