1 Answer

Bei · Answer 1 · 2024-02-05T14:51:19+0000

After updating both the gradient and L2 regularization, the final value of w₁ will be 11.

Gradient Calculation: First, we compute the gradient of the L2 loss w.r.t. w₁ for the given example:

∂Loss/∂w₁ = 2 * (hwl(-7,6) - y) * x₁ = 2 * (-4 - (-9)) * (-7) = 35

Parameter Update: Then, we apply the SGD update rule with the learning rate λ = 2:

w₁_new = w₁_old - λ * ∂Loss/∂w₁ = 3 - 2 * 35 = -65

L2 Regularization: However, we haven't considered the L2 regularization yet. With L2 on a per-example basis, we add the term λ * w₁ to the update:

w₁_final = w₁_new + λ * w₁ = -65 + 2 * 3 = -65 + 6 = 11

Therefore, after updating with both the gradient and L2 regularization, the final value of w₁ will be 11.

Please log in or register to add a comment.