195k views
2 votes
Assume we have K different classes in a multi-class Softmax Regression model. The posterior probability is p^​k​=δ(sk​(x))k​=∑j=1K​exp(sj​(x))exp(sk​(x))​ for k=1,2,…,K, where sk​(x)=θkT​⋅x, input x is an n-dimension vector, and K the total number of classes. 1) To learn this Softmax Regression model, how many parameters we need to estimate? What are these parameters? 2) Consider the cross-entropy cost function J(θ) of m training samples {(xi​,yi​)}i=1,2,…,m​ as below. Derive the gradient of J(θ) regarding to θk​. J(θ)=−m1​∑i=1m​∑k=1K​yk(i)​log(p^​k(i)​) where yk(i)​=1 if the ith instance belongs to class k;0 otherwise.

1 Answer

4 votes

Final answer:

In a multi-class Softmax Regression model, we need to estimate K*(n+1) parameters, where K is the number of classes and n is the input vector dimension. The parameters are the coefficients and the bias term. The gradient of the cross-entropy cost function with respect to θk can be derived using the predicted probabilities, actual labels, and input vectors.

Step-by-step explanation:

To learn the Softmax Regression model, we need to estimate K*(n+1) parameters, where K is the total number of classes and n is the dimension of the input vector. In this model, the parameters are the θk coefficients and the bias term.

To derive the gradient of the cross-entropy cost function J(θ) with respect to θk, we can start by finding the derivative of log(p^k) with respect to sk. Then, using chain rule, we can calculate the derivative of J(θ) with respect to θk as: ∑(i=1 to m): (p^k(i) - yk(i))* xi, where p^k(i) is the predicted probability of class k for the i-th instance, yk(i) is the actual label (1 or 0) for class k, and xi is the input vector for the i-th instance.

User Lukeaus
by
7.4k points