Assume we have K different classes in a multi-class Softmax Regression model. The posterior probability is p^k=δ(sk(x))k=∑j=1Kexp(sj(x))exp(sk(x)) for k=1,2,…,K, where s…

Question

asked Jan 19, 2024 195k views

Assume we have K different classes in a multi-class Softmax Regression model. The posterior probability is p^k=δ(sk(x))k=∑j=1Kexp(sj(x))exp(sk(x)) for k=1,2,…,K, where sk(x)=θkT⋅x, input x is an n-dimension vector, and K the total number of classes. 1) To learn this Softmax Regression model, how many parameters we need to estimate? What are these parameters? 2) Consider the cross-entropy cost function J(θ) of m training samples {(xi,yi)}i=1,2,…,m as below. Derive the gradient of J(θ) regarding to θk. J(θ)=−m1∑i=1m∑k=1Kyk(i)log(p^k(i)) where yk(i)=1 if the ith instance belongs to class k;0 otherwise.

Yong Jie Wong asked

by Yong Jie Wong

8.7k points

1 Answer

← Prev Question Next Question →

Ask a Question

Lukeaus · Answer 1 · 2024-01-23T02:38:26+0000

Final answer:

In a multi-class Softmax Regression model, we need to estimate K*(n+1) parameters, where K is the number of classes and n is the input vector dimension. The parameters are the coefficients and the bias term. The gradient of the cross-entropy cost function with respect to θk can be derived using the predicted probabilities, actual labels, and input vectors.

Step-by-step explanation:

To learn the Softmax Regression model, we need to estimate K*(n+1) parameters, where K is the total number of classes and n is the dimension of the input vector. In this model, the parameters are the θk coefficients and the bias term.

To derive the gradient of the cross-entropy cost function J(θ) with respect to θk, we can start by finding the derivative of log(p^k) with respect to sk. Then, using chain rule, we can calculate the derivative of J(θ) with respect to θk as: ∑(i=1 to m): (p^k(i) - yk(i))* xi, where p^k(i) is the predicted probability of class k for the i-th instance, yk(i) is the actual label (1 or 0) for class k, and xi is the input vector for the i-th instance.

Assume we have K different classes in a multi-class Softmax Regression model. The posterior probability is p^k=δ(sk(x))k=∑j=1Kexp(sj(x))exp(sk(x)) for k=1,2,…,K, where s…

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Final answer:

Step-by-step explanation:

Please log in or register to add a comment.

Related questions

Categories

Other Questions

Assume we have K different classes in a multi-class Softmax Regression model. The posterior probability is p^​k​=δ(sk​(x))k​=∑j=1K​exp(sj​(x))exp(sk​(x))​ for k=1,2,…,K, where s…

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Final answer:

Step-by-step explanation:

Please log in or register to add a comment.

Related questions

Categories

Other Questions

Assume we have K different classes in a multi-class Softmax Regression model. The posterior probability is p^k=δ(sk(x))k=∑j=1Kexp(sj(x))exp(sk(x)) for k=1,2,…,K, where s…