23.6k views
4 votes
Suppose you have a dataset with m = 50m=50 examples and n = 200000n=200000 features for each example. You want to use multivariate linear regression to fit the parameters \thetaθ to our data. Should you prefer gradient descent or the normal equation?

1 Answer

5 votes

Answer:

Explanation:

Consider X to be the matrix whose columns are the values for our 50 examples. The normal equation gives us the values of
\theta in the following way


\theta = (X^(T)X)^(-1)X^(T)y

The matrix
X^(T)X however, might not be invertible when
m\leq n. So we must use the pseudo inverse to solve the problem. For a big number of features, calculating the pseudoinverse might be computational expensive. So, gradient descent should be prefered.

User Gareth Oakley
by
6.4k points
Welcome to QAmmunity.org, where you can ask questions and receive answers from other members of our community.