23.6k views
4 votes
Suppose you have a dataset with m = 50m=50 examples and n = 200000n=200000 features for each example. You want to use multivariate linear regression to fit the parameters \thetaθ to our data. Should you prefer gradient descent or the normal equation?

1 Answer

5 votes

Answer:

Explanation:

Consider X to be the matrix whose columns are the values for our 50 examples. The normal equation gives us the values of
\theta in the following way


\theta = (X^(T)X)^(-1)X^(T)y

The matrix
X^(T)X however, might not be invertible when
m\leq n. So we must use the pseudo inverse to solve the problem. For a big number of features, calculating the pseudoinverse might be computational expensive. So, gradient descent should be prefered.

User Gareth Oakley
by
8.1k points