87.1k views
1 vote
Which of the following option(s) is/are true about training neural networks? (Choose all that apply.)

A.For multi-layer neural networks, stochastic gradient descent (SGD) is guaranteed to reach global optimum
B.For multi-layer neural networks, stochastic gradient descent (SGD) is not guaranteed to reach a global optimum C.Larger models tend to be harder to learn because their units need to be adjusted so that each one of them can individually solve the task
D.Larger models tend to be easier to learn because their units need to be adjusted so that they are, collectively sufficient to solve the task
E.Initialization plays no or very little role in finding a good solution during training of neural networks

User Ess
by
9.2k points

1 Answer

7 votes

Final answer:

Options B, D, and E, relating to the non-guarantee of global optimum by SGD, misconceptions about larger models, and the importance of initialization, are true about training neural networks. Options A and C are not correct.

Step-by-step explanation:

A student has asked which option(s) is/are true about training neural networks. The correct options among those provided are:

  • B. For multi-layer neural networks, stochastic gradient descent (SGD) is not guaranteed to reach a global optimum. SGD is a popular optimization method used in training neural networks, but it can converge to local minima instead of the global minimum, particularly in complex loss landscapes.
  • D. The animals which possess the best characteristics for living in an area, will be the predominant species. While this statement may seem out of context, it is akin to the concept where larger models, having more parameters, can capture complex patterns but are not necessarily 'easier' to learn as they can suffer from overfitting and require more data and computational resources.
  • E. None of the above are reasonable statements of Scientist A's hypothesis. Similarly, initialization plays a vital role in the training of neural networks. Proper initialization can help in faster convergence and avoid bad local minima.

Options A and C are incorrect with respects to the training of neural networks:

  • A. Is incorrect because SGD does not guarantee reaching the global optimum due to the non-convexity of the optimization problem in training neural networks.
  • C. Is incorrect as it misrepresents the difficulty in training larger models. Larger models tend to be harder to train not because units need to individually solve the task, but because of the increased risk of overfitting and the complexity of tuning a large number of parameters.

User Rfmodulator
by
7.5k points