Answer:
First-order method: Gradient Descent
Gradient descent is a first-order optimization algorithm that iteratively updates the parameters of a model by moving in the direction of the negative gradient of the loss function.
The algorithm starts with an initial set of parameters and then at each iteration, it computes the gradient of the loss function with respect to the parameters and updates the parameters by taking a step in the opposite direction of the gradient. This process is repeated until the parameters converge to a minimum or maximum of the loss function.
Accelerated first-order method: Nesterov Accelerated Gradient Descent (NAG)
Nesterov Accelerated Gradient Descent (NAG) is an accelerated first-order optimization algorithm that improves upon the vanilla gradient descent by adding a momentum term.
The algorithm starts with an initial set of parameters and at each iteration, it computes the gradient of the loss function and updates the parameters by taking a step in the direction of the negative gradient with a momentum term.
The momentum term helps to smooth out the oscillations that can occur in the gradient descent algorithm and allows the algorithm to converge faster to a minimum or maximum of the loss function.
Second-order method: Newton-Raphson method
Newton-Raphson method is a second-order optimization algorithm that improves upon the first-order methods by taking into account the curvature of the loss function.
The algorithm starts with an initial set of parameters and at each iteration, it computes the gradient of the loss function as well as the Hessian matrix (the matrix of second-order partial derivatives of the loss function with respect to the parameters) and updates the parameters by taking a step in the direction of the negative of the inverse of the Hessian multiplied by the gradient.