Answer:
Increase the learning rate after each mini-batch by multiplying it by a small constant.
7.8m questions
10.4m answers