Answer:
Increase the learning rate after each mini-batch by multiplying it by a small constant.
5.2m questions
6.7m answers