Answer:
Increase the learning rate after each mini-batch by multiplying it by a small constant.
9.3m questions
12.0m answers