Answer:
Increase the learning rate after each mini-batch by multiplying it by a small constant.
9.5m questions
12.2m answers