Gradient descent can end in a local minima,
whereas simulated annealing will allow, some
“uphill” – like running down a ski slope.
C. Ma et al.
Optimization methods