Posts

Stochastic gradient descent Intuition

Image
  T his is a part 2 of my blog on the optimizers. Previously I had explained about the Gradient Descent which is the basic building blocks for the other optimizers, if you want a revision check out my previous article. Link is in the bottom of this article. SGD :-   It is a similar type of optimizer as Gradient Descent, but the main difference is that it only runs for one data point at a time. Let me explains a little bit more. When we use our Gradient Descent, we take all the data points, do the iteration for number of the epochs we assigned and compute our loss function. But in the SGD we only take one single random point and compute the loss function for every epochs. The main advantage of the SGD over the Gradient Descent is that it is less time computational and less expensive. In SGD we only use one data point, due to this some noise occurs while finding our global minima. To remove these type of noise we use the concept of Momentum .  SGD with Momentum :-  As, I have told above

GRADIENT DESCENT

Image
GRADIENT DESCENT :   It is an optimization technique which helps us to find the values of parameters(coefficients) of a function (f) that minimizes the cost function(cost). The word Gradient means slope and Descent means downward. We will decreases our slope in cost function to find the optimal value of the coefficient  It is best used when the parameters cannot be calculated analytically(means by linear algebra) and must be searched for by an optimal algorithm.  Intuition of Gradient descent: Let's consider a bowl in which you eat cereal or store fruit in. The bowl is a plot of cost function(f). A random position on the surface of bowl is the cost of the values of the coefficient(cost). The bottom of the bowl is the cost of best set of coefficients, the minimum of the function which is also known "Global Minima". Our main goal is to find the global minima point on that curve which will give us the best values of the coefficients.   Basically there are two type of minima: