Gradient Descent in 1-Dimension

6.12. Gradient Descent in 1-Dimension#

%%html
<div align="center">
  <iframe width="600" height="400" 
          src="https://www.youtube.com/embed/5jQpe4nO68M?si=xOCUrnh3GXva0WAc" 
          frameborder="0" 
          allowfullscreen>
  </iframe>
</div>

Suppose we have a function of one variable, \( f(x) \), and we wish to find its minimum. The gradient descent algorithm iterates from an initial value \( x_0 \), updating according to the following equation

(6.116)#\[\begin{equation} x_{n+1} = x_n - \eta \frac{df(x_n)}{dx} \end{equation}\]

Here, \(x_n\) represents the parameter at step \( n \), \( \eta\) is the step size (also known as the learning rate), and \(\frac{df}{dx} \) is the derivative of the function. The learning rate, \(\eta \), controls the step size: a larger \( \eta \) results in larger steps, while a smaller \(\eta \) results in smaller steps.

The gradient descent algorithm continues until one of the following conditions is met:

  1. The maximum number of user-defined iterations has been reached, which may indicate that convergence to a minimum has not been achieved.

  2. The absolute value of the gradient is less than or equal to a user-defined tolerance, indicating that the algorithm has found an approximate gradient of zero.

For gradient descent in higher dimensions, the second condition is modified: the absolute value of the gradient is replaced by its norm, and the algorithm terminates when the norm of the gradient is less than or equal to a user-defined tolerance.


1D Gradient Descent Pseudocode

Initialize: x = random initial value or user choice

eta = user defined learning rate 

max_iterations = user defined maximum number of iterations

tol = user defined tolerance 

For i from 1 to max_iterations:

gradient = derivative of f at f'(x)

x_new = x - learning_rate * gradient

If |gradient| < tolerance:
    Break

x = x_new

Return x as the approximate minimum

Below is a gradient descent code to minimize the function \(x^4 - 6(x+1)^2 -3(x-1)^2\).

import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt
from scipy.integrate import trapezoid 

from matplotlib import animation
from IPython.display import HTML
f = lambda x : x**4  - 6*(x+ 1)**2  - 3*(x - 1)**2 
df = lambda x : 4*x**3 - 12*(x+1) -6*(x-1) 

x = np.linspace(-3.5, 3.5, 100) 
ax=plt.subplot()
ax.set_xlabel(r"$x$", size=20) 
ax.set_ylabel(r"$f(x)$", size=20) 
ax.plot(x, f(x)) 
ax.grid()
../../_images/c0bef5aec6fdc31bc58fc8b370f4dc299554ce166dbee333eade1bd2fd6329c6.png
x_0=3.0      # inital value 
eta=.01       # learning rate        
tol = 1e-5    # tolerance for convergence criteria 
n_iter =200   # maximum number of iterations 

x_n=x_0 
for i in range(n_iter): 

    dx = df(x_n)

    if abs(dx) <= tol: 
        print("converged at i=", i) 
        break

    x_n = x_n - eta*dx 
x_n
converged at i= 25
2.2716334764347486

The plot below marks the starting and ending point, red dot, of the gradient descent results. The vertical line marks the final \(x_n\) value.

ax=plt.subplot()
ax.set_xlabel(r"$x$", size=20) 
ax.set_ylabel(r"$f(x)$", size=20) 

ax.plot(x, f(x)) 
ax.scatter([x_0, x_n], [f(x_0), f(x_n)], color="red", label=r"initial and final values of $x_n$") 
ax.axvline(x_n, color="black") 
ax.grid()
ax.legend(loc=3) 
<matplotlib.legend.Legend at 0x7f60285fafd0>
../../_images/141c352fc63451d2c21d903836bb10cbcdcc575c9afffc2ca280038f21bca7e4.png

question Is this example function convex or not convex?

exercise Run the code with the initial x_0 value = -3, -0.5 and -0.1. Explain the result.