6.12. Gradient Descent in 1-Dimension#
%%html
<div align="center">
<iframe width="600" height="400"
src="https://www.youtube.com/embed/5jQpe4nO68M?si=xOCUrnh3GXva0WAc"
frameborder="0"
allowfullscreen>
</iframe>
</div>
Suppose we have a function of one variable, \( f(x) \), and we wish to find its minimum. The gradient descent algorithm iterates from an initial value \( x_0 \), updating according to the following equation
Here, \(x_n\) represents the parameter at step \( n \), \( \eta\) is the step size (also known as the learning rate), and \(\frac{df}{dx} \) is the derivative of the function. The learning rate, \(\eta \), controls the step size: a larger \( \eta \) results in larger steps, while a smaller \(\eta \) results in smaller steps.
The gradient descent algorithm continues until one of the following conditions is met:
The maximum number of user-defined iterations has been reached, which may indicate that convergence to a minimum has not been achieved.
The absolute value of the gradient is less than or equal to a user-defined tolerance, indicating that the algorithm has found an approximate gradient of zero.
For gradient descent in higher dimensions, the second condition is modified: the absolute value of the gradient is replaced by its norm, and the algorithm terminates when the norm of the gradient is less than or equal to a user-defined tolerance.
1D Gradient Descent Pseudocode
Initialize: x = random initial value or user choice
eta = user defined learning rate
max_iterations = user defined maximum number of iterations
tol = user defined tolerance
For i from 1 to max_iterations:
gradient = derivative of f at f'(x)
x_new = x - learning_rate * gradient
If |gradient| < tolerance:
Break
x = x_new
Return x as the approximate minimum
Below is a gradient descent code to minimize the function \(x^4 - 6(x+1)^2 -3(x-1)^2\).
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt
from scipy.integrate import trapezoid
from matplotlib import animation
from IPython.display import HTML
f = lambda x : x**4 - 6*(x+ 1)**2 - 3*(x - 1)**2
df = lambda x : 4*x**3 - 12*(x+1) -6*(x-1)
x = np.linspace(-3.5, 3.5, 100)
ax=plt.subplot()
ax.set_xlabel(r"$x$", size=20)
ax.set_ylabel(r"$f(x)$", size=20)
ax.plot(x, f(x))
ax.grid()
x_0=3.0 # inital value
eta=.01 # learning rate
tol = 1e-5 # tolerance for convergence criteria
n_iter =200 # maximum number of iterations
x_n=x_0
for i in range(n_iter):
dx = df(x_n)
if abs(dx) <= tol:
print("converged at i=", i)
break
x_n = x_n - eta*dx
x_n
converged at i= 25
2.2716334764347486
The plot below marks the starting and ending point, red dot, of the gradient descent results. The vertical line marks the final \(x_n\) value.
ax=plt.subplot()
ax.set_xlabel(r"$x$", size=20)
ax.set_ylabel(r"$f(x)$", size=20)
ax.plot(x, f(x))
ax.scatter([x_0, x_n], [f(x_0), f(x_n)], color="red", label=r"initial and final values of $x_n$")
ax.axvline(x_n, color="black")
ax.grid()
ax.legend(loc=3)
<matplotlib.legend.Legend at 0x7f60285fafd0>
question Is this example function convex or not convex?
exercise Run the code with the initial x_0 value = -3, -0.5 and -0.1. Explain the result.