Calculus of Variations Demystified

How To Solve The Shortest Path Problem

Fundamentals
Published

October 18, 2020

Calculus of Variations Demystified: How To Solve The Shortest Path Problem

1 How to Derive the Euler-Lagrange Equation

We use the calculus of variations to optimize functionals.

You read it right: functionals, not functions.

But what are functionals? What does a functional look like?

Moreover, there is a thing called the Euler-Lagrange equation.

What is it? How is it useful?

How do we derive such an equation?

You are in the right place if you have any of the above questions.

I’ll demystify it for you.

2 The Shortest Path Problem

Suppose we want to find the shortest path from point A to point B.

If there is no obstacle, the answer should be a straight line from A to B.

But for now, let’s forget that we know the answer.

Instead, let’s frame the problem into an optimization problem and mathematically show that the straight line is the answer.

Doing so lets us understand what functionals and variations are and how to derive the Euler-Lagrange equation to solve the optimization problem.


Before finding the shortest path, we need to be able to calculate the length of a path from A to B.

As the famous Chinese proverb says, “A journey of a thousand miles begins with a single step”.

So, let’s calculate the length of a small passage and integrate it from A to B.

As shown in the above diagram, a small passage can be approximated by a straight line.

dS=dx2+dy2

So, the total length of a path can be calculated by the following integration:

S=ABdS=ABdx2+dy2

Since A and B are fixed points, let’s define them as A=(x1,y1) and B=(x2,y2).

S=ABdx2+dy2=(x1,y1)(x2,y2)dx2+dy2=x1x21+(dydx)2dx

In the last step, y1 and y2 are dropped as they are determined by x1 and x2.

So, we are integrating the square root from x1 until x2.

Now we know how to calculate the length of a path from A to B.

We will find the y=f(x) that minimizes the path length S.

3 What is a functional?

We know that the y=f(x) is a straight line between A and B, but let’s still forget that for now.

Instead, we consider many potential solution lines from A to B.

The blue line is y=blue_line(x). The red line is y=red_line(x). And so on.

But having a function name for each line is too tedious, so we call them collectively y=y(x). We think of y=y(x) as any line between A and B, which could be the blue or red lines (or else). We can calculate the length of any line (given a particular y(x)) using the formula S.

In other words, S takes an instance of y(x) and returns the length of the line. So, S maps a function y(x) to a value. We call S a functional, which is a function of functions. A functional is written with square brackets like S[y], which means the functional S takes a function y and returns a value.

If we talk about functions of functions like this general, there could be too many kinds of functionals, many of which may not be that useful. Since functional optimization originates from Physics, which deals with positions and velocities, the following general form is very common, and they have many applications (not only in Physics but also in machine learning).

J[y]=x1x2L(x,y(x),y(x))dx

J[y] is a functional that takes a function y.

  • J integrates L from x1 until x2.
  • L is a function of x, y, and y.
  • y is the derivative of y in terms of x. y=dydx

The above may look too abstract. Let’s apply it to our shortest path problem. For the shortest path problem, L is defined as follows:

L(x,y(x),y(x))=1+(dydx)2

We integrate L between x1 and x2 to get the path length.

S[y]=x1x2L(x,y,y)dx=x1x21+(dydx)2dx

In the shortest path problem, L does not include x and y but only y.

That is ok.

The point is that if we can solve the optimization problem for the general form of functionals, we can solve many problems, including the shortest path problem.

4 Functional variations

Before attempting to solve the general form of functionals, let’s think about how to solve the shortest path problem. How about we generate many random functions and evaluate the path length to find the shortest path?

We can map each path to a path length value using the functional S. The question is how to know whether we find the shortest path.

We can take an alternative approach since we know the shortest path exists. We start with the shortest path and slightly modify it to see what kind of condition is required for the shortest path.

Let’s call the shortest path function f(x) that minimizes S. Here, we write f(x) instead of y(x). The function f(x) is the solution function to our problem, and y(x) is a candidate function that may or may not solve our problem, like the blue and red lines. When we give y(x) or f(x) to S, we write like S[y] and S[f] respectively.

  • S[y] gives the path length following y(x) from A to B.
  • S[f] gives the path length following f(x) from A to B which is the shortest path length.

If we add an arbitrary function η(x) (eta of x) to f(x), the following relationship is asserted:

S[f]S[f+ϵη]

The term ϵη is called a variation of the function f, which is denoted as follows:

δf=ϵη

ϵ (epsilon) is a small number, so the variation is also small. We analyze how the functional changes when we move ϵ towards zero.

Let’s apply a variation to the shortest path to make it more concrete. In the below diagram,

  • The blue straight line is the shortest path y=f(x).
  • The purple line is an example of a variation ϵη added to f.

y=f(x)+ϵη(x)

S[f+ϵη] gives the length of the path given by the function f+ϵη. Since S[f] is the shortest path length, the relationship S[f]S[f+ϵη] is true for any variation. When ϵ goes to zero, the purple line becomes the same as the blue line as the variation vanishes.

Also, note that η(x) must be zero at the point A and B since any variation must include the point A and B. (Remember we are looking for the shortest line from A to B). So, η(x1)=0 and η(x2)=0.

Why do we think about the variation?

The reason is that the variation makes the functional optimization into a function optimization, which we know how to solve.

If we look at S[f+ϵη] very closely, we can see that it depends only on ϵ.

  • f(x) is the function that gives the minimum path line, which is fixed.
  • η(x) is an arbitrary function that is fixed for a particular variation.

Therefore, the only variable in S[f+ϵη] is ϵ. In other words, S[f+ϵη] is a function of ϵ. So, let’s write it as S(ϵ).

S[f+δf]=S[f+ϵη]=S(ϵ)

S(ϵ) is the function of ϵ that returns the path length for the function y that is the solution function f plus a variation ϵη. We just need to solve the minimization problem for the function S(ϵ). Conveniently, we also know that S becomes the minimum when ϵ goes to zero.

So, the derivative of S with ϵ should be zero when ϵ=0.

S(0)=dSdϵ|ϵ=0=0

Here, the apostrophe () is for the derivative of S by ϵ.

In case it’s not clear, an apostrophe is used when we calculate the derivative of a function by the only independent variable. Since ϵ is the only independent variable of S in this context, the apostrophe here means the derivative of S by ϵ.

Let’s go back to the general form to reiterate all the points we’ve discussed so far and derive the Euler-Lagrange equation to solve the optimization problem.

5 The Euler-Lagrange equation

The general form of functionals we are dealing with is as follows:

J[y]=x1x2L(x,y(x),y(x))dx

  • We say the function f minimizes J. As such, J[f] is the minimum value of the functional J.
  • We define y=f+ϵη to mean we add a variation to the solution function.
  • Any y should satisfy the boundary conditions:
    • η(x1)=0
    • η(x2)=0

Since J[f] is the minimum (constant) and η is an arbitrary function of x which is fixed for a particular variation, we can define J[y] as a function of ϵ:

J[y]=J[f+δf]=J[f+ϵη]=Φ(ϵ)

We’ve successfully converted the general form of functionals into a function of ϵ. This function of ϵ is at minimum when ϵ=0. In other words, the first derivative of it becomes zero at ϵ=0.

Φ(0)=dΦdϵ|ϵ=0=0

So far, we reinforced the same idea from the shortest path problem for the general form of functionals. Now, we are going to solve it for the general form of functionals to derive the Euler-Lagrange equation.

By substituting J[y] into the above equation:

Φ(0)=dΦdϵ|ϵ=0=dJ[y]dϵ|ϵ=0=x1x2dLdϵ|ϵ=0dx=0

Let’s calculate the total derivative of L(x,y,y) by ϵ.

dLdϵ=Lxdxdϵ+Lydydϵ+Lydydϵ

The first term is zero since x has no dependency on ϵ.

dLdϵ=Lydydϵ+Lydydϵ

  • y=f+ϵη and y=f+ϵη.
  • f, f, η, and η have no dependency on ϵ.

So, the total derivative of L by ϵ is as follows:

dLdϵ=Lydydϵ+Lydydϵ=Lyη+Lyη

When ϵ=0, y=f and y=f:

dLdϵ|ϵ=0=Lfη+Lfη

Substituting this into the equation Φ(0)=0:

Φ(0)=x1x2dLdϵ|ϵ=0dx=x1x2(Lfη+Lfη)dx=x1x2Lfηdx+x1x2Lfηdx

In the second term, we have η. As η is an arbitrary function, we cannot calculate η. Let’s eliminate η by applying integration by parts on the second term.

x1x2Lfηdx=Lfη|x1x2x1x2(ddxLf)ηdx

The first term is zero due to the boundary conditions: η(x1)=0 and η(x2)=0.

x1x2Lfηdx=x1x2(ddxLf)ηdx

Therefore,

Φ(0)=x1x2Lfηdxx1x2(ddxLf)ηdx=x1x2(LfddxLf)ηdx

As η(x) is an arbitrary function, the term in the brackets must be zero to satisfy the equation Φ(0)=0.

LfddxLf=0

This is called the Euler-Lagrange equation, which is the condition that must be satisfied for the optimal function f of the general form of functionals.

6 Solving the shortest path problem

Let’s solve the shortest path problem using the Euler-Lagrange equation.

L(x,y(x),y(x))=1+(dydx)2

As mentioned before, y is the derivative of y in terms of x.

y=dydx

Now, we say y=f(x) is the function that minimizes the path length from A to B.

So, we think of f instead of y in L.

L(x,f(x),f(x))=1+(dfdx)2=1+f2

This f must satisfy the Euler-Lagrange equation.

Let’s solve the Euler-Lagrange equation for the shortest path problem.

LfddxLf=0

Since f does not appear in L, the first term is zero.

Lf=0

As such, the second term is also zero.

ddxLf=ddx1+f2f=ddxf1+f2=0

We integrate the above by x:

f1+f2=C

C is a constant value. This means f is also a constant value. It can be shown by taking the square of both sides and re-arrange the equation:

f21+f2=C2f2=C21C2

Let f=a and we get f(x)=ax+b which is a straight line.

Voilà!

If we put the boundary conditions f(x1)=y1 and f(x2)=y2, we get the values for a and b.

y=y2y1x2x1(xx1)+y1

7 Last but not least

The essential idea of the calculus of variations is to make a functional into a function of ϵ by adding the variation ϵη to the optimal function f so that the problem of functional optimization becomes a function optimization problem.

We derived the Euler-Lagrange equation for the general form of functionals, which must be satisfied for solution functions. Mathematically speaking, the Euler-Lagrange equation is not a sufficient condition for the functional minimum or maximum. It just tells us the solution function makes the functional stationary.

However, in many problems, we know the solution would give the functional minimum or maximum. In the shortest path problem, the maximum is unlimited, so we know the solution to the Euler-Lagrange equation gives the minimum function.

In case we want to check if the solution function is for the maximum or minimum, we need to calculate the second derivative of Φ(ϵ) by ϵ for ϵ=0 where y=f+ϵη.

Let’s do this for the shortest path problem:

S(ϵ)=x1x21+y2dx where y=f+ϵη

The first derivative of S(ϵ) with ϵ:

dS(ϵ)dϵ=x1x2ddϵ1+y2dx=x1x2ηy1+y2dx

The second derivative of S(ϵ) with ϵ:

d2S(ϵ)dϵ2=x1x2(η21+y2η2y2(1+y2)32)dx=x1x2η21+y2(1y21+y2)dx=x1x2η21+y2(11+y2)dx

This is a positive value, so we know the Euler-Lagrange equation for the shortest path problem gives the minimum function. In theory, we should check the value of the second derivative when ϵ=0 because we are talking about the local minimum/maximum around the range expressed by ϵ.

When ϵ=0, y=f:

d2S(ϵ)dϵ2|ϵ=0=x1x2η21+f2(11+f2)dx>0

The solution function (the straight line) given by the Euler-Lagrange equation for the shortest path problem gives the minimum of the functional S.

Lastly, I assumed all the derivatives are possible in this article. For example, y(x) and η(x) must be differentiable by x.

I hope you have a clearer idea about the calculus of variations now.

8 References