Lagrangian Mechanics and the Mass/Energy Relation

A while ago I wrote about where Einstein’s famous equation comes from. With the benefit of some advanced math and the Lagrangian formulation of physics, it’s actually even easier to find this equation. Though in this case easy means you have to learn Lagrangian mechanics first, so take that with a grain of salt.

Much of this discussion comes from Lagrangian Mechanics for the Non-Physicist, which I found to be excellent and highly recommend.

Pre-requesites

Let’s do a brief overview of Lagrangian mechanics and Noether’s theorem before we get to Einstein’s famous equation.

Lagrangian Formulation of Physics

The Lagrangian formulation of physics is what the aliens use in Story of Your Life (and the movie Arrival based on it). Instead of Newtonian calculations that predict what a particle will do at any given time, Lagrangian calculations optimize the entire trajectory of a particle at once based on the physics of the problem.

In order to optimize the trajectory, you need a quantity that summarizes that trajectory. That quantity is called the Action (S), and it’s generally the integral of some function over the entire trajectory. We call the function inside the integral the Lagrangian. The action has units of Joule-seconds, so the Lagrangian has units of Joules (energy).

S = \int_{t_0}^{t_f} L dt

What’s the Lagrangian that you put inside the Action integral? That depends on what formulation of physics you’re using. For classical mechanics like what Newton was solving, the Lagrangian is just kinetic energy minus potential energy, and represents a balancing of a tendency of move (kinetic) and a tendency to stay put (potential). Other theories of physics, like quantum mechanics or relativistic theories, use different Lagrangians (as we’ll see later).

The Lagrangian (in non-field theories like mechanics) is a function of coordinates, the derivatives of those coordinates, and time. That means if we’re using coordinates x, y, and z then the Lagrangian would be a function L(x,y,z,\dot{x},\dot{y},\dot{z},t).

Once you have the Action and the Lagrangian, you can do some complicated math to figure out what the optimal trajectory is. This math turns out to be the same for a lot of different theories. Instead of re-deriving it each time they need it, physicists derived it once and gave it the fancy name of the Euler-Lagrange equation. This equation is used to find the trajectory that optimizes the Action for a specified Lagrangian.

\frac{d}{dt}\frac{\partial L}{\partial \dot{q}_i} - \frac{\partial L}{\partial q_i} = 0

There’s a different Euler-Lagrange equation for each coordinate (q_i), so in a problem with coordinates x, y, and z we would have q_1=x, q_2=y, and q_3=z. By solving the Euler-Lagrange equation for each coordinate, you can get the equations of motion as functions of time for a particle with the given Lagrangian.

Noether’s Theorem

The Lagrangian isn’t just useful for finding equations of motion. It also serves as an input to other mathematical machinery. One of the most useful mathematical machines in modern physics is Noether’s theorem.

There’s a lot of wonderful details to Noether’s theorem, but for our purposes we’ll skip to the end and present it in a form most useful for us.

Any invariance of a system implies a conserved quantity. An invariance of the form \delta L = \frac{dF}{dt} has a conserved quantity of the form Q = \sum_i p_i \delta q_i - F. In this case, \delta q_i is an infinitesimal change to coordinate q_i, which may be something like a spatial dimension x or y.

One way to use Noether’s theorem is to find the invariance equation (by guessing or via experiment), and then derive the conserved quantity from it. That’s what we’ll do for relativity.

The Relativistic Lagrangian

In order to use Noether’s theorem with Einstein’s relativity, we need the relativistic Lagrangian itself. Since the action is just the integral of the Lagrangian, we’ll start with the finding the action. In Special Relativity, this is the length of a particle’s “world line”.

Why is the action in special relativity the length of a particle’s world line? The world-line is invariant. All observers, no matter how they’re moving, will measure the same world-line length. We also find that maximizing a particle’s world line length gives results that match experiment. Since we maximize world-line length, we will include a negative sign to the equation so that we end up minimizing the action with the Euler-Lagrange equation.

The length of a line in Euclidean space can be found by summing individual segments of it: ds^2 = dx^2 + dy^2 + dz^2. When we move to a relativistic world, we need to get a bit more complicated.

Relativistic physics is done in a Minkowski space, not a Euclidean space. We need to account for time and the speed of light. And time and distance don’t have the same sign. The equivalent formulation for a relativistic world line length is ds^2 = c^2dt^2 - dx^2 - dy^2 - dz^2.

Note that the little segment of time is multiplied by the speed of light (so the units come out right). Each segment of distance is negative, rather than positive. These are postulates Einstein made, and are implied pretty strongly by the outcome of the Michaelson-Morley experiment.

If that’s the length of an infinitesimal length of a world line, the total world line is the integral over the curve S. This gives us

S_{temp} = \int\sqrt{c^2dt^2 - dx^2 - dy^2 - dz^2}

Unfortunately, this doesn’t have the right units. We learned above that the action should have units of Joule-seconds (same as \hbar). Since a Joule is kg m^2/s^2, a Joule-second is kg m^2/s. Our S above has units of meters, so we’re going to have to do something to fix this.

If we’re talking about the world line length of a single particle, there’s only one place we can get a unit of kg: the mass of the particle itself.

The most natural place to get a unit of m/s is to use a speed: the speed of light.

To fit the textbooks, we’ll throw a minus sign in there too. That let’s us define the action of a relativistic particle to be:

S = -mc\int\sqrt{c^2dt^2 - dx^2 - dy^2 - dz^2}

For now let’s constrain our problem to just be in one dimension (x) and use S = -mc \int\sqrt{c^2dt^2-dx^2}.

We can find the Lagrangian from an action just by looking inside the integral. We saw above that S=\int Ldt. We’ll need to rearrange our action a bit to make it a total integral over time instead of over each individual coordinate. Let’s try just pulling a cdt out of the whole thing.

S = -mc \int\sqrt{c^2dt^2-dx^2}=\int -mc\sqrt{1-\frac{dx^2}{c^2dt^2}}cdt=\int -mc^2 \sqrt{1-\frac{\dot{x}}{c^2}}dt

This finally gives us our Lagrangian, L=-mc^2 \sqrt{1-\frac{\dot{x}}{c^2}}.

Whence E=mc^2

Once we have all of that, we can look at a relativistic particle. We say that this particle is invariant under time translations.

What does it mean to be invariant to time translations?

It means that if we do the same experiment at 3pm or at 7pm, we’ll get the same answer. The experiment has been “translated” (moved) in time, but it hasn’t varied in outcome.

A time translation means that we shift t, so if we have a little bit of time then it also gets shifted t \rightarrow t + \delta t.

Now we just need to find the quantity \delta L, which will give us the F that is used to calculate our invariance.

\delta L = \frac{\partial L}{\partial x}\delta x + \frac{\partial L}{\partial \dot{x}}\delta \dot{x} + \frac{\partial L}{\partial t}\delta t

This is just the chain rule applied to L(x, \dot{x}, t). Since L is independence of x and t the first and last terms are both 0.

To find the middle term, we need to know what \delta \dot{x} is. That little shift in velocity may be a function of time (\dot{x}(t)), so we can calculate \dot{x}(t + \delta t) = \dot{x} + \ddot{x} \delta t. That means the change in our velocity due to the shift in time is \ddot{x}\delta t, so that’s gives us the following:

\delta L = \frac{\partial L}{\partial \dot{x}}\ddot{x}\delta t

We know we need to formulate this into something that looks like \frac{dF}{dt}, so we need to get rid of the partial derivatives. To do that, we’ll try finding a derivative of L (since that’s about the only thing in our equation we have to go on):

\frac{dL}{dt} = \frac{\partial L}{\partial x} \dot{x} + \frac{\partial L}{\partial \dot{x}} \ddot{x} + \frac{\partial L}{\partial t}

But remember the first and third terms are 0 here as well. Then we can combine these two equations to get:

\delta L = \frac{dL}{dt}\delta t

We now know F=L\delta t, so we can finally use Noether’s theorem. In this case, our only coordinate is x, so we have:

Q = p_x \delta x - L\delta t

We’ll find \delta x like we did \delta \dot{x}, x(t + \delta t) = x + \dot{x}\delta t, which means \delta x = \dot{x}\delta t. We also know that p_x = \frac{\partial L}{\partial \dot{x}}. We then have:

Q = \frac{\partial L}{\partial \dot{x}} \delta x - L\delta t = \frac{m\dot{x}}{\sqrt{1-\frac{\dot{x}^2}{c^2}}}\dot{x}\delta t + mc^2 \sqrt{1-\frac{\dot{x}}{c^2}}\delta t = \frac{mc^2}{\sqrt{1-\frac{\dot{x}^2}{c^2}}}\delta t

We normally call the term 1/\sqrt{1-\frac{\dot{x}^2}{c^2}} the Lorentz factor \gamma.

Einstein defined this conserved value to be the energy Q=E, so we now have

Q = E = \gamma mc^2

And if the velocity of the particle is small, then \gamma is just about 1 and we have

E=mc^2

Going Further

We squinted enough and said that \gamma at low speeds was 1, but technically speaking you would usually want to do a Taylor series expansion on \gamma and then keep the first few terms. That would let us find the Newtonian equation for kinetic energy as well: E = mc^2 + m\dot{x}^2/2.