Why did Einstein think mass was energy?

A kid eating toast while walking on train tracks, contemplating the true nature of physics

The equivalency of mass and energy turns out to follow naturally from Einstein’s special relativity. The book Quantum Field Theory As Simply As Possible has a pretty good description of this. Here’s my take on it.

In order to understand the mass/energy equivalency, we first need to know what relativity really means. Then we learn what Einstein’s special version of relativity meant, and how it immediately implies that distances and time can change with reference-frame velocities. That in turn leads to the question of how energy and momentum change with reference-frame velocities, which implies the mass/energy equivalence.

Galilean Relativity

Imagine two ice hockey players: one goalie sitting in place and one winger skating towards the goalie at constant velocity V_w (as measured by the goalie). Both players see a puck between them, gliding along with some velocity V_p (again measured by the goalie).

Since the goalie knows V_w and V_p, the goalie can predict exactly what velocity the puck would be moving at if measured by the winger. It would be (V_p - V_w). This is called Galilean relativity, and it’s been known for hundreds of years. The speed is relative to the observer’s reference frame.

With Galilean relativity, time is measured the same way for each player. So are distances. It’s only speeds that could vary.

Since spatial distances are measured to be the same by each observer, we can say that the Pythagorean relationship holds for space. A triangle drawn on the ice will have h^2 = x^2 + y^2 + z^2. Of course since it’s drawn on the ice the z length would be 0, but you can just imagine a 3D right pyramid on the ice, like maybe a new fangled puck shape.

A constant speed of light

Maxwell came along a couple hundred years after Galileo and put together a lot of hints from Faraday and others. His equations (or our current form of them, which he probably wouldn’t recognize) are:

\nabla \cdot E = \frac{\rho}{\epsilon_0}
\nabla \cdot B = 0
\nabla \times E = - \frac{\partial B}{\partial t}
\nabla \times B = \mu_0 J + \mu_0 \epsilon_0 \frac{\partial E}{\partial t}

These equations describe the electric field (E) and the magnetic field (B). The third one shows how a changing magnetic field will cause an electric field. The fourth shows how a changing electric field will cause a magnetic field.

Since neither E nor B are dispersive (they don’t lose energy in a vacuum), that means they’ll propagate through space repeatedly causing each other into the distance. This is what a light wave is.

This Quora question has the best answer I’ve seen to deriving the speed of light in vacuum from these equations, but you have to know a bit of vector calculus. The gist is that, in a vacuum \rho and J are both 0. That simplifies the above 4 equations and lets you find a standard form wave equation for the electric field, from which you can just read off the wave speed.

QFT As Simply As Possible makes a big deal about the speed of light derived from these equations being independent of any observer’s speed. This seems to have been the first hint that the speed of light was constant in the universe. The speed being independent of observers only matters if observer speed doesn’t change relative to the wave’s medium (the equivalent of water for ocean waves). Sound has a speed in air, but we can go faster than sound because we move relative to the air.

Most physicists of the day thought light waves were carried on luminiferous ether in just the same way that sound waves are carried on air. If that were true, the ether may move relative to us observers. A moving ether would mean that light’s speed was not constant. Michaelson and Morley got together in Ohio and showed that light didn’t travel faster in the direction of motion of the Earth than it did perpendicular to it, implying that the ether didn’t move relative to the Earth. This was another huge clue that light had a constant velocity regardless of the observer’s velocity.

Special Relativity

Einstein came along 10 to 20 years after the Michaelson-Morley experiment and took the idea of a constant speed of light seriously. If you accept (or simply assume for the sake of argument) that light has a constant speed for all observers, then that does away with temporal synchrony.

Light beams, trains, and simultaneity

Here’s a little thought experiment about temporal synchrony.

My two twins are sleeping in beds separate by a distance D. I’m located at a point right between the beds, D/2 from each. As soon as the kids wake up, they shoot a laser at me to let me know I should start their breakfast toast. Only if they both need toast at the same time, meaning both their lasers arrive at my location at the same instant, will I actually stop reading physics books and head to the kitchen.

My wife is on a train moving relative to our kids at speed v. If you’re wondering why there’s a train in our house, you don’t understand children. Or physics nerds.

At some point, both lasers arrive at my location at the same time and I go make toast. Both kids awoke at the same instant.

But what did my wife see? Both kids are gliding by her at speed v (since from her perspective she might as well be stationary and me and the kids are moving). Assume my wife’s train is moving from kid H to kid F (with me in the middle moving along at the same speed as both kids). Then she sees me moving towards the light from H, and away from the light from F. Remember that both light beams are travelling the same speed (by assumption in the Einsteinian relative universe). This means that, from my wife’s perspective, the photons from H travel a smaller distance than the photons from F. In other words, for the lights to arrive at the same time H must have awoken first.

The idea of two things happening at the same time, simultaneity, depends on the observer’s relative motion!

Lorentz Transforms

Lorentz transforms are the equations that tell us how positions and times in one reference frame look to someone in another reference frame. Assuming that the second frame is moving only in the x-axis (to keep things simple), the transformations are:

t' = \gamma (t + \frac{vx}{c^2})
x' = \gamma (x + vt)
y' = y
z' = z

In this case, the variable \gamma is called the Lorentz factor, and it’s given by \gamma = (\sqrt{1-\frac{v^2}{c^2}})^{-1}. The Lorentz factor influences how much we have to care about special relativity. When the speed v is small, the Lorentz factor is close to 1 and we can assume Galilean relativity to make things easy on ourselves.

One of the most important implications of special relativity is that time is not fixed. That means we have to adapt our Pythagorean equality from the Galilean case. In special relativity, there’s still a triangle equality, but it incorporates the time dimension! A triangle drawn in space will have h^2 = x^2 + y^2 + z^2 - c^2t^2.

Yes, everyone finds it weird that time is treated like negative space.

Momentum in special relativity

The Lorentz transforms let an observer calculate lengths and times in another reference frame, assuming they know how it moves relative to them.

Once you can do that, you might ask if there are similar transforms for energy and momentum. Remember that the kinetic energy of a moving object is KE = \frac{1}{2} mv^2. The momentum is just p=mv. If we know these values in one reference frame, can we calculate them in another?

The answer is yes, and the Lorentz transforms for momentum and energy are very similar to those for time and space.

E' = \gamma (E + vp)
p' = \gamma (p + \frac{vE}{c^2})

Let’s go back to our two hockey players on the ice. The goalie measures the velocity of the puck, its energy, and its momentum. We know that the winger sees a puck velocity of v. They should see energy and momentum that corresponds with that velocity.

Imagine now that the winger is skating along at the same velocity as the puck. Then Galileo would say that the winger measures a kinetic energy of KE = 0 and momentum of p = 0.

Let’s assume the winger is skating at a speed that’s small relative to c, so \gamma \approx 1. If the winger uses the Lorentz transforms to see what the goalie measures for energy and momentum, the winger calculates:

Energy measured by goalie (as incorrectly predicted by winger):
E' = \gamma (E + vp) \approx 1*(0+0v) = 0
Momentum measured by goalie (as *incorrectly* predicted by winger):
p' = \gamma (p + \frac{Ev}{c^2}) \approx 1*(0+\frac{0v}{c^2}) = 0

But the goalie actually sees the puck moving at velocity v and measures:

Energy actually measured by goalie = \frac{1}{2}mv^2 \neq 0
Momentum actually measured by goalie = mv \neq 0

This is no good. We want our transforms to all work nicely in every case. The way we fix this is by positing a rest energy. The puck at rest needs to have energy that transforms to what the goalie measures, so we can just back it out from the Lorentz transform and what we know the goalie sees.

Assume there exists a rest energy, E = mc^2

Then the winger makes some different predictions given his own measured rest mass.

Energy measured by goalie (as predicted by winger):
E' = \gamma (E + vp) \approx 1*(mc^2+v*0) = mc^2
Momentum measured by goalie (as predicted by winger):
p' = \gamma (p + \frac{Ev}{c^2}) \approx 1*(0+\frac{mc^2v}{c^2}) = mv

Note that we approximated \gamma as 1, and since the additional energy \frac{1}{2}mv^2 is so small relative to the rest energy, it washes out in the energy conversion.

So we see that the idea of a rest energy is needed in order for reference frame transformations in special relativity to correctly predict the energy and momentum that would be measured in other frames.