A mediocre history of Bayes’ rule

The Theory that Would Not Die starts out strong. This history of Bayesian thought by Sharon Bertsch McGrayne was recommended to me after I finished reading a biography of Claude Shannon, and I was pretty excited to read about how Bayesian thought developed before and after Shannon.

The first few chapters were a great introduction to the Reverend Thomas Bayes, to Pierre-Simon Laplace, and to some of the controversies of the early 1800s. Going into this book, I thought that I understood the origin of Bayes’ rule, and just had to learn about how it became popular now. My preconception was exactly backwards.

The myth of Bayes rule is that Bayes himself created it to address a question from David Hume about the existence of god. Hume claimed that our experience of the world says there can’t be miracles. Since we’ve never seen a miracle, if someone reports one we should always believe they’re mistaken (or lying). Bayes supposedly created his formula to prescribe exactly how much hearing about a miracle should increase our belief in god.

It turns out that there’s very little evidence that Bayes was thinking about that when he created his rule. He wrote a single paper about his rule, in the form of a probabilistic thought experiment involving tossing balls onto a table. That paper was published after Bayes died, along with some religious interpretations of it written by Bayes’ friend who found that paper in Bayes’ effects.

At this point, everyone forgets about it. Pierre-Simon Laplace, facing some very complicated data analysis problems in astronomy and biology, re-invents something very similar to Bayes rule a few decades later. It was Laplace, and not Bayes, who really popularized the idea of “the probability of causes” for the first time. He used it extensively for many of the problems that he faced, and only learned about Bayes’ paper by chance. Apparently Bayes’ prior probabilities (always equal odds) were new to Laplace, and Laplace then incorporated that idea into Laplace’s formulation.

After Laplace died, few people used his methods. There seems to have been some form of smear campaign against Laplace, with people actively avoiding methods he’d created. It wasn’t until the world wars that people started using the probability of causes again.

From WWI up to the present day, the military seems to have been a great user of Bayes’ rule. While statisticians and other academics were debating the merits of Bayes’ equal prior and finding it groundless, the militaries of the world were using Bayesian updating in everything from aiming guns to cracking codes. The military used it because it worked, the academics rejected it because they couldn’t see how it could make sense.

There were some notable academics who embraced Bayes’ rule around this time, especially Turing and Shannon. The book gives a pretty good overview of what Turing accomplished. I was left even more impressed with Turing after this, and even more upset at his treatment by Britain after the war. The book unfortunately didn’t really go into detail on Shannon’s use of Bayes’ rule.

During WWII, breaking German and Japanese codes was crucial to the war effort. The British didn’t really understand that cryptography had advanced since the 1800s, and had to be given the solution to early Enigma cyphers. Polish mathematicians had managed to crack it before the war began. Britain then hired Turing to expand on the Poles’ work, and he basically created modern computing in order to do it. A large part of his work was generating likely priors for different messages. Then using Bayesian updating to determine the rest of the message. After the war, all of his work was made confidential and Turing sworn to secrecy. He was later hounded into suicide by the British government.

After this period, a lot of arguing happened. That’s my best summary of the rest of the book. Many of the Post-WWII chapters were just chronicles of the arguments between people I’d never heard of before.

Each chapter was nominally about some use of Bayes’ rule in history. For the period after WWII, these chapters were arranged moderately chronologically so you could see how Bayes’ rule was rejected and embraced in various times and places. That overall structuring makes sense, so it’s unfortunate that the book didn’t work at all.

The book really played up the personalities of the people involved, and ignored the actual math to a large extent. McGrayne avoided in depth discussions of the math. There was maybe a handful of equations in the entire book. For a history of math, that’s pretty unhelpful. I was hoping to understand how the theory was developed, how new pieces that made Bayes’ workable in the modern world actually worked. Instead I got to read pages and pages about how various people were all jerks to each other.

The emphasis on what Bayes’ rule was used on, and the people who used it, also caused the book to feel very chaotic. The chapters were nominally in historical order, but the later chapters especially jumped around a lot in the careers of various people. I ended up reading about someone in one chapter for three pages, forgetting about him for 50 pages, and then seeing him again in another chapter with the assumption that I would still remember why he was important. I went into this book looking for a high level overview of a theory’s development, and instead of got the knitty-gritty back and forth between dozens of people over the course of decades. For me, this level of detail obscured the higher level points I was reading for.

I would have loved to see more discussion of the math. More simplified examples of what the people were actually working on, and how they were trying to do it. More equations in my math book, in other words.

I was also pretty surprised by what the book focused on in later chapters. It discussed Kalman filters only tangentially, in spite of the fact that they are an incredibly useful and common application of Bayes’ rule. I was waiting for the Kalman filter chapter with baited breath, and it never came. Instead I just got one sentence about how Kalman himself claimed that his filter wasn’t Bayesian at all.

This is such a missed opportunity! The book spends pages and pages talking about how hard it was for Bayesians to get recognized, and the little arguments between Bayesians and frequentists. Then it can’t spend a single page talking about the Kalman filter or why Kalman didn’t think it was Bayesian? The Kalman filter led the development of a huge family of Bayesian filters and models that are used in all of aviation today. Instead we get an entire chapter about some rando who got sent to Europe to look for a submarine. That’s cool and all, but what about the entire space program?

The lack of discussion on Kalman filters, throwing mathematical terms around without helping the reader understand them, the lengthy digressions into tiny spats between statisticians. All of this makes me question how the topics the book focused on were chosen. I’m left wondering whether I actually got a useful history of Bayesian thought, or just the tidbits that this author was particularly interested in.

This was disappointing, as the first few chapters were so good. My recommendation: read through to the end of WWII, then find a different book.