When I first started learning math I focused a lot on formal logic and proofs. I had a lot of fun deriving things using induction, proof by contradiction, and simple direct proofs. It’s been a long time since I’ve done much of that, but I find myself thinking a lot about methods of proving things as I learn more about signal processing and statistics.
I spent some time recently studying hypothesis testing and signal detection theory for a classification problem I’m working on at school. What really surprised me about the two things was how similar they were to proof by contradiction. The main ideas in hypothesis testing is
- figuring what you want to show (called H1), and
- showing that the opposite of that (called H0) is unlikely
This is where the infamous p-value comes from. If you want to show that eating spinach gives people Popeye arms, you start by assuming that it doesn’t. This is called the null hypothesis and is denoted by H0. After you do a lot of measurements on people who have eaten spinach, you figure out how likely are their huge Popeye arms under the assumption that nothing at all has happened. That probability is called your p-value, and if it’s very low then you’ve got your “proof by contradiction”. A low probability that their Popeye arms are due to the null hypothesis indicates that there’s a high probability that something interesting is going on with those cans of spinach.
And because you’re doing statistics, it doesn’t actually prove anything. All it shows is that it’s more likely that spinach has an effect than that it doesn’t. It’s kind of a subtle point, and has led to a lot of mistaken or misleading scientific papers over the past few decades. That’s one of the reasons that a lot of people are calling for different methods of testing hypotheses (such as Bayesian methods [pdf]).
To my mind, Bayesian methods correspond more to a direct proof. That may make it easier to understand and get right, but it doesn’t mean that hypothesis testing’s p-values are useless. There’s room in science for all kinds of methods, just like so many proof methods can be useful. The key is to know your tools and understand their limitations.
And right off the bat we can see one of the main limitations with hypothesis testing using p-values. Since you’re doing something akin to “proof by contradiction”, you can’t compare different options very easily. You can say things like “Popeye arms are likely to be cause by eating spinach with p-value .02” or “Popeye arms are likely to be caused by excessive mastubation with p-value .03”, but you can’t compare those two hypotheses. One may be more likely to be true than the other, but you can’t easily tell just using p-values. Since you’re only comparing individual hypotheses to the null hypothesis, you don’t know how the hypotheses relate to each other.
That said, hypothesis testing and p-values can be a strong technique when used on the right problem; just like proof by contradiction.