Mathematical Foundations for Deciders

This is based on MIRI’s FDT paper, available here.

You need to decide what to do in a problem, given what you know about the problem. If you have a utility function (which you should), this is mathematically equivalent to:
argmax_a \mathcal{E}U(a),

where \mathcal{E}U(a) is the expected utility obtained given action a. We assume that there are only finitely many available actions.

That equation basically says that you make a list of all the actions that you can take, then for each action in your list you calculate the amount of utility you expect to get from it. Then you choose the action that had the highest expected value.

So the hard part of this is actually calculating the expected value of the utility function for a given action. This is equivalent to:

\mathcal{E}U(a) = \sum_{j=1}^N P(a \rightharpoonup o_j; x)*U(o_j).

That’s a bit more complicated, so let’s unpack it.

  • The various o_j are the outcomes that could occur if action a is taken. We assume that there are only countably many of them.
  • The x is an observation history, basically everything that we’ve seen about the world so far.
  • The U(.) function is the utility function, so U(o_j) is the utility of outcome j.
  • The P(.) function is just a probability, so P(a\rightharpoonup o_j; x) is the probability that x is the observation history and o_j occurs in the hypothetical scenario that a is the action taken.

This equation is saying that for every possible outcome from taking action a, we calculate the probability that that outcome occurs. We then take that probability and multiply it by the value that the outcome would have. We sum those up for all the different outcomes, and that’s the outcome value we expect for the given action.

So now our decision procedure basically looks like one loop inside another.

max_a = 0;
for action a that we can take:
  utility(a) = 0
  for outcome o that could occur:
    utility(a) += p(a->o; x)*U(o)
  end for
  if (max_a == 0 or (utility(a) > utility(max_a)))
    max_a = a
  end if
end for
do action max_a

There are only two remaining questions about this algorithm:

1. What is P(a \rightharpoonup o; x)
2. What is U(o)

It turns out that we’re going to ignore question 2. Decision theories generally assume that the utility function is given. Often, decision problems will represent things in terms of dollars, which make valuations intuitive for humans and easy for computers. Actually creating a utility function that will match what a human really values is difficult, so we’ll ignore it for now.

Question 1 is where all of the interesting bits of decision theory are. There are multiple types of decision theory, and it turns out that they all differ in how they define P(a \rightharpoonup o; x). In other words, how does action a influence what outcomes happen?

World models and hypothetical results

Decision theories are ways of deciding, not of valuing, what will happen. All decision theories (including causal, evidential, and functional decision theories) use the machinery described in the last section. Where they differ is in how they think the world works. How, exactly, does performing some action a change the probability of a specific outcome.

To make this more concrete, we’re going to create some building blocks that will be used to create the thing we’re actually interested in (P(a \rightharpoonup o_j; x)).

The first building block will be: treat all decision theories as though they have a model of the world that they can use to make predictions. We’ll call that model M. However it’s implemented, it encodes the beliefs that a decider has about the world and how it works.

The second building block extends the first: the decider has some way of interacting with their model to predict what happens if they take an action. What we care about is that in some way we can suppose that an action is taken, and a hypothetical world model is produced from M. We’ll call that hypothetical world model M^{a \rightharpoonup}.

So M is a set of beliefs about the world, and M^{a\rightharpoonup} is a model of what the world would look like if action a were taken. Let’s see how this works on a concrete decision theory.

Evidential Decision Theory

Evidential decision theory is the simplest of the big three, mathematically. According to Eve, who is an evidential decider, M is just a conditional probability P(.|x).

In words, Eve thinks as though the world has only conditional probabilities. She would pay attention only to correlations and statistics. “What is the probability that something occurs, given that I know that x has occured.”

To then construct a hypothetical from this model, Eve would condition on both her observations and a given action: M^{a\rightharpoonup} = P(.|a, x).

This is a nice condition, because it’s pretty simple to calculate. For simple decision problems, once Eve knows what she observes and what action she takes, the result is determined. That is, if she knows a and x, often the probability of a given outcome will be either extremely high or extremely low.

The difficult part of this model is that Eve would have to build up a probability distribution of the world, including Eve herself. We’ll ignore that for now, and just assume that she has a probability distribution that’s accurate.

The probability distribution is going to be multi-dimensional. It will have a dimension for everything that Eve knows about, though for any given problem we can constrain it to only contain relavent dimensions.

To make this concrete, let’s look at Newcomb’s problem (which has no observations x). We’ll represent the distribution graphically by drawing boxes for each different thing that Eve knows about.

  • Predisposition is Eve’s own predisposition for choosing one box or two boxes.
  • Accurate is how accurate Omega is at predicting Eve. In most forms of Newcomb’s problem, Accurate is very close to 1.
  • Prediction is the prediction that Omega makes about whether Eve will take one box or two boxes.
  • Box B is the contents of Box B (either empty or $1 million).
  • Act is what Eve actually decides to do when presented with the problem.
  • V is the value that Eve assigns to what she got (in this case, just the monetary value she walked away with).

Some of these boxes are stochastic in Eve’s model, and some are deterministic. Whenever any box changes in value, the probabilities that Eve assigns for all the other boxes are updated to account for this.

So if Eve wants to know P(one\ box\ \rightharpoonup \ Box\ B\ contains\ \$ 1million;\ x), then Eve will imagine setting Act to “choose one box” and then update her probability distribution for every other node.

The main problem with conditional probabilities as the sole model of the world is that they don’t take into account the way that actions change the world. Since only the statistics of the world matter to Eve, she can’t tell the difference between something being causal and something being correlated. Eve updates probabilities for every box in that picture whenever she imagines doing something different.

That’s actually why she’s willing to pay up to the termite extortionist. Eve can’t tell that whether she pays the extortion has no impact on her house’s termites.

Causal Decision Theory

Causal decision theory is similar to evidential decision theory, but with some added constraints. Carl, who is a causal decider, has a probability distribution to describe the world as well. But he also has an additional set of data that describes causal interactions in the world. In MIRI’s FDT paper, this extra causality data is represented as a graph, and the full model that Carl has about the world looks like (P(.|x), G).

Here, P(.|x) is again a conditional probability distribution. The causality data, G, is represented by a graph showing causation directions.

Carl’s probability distribution is very similar to Eve’s, but we’ll add the extra causality information to it by adding directed arrows. The arrows show what specific things cause what.

Constructing a hypothetical for this model is a bit easier than it was for Eve. Carl just sets the Act node to whatever he thinks about doing, then he updates only those nodes that are downstream from Act. The computations are performed radiating outwards from the Act node.

We represent this mathematically using the do() operator: M^{a\rightharpoonup} = P(.|do(a), x).

When Carl imagines changing Act, he does not update anything in his model about Box B. This is because Box B is not in any way caused by Act (it has no arrows going from Act to Box B).

This is why Carl will always two-box (and thus only get the $1000 from Box A). Carl literally cannot imagine that Omega would do something different if Carl makes one decision or another.

Functional Decision Theory

Fiona, a functional decision theorist, has a model that is similar to Carls. Fiona’s model has arrows that define how she calculates outwards from points that she acts on. However, her arrows don’t represent physical causality. Instead, they represent logical dependence.

Fiona intervenes on her model by setting the value of a logical supposition: that the output of her own decision process is to do some action a.

For Fiona to construct a hypothetical, she imagines that the output of her decision process is some value (maybe take two boxes), and she updates the probabilities based on what different nodes depend on decision process that she is using. We call this form of dependence “subjunctive dependence.”

In this case, Fiona is not doing action a. She is doing the action of deciding to do a. We represent this mathematically using the same do() operator that Carl had: M^{a\rightharpoonup} = P(.|do(FDT(P,G,x)).

It’s important to note that Carl conditions on observations and actions. Fiona only conditions on the output of her decision procedure. It just so happens that her decision procedure is based on observations.

So Fiona will only take one box on Newcomb’s problem, because her model of the world includes subjunctive dependence of what Omega chooses to do on her own decision process. This is true even though her decision happens after Omega’s decision. When she intervenes on the output of her decision process, she then updates her probabilities in her hypothetical based on the flow of subjunctive dependence.

Similarities between EDT, CDT, and FDT

These three different decision theories are all very similar. They will agree with each other in any situation in which all correlations between an action and other nodes are causal. In that case:

1. EDT will update all nodes, but only the causally-correlated ones will change.
2. CDT will update only the causal nodes (as always)
3. FDT will update all subjunctive nodes, but the only subjunctive dependence is causal.

Therefore, all three theories will update the same nodes.

If there are any non-causal correlations, then the decision theories will diverge. Those non-causal correlations would occur most often if the decider is playing a game against another intelligent agent.

Intuitively, we might say that Eve and Carl both mis-understand the structure of the world that we observe around us. Some events are caused by others, and that information could help Eve. Some events depend on the same logical truths as other events, and that information could help Carl. It is Fiona who (we think) most accurately models the world we see around us.

Functional Decision Theory

This is a summary of parts of MIRI’s FDT paper, available here.

A decision theory is a way of choosing actions in a given situation. There are two competing decision theories that have been investigated for decades: causal decision theory (CDT) and evidential decision theory (EDT).

CDT asks: what action would give me the best outcome?

EDT asks: which action would I be most delighted to learn that I had taken?

These theories both perform well on many problems, but on certain problems they choose actions that we might think of as poor choices.

Functional decision theory is an alternative to these two forms of decision theory that performs better on all known test problems.

Why not CDT?

CDT works by saying: given exactly what I know now, what would give me the best outcome. The process for figuring this out would be to look at all the different actions available, and then calculate the payoffs for the different actions. Causal deciders have a model of the world that they manipulate to predict the future based on the present. Intuitively, it seems like this would perform pretty well.

Asking what would give you the better outcome in a given situation only works when dealing with situations that don’t depend on your thought process. That rules out any situation that deals with other people. Anyone who’s played checkers has had the experience of trying to reason out what their opponent will do to figure out what their own best action is.

Causal decision theory fails at reasoning about intelligent opponents in some spectacular ways.

Newcomb’s Problem

Newcomb’s problem goes like this:

Some super-powerful agent called Omega is known to be able to predict with perfect accuracy what anyone will do in any situation. Omega confronts a causal decision theorist with the following dilemma: “Here is a large box and a small box. The small box has $1000 in it. If I have predicted that you will only take the large box, then I have put $1 million into it. If I have predicted that you will take both boxes, then I have left the large box empty.”

Since Omega has already made their decision. The large box is already filled or not-filled. Nothing that the causal decision theorist can do now will change that. The causal decision theorist will therefore take both boxes, because either way that means that they get an extra $1000.

But of course Omega predicts this and the large box is empty.

Since causal decision theory doesn’t work on some problems that a human can easily solve, there must be a better way.

Evidential decision theorists will only take the large box in Newcomb’s problem. They’ll do this because they will think to themselves: “If I later received news that I had taken only one box, then I’ll know I had received $1 million. I prefer that to the news that I took both boxes and got $1000, so I’ll take only the one box.”

So causal decision theory can be beaten on at least some problems.

Why not EDT?

Evidential decision theory works by considering the news that they have performed a certain action. Whatever news is the best news, that’s what they will do. Evidential deciders don’t manipulate a model of the world to calculate the best event, they simply calculate the probability of a payoff given a certain choice. This intuitively seems like it would be easy to take advantage of, and indeed it is.

Evidential decision theorists can also be led astray on certain problems that a normal human will do well at.

Consider the problem of an extortionist who writes a letter to Eve the evidential decider. Eve and the extortionist both heard a rumor that her house had termites. The extortionist is just as good as Omega at predicting what people will do. The extortionist found out the truth about the termites, and then sent the following letter:

Dear Eve,

I heard a rumor that your house might have termites. I have investigated, and I now know for certain whether your house has termites. I have sent you this letter if and only if only one of the following is true:

a) Your house does not have termites, and you send me $1000.
b) Your house does have termites.

Sincerely,
The Notorious Termite Extortionist

Eve knows that it will cost more than $1000 to fix the termite problem. So when she receives the letter, she will think to herself:

If I learn later that I paid the extortionist, then that would mean that my house didn’t have termites. That is cheaper than the alternative, so I will pay the extortionist.

The problem here is that paying the extortionist doesn’t have any impact on the termites at all. That’s something that Eve can’t see, because she doesn’t have a concrete model that she’s using to predict outcomes. She’s just naively computing the probability of an outcome given an action. That only works when she’s not playing against an intelligent opponent.

If the extortionist tried to use this strategy against a causal decision theorist, the letter would never be sent. The extortionist would find that the house didn’t have termites and would predict the causal decision theorist would not pay, so the conditions of the letter are both false. A causal decision theorist would never have to worry about such a letter even arriving.

Why FDT?

EDT is better in some situations, and in other situations CDT is better. This implies that you could do better than either by just choosing the right decision theory in the right context. That, in turn, implies that you could just make a completely better decision theory, which may just be MIRI’s functional decision theory.

Functional Decision Theory asks: what is the best thing to decide to do?

The functional decider has a model of the world that they use to predict outcomes, just like the causal decider. The difference is in the way the model is used. A causal decider will model changes in the world based on what actions are made. A functional decider will model changes in the world based on what policies are used to decide.

A function decision theorist would take only one box in Newcomb’s problem, and they would not succumb to the termite extortionist.

FDT and Newcomb’s problem

When presented with Newcomb’s problem, a functional decider would make their decision based on what decision was best, not on what action was best.

If they decide to take only the one box, then they know that they will be predicted to make that decision. Thus they know that the one box will be filled with $1 million.

If they decide to take both boxes, then they know they will be predicted to take both boxes. So the large box will be empty.

Since the policy of deciding to take one box does better, that is the policy that they use.

FDT and the Termite Extortionist

Just like the causal decider, the functional decider will never get a letter from the termite extortionist. If there’s ever a rumor that the functional decider’s house has termites, the extortionist will investigate. If there are no termites, then the extortionist will predict what the functional decider will do upon receiving the letter:

If I decide to pay the extortion letter, then the extortionist will predict this and send me this letter. If I decide not to pay, then the extortionist will predict that I won’t, and will not send me a letter. It is better to not get a letter, so I will follow the policy of deciding not to pay.

The functional decider would not pay, even if they got the letter, because paying would guarantee getting the letter.

The differing circumstances for CDT and EDT

Newcomb’s problem involves a predictor that models the agent and determines the outcome.

The termite extortionist involves a predictor that models the agent, but imposes a cost that’s based on something that the agent cannot control (the termites).

The difference between these two types of problems is called subjunctive dependence.

Causal dependence between A and B: A causes B

Subjunctive dependence between A and B: A and B are computing the same function

FDT is to subjunctive dependence as CDT is to causal dependence.

A Causal Decider makes decisions by assuming that, if their decision changes, anything that can be caused by that decision could change.

A Functional Decider makes decisions by assuming that, if the function they use to choose an action changes, anything else that depends on that function could change (including things that happened in the past). The functional decider doesn’t actually believe that their decision changes the past. They do think that the way they decide provides evidence for what past events actually happened if those past events were computing functions that the functional decider is computing in their decision procedure.

Do you support yourself?

One final recommendation for functional decision theory is that it endorses its own use. A functional decider will make the same decision, regardless of when they are asked to make it.

Consider a person trapped in a desert. They’re dying of thirst, and think that they are saved when a car drives by. The car rolls to a stop, and the driver says “I’ll give you a ride into town for $1000.”

Regardless of if the person is a causal, evidential, or functional decider, they will pay the $1000 if they have it.

But now imagine that they don’t have any money on them.

“Ok,” says the driver, “then I’ll take you to an ATM in town and you can give me the money when we get there. Also, my name is Omega and I can completely predict what you will do.”

If the stranded desert-goer is a causal decider, then when they get to town they will see the problem this way:

I am already in town. If I pay $1000, then I have lost money and am still in town. If I pay nothing, then I have lost nothing and am still in town. I won’t pay.

The driver knows that they will be cheated, and so drives off without the thirsty causal decider.

If the desert-goer is an evidential decider, then once in town they’ll see things this way:

I am already in town. If I later received news that I had paid, then I would know I had lost money. If I received news that I hadn’t paid, then I would know that I had saved money. Therefore I won’t pay.

The driver, knowing they’re about to be cheated, drives off without the evidential decider.

If the desert goer is a functional decider, then once in town they’ll see things this way:

If I decide to pay, I’ll be predicted to have decided to pay, and I will be in town and out $1000. If I decide not to pay, then I’ll be predicted to not pay, and I will be still in the desert. Therefore I will decide to pay.

So the driver takes them into town and they pay up.

The problem is that causal and evidential deciders can’t step out of their own algorithm enough to see that they’d prefer to pay. If you give them the explicit option to pay up-front, they would take it.

Of course, functional deciders also can’t step out of their algorithm. Their algorithm is just better.

The Deciders

This is based on MIRI’s FDT paper, available here

Eve, Carl, and Fiona are all about to have a very strange few days. They don’t know each other, or even live in the same city, but they’re about to have similar adventures.

Eve

Eve heads to work at the usual time. As she walks down her front steps, her neighbor calls out to her.

“I heard a rumor that your house has termites,” says the neighbor.

My dear reader: you and I know that Eve’s house doesn’t have termites, but she doesn’t know that.

“I’ll have to look into it,” responds Eve, “but right now I’m late for work.” And she hurries off.

As she’s walking to work, Eve happens to meet a shadowy stranger on the street. That shadowy stranger is carrying a large box and a small box, which are soon placed on the ground.

“Inside the small box is $1000,” says the stranger. “Inside the big box, there may be $1 million, or there may be nothing. I have made a perfect prediction about what you’re about to do, but I won’t tell you. If I have predicted you will take only the big box, it will have $1 million in it. If I have predicted that you will take both boxes, then I left the big box empty. You can do what you want.”

Then the stranger walks off, ignoring Eve’s questions.

Eve considers the boxes. The mysterious stranger seemed trustworthy, so she believes everything that she was told.

Eve thinks to herself: if I was told later that I took only the big box, then I’d know I’d have $1 million. If I were told I had taken both boxes, then I’d know that I only had $1000. So I’d prefer to have only taken the big box.

She takes the big box. When she gets to work, she opens it to find that it is indeed full of ten thousand hundred dollar bills. She is now a millionaire.

Eve goes straight to the bank to deposit the money. Then she returns home, where she has a strange letter.

The letter is from the notorious termite extortionist. The termite extortionist has been in the news a few times recently, so Eve knows that the villain is for real.

The letter reads:

Dear Eve,

I heard a rumor that your house might have termites. I have investigated, and I now know for certain whether your house has termites. I have sent you this letter if and only if only one of the following is true:

a) Your house does not have termites, and you send me $1000.
b) Your house does have termites.

Sincerely,
The Notorious Termite Extortionist

If her house has termites, it will take much more than $1000 to fix. Eve thinks about the situation.

If she were to find out later that she had paid the extortionist, then that would mean that her house did not have termites. She prefers that to finding out that she hadn’t paid the extortionist and had to fix her house.

Eve sends the Extortionist the money that was asked for. When she checks her house, she finds that it doesn’t have termites, and is pleased.

Eve decides to take the bus to work the next day. She’s so distracted thinking about everything that’s happened recently that she gets on the wrong bus. Before she knows it, she’s been dropped off in the great Parfit Desert.

The Parfit Desert is a terrible wasteland, and there won’t be another bus coming along for over a week. Eve curses her carelessness. She can’t even call for help, because there’s no cell signal.

Eve spends two days there before a taxi comes by. By this point, she is dying of thirst. It seemed she would do anything to get out of the desert, which is what she says to the taxi driver.

“It’s a thousand dollars for a ride into town,” says the Taxi driver.

“I left my money at home, but I’ll pay you when we get there,” says Eve.

The taxi driver considers this. It turns out that the taxi driver is a perfect predictor, just like the mysterious stranger and the termite extortionist.

The taxi driver considers Eve. The driver won’t be able to compel her to pay once they’re in town. And when they get to town, Eve will think to herself:

If I later found out that I’d paid the driver, then I’d have lost $1000. And if I later found out that I hadn’t paid the driver, then I’d have lost no money. I’d rather not pay the driver.

The taxi driver knows that Eve won’t pay, so the driver goes off without her. Eve dies of thirst in the desert.

Eve has $999,000, her house does not have termites, and she is dead.

Carl

As he heads to work, Carl’s neighbor mentions a rumor about termites in Carl’s house. Carl, also late for work, hurries on.

A mysterious stranger approaches him, and offers him two boxes. The larger box, Carl understands, will only have $1 million in it if the stranger predicts that Carl will leave the smaller box behind.

As Carl considers his options, he knows that the stranger has either already put the money in the box, or not. If Carl takes the small box, then he’ll have an extra $1000 either way. So he takes both boxes.

When he looks inside them, he finds that the larger box is empty. Carl grumbles about this for the rest of the day. When he gets home he finds that he has no mail.

Now dear reader, let’s consider the notorious termite extortioner. The termite extortioner had learned that Carl’s house might have termites. Just as with Eve’s house, the extortioner investigated and found that the house did not, in fact, have termites.

The extortioner considered Carl, and knew that if Carl received a letter he wouldn’t pay. The extortioner knew this because he knew that Carl would say “Either I have termites or not, but paying won’t change that now”. So the extortioner doesn’t bother to waste a stamp sending the letter.

So there is Carl, with no mail to occupy his afternoon. He decides to catch a bus downtown to see a movie. Unfortunately, he gets on the wrong bus and gets off in the Parfit Desert. When he realizes that the next bus won’t come for another week, he curses his luck and starts walking.

Two days later, he’s on the edge of death from dehydration. A taxi, the first car he’s seen since he got off the bus, pulls up to him.

“It’s a thousand dollars for a ride into town,” says the Taxi driver.

“I left my money at home, but I’ll pay you when we get there,” says Carl.

The taxi driver considers Carl. The driver won’t be able to compell him to pay once they’re in town. And when they get to town, Carl will think to himself:

Now that I’m in town, paying the driver doesn’t change anything for me. Either I give the driver $1000, or I save the money for myself.

The taxi driver knows that Carl won’t pay when the time comes to do it, so the driver goes off without him. Carl dies of thirst in the desert.

Carl has $1000, his house does not have termites, and he is dead.

Fiona

As Fiona leaves home for work, her neighbor says to her “I heard a rumor that your house has termites.”

“I’ll have to look into that,” Fiona replies before walking down the street.

Partway to work, a mysterious stranger confronts her.

“Yes, yes, I know all about your perfect predictions and how you decide what’s in the big box,” says Fiona as the stranger places a large box and a small box in front of her.

The stranger slinks off, dejected at not being about give the trademarked speech.

Fiona considers the boxes.

If I’m the kind of person who decides to only take the one large box, then the stranger will have predicted that and put $1 million in it. If I’m the kind of person that decides to take both boxes, the stranger would have predicted that and left the big box empty. I’d rather be the kind of person that the stranger predicts as deciding to take only one box, so I’ll decide to take one box.

Fiona takes her one large box straight to the bank, and is unsurprised to find that it contains $1 million. She deposits her money, then goes to work.

When she gets home, she finds that she has no mail.

Dear reader, consider with me why the termite extortionist didn’t send a letter to Fiona.

When the termite extortionist learned of the rumor about Fiona’s house, the resulting investigation revealed that there were no termites. The extortionist would predict Fiona’s response being this:

If I’m the kind of person who would decide to send money to the extortionist, then the extortionist would know this about me and send me an extortion letter. If I were the kind of person who decided not to give money to the extortionist, then the extortionist wouldn’t send me a letter. Either way, the cost due to termites is the same. So I’d prefer to decide not to pay the extortionist.

The extortionist knows that Fiona won’t pay, so the letter is never sent.

Fiona also decides to see a movie. In a fit of distraction, she takes the wrong bus and ends up in the Parfit Desert. When she realizes that the next bus won’t be along for a week, she starts walking.

Two days later, Fiona is on the edge of death when a taxi pulls up.

“Please, how much to get back to the city? I can’t pay now, but I’ll pay once you get me back,” says Fiona.

“It’s $1000,” says the taxi driver.

The taxi driver considers Fiona’s decision-making process.

When Fiona is safely in the city and deciding whether to pay the taxi driver, she’ll think to herself: If I were the kind of person who decided to pay the driver, then the driver would know that and take me here. If were the kind of person who decided not to pay the driver, then the driver wouldn’t give me a ride. I’d rather be the kind of person who decided to pay the driver.

The taxi driver takes Fiona back to the city, and she pays him.

Fiona has $999,000, her house doesn’t have termites, and she is alive.

Dear reader, the one question I want to ask you is: who is spreading all those rumors about termites?

The Weakness of Rules of Thumb

There’s a common issue that comes up when I’m teaching people how to design electronics: people new to designing electronics often feel like they need to obey all the rules of thumb.

This came up recently when somebody I was teaching wanted to make sure none of her PCB’s electrical traces had 90 degree bends in them. There was on particular point on her board that couldn’t be made to fit that rule.

When she realized that she’d have to put a 90 degree bend in her trace, her question to me was if that was “valid and legal”. I probed her understanding a bit, and it seemed she was mostly thinking about the design guidelines as though they were laws, and she didn’t want her design to break any laws.

This kind of thinking is pretty common, but I think it actively prevents people from designing electronics effectively. People with this mindset focus too much on whether their design meets some set of rules, and not enough on whether their design will actually work.

Where Design Guidelines Come From

Electronics is a domain that has a lot of rules of thumb. There are some pretty complicated physics behind how electrons act on a circuit board, and you can often simplify an engineering problem to a simple rule.

After a few decades of the industry doing this, there’s a large collection of rules. New engineers sometimes learn the rules before learning the physical principles that drive them, and then don’t know when the rules don’t apply.

For example, the advice to not put 90 degree bends in electrical traces is due to the fact that sharp bends increase the reactive impedance of a trace. For high frequency traces, this can distort the electrical signal flowing down the trace.

For low frequency or DC signals, 90 degree bends are much less of an issue.

Ultimately, any rule of thumb rests on a concrete foundation of “if this condition holds, then that result will be produced in a specific manner”. If you know the detailed model that drives the simple rule, you know when to ignore the rule.

Obeying The Rules or Designing a Working Project

The mindset that I sometimes see in new students is the idea that they need to follow all the rules. This makes sense if you assume that following all the rules will automatically lead to a working project. Unfortunately, the rules of thumb in electronics over-constrain a circuit board. Electrical engineers will often face the prospect of a design guideline that can’t be satisfied.

The most effective response to a rule of thumb that can’t be satisfied seems to be to ask about the physics behind the rule. Then the engineer can figure out how change the design to match the physical laws behind the general guideline.

The design guidelines were made for the people, not the people for the design guidelines.

Don’t ask “how can I make this design satisfy all the design guidelines?”

Instead ask “how can I make this design work?”

Corrigibility

This post summarizes my understanding of the MIRI Corrigibility paper, available here.

If you have a super powerful robot, you want to be sure it’s on your side. The problem is, it’s pretty hard to specify what it even means to be on your side. I know that I’ve asked other people to do things for me, and the more complicated the task is the more likely it is to be done in a way I didn’t intend. That’s fine if you’re just talking about decorating for a party, but it can cause big problems if you’re talking about matters of life or death.

Overrides

Since it’s hard to specify what your side actually is, it might make sense to just include an override in your super powerful robot. That way if it starts mis-behaving, you can just shut it down.

So let’s say that you have an emergency stop button. It’s big and red and easy to push when things go south. What exactly happens when that button gets pushed.

Maybe the button cuts power to the computer that runs your robot. The problem with that is that your robot may have set up a bunch of sub-agents online, and a simple power switch wouldn’t effect them.

No, that e-stop button needs to have some pretty complex logic behind it to actually stop things.

Maybe the button puts your robot into a different mode. Before you press the button, the robot is trying to make you a sandwich. After you press the button, the robot is trying to shut down the peanut butter spreader without dropping a knife on anyone’s foot.

The problem there is that if your robot is complex enough to pursue a goal, then it may try to prevent you from stopping it. Right now, most robots just execute specific pre-programmed actions. A super-powerful robot will need to generate new actions to achieve difficult goals.

If a robot’s goal is to make a sandwich, and it knows it won’t be able to if you press the button, then step one of sandwich making may be to kill you so you don’t press the button.

So the original goal mode, before the button is pressed, needs to incorporate an equanimity with the button being pushed. That’s what it means for an robot to be corrigible in the sense that MIRI means.

Encouraging Equanimity

You can say that the thing your robot wants is represented by its value function: some mathematical representation of its desires (and those desires may not be recognizable as such to a human). The robot will do whatever maximizes the value function. You want to make that value function be ok with the e-stop button being pressed.

The way that the MIRI paper recommends doing this is to combine two different value functions. One of those value functions is where the robot does what you told it to. The other value function is where the robot shuts down nicely.

Combining those value functions is complicated, since the first one needs to have equanimity with the second.

Simple Combination of Value Functions

You might try to combine the value functions as simply as possible:
if the button isn’t pressed, value is just the normal value function. If the button is pressed, then the value is 100 if the robot is shutting down and 0 otherwise.

The robot may do a bunch of different things over time, and the value it expects to get may change a lot after each action. Maybe in the course of making a sandwich it drops the bread, so the sandwich will be gross and provide low value. Or maybe it discovers that adding heroin to the sandwich makes people love it so much more, thus increasing the value of the sandwich.

Whatever the actual e-stop value function is, the normal value function could be much worse (maybe -10) or much better (maybe 200) depending on what actions the agent takes. This gives the agent a reason to try and manage whether the button gets pressed. This obviously doesn’t help us.

Perfect Indifference

Since different actions have an impact on the normal value function, you may want them to have an effect on the e-stop value function as well.

Think about it this way:

Let’s say you have a lottery ticket, and the winner hasn’t been announced yet. If your friend offers you $20 for the ticket and you paid $1 for it, then that seems like a pretty good deal.

But what if you check online, and find out that your ticket is the winner. Now your friend would have to offer you much more than $20 to make you give up the ticket.

By making the e-stop value function dependent on what the robot already knows, you can make sure that it’s always willing to obey the button.

The MIRI paper describes this as a combination of bonus and penalty. If the e-stop would ordinarily make the robot lose value, then the e-stop value function gives it a bonus equal to what it might have lost. If the e-stop would have made the robot win, then the e-stop value function gives it a penalty equal to what it may have lost in the normal mode.

This makes the robot perfectly indifferent to whether the button is pressed, but it does have some surprising impacts.

Problems with Perfect Indifference

The perfect indifference between e-stop and normal valuations is a problem. It means that there’s nothing that the robot would ever do to preserve the e-stop button from destruction. You, the robot creator, value the e-stop button. The robot does not. And if it could pay a tiny cost to prevent the e-stop button from being disconnected by an errant roomba, it would not. It would not even let you know that the event had occurred.

So if something goes seriously wrong with your robot, you don’t have any guarantee that the e-stop button will still work when you press it.

The State of the Art

The description above is pretty much the state of the art in super-powerful e-stop buttons. The MIRI paper argues that knowing how to combine utility functions in this way is crucial to being able to depend on super-intelligent AIs. They point to a few different things that may help, but don’t have a solution yet.

Things to look into later

Complex Shutdown Procedures

What if you want it to do something other than shut down? You can write your e-stop utility function to be anything you want. The more complicated it gets, the more you might want an e-stop for your e-stop.

Equanimity or Begrudging Acceptance

It doesn’t make sense to me that you’d want your robot to be equally ok with the button being pressed or not pressed. In that case, why would it not just flip a coin and press the button itself if the coin comes up heads? To me it makes more sense that it does want the button to be pressed, but all the costs of actually causing it be pressed are higher than the benefit the robot gets from it. In this case the robot may be willing to pay small costs to preserve the existence of the button.

Depending on how the expected values of actions are computed, you could attach an ad-hoc module to the robot that automatically makes the cost of pressing the button slightly higher than the benefit of doing so. This ad-hoc module would be unlikely to be preserved in sub-agents, though.

Costs the Robot Maker Can Pay

Some of the assumptions behind the combined value function approach is that the normal value function is untouched by the addition of the e-stop value function.

You want your robot to make you a sandwich, and adding an e-stop button shouldn’t change that.

But I’m perfectly ok with the robot taking an extra two minutes to make my sandwich safely. And I’m ok with it taking food out of the fridge in an inefficient order. And I’m ok with it using 10% more electricity to do it.

There are a number of inefficiencies that I, as a robot builder, am willing to put up with to have a safe robot. It seems like there should be some way to represent that as a change to the normal value function, allowing better behavior of the robot.

Don’t be afraid to move down the ladder of abstraction

In programming, there’s an idea that’s often called the ladder of abstraction. When you approach a problem, you can understand small bits of it and then put those together into larger pieces. By thinking about the problem with these larger pieces, you can get a better idea of what’s going on.

A piece of advice that’s often given is to move up the ladder of abstraction. Build a tool or function that does a low level thing, then just use that instead of looking at the lower level again. When you’re starting from scratch on a project. This is a great idea. Using the ladder of abstraction allows you to quickly build things that work well, without having to keep solving the same problems over and over again.

However, there are times that it makes total sense to move down the ladder of abstraction, and look at what’s going on as concretely as possible. This is especially true if you’re debugging, and trying to fix something that’s broken. Higher levels of abstraction obscure what’s actually happening, which makes it difficult to isolate a problem so it can be fixed.

That’s not to say that bugs should always be hunted in the weeds. Moving up the ladder of abstraction can help you to find out which particular component of a larger system is the source of the problem. Once that’s been determined, you’ll have to be more concrete with that component in order to solve the problem.

I also think this kind of model is good for solving more than programming problems. I’ve successfully used the idea of changing levels of abstraction to solve software bugs, fix hardware errors, and figure out how to deal with socially difficult situations. I would expect the idea to also work well in the softer sciences, like politics, but it seems like people often get stuck in at one level in those areas.

I sometimes have conversations with people about political systems that have clear problems, like how global warming is dealt with in the US. Sometimes, the solution proposed is systemic change. When I ask what that means, answers are often given at the level of the entire political system, rather than what specific people or groups should do. While I agree that “the system” needs to change, I think trying to change the system as a whole is ineffective. It would be much more effective to move down a level of abstraction and suggest who do what differently. Once that’s done, the system is different. Systemic change has happened, but at a level that is easier to impact.

I think that some aspects of political discontent stem from being stuck at one level of abstraction. If you think that global warming or poverty needs to be solved at the level of the US government, then that’s a huge problem and how are you going to do anything? It’s easy to get overwhelmed that way. On the other hand, if you think of those problems as being generated by smaller sub-components, then you have places to look for actions that are achievable.

I don’t have the answers for those large systemic political issues right now, but I do think that this idea from software can be of help. By being willing to move to more concrete understandings, we can solve problems that seem intractable.

Red Spirits

There’s a story I heard from a friend at a recent Rationality meetup. It goes like this:

When Europeans were colonizing Africa, they told some Africans that they had to move their city. Their city was on a plains, and Europeans wanted a nice city like those at home: on a river. The Africans objected, saying that they couldn’t live near the river. That’s where the red spirits were, and people would suffer if they lived there. The Europeans made them do it anyway, because red spirits clearly don’t exist. And then everyone got Malaria.

I think there are two thing going on here:

1. The colonizers were basically assuming that the moral of a fairy tale wasn’t useful because the fairy tale wasn’t true.

2. The Europeans were ignoring a story because it didn’t fit in with the terminology that they already used to describe the world.

Fairy Tales With Morals

The colonizers assumed that, because the justification for a custom was contradicted by scientific understanding, the custom wasn’t valuable. Red-spirits don’t exist, so there’s no reason to follow the custom.

The issue with this is that culture is subject to evolutionary pressure in the same way as genes. Cultures that lead to their adherents prospering are more likely to be present in the future, so any currently existing cultural artifact should be assumed to have served some important purpose in the past. That purpose may not be clear, or it may not be one that you agree with in a moral sense, or it may not apply in the present, but the purpose almost certainly existed.

This is basically a Chesterton’s Fence argument at the cultural level. If the colonizers hadn’t assumed that something they couldn’t see a reason for had no reason, many people’s lives could have been saved.

Science Stories

The terminology mistake is, in my opinion, even more dire. The colonizers argued that red spirits didn’t exist, so people should move to the river. My friend who told me this story argued that the native villagers were mistaken for believing in red spirits, and that they should instead have believed in mosquitos.

The problem is that it isn’t clear from the story that there’s any difference between believing in mosquitos and believing in red spirits. Maybe red-spirits just means mosquitos. Or maybe it means malaria. The story doesn’t have enough information to tell if the villagers were actually wrong about anything. When I brought this up, my friend couldn’t answer any questions about what believing in red spirits actually meant to the villagers.

This is a failure mode that I think is common to people who describe themselves as scientists. I’ve noticed that people who describe observations in a way that doesn’t use standard scientific jargon are often dismissed by people who are super into science. That happens even more if the description given uses words often used by marginalized sub-cultures.

People may be describing the exact same observations, and using the same model to describe those observations, but argue because they’re using different terminology. It seems important to actually try to understand the model people have in their heads, and try to avoid quibbles about how they describe that model as much as possible.

There’s another level to this if you assume that red spirits actually means ghosts in the western sense. Science-afficianados like to talk about the value of testability, but both mosquito and ghost models are testable. If you think that mosquitos carry tiny cells that can reproduce in your body and make you sick, that implies certain things you can do to prevent disease. If you think that ghosts get angry if you live in a certain area and make you sick, that implies other methods of prevention. People can try these prevention methods and see what works; they can test their theories. Just going in and saying that ghosts don’t exist totally ignores any tests that the villagers actually did before you got there.

The Use of Red Spirits

Even assuming that red spirits literally meant believing in ghosts, that idea was saving lives at the time that the colonizers moved in. It seems like there are a lot of fairy tales like this: explanations whose constituent parts don’t correspond with things in the real world, but that still accurately predict patterns in the real world.

I think that this is the source of a lot of cultural relativism and post-modernism. If someone thinks only of the outcome of explanations, then the actual truth value of the component parts of the explanation don’t matter. All explanations are as valid as they are useful to their culture. Since cultural evolution strongly implies all stories and explanations serve some useful purpose, every story a culture tells is useful. Therefore all explanations are true.

The only mistake that I see with that is the idea that a useful fairy tale implies that each component of the fairy tale is useful.

Having an explanation whose component parts each correspond to something that can be observed in the real world is useful on its own. If you have such a model, you can mentally vary different aspects of it and predict the outcome. It’s easier to use subjunctive reasoning on a model with true parts than a model that’s only useful when taken as a whole. You can even take small sections of the model and apply them in other circumstances.

Thinking that mosquitos cause malaria implies that you should avoid mosquitos, which (as we now know) can actually prevent you from getting sick. Thinking ghosts cause malaria might be useful if you end up avoiding mosquitos while also avoiding ghosts. Given that avoiding ghosts leads to avoiding mosquitos, the main reason to prefer one of these over the other is if one is less onerous.

Beliefs and stories rent out a share of your brain by being useful to you. As the landlord of your brain, it seems like the best thing to do is get beliefs that will pay you a lot in usefulness while requiring little mental real estate for themselves. Believing in and acting on the mosquito-malaria connection takes a certain amount of mental effort. I’m not sure what a belief in a ghost-malaria connection that actually led to avoiding malaria would entail, but I can guess that it would be more mentally costly than the mosquito-malaria alternative.

A non-mathematical introduction to the wave equation

Waves pop up everywhere in physics. They’re most obvious at the beach, but waves are also used to describe light, pendulums, and all sorts of other things. Because waves can describe so much in physics, it’s important to know what it actually means when you talk about waves.

The medium is not the wave

When physicists talk about waves, they mean something very specific. They mean that some material is moving in a specific way. Waves at the beach are the best way to visualize this for me. Water waves have a very obvious medium: water. The waves are not the water, they’re the way the water moves up and down over time. And the water doesn’t just move up and down randomly; a peak on the water seems to travel towards the beach.

This is a very key point. Waves are just the way that some type of medium is moving. Light waves, for example, are just motion in the electromagnetic field.

So what causes the wave to move like it does? The answer to that relies on two separate ideas: energy input and strain in the medium.

Energy input

For most mediums (like water), if you leave it alone the waves will all die out and it’ll be still. There needs to be some transfer of energy into the medium in order for a wave to start. At the beach, the energy to start a wave often comes from wind. For light, the energy to start a light wave (photon) usually comes from electrons bouncing around.

Not all mediums are like this. In outer space, the electromagnetic field will keep a light wave going forever. That only works once the light wave gets started, which still takes energy input.

Strain

Strain in a medium is the tendency for it to return to it’s original position. In water, strain is provided by gravity. Because gravity pulls on all the water in the ocean, the water tries to keep an even level. If wind pushes some water up higher, gravity will try to pull it down. The energy in the peak will get transferred to nearby water molecules, and the wave will move.

The wave equation

The way that strain in a medium causes a wave to travel is described by the wave equation

\frac{\partial^2 u}{\partial t^2} = c^2 \frac{\partial^2 u}{\partial x}

This equation shows how the time rate of change of the wave (the left hand side) is related to the strain in the medium (the right hand side). This equation is what’s called a second-order differential equation, which just means we’re taking derivatives twice (the \partialstuff).

To understand this equation, it helps to take a look at each side separately. To do this, we’re going to look at snapshots of the wave in a couple of different ways.

Localizing space

Let’s think about the left side of this equation first. The entire left side basically means “the way the medium changes with time”. To get a feel for this, I like to imagine I’m treading water at the beach. Maybe I’ve swum out a few leagues and I’m just bobbing up and down with the water. You could plot my height above sea level over time, and that would give you one view of the ocean waves. We’ve made a little movie of the wave for a single point in space, and ignored all the other points.

The left hand side of the wave equation represents how fast my height at that point is changing (which is equivalent to saying that it’s the second derivative of my height with respect to time).

Freezing time

The right hand side of this equation is another second derivative. This time it’s a derivative with respect to space, instead of time (x instead of t). To understand this, think of freezing time instead of localizing space. If we take a snapshot of an ocean wave at a given time, it would be like a bunch of troughs and valleys in the water that don’t change.

The second derivative on the right hand side of this equation represents the curvature of the water at any given point. That curvature is a pretty good measure of the strain the water is under. The spikier the water, the more the curvature, the more the strain.

Time and Space together

The wave equation is an assertion that curvature of a wave in space (the strain) is related to the way that the wave travels as time moves forward. The relation is given by the factor c^2 in the wave equation. The travel of some part of the wave through time is equal to the curvature of the wave at that point multiplied by speed of the wave in that medium.

If we’re talking about light waves, then c is the trusty old 3*10^8 m/s. If we’re talking about sound waves or water waves, then c is going to be different. The exact value of c depends on what kind of medium we’re talking about.

Conclusions

So that’s the main thing that a wave is. A wave is just a way that strains in a medium move around, and you can describe that motion using a specific equation. The wave equation says that the curvature in the medium at a point is related to the rate at which the strain of the medium changes at that point.

If you know the properties of a medium and what the strain in the medium looks like now, you can calculate the curvature of the medium. That let’s you figure out what the medium looks like later. You may also need some extra information, like the velocity of the wave now or the energy being put into the wave.

Feature Selection in SIFT

SIFT is Scale Invariant Feature Transform, which is a commonly used image recognition algorithm. Like most image recognition algorithms, it works by projecting a 2D image to a 1D feature-vector that represent what’s in the image, then comparing that feature-vector to other feature-vectors produced for images with known contents (called training images). If vectors for two different images are close to each other, the images may be of the same thing.

I did a bunch of machine learning and pattern matching when I was in grad school, and the thing that was always most persnickety was choosing how to make the feature-vector. You’ve got a ton of data, and you want to choose only values for the feature-vector that are representative in some way of what you want to find. Ideally, the feature-vector is much smaller in size (in terms of total bytes) than the original image. Hopefully the decrease in size is achieved by throwing away inconsequential data, and not data that would actually be pretty helpful. If you’re doing image recognition, it might make sense to use dominant colors of an image, edge relationships, or something like that. SIFT is much more complicated than that.

SIFT, which has been pretty successful, creates feature-vectors by choosing key points from band-pass filtered images (they use the difference of gaussians method). Since an image may be blurry or of a different size than the training images, SIFT generates a huge number of Difference of Gaussian variants of the image (DoG variants). By carefully choosing how blurred the images are before subtraction, DoG variants can be produced that band-pass filter successive sections of the frequency content.

The DoG variants are then compared to each other. Each pixel in each DoG variant compared to nearby pixels in DoG variants of nearby frequency content. Pixels that are maximum or minimum compared to neighboring pixels in nearby frequency DoGs are chosen as the features for the feature-vector, and saved as both location and frequency. These feature-vector elements (called keypoints) then encapsulate both space information (the location of the pixel in the original image), and frequency information (it’s a max or min compared to nearby frequency DoGs).

Pixels that are too similar to nearby pixels in the original image are thrown away. If the original pixel is low contrast, it’s discarded as a feature. If it’s too close to another keypoint that is more extreme in value, then it’s discarded. Each of the remaining feature-vector elements is associated with a rotation by finding the gradient of the DoG that the element was taken from originally. Finally, all of these points are assembled into 128 element arrays describing the point (position, frequency, rotation, nearby pixel information).

This means that if there are a large number of keypoints in the image, the feature vector used for image recognition could be even larger than the size of the image itself. So they aren’t doing this speed up computation, it’s solely for accuracy.

And SIFT does get pretty accurate. It can identify objects in images even if they’re significantly rotated from the training image. Even a 50deg rotation still leaves objects identifiable, which is pretty impressive.

Difference of Gaussians

While reading about image recognition algorithms, I learned about a method of band-pass filtering I hadn’t seen before. The Difference of Gaussians method can be used to band-pass filter an image quickly and easily. Instead of convolving the image with a band-pass kernel, the Difference of Gaussians methods uses two low pass filters and subtracts the two.

You start by blurring the image using a gaussian kernel, then subtract the blurred image it from a second less blurred version of the original. The result is an image with only features between the two blur levels. The two levels of blur used in the subtraction step can be varied to give different band pass limits.

This method can be effectively used for edge detection because it cuts down high frequency noise by subtracting a less blurred image. That means that noise in the image doesn’t get treated as an edge. Apparently there are common blur levels that cause the Difference of Guassians method to approximate response of ganglion cells (light sensing nerve cluster in the eye)to light that falls on or near them.