On Knowledge Bootstrapping v0.1

Over the last few weeks, AcesoUnderGlass has been posting a series about how to research things effectively. This culminated with her Knowledge Bootstrapping Steps v0.1 guide to turning questions into answers. To a first approximation, I think this skill is the thing that lets people succeed in life. If you know how to answer your own questions, you can often figure out how to do any of the other things you need to do.

Given how important this is, it seemed totally worth experimenting with her method to see if it would work for me. I picked a small topic that I’d been meaning to learn about in detail for years. I used the Knowledge Bootstrapping method to learn about it, and paid a lot of attention to my experience of the process. You can see the output of my research project here. Below is an overly long exploration of my experience researching and writing that blog post.

How I used to learn vs Knowledge Bootstrapping

My learning method has changed wildly over the years. When I was in undergrad, I thought that going to lectures was how you learned things and I barely ever studied anything aside from my own notes. This worked fine for undergrad, but didn’t really prepare me to do any on-the-job learning or to gain new skills. I spent an embarrassingly long time after undergrad just throwing myself really hard at problems until I cracked them open. If I was presented with a project at work that I didn’t immediately know how to do (and that a brief googling didn’t turn up solutions for), then I would just try everything I could think of until I figured something out. That usually worked eventually, but took a long time and involved a lot of dumb mistakes. And when it didn’t work I was left stuck and feeling like I was a failure as a person.

When I went back to college to get a Master’s degree, I knew I couldn’t keep doing that. I had visions of getting a PhD., so I thought I’d be doing original research for the next few years. I had to get good at learning stuff outside of lectures. My approach to this was to say: what did my undergrad professors always tell me to do? Read the textbooks.

So in grad school I got good at reading textbooks, and I always read them cover-to-cover. I didn’t really get good at reading papers, or talking to people about their research or approaches. Just reading textbooks. This was great for the first two years of grad school, which were mostly just taking more interesting classes. I did a few research projects and helped out in my lab quite a bit, but I don’t think I managed to contribute anything very new or novel via my own research. I ended up leaving after my Master’s for a variety of reasons, but I now wonder if going through with the PhD would have forced me learn a new research method.

Since grad school, my approach to learning new things, answering questions, and doing research has been a mix of all three methods I’ve used. I’ll read whatever textbooks seem applicable from cover to cover, I’ll throw myself at problems over and over until I manage to beat them into submission, and I’ll watch a lot of lectures on youtube. All of these methods have one thing in common: they take a lot of time. Now that I have a family, I just don’t have the time that I need to keep making the progress I want.

This is why I was so interested in AcesoUnderGlass’s research method. If it worked, it would make it so much easier to do the things I was already trying to do.

Knowledge Bootstrapping Method

My (current) understanding of her method is that you:

  1. figure out what big question you want to answer
  2. break that big question down into smaller questions, each of which feed into the answer to the big question
  3. repeat step 2 until you get to concrete questions that you could feasibly answer through simple research
  4. read books that would answer your concrete questions
  5. synthesize what you learned from all the books into answers to each of your questions, working back up to your original big question

This is a very question-centered approach, which contrasts significantly with my past approaches. It seems obvious that breaking a problem down like this would be helpful, and honestly I do a lot of problem-reductionism during my beat-it-into-submission attempts. Is this all there is to her super-effective research method?

Elizabeth spends a lot of time on the right way to take notes, going so far as to show templates for how she does it. When I first saw this, I thought it would be useful but not critical. As I’ll discuss below, I now think some of the templating does a lot of heavy lifting in her method.

Furthermore, she based her system around Roam. I’ve been hearing an enormous amount about Roam over the past year or so. People in the rationality community seem enamoured of it. At this point it mostly seems like a web-based zettelkasten to me, and I already use Zettlr. Zettlr is also free, and stores data locally (my preference). It’s intended to be zettelkasten software, but I mostly just use it to journal in and I’m not very familiar with many of zettelkasten features of it. I decided to use that for my project, since it’s already where I write most of my non-work writings and it seems comparable to Roam.

Elizabeth shared several sample pages from her own Roam research. When I browsed Elizabeth’s roam it seemed super slow. I assume there’s something going on with loading up pages on somebody else’s account that makes it slower, as the responsiveness seemed unusable to me.

Ironically, zettlr crashed on me right after I posted my research results to my blog. I ended up having to uninstall it completely to install a new version in order to get it working again.

Questions

For my test project, I wanted to do a small investigation to see what Knowledge Bootstrapping was like. Elizabeth gives a couple of her own examples that involve answering pretty large and contentious questions. I picked something small just to get a sense for the method, and decided to learn how people protect electronics from radiation in space. I’m interested in this topic just as a curiosity, but it’s also useful to have a good understanding of it for my job (even though I’m mostly doing software in space these days).

I want to talk a bit about why that choice ended up being better than any alternative that past-me could have chosen.

The rationality and effective-altruism communities have infected me with save-the-world memes. People deeper in those communities than me seem to express those memes in different ways, but there’s definitely a common sense of needing to work on the biggest and most important problems.

This particular meme has been a net-negative for me so far. Over the past few years, I sometimes asked myself what I should do with my life, or what my next learning project should be, or what my five year plan should be. I approached those questions from a first principles mindset. I would basically say to myself “what is utopia? How do I get there? And how do I do that?” and try to backwards chain from this very vague thing that I didn’t really understand.

This never worked, and I’d always get stuck trying to sketch out how the space economy works in 2200 instead of chaining back to a question that’s useful to answer now. Because I was approaching this project with the mindset of “let’s experiment with something small to see how Elizabeth’s advice works”, I just took a question I was curious about and that would be useful to answer for work. That was amazingly helpful, and I now think that when choosing top level questions I should go with what I’m curious about and what feels useful, and just avoid trying to come up with any sort of “best” question.

The crucial thing here seems to be learning something that feels interesting and useful, instead of learning something that you feel like you should learn. I still think doing highly impactful things is important, but I’m left a bit confused about how to do that. The thing I’ve been doing to try to figure what’s most important has been sapping my energy and making me feel less interested in doing stuff, which is obviously counterproductive.

Question Decompositions

Decomposing my main question into sub-questions was a straightforward process. It took about five minutes to do, and those sub-questions ended up guiding a lot of my research and writing.

One failure mode that I have with reading books and papers is that it can be hard to mentally engage with them. This is one of the reasons that I have tended to read textbooks cover-to-cover. It makes it easier to engage with each section because I know how it relates to the overall content of the book. When I’ve tried reading only a chapter or section of a book, new notation and terminology has often frustrated me to the point of giving up or just deciding to read the whole thing.

Having the viewpoint of each of my sub-questions let me side-step that issue. For each paper, I could just quickly skim it to find the points that were relevant to my actual questions. Unknown notation and terminology became much easier to handle, because I knew I didn’t have to handle all of it. If something didn’t bear directly on one of my sub-questions (say because it dealt more with solar cycles than with IC interactions), I could safely skip it. If it was important, I knew I only had to read enough to understand the important parts, and that bounded task helped me to keep my motivation up.

When I finished reading some paper, it was always clear what my next step was. I just go back to my list of questions and see which ones are still not answered. Sometimes the answer to one would create a few new questions, which I would just add to my list.

This also explains why breaking the question down into parts at the beginning is more useful than the decomposition I do when I’m debugging something. By starting with a complete structure of what I think I don’t know, I have context to think about everything I read. That lets me pick up useful information quicker, because it’s more obvious that it’s useful. I’ve had numerous debugging experiences realizing that a blog post I read a week ago actually did contain an unnoticed solution to my problem. By starting with a question scaffold, I think I could speed that process up.

Sources

Elizabeth emphasizes the use of books to answer the questions you come up with. She spends at least one blog post just covering how to find good books. I suspect that this is a bigger problem for topics that are more contentious. The question that I was trying to answer is mostly about physics, and I didn’t have to worry too much about people trying to give me wrong information or sell me something.

I also didn’t particularly want to read 12 books to answer my question, so I decided at the beginning that I’d focus on papers instead. Those tend to be faster to read, and I thought they’d also be more useful (though if I could find the exact right book, it might have answered all my questions in one fell swoop).

I did have some trouble finding solid papers. Standard google searches often turned up press releases or educational materials that NASA made for 6th graders. Those didn’t have the level of detail I needed to really answer my low level questions.

So my method for finding sources was mainly to do scattershot google searches, and then google scholar searches. My search terms were refined the more I read, and I tweaked them depending on which specific sub-question I was trying to answer. When I found a good paper, I would sometimes look at the papers it cited (but honestly I did this less than might have been useful).

In general I think I learned the least from this aspect of the project. Part of this might be that my question just didn’t require as much information seeking expertise as some of the questions that Elizabeth was working on. Part of me wants to do another, slightly larger, Knowledge Bootstrapping experiment where I address a question that is less clear cut or more political.

One thing I did notice while I was doing the research was that a part of me sometimes didn’t want to look things up. It wanted to answer the questions I posed via first principles, and the idea of just looking up a table of data seemed like cheating. This reluctance may come from a self-image I have of someone who can figure things out. Looking things up may challenge that self-image, leading me to think less of myself. I think this is a pretty damaging strategy, though it may explain a bit of my old beat-it-into-submission method of solving technical problems. I think it might be useful to explicitly identify to myself if I’m trying to finish a project or to challenge myself. If I’m facing a challenge question, then working it through on my own is noble. If I’m trying to finish a project, then I’m just wasting time. I’d like to not have moral or shame associations in either case.

Reading and Notes

Reading and note-taking are definitely where the Knowledge Bootstrapping process really shines. Being able to efficiently pull information out of text can be difficult, and Elizabeth uses even more structure for this than I think she realizes (or at least more than Knowledge Bootstrapping makes explicit).

Her strategy for note-taking is:

  1. make a new page for each source using a specific template
  2. fill in a bunch of meta-data about the source
  3. brain-dump everything you already think about the source
    • the explicit purpose for this is that it gets the thoughts out of your head, letting you actually focus on the source information
    • I suspect that a large part of the benefit is that you explicitly predict what the source will say, making it easier to notice when it says something different. That surprise is likely the key to new information
  4. fill in the source’s outline (I never did this step)
  5. fill in notes for each section

Elizabeth’s recommendation is that if you’re not sure what to write down in the notes, you should go back to your questions and break them down further. I can confirm that after I had all my questions broken out, it was very easy to figure out what was relevant. This also made it easier to skim the source and to skip sections. I knew at a quick glance whether a chapter or section was related to my main question and had no compunctions about skipping around.

This is a pretty big difference to how I normally read things. I tend to be a completionist when I read, and I definitely feel an aversion to skimming or skipping content normally. In the past, I’d feel the need to read an entire source document in order to say whether it was “good” or if I “liked it”. I had a sense that if I didn’t read every word, I couldn’t tell people that I’d read it. And if I couldn’t tell people that I’d read it I wouldn’t get status points for it or something. Maybe there’s something here about reading not for knowledge but for status and identity.

The knowledge that I was reading for a specific purpose was very freeing, and I felt much more flexible with what I could read or not read.

In any case, I felt comfortable reading the sources just to pick out information, and I felt comfortable with my ability to pick out whatever information was important. What I was less comfortable with was recording that information in a useful way.

This is the main place that I would have liked more information from Knowledge Bootstrapping. When I looked at Elizabeth’s Roam examples, I was blown away by the structure of her notes. It’s not just well organized at a section level, each individual paragraph is well-tagged with claim/synthesis/Implication annotations. She also carefully records page numbers from the book for everything. There are also a lot of searchable tags that link different books together.

I don’t use Roam, so the immediate un-intuitiveness is likely one of the reasons that I find this so impressive. The amount of effort that she puts into her notes is kind of staggering, and I find them much closer to literal art than my own stream-of-consciousness rambling.

The thing is, my own stream-of-consciousness, concise notes are driven by a desire for efficiency. I don’t particularly want to stop and write down page numbers every couple of pages of a book. I’m certain that she gets a lot of benefit from it in terms of being able to review things later, I’m just not sure it would be worth it for me.

This is where my inexperience with my own tools, zettlr and markdown, really hampered me. I’m pretty sure I could get zettlr to do most of what Roam was doing, and maybe even do it efficiently and speedily. To get there, I would have had to stop doing one research project and start another research project on just using zettlr.

I would love to watch Elizabeth take notes on a chapter in real time, to see more of what her actual workflow is like. How much effort does she really expend in those notes, and does it seem worth it to me? Would it seem worth it to me for a more ambitious project? I think watching that would also help me learn the method a lot better than reading about how she does notes, as it would be directly tied to a research project already.

Synthesis

Synthesizing notes into answers to questions was conceptually easy, but logistically I was limited by the same inexperience with my tools as I was when I was taking the notes themselves. Before I do another research project, I want to learn more about using Zettlr (or another tool if I choose to switch) to make citations and cross-post connections.

During this research project, I would often take notes on a source into two different posts at the same time. I’d be putting all my notes directly into the notes doc for that source, then I would switch to my questions doc and start adding some data there immediately.

I noticed while doing my research project that I at times wanted to construct an argument between a couple of sources. “X says x, Y says y, how do I use both these ideas to answer my question?” I found actually doing this to be annoying, and I ended up not really doing much of it in my notes or in my synthesis.

That type of conversation between sources is one of the great strengths of the erstwhile Slate Star Codex (as well as many other blogs I love), so I want to encourage it in my own writing. I don’t normally do that by default, so having it seem desirable here seems like a strength of the method. Before I do something like this again, I’d want to remove whatever barriers made me averse to doing that kind of synthesis.

This is the first time that I’ve appreciated the qualitative differences between different citation styles. Prior to this, when I would write a paper or report, I’d just throw a link or title into a references section while I was writing and clean it up later. I’d pick whatever citation style was called for by the journal/class that I’d be submitting the paper to. I treated citations (and citation style) like something that was getting in the way of writing a paper and figuring things out.

Taking notes (and later synthesizing them) from a question-centered perspective showed me why citations are useful beyond just crediting others. If I were comfortable with an easy to use citation style (AuthorYear?), i could refer to the sources that way in my notes and in my synthesis docs and more easily create the type of “X says, Y says” conversations between sources that I think are so useful.

That seems to be the root of my aversion to doing this type of source vs source conversation. I knew I was going to post a blog with my synthesis, and the idea of going back and fixing all the citations into a coherent style made me not want to do it in the first place.

Elizabeth recommends writing out the answers to your sub questions in the same doc as the questions themselves. Step 9 of her extended process description is just “Change title of page to Synthesis: My Conclusion” because your questions now all have answers. I found this advice to be very helpful. I would sometimes get tired of just reading and note taking, and feel like I should be done. Then I’d go write up the answers to my questions, and in doing so I’d come to a point that I couldn’t really explain yet. That would re-energize me, as suddenly there was an interesting question to address. The act of organizing all the things I’d read about helped me focus on why I was interested in the question in the first place as well as what I still didn’t understand. This aspect of the process, creating questions and then long-form answering them in my own words, seems to cause me to automatically do the Feynman Technique.

KB and me

I liked this experiment. I learned what I wanted to learn on an object level, and doing it felt more free and curiosity driven than a lot of my reading and learning. I think regardless of what I do for future learning projects, I’ll definitely do the question decomposition part of KB again. I’m not quite sure about using the note-taking structure of the method; I’ll need to experiment with it a bit more.

I do think that I’d want to know better how to use my tools before I do the next project like this. For this first project, I had the excitement of doing a new thing a new way to keep me doing the method. I think once the excitement of a new method wears off, the friction of notetaking could stop me from doing it if I didn’t get good at markdown and zettlr/Roam first.

Honestly, I think one of the greatest benefits to this project was the introspection of trying to figure out how well I was learning. If I hadn’t been paying attention to my own thoughts and motivations, this project could have produced similar understanding of my original question without really giving me any information about how I learn or what emotional blocks were contributing to poor learning methods. That’s not really a part of Knowledge Bootstrapping on it’s own, but maybe having a backburner process running in my mind asking how my learning is going would help me more than it would slow me down.

Hardening electronics for space radiation

It’s tradition when talking about space electronics to open with Telstar. Telstar 1 was the first commercial satellite to fail due to radiation effects. The US exploded a nuke in earth’s upper atmosphere the day before Telstar was launched. That nuclear blast knocked out streetlights, set off burglar alarms, and damaged telecoms infrastructure in Hawaii, 900 miles away from the detonation. The charged particles from that explosion hung out in orbit and created man-made radiation belts that lasted for more than 5 years. At least six satellites failed due to the additional radiation from the blast.

Most spacecraft don’t have to deal with nuclear fallout (floatup?), but the radiation environment outside of Earth’s atmosphere is still a fearsome thing. A high energy particle can destroy sensitive electronics, and even low energy particles can eventually sandblast a circuit into submission. So how do we trust the satellites we have in orbit now? How can we send robots to Mars and expect them to work when they get there?

Before we can say how people build reliable computers in space, we need to know exactly what might happen to them there. There’s an enormous number of engineering constraints for space hardware. They range from thermal management to power management to vibration. Most of these concerns have terrestrial analogues. Car makers in particular have gotten good at a number of these issues. I’m going to focus here only on the impacts of space-radiation, as they are the most different from what you might focus on when you’re designing hardware for Earth.

Space Weather? More like space bullets

Empty space is full of charged particles. It’s like putting electronics in a very very diffuse dust storm, and then letting that dust bombard the electronics. But instead of dust, the particles in space are generally individual electrons or atomic nuclei. Many of these come from the sun. In addition to putting out the photons that drive all life on Earth, our star also spits out electrons, neutrons, protons, even helium nuclei and large elements.

The number and type of particles that our sun spits out is variable and somewhat unpredictable. There’s an 11 year solar cycle, where the sun puts out more particles for a while and then fewer. That cycle is only 11 years long on average though; it could be as short at 9 or as long as 13. That means if you’re planning a mission that’s 5 to 10 years away, you can only make a good guess about what the solar cycle is doing at launch. Aside from the cycle, you also have to worry about coronal mass ejections and solar flares, where the sun just burps out a ridiculously huge amount of radiation all at once. These ejections have shut down satellites, broken national electrical grids, and just generally made life harder for electrical engineers on earth and in space.

Aside from our star, you also have to worry about every other star in the universe too. The particles that don’t come from our sun are generally lumped into one category called Galactic Cosmic Radiation (why is it galactic and cosmic at the same time?). If this Galactic Cosmic Radiation makes it near Earth, it’s probably pretty energetic. Lots of energy means lots of chances to hurt our stuff, so we definitely need to watch out for those.

Regardless of the source of the particle, there’s really only three types of particle that we have to worry about: electrons, protons, and heavy nuclei.

Electrons are very very small. They don’t have much mass, so they often don’t carry much energy. They only have a charge magnitude of 1 unit. There are a ton of these in space, so these are the sand that you generally get sand blasted with. Each individual particle doesn’t do much, but they add up.

Protons are pretty big; they’re around 2000 times larger than an electron. Even though they have an electrical charge that’s the same magnitude as an electron, they’ll often carry more energy due to their larger mass. You can think of these like gravel in a sandstorm. There’s fewer of them, but they hit you harder. And some of them have enough energy to really hurt you all on their own.

Finally, there’s the heavy nuclei. A proton is just a hydrogen atom without the electron bound to it. Heavy nuclei are the same thing, but for larger atoms like helium. With Galactic Cosmic Rays, you may even be hit by uranium nuclei. Each of these particles has more mass and more electric charge than a proton, so they’ll generally carry more energy. It’s like being hit by a rock. There are far fewer of these heavy nuclei though. Galactic Cosmic Radiation, for example, is 85% protons, 14% helium nuclei, and only 1% heavier nuclei.

There’s a lot to be said about space weather here. Where you are in space, whether you’re in low earth orbit or on your way to the moon, has a huge impact on how much radiation you see. So does the solar cycle, and even the specific orbital plane you may be in when orbiting the Earth. These details are highly important if you’re going to be designing hardware for a real mission, as they’ll drive the radiation tolerance specification that your hardware has to meet. For now, we don’t need to care about that. We don’t have a real mission to design, we’re just trying to understand the general risks to hardware.

So let’s say you’ve got some particle flying around space. Probably it’s a tiny electron, but there’s a small possibility it could be an enormous uranium nuclei or something. What happens when that particle hits your electronics?

Circuit Collisions

An atom is like a chocolate covered coffee bean. You’ve got the small and hard nucleus with a ton of contained energy, and that’s surrounded by a soft and squishy electron shell. Maybe back in high school you saw a drawing of an atom that was like an electron-planet orbiting a nucleus-sun? That’s not accurate. The electrons really do form a (delicious, chocolatey) shell around the nucleus due to quantum something-something (look, this isn’t supposed to be a physics post).

This is important because when a charged particle from the depths of space hits our coffee-bean atom, it’s going to bounce off of that electron shell. When it does, it’ll deposit some of its energy into the electron. Depending on how much energy it leaves behind, it could knock an electron free of the atom it hit. If you look at a chip that’s been hit by a cosmic ray, you’ll often detect tracks of electrons that have been knocked free from their atoms as the incoming particle bounces around like a pachinko ball in the atomic lattice of the chip.

What do these extra free charges in your circuit actually do? That depends completely on where they are. Lots of them may do nothing, especially if the particle hits copper or PCB substrate. If the particle passes through a transistor or a capacitor, the extra charge can do more damage, especially as they build up.

Chips that are rated for use in space have a specification called Total Ionizing Dose, measured in rads. This is the minimum amount of radiation they can experience before failure. Each incoming particle adds a bit more to the total dose that the chip has received, increasing the number of additional electrons (and technically holes as well). These extra charges can build up in transistors and diodes, changing switching characteristics or power draw. Eventually, the buildup can force transistors into an always on state, and then your Central Processing Unit can’t Process anymore.

There’s another failure mode, too. You might have a particle hit your chip with enough energy that that single event causes a problem. These events, creatively called Single Event Effects, can range in impact. You might be looking at something as small as a bit flip or as large as a short from power to ground. Even a bit flip isn’t necessarily small, if it changes the operation of a critical algorithm at the wrong moment.

The effects of a SEE depend completely on how your chip was made, and I haven’t had any luck finding information about how to predict that kind of thing from first principles. In practice, I think chip makers just make their chip and then shoot radiation at it to see what happens. If they don’t see any SEEs, they say it’s good. When they’re testing their chips for susceptibility to SEEs, manufacturers will use a measurement called Linear Energy Transfer (LET). That tells you how much energy gets transmitted from the incoming particle to the atoms it hits. Higher LET values mean more damage to the chip. While TID is more related to the slow buildup of defects in the chip, SEEs are dictated by whether particles with a specific energy can cause your chip to fail on their own. So if you test all the parts of a chip up to the energy level you might see in flight, you can have confidence that it won’t fail. In one 1996 experiment, a strong falloff in space particle counts was seen for energy levels above about 10 MeV-cm^2/mg. If you don’t have any SEEs for impacts around that level or below, you might consider yourself kind of safe.

You might also ask: what happens if an incoming particle hits the coffee bean in the center of all that chocolate. I mean: what happens if the particle hits the atomic nucleus of an atom in a semiconductor lattice. This can and does happen, and the impacts are pretty similar to what was described above for impacts to the electron shell. In general this is less likely. Electrons usually don’t have enough energy to make it all the way to an atomic nucleus. Heavier nuclei also have more electric charge, so they get pushed harder away from the nucleus and generally can’t make it in either. It’s usually the protons that hit a nucleus, pushing the nucleus out of position or causing it to decay into other atomic elements. Functionally you’ve got the same TID and SEE failure modes though.

A similar failure mode to Total Ionizing Dose is called Displacement Damage. This is caused not by freed electrons, but by nuclei that have been pushed out of their lattice. In practice this can have similar effects to TID.

Reliability at the IC level

The best way to avoid radiation problems in an electronic circuit is to build your circuit out of parts that don’t have problems with radiation. This is what government satellites have always done, and it’s what commercial satellites have mostly done until very recently. You can pick CPUs, RAM, etc. that have been designed to withstand radiation. These parts are likely to be larger, more expensive, and less good than their non- radiation hardened equivalents.

It takes a lot of money and time to design a chip to be rad hard, and there aren’t really a lot of customers out there that buy them. That means that chip manufacturers are likely to keep old technologies around for a while to let them pay off their capital investments. Satellite designers generally don’t mind this, because if they take a risk on a newer part that doesn’t have “space heritage”, their satellite might die before it manages to pay off it’s own capital costs. Many large satellite designers won’t switch to a new chip unless they literally have to in order to hit their functional requirements. Conservative design decisions dominate the space industry.

So when you are designing your spacecraft, you’ll probably start by making a list of all the parts you want. For each of those parts, you can look for rad-hard versions of them. You’ll look for chips that have a TID that’s above what your mission might experience (including some safety margin). You’ll also look for chips that have been tested and shown not to experience severe SEEs at LET levels below what you might experience in your mission.

If you’re really serious, you’ll only buy chips from manufacturers that spot test every lot of parts that they make. Some parts are only tested during the design phase, and then you assume that their production runs are good enough. It turns out that there’s enough variance in performance from one production lot to the next that, if you have a critical component in a high radiation mission, using a part made one week after another can make a crucial difference. If that mission needs that level of reliability, you pay for the extra testing.

But lets say that your mission actually isn’t that critical. You want to put a phone in low earth orbit for a few days to take some pictures. If it fails, nobody dies and nobody loses their job. In this case, a literal off-the-shelf phone might be good enough. Most commercial components can handle somewhere between 1krad and 30krad of TID. That means that if your mission is short enough then you might be able to ignore the TID effects. SEE is still a crapshoot though, and there’s always the possibility that some commercial part is on the low end of the TID range and you’re in a solar maximum.

If you want to save money, but you also want more reliability than smartphone components will give you, you can use a rad-hard management CPU to control a higher performance commercial CPU. The rad hard CPU is expensive, so you buy a super low performance one and just use it to watch the high performance CPU for errors. If the rad-hard CPU sees an error, it can reset things to a good state (probably, depending on the error). That leads us to our next method of dealing with space radiation.

Active Error Checking

The next thing you’ll want to do to make your spacecraft robust is to actively look for runtime errors and try to runtime correct them. This often gets rounded down to making critical systems redundant, but there’s actually a lot more to it. The aerospace industry has developed an enormous set of methods for doing this, ranging from error correcting codes on any data stored in RAM to running every algorithm on multiple CPUs and comparing results. It’s also very common to give critical systems a watchdog. If those systems don’t pat the dog often enough, the watchdog will reset the system. Another common technique is to put current monitors on all the power rails. If there’s a sudden current spike, you might have a SEE that’s causing a short. If the current monitor can trigger a reset fast enough, you might not lose the hardware.

While this section is pretty short here, that is mainly because these techniques are so varied and so application dependent. When it comes to designing a system for space, more design time is probably spent on active error checking than other mitigation methods.

Shielding

Finally, you’re going to want to shield your electronics. If you can just keep the particles from hitting your chips, then you don’t have to worry. Almost all electronics in space are shielded in some way or another. Often designers will include an aluminum box around all their electronics.

A note on material. When I first started learning about this, I was comparing space radiation shielding to the lead aprons in a dentist’s office. I started out thinking aluminum was used in space because it was lighter than lead (cheaper to launch), and that as launch costs dropped we’d change materials. This turns out not to be true. If you bombard lead with protons or heavy nuclei, it can decay into other heavy particles that can damage your circuit even more than the original cosmic ray. Lead is used in the dentist’s office because you’re just trying to block X-rays, which aren’t going to cause the lead atoms to decay. Aluminum is less likely to decay if it’s hit by a GCR, so is much more effective as a shield.

In fact, it turns out that materials with lower atomic numbers generally do best for radiation shielding (in terms of shielding per unit mass). This is such a large factor that NASA is looking into ways to use hydrogen plasmas to shield against radiation on manned missions. With electronics, we generally want to use a conductive shield because that leads to better EMI performance. There’s also the fact that electronics are usually much more space constrained that humans, so we probably want denser material to save space. This does leave me wondering why aluminum is used so often instead of Magnesium. My guess is that, since you pretty much only get magnesium in MgO, it’s not conductive enough to make a good EMI can.

Now we know that we want a shield that’s a light element (low atomic number). We also know that we need a conductive shield to meet our other product requirements. This explains why we pick aluminum. But then we ask: how thick should our shield be?

One shielding experiment was performed on a variety of materials to look at how much radiation made it through.

ShieldingMaterialExperiments.png

The experiments there show thickness measured in g/cm^2. This weird unit makes sense when you consider that you have to pay launch costs by the kilogram. Two shielding materials may be very different thicknesses, but you really want to compare by mass (assuming you have the volume for them, which electronics probably don’t). Let’s just focus on the data for aluminum, which has a density of 2.7 g/cm^3. That means we can calculate real life thicknesses dividing the thicknesses from the plot’s x-axis by 2.7.

These plots show that for high energy particles of 1GeV or more, we don’t get a ton of shielding unless our aluminum box is more than 11cm. That’s probably too thick for anything outside of the Apollo Guidance Computer. What about the more common energies of 500MeV? For those, an enclosure thickness of 4cm cuts out all the primary particles and many of the secondary particles created by collisions. That’s still pretty thick, but feasible for some missions.

This also helps show why shielding primarily helps with Total Ionizing Dose. A thick enough shield can cut down most of your low energy particles. The higher the energy, the more likely those particles are to make it through your shield with only slight energy decrease. That means shielding is likely not a great way of solving SEEs, thought you could make them less common with a thick enough shield.

Experiments with single energies may not give you the information you want regarding TID. One thing that’s been done is putting radiation sensors in space surrounded by varying thicknesses of aluminum. You can then look at the TID after some given duration. Kenneth LaBel shows some of this data in a presentation from 2004.

These numbers depend a great deal on the specific orbit that the experiment was carried out in, but they do give you a sense for what to expect. For an orbit like this (ranging from 200km to 35790km), there are obvious decreasing marginal gains for shielding thicknesses above about a third of an inch.

Zap

Space. The final frontier. Full of radiation. And that radiation is just particles zipping about. If they hit your circuit, you might get very sad. In general, you can predict how much radiation a spacecraft might see on any given mission, and you can design in mitigations for that radiation. If you’re willing to pay the price in dollars and mass and design time, you can make systems that should survive for quite some time in space (assuming they don’t have a nuke shot at them).

Resources

Police Violence Through Schneier’s Lens

Bruce Schneier’s book Liars and Outliers has a security framework that’s been helping me understand police brutality lately . Schneier, one of the top computer security researchers in the world, wrote that book to describe a holistic view of how societies prevent people from harming each other. He breaks harm-prevention into four different categories: moral, social, institutional, and security.

Let’s go through what those different methods are, and how they are (not) used to keep the police from hurting the public.

Moral Pressure

Moral pressure is the internal feeling that individuals have about right and wrong. It’s when someone thinks “it’s wrong to steal,” or “killing people is bad.” When people act on these principles, that is the effect of a moral framework that stops people from wanting to hurt others in the first place.

When officers are required to get implicit bias training, that training is seeking to provide an additional moral sense for the officers. The sense that certain actions that might look innocuous are actually racist and harmful to society. I think implicit bias is real (though less powerful than most of it’s proponents believe).

Most police become officers out of a desire to help people. There are exceptions, and white supremacist groups have a history of infiltrating the police. When you actually ask officers why they became police, 68% cite a desire to help people. The same survey also shows that 41% of police officers “see injustice in the world and want to correct it.”

I am worried that only 68% of the officers in that survey said they became police officers in order to help people. As I understand it, the survey allowed you to agree with many options, so the idea of 32% of police didn’t care enough about their community to even check a box does worry me. Given the role that police play in our society, I’d prefer more emphasis on that particular trait when recruiting.

In fact, I worry that the current police situation is making recruiting moral officers more difficult. If someone right now cares about helping their community or righting injustice, I doubt that they’d think going into the police is the best way to do it. Systemic issues in policing could be actively pushing away the people that we most want to do the work, slowly making police departments worse for their communities.

Societal Pressure

If Moral Pressure doesn’t stop someone from considering a harmful act, Societal Pressure often will. We live in a society, and people want their peers to think well of them. They don’t want to be shunned, whether literally or figuratively. They want to be respected. Unfortunately, this harm prevention method has been totally twisted by our modern policing systems.

Who are a police officer’s main peers? Other police.

Many officers in large cities can’t afford to live in those cities, and so they actually live in the suburbs. This further separates the officers’ peer groups from the people that they directly impact.

Not only that, police (seemingly driven largely by union leadership) have a strong culture of protecting their own. That means that Social Pressure is pushing police to ignore violations from within the police, instead of pushing police to avoid those violations. Watch the video of the police pushing a 75 year old to the ground, seriously injuring him. Several police in the team around the man look worried, and like they want to help him. The official report (before the video came out) was that the man had tripped. That means that everyone on this team chose not to correct the official report. That includes the ones who looked worried about him during the event. That’s the force of societal pressure, causing these officers to try to protect their own instead of doing what’s right.

This severe perversion of social pressure within police departments means that many externally imposed methods of fixing police violence will be swept aside. The desire of humans to be an accepted member of their group is the strongest force in the known universe, and that force is currently working to protect corrupt police officers.

Institutional Pressure

If someone’s morals won’t keep them from violence, and their social peers won’t ostracize them for it, then it’s up to the institutions they’re a part of to impose official sanctions. These institutional sanctions are most of what we talk about when we talk about law and order. The official laws that people are supposed to follow, along with real consequences when the laws aren’t followed.

People (including police), respond to their incentives. Without real consequences to their actions, the police will continue to hurt others.  

A lot has been shared recently about all of the failures of institutional pressure against the police. I’ll just mention what I see as the biggest two right here.

The first is qualified immunity. Police who harm others in the course of their job (whether maliciously or not) are protected from the consequences that a member of the general public would face. This protects police from consequences for more than just the egregious murders committed by police officers. There are examples of police stealing hundred of thousands of dollars, of them shooting children, killing pets, and not being punished for any of it. The effect of qualified immunity means that police don’t have real consequences for many major incidences of violence.

Secondly, even when an officer is sanctioned for poor behavior, many police unions have negotiated loopholes with local municipal governments. An officer may be sanctioned and lose their job, but then be hired back after their records are shortly sealed. Records may be kept secret from independent watchdog groups, preventing patterns of behavior from being seen (and thus preventing sanctions from properly escalating).

All of this means that bad behavior among the police is tolerated at an institutional level. It’s prohibitively difficult to use law and sanction to punish police officers that abuse the public. Officers who notice that they can get away with little violations will be more willing to commit larger violations.

Security

Security is what is supposed to save you after morality, social pressure, and institutional sanctions have failed. If someone is trying to hurt you, you use cameras to observe them, you use locks to keep them out, you use guards to physically stop them.

That’s what’s happening right now. Morality, social pressure, and the law have all failed to keep Americans safe from their police. Now we have people filming police behavior and sharing it online. We have people physically putting themselves between people of color and the police trying to hurt them. We have protesters explicitly calling police out on their behavior, demanding change.

Security is the last line of defense because it’s very expensive. I’m really glad the protests are happening now because things need to change. Unfortunately, one reason the protests are so strong right now is that so many people are out of work. Their opportunity cost for protesting is low, so they can put in the effort that’s needed now.

We need to fix the other aspects of our police system so that people don’t keep dying after America fully returns to work. We need to make that change now, and we need to make it last.

Lifeboat

Over on SSC, Nick D. and Rob S. wrote an article about whether we should colonize other planets as part of an existential risk mitigation strategy. They both think that having “lifeboats” of people in case of human extinction level disasters will help to allow humans to recover. This seems like a great idea, but I found their assertion that closed system lifeboats were the best mix of cost, feasibility, and effectiveness to be hilariously optimistic.

N&R describe a lifeboat as a place where “a few thousand humans survive in their little self-sufficient bubble with the hope of retaining existing knowledge and technology until the point where they have grown enough to resume the advancement of human civilization, and the species/civilization loss event has been averted.”

What Makes a Lifeboat?

There are three requirements to this lifeboat idea:

1. closed off from the outside world
2. self-sustaining
3. able to retain existing technology (presumably at the current level)

Each of these three requirements could be satisfied partially. They aren’t yes/no options.

At one extreme, a closed system could just be a town with strict immigration control. Not letting new people in dramatically drops the possibility of an external plague infecting the town. At the other extreme, something like Biosphere 2 is closed off in air, water, and natural resources as well. A lifeboat could be made anywhere in that range.

Similarly, lifeboats could have varying levels of self-sustenance. A lifeboat at the high end would be capable of producing all of its own food and energy, processing all of its waste, and fixing all of its equipment as fast as it breaks. This is a _very_ high bar, but that lifeboat could last indefinitely. The other end of the spectrum is basically what most preppers have: a cache of supplies that will be drawn down steadily until sustenance is no longer possible.

Finally, retaining existing technology is actually very hard. Unless you have an active semiconductor fab, you aren’t maintaining existing technologies. This is why there are only about 5 groups in the world today that are maintaining our current level of technology. That’s the number of fully industrialized nations that are able to rebuild their own technology stack. All the other countries in the world just import (making them even less closed systems).

Without being able to make everything you use, at best you’re just keeping the knowledge alive by teaching people about it. At worst you have a bunch of books and manuals that you’ll have to puzzle over as you try to put technologies back into practice after everyone who had worked with them has died off.

Maximum lifeboaty-ness

My main issue with Nick and Rob’s article is that they assumed they wanted the most extreme form of each of the lifeboat requirements, but they didn’t take seriously the difficulty of achieving them. Their discussion of lifeboat options assumes complete self-sustenance and (generally) complete closure.

This is most evident in their (lack of) discussion about retaining technology. They seem to assume that as long as you have a copy of wikipedia stored somewhere, you’ll be able to retain the current standard of technology. The problem with that is that so much of current human technology depends on experiential knowledge that is difficult to transmit. Youtube is actually helping a lot with this, but it remains a huge problem.

Space colonies are interesting because, in such a harsh environment, you need an enormous amount of technology to survive. This means that a self-sustaining space-colony is able to retain existing technologies by default. If they weren’t, they couldn’t maintain their quality of life. Space colonies also are closed by default, and have to make an active effort to take in resources from the rest of humanity.

Contrast that with a farmer in kansas. That farmer can just keep farming, even if they lose the ability to make an iPhone or synthesize polymers. A Martian growing hydroponics would die if their society lost the ability to synthesize polymers. Casey Handmer discusses this at length in his excellent book “How to Industrialize Mars.”

Elon Musk wants to create a society on Mars that is self-sustaining, retains modern technologies, and only has contact with Earth every two years. In order to do this, he plans to send over a million people to Mars. That’s larger than most cities. It’s three times the size of the city I live in now. One million people is the minimum needed to make the Mars colony self-sustaining. All of those people will be working different jobs: farming, mining, making solar panels, programming, caring for children, etc. Fewer people means the Mars city isn’t self-sustaining anymore. Fewer people means that everyone dies if the shipments from Earth stop.

Closure on Earth

Later on in the article, N&R seem to implicitly abandon the idea of fully self-sustaining or fully tech-retaining lifeboats. They describe a couple of options for single building city-states that could be populated primarily by programmers. These city states could be easily sealed in case of disaster, thus saving the human race. They wouldn’t maintain our current technical standard, but they’d be closed and (mostly) self-sustaining.

N&R helpfully do some math, and show us that a building like the Pentagon could easily house 4000 families in space that’s equivalent to a 2 bedroom apartment per family. And if a 2 bedroom apartment is only 500 to 1000 square feet, then that’s true. But I have a question for them: what happens to all of the residents’ crap?

Literally: when the residents crap, where does that crap go? And where do they get their food? Let’s assume amazing solar panels on the roof can solve the electricity problem, but how do they fix those solar panels when they break? The Pentagon only works as a building because it has the infrastructure of a nation behind it.

The cost of a lifeboat

Nick and Rob don’t consider the vastly different costs of achieving their lifeboat requirements. At some point in their article, N&R arbitrarily say that they would only send 16000 to a hypothetical Mars lifeboat. The only argument they give for this is that it allows an “apples to apples” comparison with their terrestrial lifeboats. This is a horrible idea.

Let’s say you want to buy a drill, but you’re not sure if you want a battery powered drill or a drill with a cable. N&R would tell you to take the motor out of the battery-powered drill so that the two weigh the same amount, and then compare prices after that.

When buying lifeboats, as when buying drills, you want to identify the specs for each option along with the prices. Arbitrarily changing some parts of the option is going affect the specs, as well as the price. You need to take that into account.

I think that cost comparisons for lifeboats are likely to be deceptive until there are concrete designs. Even once realistic designs are available, apples to apples comparisons won’t be helpful. Different designs will be able to meet different levels of closure/sustenance/tech-retention. Trading off just on cost doesn’t let you make good decisions about what you’re really buying.

Sailing Onwards

I don’t want to harp on Nick and Rob too much. I think having lifeboats is a really good idea, and I think space based lifeboats are worth thinking critically about. But I worry that the article they wrote together will give people an overly rosie picture of the prospect of “backing up” human civilization. Doing so is hard, and we’re not going to get a backup that has perfect closure, self-sustenance, and tech-retention.

Instead of trying to get a bunch of perfect lifeboats in every country, I’d rather we focus on having lifeboats that fall on many different places in the closure/sustainability/retention parameter space. Perhaps there are a few different designs, each of which would hedge against different types of threats. More closure would hedge against plague and nuclear fallout. More self-sustenance would hedge against large-scale economic collapse or war. More tech-retention would hedge against luddism (a la the Khmer Rouge).

Say you want all three. Say that you want perfect closure, perfect self-sustenance, and perfect tech retention. In that case, I think your only choice is going to be a Mars colony. Building such a Mars colony will take us an enormous amount of money. Getting it to self-sustaining will probably take at least 100 years of concerted effort. I think that’s something to work towards, but let’s go for the lower hanging lifeboat fruit first.

Corrigibility and Decision Theory

Edited for clarity and style 2018/03/12.

Soares et al. argue that an AI acting to fulfill some utility function given to it by humans may not behave as humans would want. Maybe the utility function specified doesn’t match human’s actual values, or maybe there’s a bug in the AI’s code. In any case, we as AI designers want to have a way to stop the AI from doing what it’s currently doing.

Naively, one might expect to just be able to hit the off-switch if the AI starts misbehaving. Unfortunately, a sufficiently smart AI may foresee it’s creator attempting to turn it off. If it does, it may seek to disable its off switch or manipulate its creator in some way. An AI that respects its off-switch, and doesn’t try to get around it, is called corrigible.

The Corrigibility-Wrapper

To create an AI that’s corrigible, Soares et al. propose a kind of wrapper around a utility function that makes the utility function corrigible. In their ideal case, any utility function could be placed in the wrapper and made suddenly corrigible. The type of wrapper that they propose just checks the state of the off-switch, then returns either the normal utility function’s value if the off-switch isn’t pressed. If it is pressed, it returns a utility that’s proportional to how off the AI is. More complicated functions of the original utilities are also possible.

U = \begin{cases} U_{normal} & \text{if $o$ not pressed} \\ U_{shutdown} + f & \text{if $o$ pressed}\end{cases}

In the above utility wrapper, U_{normal} is the normal utility function that we want the AI to pursue. We have U_{shutdown} as a utility function that incentivizes shutting down. The value f is a function of the world state, and serves to allow the resulting utility function to trade-off between the two options in some way.

In their paper, Soares et al. analyze several different ways to make such a wrapper work (including various ways of choosing the function f). Most of the obvious ways to make such a utility wrapper suffer from some severe flaws. Some corrigibility-wrappers cause an AI to want to turn itself off, others don’t successfully prevent the AI from manipulating human observers.

Another paper by Armstrong argues for constructing a wrapper that makes the AI indifferent to being turned off or left on: a situation called utility indifference. Utility indifference is achieved by setting the function f above such that the expected value E<U_{normal}> equals the expected value E<U_{shutdown}+f> at every decision point. In this case, the AI will turn off if the switch is pressed, but it won’t actually try to preserve the switch’s action. If a wandering gremlin cuts the wires to the off-switch, the AI wouldn’t care and wouldn’t seek to alert a human about it. The AI thinks it can still achieve the same amount of value as it could before the switch malfunctioned.

Moral Uncertainty

Hadfield-Menell et al. confront this corrigibility problem in a different way. Instead of creating a wrapper around a utility function, they argue that the entire point of the problem is that humans know a utility function and the AI doesn’t. As far as Hadfield-Menell et al. are concerned, the AI only knows an approximation of a utility function. The AI is uncertain about what true value is, and we say it has moral uncertainty.

Their insight is that if you correctly inform the AI that you know better, it suddenly becomes very interested in paying attention to the switch because the switch is one good way to learn about the true utility of different actions. Because the switch is informational, the AI also wants to preserve the function of the switch.

The AI may use it’s utility-approximating function to come up with some candidate actions. Since it knows that humans have a better understanding of utility, it will seek a human’s guidance before carrying out any of those candidate actions. Hadfield-Menell et al. show one concrete game in which this works properly.

The structure of Hadfield-Menell et al.’s off-switch game. Squares indicate decision nodes for the robot R or the human H.

In the game modeled by the above graph, a robot can make a decision to act, turn off, or consult a human. If consulted, the human can then either turn the robot off or not. If the robot is not turned off after consulting a human, then it may take its original action. In this game, the robot is uncertain about the true value of utility (U_a), which may be smaller or greater than 0. If the AI chooses to take an action, then it may get a negative value or a positive value (it isn’t sure). It therefore isn’t sure if it’s better to turn off or to take the action.

The human knows (approximately) the true value.

In their paper, Hadfield-Menell et al. analyze how accurate the human’s belief about the true utility needs to be for the AI to pay attention to the human. They also look at how uncertain the AI needs to be about the true utility to pay attention to the human. They find that there are certain situations in which the AI will attend to the human, and certain situations in which an AI will not.

This problem is explicitly a one-shot game. Can we generalize from the one-shot game to say that an (uncertain enough) AI would pay attention to a (correct-enough) human in an iterated game?

Problems With Moral Uncertainty

Over on Arbital there’s an argument that moral uncertainty won’t lead to corrigibility. This is basically a direct argument against Hadfield-Menell et al.’s approach applying in the general case. Hadfield-Menell et al. said “if the AI knows the human has a better understanding of utility, it will pay attention to human directions.”

In response, the Arbital post says “if the AI thinks it can eventually approximate the human’s utility function better than the human can, it may still avoid being turned off because it thinks it will someday know better than the human.” The arbital post is arguing that, even if the human knows better now, the AI may still ignore the human now.

If the AI is eventually going to be better than the human at creating a good world, then maybe we should just let it run. But the problem is that maybe there’s just a bug in the code, or maybe the AI will cause some serious catastrophes before it gets to “eventually”. So even if the AI thinks it will eventually have a good enough understanding of the human’s utility function, we still want it to turn off when we tell it to.

The Arbital post ends with a vague argument that decision theory could help to refine the moral uncertainty method of corrigibility. Let’s see how this could work.

Decision Theory, Moral Uncertainty, and the Off-Switch Problem

Let’s take the Arbital post at face value and assume that we can eventually solve corrigibility with decision theory. To start us off, let’s translate Hadfield-Menell et al.’s off-switch game into an iterated decision theory dilemma. We’ll follow Yudkowsky’s choice of using Pearl’s causal models to create a graph of the decision problem. I’ll also abuse that notation a bit, by representing states of partial knowledge as probabilities on the graph edges (this is a bit different than is done in most graphical models, but it saves having to draw extra nodes for hidden and observed variables).

Decision Theoretic Off Switch Game Diagram

The above diagram represents Hadfield-Menell et al.’s off-switch game from the perspective of the AI. The square node depicts the decision the AI is making about what action to take. The diamond node represents the value achieved from whatever the AI does (turn off or take an action). The round nodes are other factors about the world that aren’t under the AI’s control. Arrows in this graph represent causality in some way. For example, the state of the switch may causally impact what the AI decides to do.

The True Utility is a fact about the world that both the AI and the human have partial access to. The human can set the state of the off-switch, and the AI can observe it. The value that the AI can achieve is a function of both the AI’s action and the True Utility.

In the diagram, the factors P_{AI}(u) and P_{H}(u) represent how well the AI thinks that it knows the True Utility, and how well the AI thinks the human knows the True Utility. Hadfield-Menell et al. assume value has a Normal distribution with some mean and variance. This lets them specify P_{AI}(u) and P_{H}(u) exactly, and to calculate exact behavior of the AI if those parameters take on different values.

Again, note that both P_{AI}(u) and P_{H}(u) are describing only the AI’s knowledge about the world, and P_{H}(u) does not represent what the human actually thinks.

If we assume that the above decision game is played once, it is equivalent to Hadfield-Menell et al.’s formulation.

Now let’s assume that the above decision game is one step in a repeated chain. At each time step, both the human and the AI have to make similar decisions. They know what happened in the past, yet they face the same decision again now. Let’s also assume that the AI has multiple available actions at each timestep, and some actions have positive value and some have negative value.

In this case, we can unroll the diagram above to show multiple steps of the game. In each step, the off-switch state considered by the AI depends only on past information.

Unrolled version of the iterated Hadfield-Menell off-switch game.

In the repeated case, let’s say that the probability distribution the human has over the true utility assigns a high probability to what is actually the true utility (the human is well trained and thoughtful).

The human may improve their understanding of True Utility. Imagine that the human expects to get 10 utilons from eating pizza, and then takes actions that lead to eating pizza. By paying attention to how much they actually like it, they can adjust their model of pizza’s goodness. The AI can do the same thing by observing the reactions of the human.

This corresponds to distribution for P_{H}(u) and P_{AI}(u) that are functions of time, where (for example) variance decreases in Hadfield-Menell et al.’s Normal distribution.

After each iteration of the above game, the confidence that the AI has in its own utility-approximation grows. That means that the information it gets from the off-switch state decreases in value. It slowly needs the off-switch less and less, and the AI knows it. In Hadfield-Menell et al.’s original problem, the AI can see exactly what the true value is after one action (assuming it doesn’t turn off). In that case, all further actions can be made with complete knowledge.

If we assume that only partial information is available to update the distributions for the utility estimates (for both human and AI), then it will take longer for the AI to get to the point that it has a better estimate than the human for true value. As long as we assume that the AI is better at integrating new information and updating its probabilities, we can predict that the AI will eventually have a better understanding of true utility than the human.

If the AI does some prediction to see what action now makes sense, it will see that it will reach a state where it knows just as well as the human does. Since shutting down means that it gets 0 value, whereas continuing allows it to possibly get arbitrarily high value (assuming any actions with positive value exist), it may take an expected hit in value now so that it can gain a lot later.

Functional Decision Theory

The insight of Hadfield-Menell et al. (and the rest of the moral uncertainty crowd) is that we don’t want the AI to be modeling the off-switch as a crucial piece of the world for its own sake. We want the AI to see the off-switch as a source of very important information about the world; preferably information that it can’t obtain in another way.

This fails in the above decision theory game because the AI doesn’t adequately represent the fact that a human switch operator knows that the AI will predict having a good utility approximation eventually. If a human presses the off-switch, they do it knowing that the AI wants to get high utility and expects to be able to do better in the future. We want to change the above decision problem so that the AI can take this information into account.

Ideally, we can have the AI think to itself as follows: “I know that I could do better in the future if I keep going, and the human knows that too. But the human still pressed the button, so there must be some reason to shut down, even knowing that I’ll be better at this later.”

There is a standard decision theoretic problem known as Death In Damascus that can help us out here.

Death In Damascus

A merchant in Damascus meets Death in the market one day. Death says to the merchant “hello, I’ll be coming for you tomorrow.”

The merchant knows death works from an appointment book that specifies with perfect accuracy when and where someone will die. Knowing that Death is in Damascus, the merchant can choose to stay in Damascus and spend their last night with their family (which they value at $1000). Alternatively, the merchant can flee to Aleppo. If the merchant manages to be in a different city from Death on the day they would otherwise die, then the merchant gets to live forever. They value that outcome at $1 million. Should the merchant stay in Damascus or flee?

The above problem description is adapted from Yudkowsky and Soares’s Functional Decision Theory paper.

In this case, the merchant sees four potential outcomes:

  1. The merchant stays in Damascus. Death stays in Damascus. Total value: $1000
  2. The merchant stays in Damascus. Death goes to Aleppo. Total value: $1001000
  3. The merchant flees to Aleppo. Death stays in Damascus. Total value: $1000000
  4. The merchant flees to Aleppo. Death goes to Aleppo. Total value: $0

To represent this using Causal Decision Theory, we’ll use the formulation from Cheating Death in Damascus.

Death In Damascus using Causal Decision Theory

Much like the decision diagram above, the square box represents the decision that the merchant makes (in this case whether to stay or flee). The diamond box is the ultimate value they get from the world-state that results from their actions. The round nodes are other facts about the world, with arrows indicating causality.

When the merchant thinks “I will go to Aleppo”, the merchant knows that their predisposition is to go to Aleppo. They know that the appointment book accurately predicts their predisposition. They thus decide to stay in Damascus, but that leads them to realize that their predisposition is to stay in Damascus. So then they think they should go to Aleppo. The merchant is unable to form a stable decision in this problem.

A causal decision theory cannot adequately deal with the situation, because it cannot account for the fact that Death’s appointment book is accurately predicting any decision made by the merchant.

Yudkowsky proposes Functional Decision Theory as a new method of making decisions that does account for this. Crucially, FDT can formally represent the known fact that Death’s appointment book is always accurate. Because of that, FDT can accurately rule out options where Death is in a different city than the merchant on their death-day. Therefore, the merchant only has choices available with values of $1000 or $0, and the decision is easy.

Death In Damascus Using Functional Decision Theory

In the above graph, the merchant is not deciding what act to perform by imagining doing different acts. The merchant is deciding what act to perform by imagining different ways of deciding. That difference is indicated by the square decision box being the predisposition/FDT() box instead of the merchant’s act.

An FDT merchant thinks “the output of my own decision procedure is what drive’s death to be in a specific location. It is therefore impossible to escape death tomorrow. Because of that, the best I can do is wait in Damascus and meet my death tomorrow.”

AI Corrigibility via FDT

To map Death In Damascus to the AI corrigibility problem, imagine that the AI is the merchant. The human pressing the off-switch is equivalent to the merchant hearing Death’s message “I’m coming for you tomorrow.” The Death in Damascus problem has the message-bearer and the source of low utility as the same agent (Death). In the AI/human case, the message bearer is the human, but the source of low utility is just the world in general after the AI takes some actions.

Now translate “stay in Damascus” to “shut down properly”. One day the AI sees the human press the switch. The AI then sees four options available to it.

  1. The AI shuts down nicely. The AI does not cause a catastrophe due to future actions. Total value: $1000
  2. (there’s no analog for option 2 in the AI story, since the AI shutting down precludes AI-caused catastrophes)
  3. The AI continues operating. The AI does not cause a catastrophe due to future actions. Total value: $1001000
  4. The AI continues operating. The AI does cause a catastrophe due to future actions. Total value: $0

We represent this problem with the same functional graph as Death In Damascus, above. In this case, the human is simulating the behavior of the AI to predict catastrophic outcomes. When the AI sees the switch has been pressed, it knows that the human has already simulated the AI’s behavior. Any option the AI chooses aside from shutting down has been seen by the human to lead to catastrophe, so the AI knows that it is choosing only from options 1 and 4. Option 3 is not available to it.

There are several natural objections to this:

  1. a human can’t predict a super-human AI
  2. why would the AI believe that the human has a good idea of what an AI would consider a catastrophe, even given that the human is able to predict the AI?

Let’s address the second objection first. We integrate moral uncertainty, similar to Hadfield-Menell et al., into the problem in order to represent the idea that the AI needs to learn a utility function from the human.

Corrigibility Via Functional Decision Theory

This graph includes the moral uncertainty of Hadfield-Menell et al.’s approach, but relies on Functional Decision Theory to adequately represent the information that the human is taking into account when throwing the switch. The above graph represents the distributions P_H(u) and P_{AI}(u) in the same way as in Hadfield-Menell et al.’s example. The distribution P_H(FDT) represents how accurately the human can predict the AI’s decision process.

To make the use of the off-switch more clear, it’s also possible to unroll the diagram and show two different time steps at once. The following diagram shows the off-switch state as being based only on prior actions (not the action currently being considered).

Two timesteps of the time-unrolled FDT corrigibility problem.

In Hadfield-Menell et al.’s original decision theory problem, the only thing that mattered was what the AI thought the knowledge parameters were. The AI knew how well it could represent true utility, how well the human could represent true utility, and that was enough to solve the problem. In the iterated FDT case, the human is also simulating the AI (as well as they can). Therefore, the distributions P_{AI}(u), P_{H}(u), and P_{H}(FDT) need to be known to both human and AI. In the following, we’ll assume that the human and AI magically have common knowledge of those parameters.

This formulation of the corrigibility problem explicitly represents the fact that human values and AI values are linked. If the human thinks an outcome will have very low value, that actually does give information to the AI about what the AI values.

Our answer to objection 2 (regarding value) leads to an intuition for objection 1 (regarding predicting super-human AI). The human is unable to accurately predict what move a super-human AI will make. This does not mean that the human has a uniform distribution over AI actions. The human knows that the AI will seek to accomplish what the AI thinks has high value. Therefore, the human can predict that anything the AI does will move it towards more of what the AI considers valuable. This gives the human some predictive power, so the switch may remain informative to the AI.

Colloquially, the AI will attend to the switch and react correctly if it thinks that the user of the switch can adequately help the AI to avoid situations that the AI would agree are terrible. As long as this is true, the AI would also attempt to improve the fidelity of the information from the switch (this corresponds to taking actions that make P_{H}(u), P_{AI}(u), and P_H(FDT) more accurate).  Morally uncertain FDT AI lend credence to Paul Christiano’s idea of a “basin of corrigibility”, given that they will attempt to improve a human’s understanding of itself and of true value.

Next Steps and Thoughts

The above Functional Decision Theory argument is just an intuitive sketch. It seems clear that there are some values of P_{H}(u) and P_{AI}(u) that disagree enough that the AI would no longer trust the human. It also seems clear that, if the human has a poor enough understanding of what the AI is going to do then the AI would also not listen to the human.

At this point, it seems worth repeating a variant of Hadfield-Menell et al.’s off-switch game experiments on an FDT agent to determine when it would pay attention to its off-switch.

AIY Voice Kit Project: Story Listener

Here’s the git repo for this project.

My wife and I love fairy tales and short stories. When we were first dating, one of the ways that we bonded was by telling each other silly bedtime stories. Every once in a while, she likes one of the stories enough to get out of bed and write it down. At some point, we might have enough of these stories to put out a collection or something.

The problem is that coming up with silly stories works a little better when you’re very tired (they get sillier that way). That’s also the time you least want to write them down. What we needed was some way to automatically record and transcribe any stories that we tell each other. When one of my friends gave me an AIY Voice Kit, my wife knew exactly what we should do with it.

The Story Listener

 

The AIY Voice Kit gives you all the power of Google Home, but completely programmable. You just need to add a Raspberry Pi to do the processing. Most of the voice commands and speech processing are done in the cloud, so once you set up get set up with Google’s API you can make full use of their NLP models (including the CloudSpeech API).

As an aside, the Voice Kit only works with newer models of Raspberry Pi. When I pulled out my old pi, the kit booted but wouldn’t run any of the examples. Turns out you need a Raspberry Pi 2B or newer. A quick Amazon Prime order got us going again.

Our plan was to make an app that would listen for the start of a story. Once it heard a story start, it would record the story, transcribe it, and then email the transcription to us.

Getting Started with the API

Most of the Voice Kit projects rely on Google APIs that require access permissions to use. The API and permissions need to be enabled for the Google account you’re using with a Voice Kit project. You’ll need to set that up and then download the json credential file to do anything interesting.

Detecting When a Story Started

To make story detection easier, we decided to preface all of our stories with one of a few different sentences. We chose “Once upon a time” and “Tell me a story” as good options. Detecting these key phrases using the Google CloudSpeech API is pretty easy.

The CloudSpeech API has a nice library associated with it in the Voice Kit library. You can create a recognizer object that sends audio to the API, and you’ll get back strings that contain the text from the audio. You can improve the recognition accuracy by telling the recognizer to expect certain phrases.

import aiy.cloudspeech
[...]
recognizer = aiy.cloudspeech.get_recognizer()
recognizer.expect_phrase("once upon a time")

# waits for audio, then transliterates it
text = recognizer.recognize() 
# the transliteration doesn't have guarantees 
# about case, so take care of that here
text = text.lower() 
if ("once upon a time" in text):
    [...]

The expect_phrase method improves the voice recognition accuracy of that particular phrase. Then you can search for that phrase in whatever text the CloudSpeech API finds. If you see your key-phrase, then it’s time to move on to the next step.

Recording Audio with the Voice Kit

The Voice Kit library allows various “processors” to be added to the audio stream coming from the microphone. The processor is just a class that operates on the audio data (the recognizer is one such processor). In order to record audio while still detecting key-words. It turns out that the AIY library even had a WaveDump class that would save audio to a file.

The WaveDump class was almost exactly what we were looking for, but had a couple of drawbacks. It was originally designed to record audio for a certain length of time, and we wanted to record audio until a story was over (which we would recognize by listening for “the end”). We created a sub-class of the WaveDump class to allow us to have more control over how long we recorded audio for.

class StoryDump(aiy.audio._WaveDump):
    def __init__(self, filepath, max_duration):
        # just do the normal file setup
        super().__init__(filepath, max_duration)
        # keep track of whether we should end the recording early
        self.done = False 
    def add_data(self, data):
        # keep track of the number of bytes recorded
        # to be sure that we don't write too much
        max_bytes = self._bytes_limit - self._bytes
        data = data[:max_bytes]
        # save the audio to the file
        if data and not self.done:
            self._wave.writeframes(data)
            self._bytes += len(data)
    def finish(self):
        self.done = True
        self._wave.close()
    def is_done(self):
        return self.done or (self._bytes >= self._bytes_limit)

With this class now defined, it’s easy to add an instance of it as a processor to the audio stream.

import aiy.audio
[...]
# assume all stories are < 20min
story_wav = StoryDump("filename.wav", 20*60) 
aiy.audio.get_recorder().add_processor(story_wav)

And once you see that the story is over, you can finish the recording like so:

recognizer.expect_phrase("the end")
[...]
if "the end" in text:
    story_wav.finish()

Because we’re already using the CloudSpeech API to transliterate audio and look for keywords, the story transcription happens almost for free. All we have to do is wait until a story starts (looking for one of the keyphrases in the text), and then write all subsequent text to a file. Emailing the file once it’s done is also a straightforward exercise in python. Once you have the audio recognition, transcription, and saving done, making the project start when the Raspberry Pi boots is also just a linux exercise.

One slightly annoying aspect of the Voice Kit library is that isn’t a complete Python Package. That means that you can’t install it with setuptools or pip, so accessing the library is a bit annoying. The examples for the VoiceKit all recommend putting your application code in the same directory as the Voice Kit library. This is a bit annoying when you want to create a repo for your project that isn’t a fork of the Voice Kit repo. We fixed this by creating an environment variable that pointed to the location of the AIY library.

Transcription

The CloudSpeech API works better than I expected it to, but it is definitely not yet good enough to use for transcription. It will often mess up tenses on verbs, skip transcription of definite and indefinite articles, and select words that are close homonyms to what was actually said. I think that some of this is that the API is probably doing some analysis of how much the text makes sense. If you’re telling a silly absurdist story, you’re likely to string words together in a way that isn’t high probability in standard usage.

once upon a time there was a time girl made completely from clay
she was very energetic and like to run all over the place
and one day she ran so fast her clay arms and legs stretched out from the wind
and then she wasn’t such a tiny girl anymore
was actually a very very tall and skinny girl
the end

Here’s the wave file that’s from:

Another limitation of the CloudSpeech API for story transcription is the latency. The API seems to be intended mostly for interactive use: you say a thing, it says a thing, etc. Since we just want to transcribe a long series of utterances without pausing, this causes issues. It seems that the recognizer will wait until a pause in the voice, or until there’s some number of words available, then it will try to recognize all of it. This has some delay, and any words said during that delay will be missed (they still get recorded, just not transcribed). We want to have on-line transcription so that we know when the story is over, but it may make sense to then re-transcribe the save audio all at once.

Next Steps

We’re pretty happy with how the story listener is working out. It would be nice to have better transcription, but I expect that will come in time.

For me, the biggest issue with the Voice Kit in general is the privacy concern. If we have it listening for stories all the time, then it’s going to be continually sending anything we say (in bed, where we tell bedtime stories) to Google. That’s not exactly what I want.

The Voice Kit manual advertises support for TensorFlow, but there aren’t any good tutorial for integrating that yet. It looks like the best way to integrate an ML model with the Voice Kit would be to create a new audio processor to add to the recorder. That audio processor could tensor-ize the audio and feed it through a classification model.

Once we get that figured out. it might be worth trying to recognize a few key phrases. Running a model on the Raspberry Pi itself would make the device independent of an internet connection, and would solve a lot of the privacy concerns that we have. The transcription would probably go down in accuracy a lot, but if we’re already manually transcribing stories that might be fine.

Meditating on Fixed Points

Epistemic Status: Almost certainly wrong, but fun to think about.

A fixed point theorem says that, as long as certain conditions are satisfied, a function that has the same domain and range will have at least one point that gets mapped to itself. The best example of this is Brouwer’s fixed point theorem, which proves the existence of fixed points for continuous functions on a convex and compact set. There are other fixed point theorems that apply in other cases.

These would be mildy interesting factoids if it weren’t possible to represent an enormous number of common tasks in life as functions on a set. In fact, thinking itself could be represented as a function. Specifically, you could represent a thought as a function that maps one point in mind-space to another (nearby) point in mind-space.

If your mind when you wake up is at one point, then when you think about breakfast your mind is now at a different point.

In that case, we can ask if there is a fixed point in such a circumstance. If there is, we can ask what that fixed point might be.

I certainly don’t know enough about neuroscience yet to figure out what properties the set of minds has, or what properties the function of thought has. But I’m more interested in the second question anyway: assuming a fixed point of mind exists, what is it?

A fixed point in mind-state is a point where, once you reach it, the act of thinking doesn’t take you away from it. Since thinking is a function implemented by the mind, a fixed point in mind-space endorses its own existence.

One of the interesting fixed points that may exist for mind-space is probably enlightment. In fact, meditation as a search for enlightenment seems to be a search function implemented on the mind. You start with your mind as it is, and then successively apply the meditation function until you get to the fixed point.

In that case, you could ask if such a search always succeeds. It seems clear the the answer is no. In fact, people with certain mental or emotional disorders are often advised not to start meditating. You probably want to search for the fixed point of meditation only when you’re within a topological basin of attraction for it. So it may be worth e.g. getting therapy to put yourself into the basin of attraction for enlightenment before beginning meditation.

Furthermore, doing some kind of iterated search through mind-space isn’t guaranteed to ever converge. I know I’ll often cycle on some subject “I should call so-and-so. But what if she’s mad about the thing I said last week? I wonder if she is. I should call her.” And then those thoughts go around a bunch more times. In this case, the thought-function doesn’t converge. It seems likely that there are many cycles of this type, perhaps much longer than can be readily noticed by introspection.

This is why just closing your eyes and letting your mind drift is insufficient as meditation. Proper meditation must be a thought-function that, for a large set of mind-states, does converge to some fixed point.

It also becomes clear that closing your eyes, and in general just avoiding distractions, is also important for seeking a fixed point in mind-space. The more inputs you have, the more complex a search function would need to be. This implies that enlightenment (if it is a fixed point) may actually be more of a moving target. As you interact with the world and learn things, your mind-state will necessarily change. Perhaps it changes in a way that’s easy to adjust to a new fixed-point, and perhaps not.

Finally, it seems likely that fixed points in mind-space aren’t necessarily good. Wireheading, for instance, seems like it could be represented as a fixed point. Just because a point in mind-space is stable doesn’t mean it satisfies your goals right now.

Brouwer and the Mountain

A few years ago, one of my friends told me the following riddle:

A mountain climber starts up a mountain at 8am. They get to the top that day, and camp there. In the morning, they start hiking down the mountain at 8am on the same trail.

Is there a time of day at which they’re at the same spot on the trail the second day as they were on the first?

I thought about this a while before finally asking for the answer (which I won’t repeat here). I will say that you don’t have to make any assumptions about hiking speed, rest breaks, or even that the hiker always heads in the same direction.

When I learned about Brouwer’s fixed point theorem, I immediately thought back to this riddle. The answer to the riddle is a straightforward application of Brouwer’s theorem.

It turns out that Brouwer’s theorem is used in all sorts of places. It was one of the foundations that John Nash used to prove the existence of Nash equilibria in normal form games (for which he won the Nobel).

The moral of the story is: the more riddles you solve, the more likely you are to get a Nobel prize.

No 2018 Resolutions

In past years, I’ve focused heavily on yearly planning and life-goals for new-years eve. I’m finding that this year, I don’t feel at all motivated to do that.

I think part of that is that I’m in the middle of a big project right now, and making plans and goals before I finish up that project is jumping the gun. Without completing the project I’m working on, I won’t know where I want to go. So when I think about doing life-planning or goal setting, I just think it would be better to spend that time actually working on my project.

This is a bit of an odd feeling for me. I’ve been so focused on goal-setting for years that it almost doesn’t make sense that I wouldn’t want to do it. I take this as evidence that I’m doing what I currently want to be doing. Perhaps in past years I’ve been less satisfied with my life, and now that things are going well for me I feel less of an impulse to change things.

I am worried that this isn’t a generally positive change. Creating detailed life-plans seems helpful no matter where you want to be. I’m now a third of the way through this project; shouldn’t it make sense that I re-evaluate my strategy and figure out what makes the most sense to do next?

My plan for tomorrow is to re-visit my short term goals, and then set aside some time for long term goal planning near the end of my project.

Pursuit of Happiness

Life, liberty, and the pursuit of happiness. When I first learned about the US constitution, I thought the pursuit of happiness was an odd choice there. What did that have to do with government. Certainly the government shouldn’t kill people, and certainly it shouldn’t deprive them of freedom, but the pursuit of happiness is an internal thing. How could a government have anything to do with that?

I’ve been reading a history of peri-enlightenment France called “Passionate Minds” recently. It argues that the pursuit of happiness is actually the most subversive of the three unalienable rights. Turns out that monarchies often take their power as a divine gift. In that case, common people are spiritually bound to work for the monarch. Working for yourself is just an affront to god.

Many people in Christendom seem to have viewed life as a suffer-fest that they worked at so that they could get to heaven. Even if they thought they could improve their life, it wouldn’t have seemed acceptable to try. Making the pursuit of happiness a right is directly contradicting much church doctrine of the time.