[Disclaimer: Esoteric musings. This post probably won’t make sense if you aren’t familiar with “updateless decision theory.”]
I’ve been skeptical for a while of updateless decision theory, diachronic Dutch books, and dynamic consistency as a rational requirement. I think Hedden's (2015) notion of time-slice rationality1 nicely grounds the cluster of intuitions behind this skepticism.
(Note: Time-slice rationality is a normative standard, i.e., a statement of what it even means to “win” (if rationality = “winning”). This is not to say that winning with respect to time-slice rationality requires believing in time-slice rationality.)
According to this view, in principle “you” are not a unified decision-maker across different time points, any more than your decision-making is unified with other agents’. Yes, you share values with your future self (mostly). But insofar as you have different evidence, there isn’t any privileged epistemic perspective that all the yous at different time points can agree on. Rather:
“You” at time 0 (“0-you”) are a different decision-maker from “you” at time 1 (“1-you”).
What is rational for 1-you only depends on 0-you insofar as 0-you are another agent that 1-you might need to coordinate with. And vice versa.
This is consistent with 0-you being able to make commitments that influence 1-you. Or, e.g., just refusing an offer in a potential money pump that looks good to 0-you in the short term, because 0-you predict that 1-you will make decisions that are bad from 0-your perspective. More on this later.
Let’s say 0-you and 1-you share a value system V, and define a “diachronic (sure) loss” as a sequence of actions that (certainly) make things worse with respect to V than another sequence. (E.g., some sequence of bets in Sleeping Beauty that “an agent” endorsing EDT and SIA might take.)
On time-slice rationality, then, 1-you are not rationally obligated to make a decision that would avoid a diachronic loss given 0-your decision! 1-you just aren’t deciding a sequence. Rather, 1-you are deciding the time-1 action, and 1-you ought to decide an action that’s best from 1-your perspective.2 If a diachronic loss happens, this is a result of two agents’ decisions, which (I claim) neither 0-you nor 1-you can entirely control.3
Objection: “Consider a framing of Transparent Newcomb in which you just think about the problem at time 0, and then after seeing the boxes you either take one or two at time 1. Then, 0-you can decide for 1-you to one-box, right?”
Response: Sort of. If 0-you really are capable of psychologically binding 1-you to an action that (combined with 0-yours) will avoid a diachronic loss, 0-you ought to do so. (And if you can, then there’s no need for “updateless decision theory” per se here — you can justify diachronic loss avoidance just with time-slice rationality.) But merely intending for 1-you to do some action doesn’t bind 1-you to it. And forming this intention doesn’t rationally obligate 1-you to follow through on it, any more than declaring your endorsement of some social norm rationally obligates others to follow it. What makes 0-you so confident that 1-you will one-box?
Objection: “I know 1-me will one-box because they’ll recognize that’s the rational thing to do.”
Response: Isn’t rationality supposed to be about “winning”? 1-you wouldn’t win from their perspective by deciding to one-box. 0-you would win by having 1-you be predicted to one-box. I don’t see the independent justification for conceiving of “winning” from some perspective other than the agent’s own.
Objection: “Backing up: Isn’t following through on your intentions inherent to rational decisions? Realistically, there’s always some gap in time between when you form an intention and follow through on it, so it seems that what is rational for 1-you does depend on 0-your intention.”
Response: In the vast majority of cases, either: 1) Nothing relevant changes about your epistemic state between when 0-you form an intention and when 1-you decide to follow through. Or 2) 0-your “intention” is itself a decision that determines the subsequent behavior, so there’s nothing for 1-you to decide. These are qualitatively different from a situation like Transparent Newcomb, where 1-you has different information than 0-you and has a decision to make.
Objection: “We nudge our future selves to do things they don’t want to do all the time. Why is it any harder to just bind 1-you to one-box by intending to do so?”
Response: I think in these mundane cases, the kinds of actions you want your future self to do, and the kinds of situations they’ll be in, are much less bizarre and contrary to our natural inclinations than Transparent Newcomb. If 0-you work up the willpower to go to the gym, the sunk-cost inertia and dream of getting swole make it less costly from 1-your perspective to work out than go home. If 0-you promise to keep a secret for a friend, then even if 1-you become reasonably confident 1-you’d get away with telling it when convenient, 1-your conscience will (hopefully) make 1-you not want to tell it. By contrast, I struggle to imagine hyping myself up to one-box, then seeing the open boxes right there and feeling worse about taking both boxes. (Especially if I imagine the True Transparent Newcomb, where the money is replaced with whatever I terminally care about.)
Objection: “But 1-me shares my values. Doesn’t 1-me want both of us to receive the $1 million that comes from one-boxing?”
Response: Indeed they do. But, if 0-you didn’t bind 1-you to one-box, then I don’t see how it’s possible for 1-you to make it more likely, via their decision, that the box contains the $1 million. 1-you are certain of the box’s contents — there is no sense in which 1-your decision not to take both boxes is better from 1-your perspective. Don’t blame 1-you, blame 0-your own inability to bind 1-you.
Objection: “1-me isn’t deciding the action ‘one-box,’ they’re deciding the policy ‘one-box given a Transparent Newcomb problem.’ This is the policy that maximizes money, even from 1-my perspective.”
Response: It only maximizes money with respect to the prior, i.e., a perspective that is uncertain of the boxes’ contents. By hypothesis, 1-you are not uncertain of this. So you’re begging the question in favor of updatelessness here.
Objection: “*Shrug*. From 0-my perspective, it’s good for 1-me to believe updatelessness is rational, even if from 1-my perspective it isn’t.”
Response: I’d agree with that! Convincing yourself of updatelessness might be reasonable ex ante. (And no, this doesn’t contradict time-slice rationality. See the note at the start.) But my concern is that the tails will come apart — there will be cases where this instrumental justification for believing in updatelessness doesn’t make sense. Some examples:
Anthropics: People have appealed to updatelessness / diachronic sure losses as a justification for certain epistemic views even when no 0-you who’d want to bind 1-you ever existed!
Logical updatelessness: There’s no 0-me who floated around in a void before knowing whether they’d exist.
Fake “commitments”: If I’m right about the gap between intentions and commitments, people might be systematically overestimating the acausal power of their intentions. I worry they’re making a mistake by deviating from time-slice rationality, in that they’re not really reaping the benefits of a commitment because they could’ve decided otherwise.
He applies this idea more directly to sequential decision-making in “Options and Diachronic Tragedy.”
Carlsmith writes, regarding Parfit’s hitchhiker: “Indeed: if, in the desert, I could set-up some elaborate and costly self-binding scheme – say, a bomb that blows off my arm, in the city, if I don’t pay — such that paying in the city becomes straightforwardly incentivized, I would want to do it. But if that’s true, we might wonder, why not skip all this expensive faff with the bomb, and just, you know, pay in the city?” From the time-slice rationality perspective, this question sounds like, “Imagine that your mom [who values your survival as much as you do] finds you and the driver in the city. And the driver [who is very frail and harmless, so nothing bad will happen if their demand is refused] demands that she burn $5. Why doesn’t she just, you know, burn the money?”
This is so good and important! It should be on LW!
I've basically been trying to hammer this point home for a while (albeit with slightly different language). Some LWers (including people whose job is decision theory?) seem to have the vague intuition that "there will be some nice clever way to reconcile the perspectives of 0-you with 1-you, and this will be a grand unified theory of how to be updateless". And I think the truth is you can't have your cake and eat it too: 0-you and 1-you just have different probability distributions about the world, and when you maximize expected value according to them, they recommend different actions! It is mathematically different to optimize over whole trajectories instead of last actions of said trajectories!
Sure, there might be some particular ways of "doing commitments"/"doing updating" that, for some contingent reason, work especially well in the set of environments you think you inhabit. Some of these will be boring, like Omega bonking you if you don't do it a specific way. Others could be more interesting, like emergent properties of physics/chaos theory/how our imperfect brains are wired. But this all becomes more a semi-empirical study of "which properties does our environment have, and what does this tell us about how embedded agents should store and update information and decision procedures", rather than a fully general theoretical solution (which is mathematically impossible).
(On a similar note, I think the name "Updateless Decision Theory" is misleading, because it is not a decision theory. Rather, "Updateless" just means "hard-coding my agent so that it always uses the belief distribution of 0-you, independently of which specific decision rule it uses to turn such distribution into actions". You can take any decision rule and turn it updateless by doing this. Caveat 1: This might be hard to do practically in the real world, because the belief distribution of 0-you is fundamentally incomplete, and so you get awareness growth. Caveat 2: Also, even such a fully updateless hard-coded agent will sometimes decide to "update" on information, meaning make its action depend on an observation, because this seems optimal from the perspective of the prior.)
But I might disagree with you on the following. You seem to use these considerations as an argument against updatelessness. I can understand this psychologycally/sociologically, because some LWers say "all time-slices want to be dynamically consistent (with past time-slices), and thus updatelessness is the best", and you have noticed that the antecedent is false. But from my perspective, here you have just laid out some basic mathematical facts about how agents work. And we're still left to decide: from my current (0-me) perspective, what is the optimal move here? How much should I let my future actions depend on future observations (and which actions, and which observations)? How much can I bind myself, and how can I increase my binding power?
Going into further detail, I will note that the main difficulty here is my incomplete prior. If I truly already had a complete prior (and had enough computing power, and could bind myself), then I would just press the button to maximize over my future policy. What else is there to do, really? That's just what maximizes expected value. But I have an incomplete prior stored in my messy brain, which I don't know how to extrapolate to any sensible complete prior. So many times, I find myself in the situation of asking: "should I let my future actions depend on this information that I might learn?". And I can't even compute what my complete prior thinks about this! I need to think through it myself, and generate my prior on the go, or something? And in doing so, it's hard to ensure that all the normative intuitions about how I want my prior to be are satisfied. I think we need more work on the latter, and this might be semi-empirical trial-and-error work.
Now for some more direct notes on the post:
- It might be better for LWers if you emphasize further up that central example of the "differing perspectives of 0-you and 1-you" is something as simple as 1-you having updated on information.
- I don't agree with your intuition that "I can see myself hyping me up for going to the gym, but not to One-box on the True Newcomb Problem". Of course, hopefully we'll be able to just implement better binding mechanisms before you find yourself in the room, so that we don't have to worry about this. But even if it happened right now, I feel like I could sustain for a while the double-think required, just because of something vaguely plausible-sounding motives like "0-Martín is thinking about it really hard and really does seem to be super convinced, so better do that!". But of course, I might be wrong, and even if I'm not, these would just be psychological differences.
- Your last point is especially interesting. I agree, as you point out, that "just the math of how agents work" doesn't force us to, for example, make our actions consistent with what a fictional reconstructed (-1)-me who never actually existed (contrary to the opinion of some LWers). That said, there can be other pro tanto philosophical or contextual reasons why we might want to do something of that shape. Here are some:
+ Of course, it could just be that other agents will treat you better if you do something of that shape. This is not interesting, because we can say this about anything. But, for some reason, it does seem somewhat more likely that other agents will want us to do something of this shape than other arbitrary shapes. Why? Well, there seems to be a weird continuum or resemblance between "what 0-you wants 1-you to do because of strict EV-maximization reasons" and "what (-1)-Alice wants 0-Bob to do because of fairness considerations favored by group evolution" (so much so that some LWers confuse the two). While these two considerations are very different, there might be some group-bargaining reasons why they tend to result in very similar behavior, and so we might end up implementing something that looks suspiciously close to following the advice of fictional reconstructed (-100)-you.
I don't have much more concrete stories here, but maybe one concrete story is: some agents already had binding power at time 0, but they predicted that others might not have binding power until time 1, but from their priors it also seemed like interacting with 1-others is on average worse than interacting with 0-others (this sounds unlikely), and so when binding at time 0, they included the clause that "if you try your hardest to act as if you bound at time 0, I reward you".
+ It might also be that our utility function is especially likely to have a shape which makes us act as if reconstructing (-100)-you. For example, because you terminally value those notions of fairness, or because you still want to think the other branches exist, similar to quantum branches? I don't know if the latter can make sense.
+ If existence is binary, and I can recognize clearly whether I exist, then sure, I should always assume I exist. If instead we think of quantitative existence-juice, but it is still the case that I should update strongly towards my observations having more existence-juice, then I think the quantitative version of "assume you exist" still goes through (although then there are already many situations where you do want to sacrifice future gains to give more existence-juice to branches including you). I wonder if there's some coherent position like "I might be an L-zombie, and I'd have no way to tell, so I shouldn't update much, or at all, on my current branch existing". Probably there isn't, or it just makes the purpose/definition of existence-juice non-sensical.
+ I too worry that Fake "commitments" might make many proposed acausal interventions net-negative, because humans obtain information while thinking they are bound, but they really aren't. I'm still not sure, though, whether this clearly points towards the net-negative direction, or just increases variance (modulo independent considerations like participation or variance being negative by default, which I'm also unsure about).
So there's still a version of time-slice FDT which one-boxes in non-transparent newcomb because it calculates the effect of its actions using subjunctive dependence
But it disagrees with UDT in counterfactual mugging because only timeslice FDT 0-agent cares about timeslice FDT 1-agent in both branches, timeslice FDT 1-agent who sees tail doesn't care about the other branch
& it disagrees with UDT & TDT on transparent newcomb because the certainty of the box's contents "supersedes" subjunctive dependence
Is this accurate & is there an existing term for timeslice FDT?