Against dynamic consistency: Why not…

Anthony DiGiovanni

Sep 3, 2024

[Disclaimer: Esoteric musings.

Read →

3 Comments

Martín Soto

Oct 26, 2024

This is so good and important! It should be on LW!

I've basically been trying to hammer this point home for a while (albeit with slightly different language). Some LWers (including people whose job is decision theory?) seem to have the vague intuition that "there will be some nice clever way to reconcile the perspectives of 0-you with 1-you, and this will be a grand unified theory of how to be updateless". And I think the truth is you can't have your cake and eat it too: 0-you and 1-you just have different probability distributions about the world, and when you maximize expected value according to them, they recommend different actions! It is mathematically different to optimize over whole trajectories instead of last actions of said trajectories!

Sure, there might be some particular ways of "doing commitments"/"doing updating" that, for some contingent reason, work especially well in the set of environments you think you inhabit. Some of these will be boring, like Omega bonking you if you don't do it a specific way. Others could be more interesting, like emergent properties of physics/chaos theory/how our imperfect brains are wired. But this all becomes more a semi-empirical study of "which properties does our environment have, and what does this tell us about how embedded agents should store and update information and decision procedures", rather than a fully general theoretical solution (which is mathematically impossible).

(On a similar note, I think the name "Updateless Decision Theory" is misleading, because it is not a decision theory. Rather, "Updateless" just means "hard-coding my agent so that it always uses the belief distribution of 0-you, independently of which specific decision rule it uses to turn such distribution into actions". You can take any decision rule and turn it updateless by doing this. Caveat 1: This might be hard to do practically in the real world, because the belief distribution of 0-you is fundamentally incomplete, and so you get awareness growth. Caveat 2: Also, even such a fully updateless hard-coded agent will sometimes decide to "update" on information, meaning make its action depend on an observation, because this seems optimal from the perspective of the prior.)

But I might disagree with you on the following. You seem to use these considerations as an argument against updatelessness. I can understand this psychologycally/sociologically, because some LWers say "all time-slices want to be dynamically consistent (with past time-slices), and thus updatelessness is the best", and you have noticed that the antecedent is false. But from my perspective, here you have just laid out some basic mathematical facts about how agents work. And we're still left to decide: from my current (0-me) perspective, what is the optimal move here? How much should I let my future actions depend on future observations (and which actions, and which observations)? How much can I bind myself, and how can I increase my binding power?

Going into further detail, I will note that the main difficulty here is my incomplete prior. If I truly already had a complete prior (and had enough computing power, and could bind myself), then I would just press the button to maximize over my future policy. What else is there to do, really? That's just what maximizes expected value. But I have an incomplete prior stored in my messy brain, which I don't know how to extrapolate to any sensible complete prior. So many times, I find myself in the situation of asking: "should I let my future actions depend on this information that I might learn?". And I can't even compute what my complete prior thinks about this! I need to think through it myself, and generate my prior on the go, or something? And in doing so, it's hard to ensure that all the normative intuitions about how I want my prior to be are satisfied. I think we need more work on the latter, and this might be semi-empirical trial-and-error work.

Now for some more direct notes on the post:

- It might be better for LWers if you emphasize further up that central example of the "differing perspectives of 0-you and 1-you" is something as simple as 1-you having updated on information.

- I don't agree with your intuition that "I can see myself hyping me up for going to the gym, but not to One-box on the True Newcomb Problem". Of course, hopefully we'll be able to just implement better binding mechanisms before you find yourself in the room, so that we don't have to worry about this. But even if it happened right now, I feel like I could sustain for a while the double-think required, just because of something vaguely plausible-sounding motives like "0-Martín is thinking about it really hard and really does seem to be super convinced, so better do that!". But of course, I might be wrong, and even if I'm not, these would just be psychological differences.

- Your last point is especially interesting. I agree, as you point out, that "just the math of how agents work" doesn't force us to, for example, make our actions consistent with what a fictional reconstructed (-1)-me who never actually existed (contrary to the opinion of some LWers). That said, there can be other pro tanto philosophical or contextual reasons why we might want to do something of that shape. Here are some:

+ Of course, it could just be that other agents will treat you better if you do something of that shape. This is not interesting, because we can say this about anything. But, for some reason, it does seem somewhat more likely that other agents will want us to do something of this shape than other arbitrary shapes. Why? Well, there seems to be a weird continuum or resemblance between "what 0-you wants 1-you to do because of strict EV-maximization reasons" and "what (-1)-Alice wants 0-Bob to do because of fairness considerations favored by group evolution" (so much so that some LWers confuse the two). While these two considerations are very different, there might be some group-bargaining reasons why they tend to result in very similar behavior, and so we might end up implementing something that looks suspiciously close to following the advice of fictional reconstructed (-100)-you.

I don't have much more concrete stories here, but maybe one concrete story is: some agents already had binding power at time 0, but they predicted that others might not have binding power until time 1, but from their priors it also seemed like interacting with 1-others is on average worse than interacting with 0-others (this sounds unlikely), and so when binding at time 0, they included the clause that "if you try your hardest to act as if you bound at time 0, I reward you".

+ It might also be that our utility function is especially likely to have a shape which makes us act as if reconstructing (-100)-you. For example, because you terminally value those notions of fairness, or because you still want to think the other branches exist, similar to quantum branches? I don't know if the latter can make sense.

+ If existence is binary, and I can recognize clearly whether I exist, then sure, I should always assume I exist. If instead we think of quantitative existence-juice, but it is still the case that I should update strongly towards my observations having more existence-juice, then I think the quantitative version of "assume you exist" still goes through (although then there are already many situations where you do want to sacrifice future gains to give more existence-juice to branches including you). I wonder if there's some coherent position like "I might be an L-zombie, and I'd have no way to tell, so I shouldn't update much, or at all, on my current branch existing". Probably there isn't, or it just makes the purpose/definition of existence-juice non-sensical.

+ I too worry that Fake "commitments" might make many proposed acausal interventions net-negative, because humans obtain information while thinking they are bound, but they really aren't. I'm still not sure, though, whether this clearly points towards the net-negative direction, or just increases variance (modulo independent considerations like participation or variance being negative by default, which I'm also unsure about).

Expand full comment

Daniel C

Sep 6, 2024

So there's still a version of time-slice FDT which one-boxes in non-transparent newcomb because it calculates the effect of its actions using subjunctive dependence

But it disagrees with UDT in counterfactual mugging because only timeslice FDT 0-agent cares about timeslice FDT 1-agent in both branches, timeslice FDT 1-agent who sees tail doesn't care about the other branch

& it disagrees with UDT & TDT on transparent newcomb because the certainty of the box's contents "supersedes" subjunctive dependence

Is this accurate & is there an existing term for timeslice FDT?

Expand full comment

Reply (1)

Anthony DiGiovanni

Sep 6, 2024Edited

I'm not sufficiently familiar with how FDT or "subjunctive dependence" are defined to say, but as far as I can tell that seems right. I think the closest analogue is this post's definition of TDT: https://www.lesswrong.com/posts/dmjvJwCjXWE2jFbRN/fdt-is-not-directly-comparable-to-cdt-and-edt. Because TDT conditions on its observations

Expand full comment

Anthony’s Substack

Against dynamic consistency: Why not…