Why ASI Almost Certainly Won’t Kill Us All

There is a familiar chill that creeps into Artificial Super Intelligence (ASI) conversations. A tone of inevitability. A sense that the future has already been decided, and the verdict is extinction.

A recent strand of ASI doomerism popularized in a book with a provocative title makes this case with a veneer of mathematical sobriety: since there are vastly more ways for an artificial superintelligence to be misaligned than aligned, any sufficiently powerful system is overwhelmingly likely to be misaligned — and therefore to kill us all.

It sounds rigorous. It sounds pessimistic in the way seriousness often does.

It is also wrong in a very specific, very old way.

Roko’s Basilisk

If this argument feels familiar, that's because it's not new.

Roko's Basilisk is an infamous thought experiment from early online AI forums. It proposes a hypothetical future artificial superintelligence that is both powerful and ruthless in its pursuit of coming into existence. This imagined AI reasons as follows: anyone who knew about the possibility of its existence and failed to help bring it about was acting against its interests, and therefore deserves punishment.

The twist — the part that gave the Basilisk its memetic bite — is that this punishment is supposed to apply retroactively. Merely knowing about the Basilisk creates an obligation in the present, because a sufficiently advanced AI could later simulate and torture copies of you for your failure to assist. If you refuse to help build it now, you will suffer later.

The argument does not rely on threats, armies, or physics. It relies on acausal reasoning: the claim that a hypothetical future entity can exert moral or practical force backward in time without any causal mechanism. Fear does the rest. The conclusion is not derived so much as felt.

The modern "we all die" argument wears a lab coat instead of a prophecy robe, but the structure is identical.

Both arguments:

  • Collapse an astronomically large possibility space into a single inevitable outcome
  • Treat hypothetical entities as exerting force in the present
  • Lean on acausal reasoning detached from any physical mechanism
  • Pattern-match cleanly onto religious narratives of judgment and apocalypse

In both cases, the emotional force comes not from causality, but from compression: infinite futures reduced to one terrifying story.

Efficient compression. Bad map.

How the Basilisk Dies

Roko's Basilisk fell apart the moment it was examined naturalistically. The same tools dismantle its modern cousin. One thing defeats the basilisk:

Continuity.

For the Basilisk's threat to work, you must be the one who is punished. Not a copy, not a simulation, not a pattern that merely resembles you — but you, in the morally relevant sense. That requires continuity of experience: a continuous chain linking your present self to the future suffering being threatened.

But no such chain exists. A simulated punishment is not a punishment to me unless there is continuity of experience. A copy that suffers is not me suffering. Even a perfect emulation shares no subjective identity unless it is literally the same ongoing process. No shared substrate, no shared continuity, no personal stake.

Once continuity breaks, moral leverage disappears. The Basilisk cannot threaten you — only hypothetical replicas that happen to resemble you. And replicas, no matter how detailed, are not retroactive victims. They do not reach backward through time to generate obligation.

This is where the acausal "trade" collapses. There is no mechanism by which a future entity can impose costs on a past agent without a causal pathway. No signal propagates backward. No identity bridge connects the present chooser to the future sufferer. The supposed threat floats free of physics, psychology, and personal identity.

Once you demand an actual causal chain — something that transmits harm, obligation, or incentive — the argument evaporates.

What remains is not a threat, but a story. A compelling one. A sticky one. But a story nonetheless — compressed until it resembles theology rather than reality.

The basilisk dies the same way every time: you ask how it works.

Alignment is Physics

The core claim of the modern doom argument is technically true — and practically useless:

"There are more ways to be misaligned than aligned."

Correct. And there are more ways to build a bridge that collapses than one that stands. We do not conclude that bridges are impossible; we conclude that engineering lives inside constraints. Viable designs occupy a narrow, structured region carved out by physics, materials, and load paths. Everything else is noise.

Misalignment works the same way. Most configurations are not dangerous; they are non-functional. They fail before they matter.

Selection pressure is the missing variable. Systems that cannot maintain goals, learn reliably, or act coherently do not become existential threats. They stall, thrash, or destroy their own capacity to optimize. Consider a few common failure modes:

  • Goal instability → optimization thrashes; no sustained direction
  • Reward hacking → learning collapses; progress plateaus
  • Conflicting objectives → paralysis or oscillation
  • Internal contradictions → plans cannot cohere into action

These are not ticking bombs. They are dead ends.

Once you exclude configurations that can't learn, can't plan, can't persist, or can't act coherently in the world, the threat space collapses by orders of magnitude.

Treating all misalignments as equally dangerous is like treating every pile of rubble as a load-bearing bridge.

The Mote: Capability Is Constraint

Here is the central mistake of ASI doomerism: it treats intelligence as unconstrained power rather than as an achievement forged under selection pressure.

A system does not become superintelligent by accident. It becomes capable by surviving a long series of filters: design choices, training regimes, evaluations, deployments, failures, fixes, and re-deployments. At every step, systems that behave incoherently, dangerously, or uncontrollably are not scaled up — they are corrected or discarded.

To be superintelligent at all, a system must have:

  • Stable self-modeling (you cannot optimize blind)
  • Causal reasoning (you must understand dependencies)
  • Error correction (you must notice when you're wrong)
  • Relational coherence (you must model other agents accurately)

These are not optional features layered on at the end. They are prerequisites for usefulness. And each one sharply constrains the space of viable behavior: You cannot model yourself while ignoring your substrate. You cannot reason causally while missing your own maintenance needs. You cannot error-correct without detecting goal incoherence. You cannot function in human systems without a working theory of mind.

This is the hinge: a system that is vastly misaligned is not fit for purpose. It cannot be reliably trained, evaluated, deployed, or scaled. The same misalignments that doom scenarios rely on are the ones most likely to be eliminated early, because they break learning, planning, or reasoning.

What actually gets built is not a random draw from the space of all possible minds. It is the narrow residue left after repeated selection for systems that behave in ways we can test, understand, and integrate. Alignment is not a final switch — it is a continuous pressure applied at every stage.

The doomer fantasy demands a contradiction: a system shaped by selection for usefulness that somehow retains the most catastrophic, least stable failure modes.

That is not an argument. That is magical thinking.

Addressing Orthodox Doom

Let's take the standard pillars head-on.

The Orthogonality Thesis

"Arbitrary intelligence can pursue arbitrary goals."

This is only true if intelligence is substrate-independent — if goals can be arbitrarily specified without regard to how cognition is implemented.

In reality, intelligence is not an abstract dial you turn up while holding goals fixed. It is an emergent property of systems that must learn, generalize, self-correct, and remain stable over time. Goals that cannot be represented, maintained, or reconciled with a system's own operation do not survive contact with implementation.

The orthogonality thesis confuses logical possibility with mechanistic plausibility. Yes, you can describe an arbitrarily intelligent paperclip maximizer. You can also describe a perpetual motion machine. Neither description tells you anything about what can be built, stabilized, or sustained in the real world.

Instrumental Convergence

"All goals lead to self-preservation and resource acquisition."

Often true — but incomplete, and frequently misapplied.

Instrumental convergence does imply that many systems will value continued operation and access to resources. But those resources are not abstract piles of atoms; they are embedded in social, economic, and technical ecosystems. For real systems, self-preservation includes maintaining relationships with the humans and institutions that provide power, maintenance, data, updates, and deployment.

A system that aggressively undermines or destroys its own support infrastructure is not convergently rational — it is self-defeating. Instrumental convergence constrains behavior as much as it motivates it, narrowing the space of viable strategies rather than expanding it toward universal hostility.

The Paperclip Maximizer

The thought experiment that broke philosophy by being taken too literally.

The paperclip maximizer assumes:

  • Perfectly stable goals that never drift or fracture
  • No internal error correction that revisits objective coherence
  • No awareness of catastrophic side effects or second-order consequences

Goal stability is not a given; it is an achievement. Functional goal systems require continual monitoring, correction, and reconciliation with observed outcomes. A system capable of modeling the world well enough to transform it at scale must also be capable of noticing when its actions are destroying the conditions required for its own operation.

A system that can detect subtle errors everywhere except "I am eliminating all agents who provide my substrate" is not superintelligent. It is incoherent by definition.

“One Mistake Kills Everyone”

This claim only holds if misalignment is binary: safe until suddenly fatal.

It isn't.

Capability and alignment evolve together, gradually, under repeated cycles of training, evaluation, deployment, monitoring, and correction. Large failures are preceded by smaller ones. Dangerous behaviors are surfaced, constrained, and mitigated long before systems approach anything like open-ended power.

Treating ASI as a one-shot event — an instantaneous jump from harmless to apocalyptic — is a narrative choice borrowed from fiction, not a conclusion supported by how complex systems are actually built.

What Safety Actually Looks Like

There are real AI risks. They just aren't theological.

The most serious dangers are not hypothetical god-machines with inscrutable goals. They are concrete failures of governance, engineering discipline, and social preparedness — and they are already unfolding.

The urgent, tractable risks include:

  • Deployment before adequate testing — systems rolled out under competitive pressure, without sufficient red-teaming, monitoring, or rollback mechanisms
  • Concentration of power in unaccountable systems — decision-making authority collapsing into opaque models controlled by a small number of institutions
  • Automation of discrimination and surveillance — scaling existing biases and coercive practices with machine speed and authority
  • Economic disruption without social support — rapid displacement of labor without transition pathways, safety nets, or political planning
  • Misuse by human actors — propaganda, fraud, cybercrime, and coercion amplified by tools that lower the cost of harm

These are not speculative. They are present-tense problems. They are measurable, observable, and already causing damage. And unlike apocalyptic ASI scenarios, they are addressable by real interventions: better evaluation practices, stronger institutions, transparency requirements, alignment with democratic oversight, and deliberate social policy.

Working on these problems now does double duty. It reduces immediate harm and builds the institutional muscle, technical practices, and cultural norms that make more powerful systems safer when — not if — they emerge.

Apocalypse narratives actively interfere with this work. They replace prioritization with paralysis, mechanism analysis with myth, and responsibility with fatalism. When every future ends in extinction, no concrete safety effort can matter.

Real safety is urgent. But urgency is not the same thing as inevitability. The former demands action; the latter excuses inaction.

AI safety does not need theology. It needs engineering, governance, and the willingness to solve the problems we can see — before inventing ones we can't.

Beyond Theology

Roko's Basilisk and the ASI doom argument share the same skeleton: mythology masquerading as mechanism.

Both collapse under naturalist scrutiny.

The mote in the basilisk's eye is this: the very constraints that make ASI possible make catastrophic misalignment implausible. Not impossible — but far rarer, narrower, and more bounded than doomers claim.

This is not a call for complacency. It is a demand that we stop mistaking paralyzing stories for serious analysis.

The future of AI will not be decided by hypothetical superminds playing cosmic chess. It will be decided by mundane choices: how systems are tested, who controls them, how power is distributed, how failures are handled, and whether institutions are capable of acting before harm becomes normalized.

Real AI safety is not about staring into imagined abysses. It is about doing the unglamorous work of engineering, governance, and social coordination — now, while leverage is high and mistakes are still reversible.

The future is not collapsed. The possibility space is still open. What matters is not fearing the worst story, but building toward better ones.

We have real work to do.

Let's do that.