MAIM is Probably Not a Good Deal for America
The Problem of Positional Incentives
Note: This post makes primarily descriptive rather than normative claims. While I believe MAIM could reduce x-risk in theory, I argue it’s unlikely to emerge or remain stable in practice due to structural incentive problems for the leader.
Also, a simplified version of this argument can be read here.
Introduction
Feel free to skip if familiar with MAIM
Mutually Assured AI Malfunction (MAIM) is currently the most prominent strategic framework for establishing international governance of superintelligence development. A deterrence strategy modeled on Cold War nuclear doctrine, the core premise of MAIM is that nations will threaten to sabotage each other’s superintelligence projects to prevent anyone from developing ASI unilaterally.
Co-authored by Dan Hendrycks, Eric Schmidt, and Alexandr Wang, the detailed report makes several descriptive claims about what conditions (e.g., ability to carry out attacks easily on datacenters) and incentives (e.g., strong incentive to prevent a rival state from achieving military dominance) make a deterrence framework likely to emerge by default, and then makes normative claims about what strategies states should actively employ to “preserve the mutual vulnerability of AI projects” so that MAIM can produce a safe, stable deterrence dynamic (h/t Adam Khoja, a major contributor to Superintelligence Strategy).
“During the standoff, states seeking the benefits from creating a more capable AI have an incentive to improve transparency and adopt verification measures, thereby reducing the risk of sabotage or preemptive attacks.”
A brief summary of how a stable MAIM equilibrium would look in practice:
1. Countries establish red lines for dangerous AI development and collaborate on an escalation ladder so that each side understands what actions will provoke retaliation and how severe that retaliation will be.
2. If a rival crosses those lines, other nations sabotage their AI infrastructure (cyberattacks, physical strikes on data centers, etc.)
3. The mutual threat creates incentives for transparency, coordination, and more cautious development.
This post is primarily intended to address Superintelligence Strategy’s normative claim that countries are inherently incentivized to maintain a MAIM equilibrium if it arises; I do not think this is true by default, particularly as it pertains to the leader of the AI race.
Core Thesis
The foundation of my argument is that strategically, MAIM doesn’t appear to be a good deal for America– the incentives seem to be misaligned even in a future where we assume the U.S. begins to take ASI misalignment concerns seriously.
Rather than centering my argument on why MAIM won’t emerge at all, I focus primarily on why it won’t remain stable. This is because the conditions that could produce MAIM by default (e.g. observable infrastructure, intelligence capabilities, infeasibility of preventing MAIM attacks) are precisely the conditions that give the leader maximum incentive to resist cooperation.
In this piece, I make two major claims:
The leader of the AI race is actively incentivized to undermine MAIM, rendering the normative sections of Superintelligence Strategy that require international cooperation near-impossible to attain.
Whether MAIM will emerge by default is uncertain— but paradoxically, conditions that make MAIM more likely to emerge by default also make MAIM a worse deal for the leader.
1. The leader is inherently disincentivized to pursue MAIM
Maintaining MAIM equilibrium in a stable manner requires mutual visibility of AI projects– China needs to know where the U.S. stands to be able to know if the U.S. is dangerously close to achieving superintelligence. Without that visibility, deterrence almost certainly fails; if China doesn’t know that the U.S. is nearing superintelligence, the chance of a MAIMing strike is virtually zero, and the U.S. can continue development without high risk of sabotage.
As it stands, mutual visibility on AI development projects is beneficial for the rest of the world, but actively disincentivized for the leader of the AI race.
Consider this scenario:
MAIMing strikes are likely to occur as any nation becomes dangerously close to superintelligence.
Nation X currently leads the race.
If all nations shared development information, Nations A, B, C, and Y would all have strongest incentive to target Nation X— precisely because it’s furthest ahead.
In other words: transparency transforms the leader into everyone's primary target.
Now consider:
A. #3 holds true even if Nation X sees Nation Y as a bigger safety threat because Nation Y has put less resources into safety research.
B. Ideologically, Nation X sees itself as the best candidate for achieving ASI first because it is a liberal democracy, while Nation Y is a communist authoritarian regime.
Overall, Nation X wants ASI first and thinks it should develop it first.
Nation X sees itself as having the greatest chance of both keeping ASI in check and ensuring human flourishing. It regards its closest competitor, Nation Y, as more likely to cause catastrophic risk and use the power of ASI to restrict the freedoms of the world.
However, MAIM means Nation X is the most likely candidate for sabotage.
Obviously, Nation X is the United States. Nation Y is China.
Under these circumstances, I find it highly unlikely that the U.S. would subscribe to the MAIM doctrine. And since deterrence is only likely to be (1) effective and (2) stable in a world with clearly agreed-upon red lines and escalation ladders, even if MAIM equilibrium emerges without U.S. cooperation, many of the interventions detailed in Superintelligence Strategy intended to maintain its stability require international cooperation– clarifying the escalation ladder, building datacenters in remote locations, increased transparency, AI-assisted inspections– making the stability of MAIM equilibrium exceedingly unlikely.
Ultimately, the information given to the U.S. via visibility into others’ projects is almost certainly not enough to outweigh the comparative advantage the U.S. has as a result of the ambiguity.
Note: The argument that the U.S. sees itself as the safest steward is not unique to the U.S. Every major power believes it’s the responsible actor– China also claims it takes AI safety seriously and views American development as reckless. The asymmetry isn’t ideological; it’s positional. Whoever is ahead will resist transparency. So even in a world where Country X isn’t the United States, all of the above still applies.
What if the U.S. becomes sufficiently convinced of existential risk that safety concerns outweigh strategic competition? Wouldn’t this change the calculus?
MAIM assumes countries will continue racing while occasionally sabotaging each other. A genuinely x-risk-focused U.S. would either stop entirely, or focus on making American ASI safe rather than accepting a framework where rivals sabotage the country with the most safety investment.
The former case— a complete pause— is almost certainly not going to materialize unless every other country also stops their own development, and the latter case leads, again, to the conclusion that the U.S. will undermine MAIM.
2. Conditions that make MAIM more likely to emerge by default also make it a worse deal for the leader—
Consider what’s required for MAIM to emerge without international cooperation. Mutual visibility could develop through espionage, but espionage-based visibility is unlikely to produce meaningful deterrence. Without agreed-upon frameworks, what one country views as a “legitimate deterrent strike” another views as an “unprovoked first strike”— creating extreme risk that any sabotage attempt triggers all-out war.
Essentially, for unilateral MAIM to emerge, countries must be willing to launch strikes that could trigger war. This requires:
High confidence in imminent breakthrough: Without agreed-upon red lines, sabotaging countries need extremely high confidence that a rival is about to achieve ASI or a capability breakthrough that confers decisive advantage. Such breakthroughs are notoriously difficult to predict.
Willingness to act on intelligence alone: The key question is whether covert intelligence provides sufficient confidence to launch sabotage operations—especially when the stakes are magnified by the absence of coordinated frameworks and the risk of being wrong is catastrophic.
Viable unilateral red lines: Some might argue that countries could unilaterally declare and enforce red lines through MAIMing strikes, even without the target's agreement. But this faces two problems:
First, as other critics have noted, defining concrete, verifiable red lines for dangerous AI development is extremely difficult.
Second, the MAIMing country is still risking all-out war in the name of deterrence; how does unilateral deterrence avoid the escalation trap?
This reinforces the idea that without cooperation, unilateral MAIM deterrence is highly prone to escalatory instability.
Could cyber-only operations solve this problem?
One possible counterargument: cyberattacks are not typically seen as provocation of war to the extent that kinetic attacks are– their lack of visibility and low salience means they are a much less risky form of deterrence.
However, the MAIM framework itself assumes ASI development is seen as an existential national security issue. If that’s true, then attacks on ASI projects– even cyber ones– may not be treated like normal cyber operations; the stakes change the perceived gravity of the attack.
Moreover, cyber deterrence faces a fundamental paradox: if operations are deniable and subtle enough to avoid escalation, they’re unlikely to be very effective deterrents. If they’re effective enough to meaningfully halt ASI development, they will likely be visible and attributable, which makes them escalatory when targeting what both sides view as existential national priorities.
The cumulative effect of repeated tit-for-tat cyber sabotage while racing toward ASI would itself be dangerously destabilizing, potentially leading one side to conclude that cyber operations are insufficient, triggering escalation to other means.
The only scenario where I imagine unilateral MAIM might work– where the target cannot credibly deter the deterrence– is as follows (h/t Adam Khoja):
A. China MAIMs the United States using cyber operations only, believing that the U.S. is unlikely to escalate from cyber ops to kinetic strikes.
B. Even if the U.S. threatens to escalate in the event of a cyber attack, political realities make escalation beyond further cyber ops untenable– thus, the U.S. government is prevented from launching a kinetic strike.
In this scenario, taking MAIM seriously as an existential national security threat is enough to justify MAIMing, but not enough to escalate in extremely risky ways, especially given that most of the U.S. would likely be opposed to launching a kinetic counterstrike in response to a cyber op. Again, successive cyber ops still make for a very unstable dynamic, but for the sake of argumentation we’ll assume this kind of situation could emerge– I do think it’s plausible.
But this only reinforces why MAIM is such a bad deal for the leader.
MAIM only works in a world where ASI is perceived as (1) an existential security threat due to its ability to confer decisive strategic advantage, but (2) not important enough to risk all-out war over– and the MAIMing country has to essentially believe that their MAIM will not trigger escalatory countermeasures, OR that preventing another country from obtaining ASI is worth risking WW3 over.
If countries decide the possibility of triggering escalatory retaliation is too risky, no one MAIMs.
If one country thinks another country won’t escalate, then they MAIM.
If it’s true that the other country does not escalate, that means the leader of the ASI race is accepting an even worse deal by subscribing to MAIM– which gives them an even stronger incentive to avoid the conditions that enable MAIM in the first place.
If the U.S. can’t credibly deter MAIMing, they are essentially defenseless in a situation like this. The only option is to make MAIMing harder to carry out, which means hardening datacenters, improving cybersecurity, racing faster, and maintaining maximum opacity.
The bottom line is that if MAIM emerges, the leader of the ASI race is incentivized to do everything in their power to undermine it.
This concludes the section of the post focused on positional incentives; however, even setting aside these incentive problems, MAIM faces additional structural challenges that would prevent stable equilibrium from emerging. Particularly:
Even if the international community is somehow able to agree upon red lines and escalation ladders with U.S. buy-in, any agreed-upon escalation ladders cannot maintain stability for long.
Sub-argument A– lack of effective verification mechanisms
Based on the current lack of trusted, privacy-maintaining verification mechanisms, countries-- at least for the foreseeable future-- will have to rely on easily verifiable, external metrics such as the amount of FLOP used to train a model. As more advanced capabilities are run with progressively lower compute and energy inputs, this becomes less and less effective at maintaining an equilibrium. (This reason may become obsolete once trusted, privacy-preserving verification systems for AI capabilities are developed and gain international acceptance.)
Sub-argument B– unpredictable development
Even if we assume– despite the points Oscar Delaney and others have brought up, and despite the fact that the U.S. is unlikely to have incentive to do so– that the international community could establish concrete red lines for intelligence recursion and other volatile pathways, they are not necessarily the only pathway for destabilizing “leaps forward;” they might occur as a result of common development strategies, which cannot be included on escalation ladders. As evidence, consider that virtually all of the major breakthroughs (transformers, GANs, Alphafold) we’ve had up to this point have emerged from conventional development methods like scaling, architectural innovations, and training improvements, not from AI systems recursively improving themselves.
And even if we go a step further and assume, optimistically, that the international community could somehow agree on concrete and verifiable red lines for superintelligence, this would still leave open a vast and dangerous gray zone. Models that fall short of ASI could nonetheless be strategically decisive. A country that attains early AGI– or even a pre-AGI system with highly asymmetrical capabilities– could gain an overwhelming geopolitical advantage long before any formal “superintelligence” threshold is crossed. Such an advantage could be sufficient to dominate or reshape the international order unilaterally, while technically remaining inside agreed-upon boundaries.
Moreover, similarly powerful AI systems developed en route to ASI could severely undermine deterrence. Any technology that dramatically enhances a state’s intelligence, economic power, cyber capacity, or military decision-making may shift the balance of power so rapidly that existing deterrence frameworks become irrelevant. Deterrence relies on rough capability parity, stable expectations, and predictable escalation incentives, but an abrupt leap in national capabilities could dissolve those conditions instantly.
Compounding the problem is the difficulty of detecting when an AI capability crosses from “competitive lead” into “strategic dominance.” While progress at the system level is often gradual, its consequences can be sharply nonlinear. Breakthroughs frequently appear as sudden qualitative shifts emerging from incremental quantitative improvements, and the geopolitical significance of such shifts is often apparent only in hindsight. In practice, this makes any warning window narrow, ambiguous, and contested. By the time it becomes broadly agreed that a particular capability enables a transformative strategic position, the opportunity for deterrence to stabilize the situation may already have passed.
I think the paper does the best job it can of managing this by proposing basing escalation off of risky training strategies most likely to cause volatile advances, but I still think the uncertainty makes MAIM near-impossible to maintain.
Conclusion
In summary, Superintelligence Strategy makes two claims:
Descriptive: “MAIM-like dynamics will emerge by default”
Prescriptive: “States should actively maintain MAIM equilibrium through cooperation”
This counterargument asserts that:
If (1) is true, then (2) requires the leader accepting a bad deal; and
If (2) is needed, then (1) probably won’t produce stability on its own.
If MAIM emerges by default, cooperation from the leader means volunteering to become the primary target for sabotage. Without cooperation, interventions that rely on international agreements to maintain stability become impossible, rendering the normative sections of Superintelligence Strategy obsolete.
The leader will rationally choose to harden defenses, maintain strategic ambiguity, and race toward ASI– precisely the behaviors that prevent MAIM from functioning as intended.
The prerequisites for MAIM to successfully maintain stability appear unlikely to materialize given the positional incentives at play.
Thank you very much to Jason Hausenloy, Adam Khoja, Rohan Selva-Radov, Ethan Kuntz, & Saheb Gulati for feedback and discussions.

