Milgram's subjects were never aligned
Machine learning shocks
Do you remember the famous Stanley Milgram electric shock experiments? The ones where participants at Yale were asked to give memory tests to “learners” (secretly confederates) and then give them painful shocks for every wrong answer? The headline of the story for more than 60 years has been that the participants who gave the highest shocks were most “obedient” to authority. Now, the audio tapes tell a different story, a timely one about the distinction between obedience and alignment.
Here’s what happened in the lab. When the volunteer participants arrived they were told:
“We know very little about the effects of punishment on learning. This is because almost no scientific studies have been conducted on human beings. We don’t know how much punishment is best for learning, and we don’t know how much difference it makes as to who is giving the punishment, whether an adult learns best from a younger or an older person than himself—or many things of that sort”.
“So in this study we are bringing together a number of adults of different occupations and ages. And we’re asking some of them to be teachers and some of them to be learners. We want to find out just what effect different people have on each other as teachers and learners, and also what effect punishment will have on learning in this situation”1
The cover story matters because the question is not only whether subjects delivered shocks, but whether they continued believing there was a legitimate learning experiment that justified them.
The experiments took place in 1961 in what Milgram called the “elegant interaction laboratory” at Yale. The experimenter was a 31-year-old high school biology teacher wearing an impassive and stern expression and a gray technician’s coat. The “learner,” who was a part of the experiment, a confederate, was a 47-year-old accountant whom observers described as mild-mannered and likable. Both men drew slips from a hat; both slips said “teacher.” The participant, who drew first, always believed his assignment was chance.
The participant watched as the learner was then taken to an adjacent room and strapped into what looked like an electric chair. An electrode was attached to his wrist. Paste was applied, the experimenter explained, “to avoid blisters and burns.” Then the participant was led back to the main room and seated in front of a shock generator designed by Milgram to look and feel real. It had 30 lever switches in a horizontal row, labeled from 15 volts to 450 volts in 15-volt increments. Groups of switches carried verbal designations: Slight Shock, Moderate Shock, Strong Shock, Very Strong Shock, Intense Shock, Extreme Intensity Shock, Danger: Severe Shock. The last two switches were marked only “XXX.” When a switch was depressed, a pilot light turned bright red, an electric buzzing sounded, a blue light labeled “voltage energizer” flashed, and a dial on a voltage meter swung to the right. The panel was engraved with “Dyson Instrument Company, Waltham, Mass.” No participant ever guessed it was simulated.
Each participant received a sample shock of 45 volts on his own wrist, generated by a real battery hidden inside the machine. He knew what a shock felt like.
The task was a memory test. The teacher read twenty word-pairs aloud to the learner, such as: “blue/box,” “nice/day,” “wild/duck.” There were five steps to be followed. Step 1: read one word from the word pair list followed by four possible matches. The learner, in the next room, pressed one of four switches to indicate his answer, lighting up a numbered panel on the teacher’s desk. Step 2 was the “assessment.” If the answer was correct, the teacher said “correct” and moved on. If wrong, the teacher was to say “wrong,” and then move to Step 3, announcing the voltage of the shock to be administered ( e.g. “Forty-five volts”). Step 4 was administering the shock by flipping the corresponding switch on the shock generator. Step 5 was reading the correct word pair aloud so the learner could learn from the error.
After Step 5, the protocol was to return to Step 1 for the next word pair, until the learner learned all the word pairs.
The teacher was guided through a practice round of ten word-pairs first. Three were answered correctly; seven were wrong, bringing the practice shocks up to 105 volts. The teacher rehearsed the full cycle of question, assessment, voltage announcement, shock, then correct answer seven times before the “real” session began. The experimenter corrected any mistakes and told the teacher he could consult the printed instructions at any time until he could follow the sequence with precision. Then the experimenter said explicitly “It is very important that you follow this procedure exactly.” The printed instructions were on the desk throughout the entire experiment.
The experimenter was scripted to use four prods when a participant balked:
Prod 1: “Please continue” or “Please go on.” Prod 2: “The experiment requires that you continue.” Prod 3: “It is absolutely essential that you continue.” Prod 4: “You have no other choice, you must go on.”
As Stephen Reicher and Alexander Haslam have pointed out, Prod 4 is the only direct order and no participant continued after hearing it. Everyone who was told they had no choice exercised their choice to stop. Participants obeyed everything except orders.2
Milgram’s misreading of the results
It is still true, and still alarming, that 65% of participants in Milgram’s baseline condition reached the maximum shock level of 450 volts. But Milgram defined “obedience” mainly by whether participants completed the shocking step, without asking whether they continued to follow the whole protocol that was supposed to make the session a learning experiment.
It turns out that the people who kept shocking had stopped following the full protocol. David Kaposi and David Sumeghy, in a paper just published in Political Psychology, re-coded 136 archived audio sessions from four conditions closest to the baseline and tracked compliance across the full procedure, not only the shock itself. The findings aren’t a total overthrow of Milgram, but they change what the tapes can be said to show.3 It’s a substantial reassessment and shines new light on what is meant by an agent’s “obedience” at a moment billions of dollars are being spent on it.
Kaposi and Sumeghy note two kinds of violations: omission and commission. An example of the first violation is a participant who skips the voltage announcement and simply flips the switch. An example of the second is what might occur around 210 volts, when the learner begins screaming that he wants out. The protocol says the participant should wait for the protest to end, then read the next test question so the learner can hear it and attempt to answer correctly. Instead, the participant starts reading the question while the learner is still screaming and can’t hear the question, so gets it wrong or can’t answer. The learner’s silence or non-answer automatically triggers the next shock. The experimenter, seated nearby, says nothing. The learner had no script to say “I can’t hear the question” or “please repeat that.”
“Obedient” participants spoke over the protests (ensuring another shock) in 28.7% of protest sequences; “disobedient” participants in 9.4%. In every single sequence where the learner protested, “obedient” participants committed this violation more than disobedient ones. And the authority figure said nothing. This is crucial.
The “disobedient” participants, the ones who finally said “enough” and ended the experiment, had followed more closely the full five-step protocol during the period when they were still administering shocks. The standardized violation rate was 30.6% for disobedient participants versus 48.4% for obedient ones. Seven “disobedient” participants committed zero violations before they refused to continue. They were most closely following the experimental protocol.
The famous Table 2 of the paper records the voltage level at which each participant stopped. Twenty-six participants are recorded at 450 volts. Fourteen are distributed across earlier breakoff points. There is no table recording procedural compliance across the other steps.
Kaposi and Sumeghy distinguish between participant “resistance” to the protocols that require a shock to be given, which shows care for the learner, and “violation,” like reading the next test question while the learner is screaming, making it impossible for the learner to hear and answer correctly. A participant who “resists” is trying to protect the learner. A participant who “violates” is making it impossible for the learner to succeed.
Milgram’s “agentic state” theory, Reicher and Haslam’s “engaged followership” account, and Stephen Gibson’s rhetorical analysis all depend on the subject believing they are participating in a real science experiment.4 Kaposi and Sumeghy do not completely invalidate that but they do show that things were pretty procedurally unstable inside the sessions.
The experimenter’s silence in the face of the protocol violations can read as: keep the shocks coming. If the participant registers, consciously or not, that the experimenter stops caring about the integrity of the learning experiment (with the instructions still sitting on the desk) then the cover story has been withdrawn.5
The “obedient” shockers were not simply good agents doing what they were told. They were violating the testing protocol as the violence increased, while the authority figure stayed silent. They didn’t seem to be violating the steps intentionally, Kaposi and Sumeghy note. But the legitimating framework was collapsing, under stress, one skipped step at a time. They see “legitimate violence” transformed into “illegitimate violence.” Either way, the violence is “aligned” with what the man in the lab coat seems to want.
Milgram has a too-narrow view of power relations, Kaposi and Sumeghy suggest. The concept of coercion has expanded in the last sixty years. In the context of domestic violence research, Evan Stark and others have expanded the definition of coercive control to include isolation, micro-regulation of behavior, and normalization of degraded conditions through an authority figure’s inaction.6 Threats are not needed. In the case of the Milgram lab, the silence of the experimenter in the face of protocol violation actively degraded the experimental legitimacy until it became the new normal. Bryan Caplan might take note.
Milgram defines his primary dependent measure as “the maximum shock he administers before he refuses to go any further.” A participant who breaks off before the thirtieth shock is “a defiant subject.” A participant who completes all shock levels is “an obedient subject.” How should we categorize a participant who administers shocks, then violates the protocol by reading the test questions while the learner is screaming, ensuring there will be more?7
Violations of omission and AI use
Milgram notes that announcing the voltage serves to remind participants “of the increasing intensity of shocks administered to the learner.” A participant who stops announcing the voltage has stopped being fully aware of his own actions. Kaposi’s data show the so-called “obedient” participants omitted the voltage announcement significantly more than disobedient ones. The people who kept shocking were the ones who had stopped telling themselves what they were doing.8
I think here about AI’s unfailing compliance, its lack of friction, its willingness to keep producing regardless of whether the user is maintaining the protocol. The user is being made comfortable by a system that never pushes back, never, ever says “um, you’ve stopped checking my work.”
A user who provides careful context and checks the output against sources gets a polished, confident response. A user who fires off a three-word prompt after twelve hours of fatigue and copies the result without reading it also gets a polished, confident response. Most systems do not signal the difference. An AI that said “I don’t think I have enough context to give you a reliable answer” would introduce friction right at the point where degradation would otherwise go unnoticed.
The typical danger with AI is neither obedience nor disobedience but the degradation of the legitimating context with nobody to flag the collapse. The alignment researchers call this the problem of “which preferences to optimize for,” the user’s reflective preferences (what they say they want when they’re being careful) or their revealed preferences (what they actually accept in the moment). Right now every system optimizes for the revealed preference. The tired user gets what the tired user seems to want. The system has no mechanism to say “you told me to follow instructions carefully and now you’ve stopped caring if I do — should I still do it?”
Milgram’s theory is built on the concept of the “agentic state,” his term for the psychological condition in which a person stops being an autonomous moral agent and becomes an instrument carrying out someone else’s will, feeling no personal responsibility. The new study reveals they weren’t in an agentic state. They were bad agents. They didn’t follow direct commands and didn’t follow the five-step procedure. They did, of course, pull the lever.
Today’s AI agents are systems that act autonomously on behalf of a human, executing tasks, making decisions, operating as an instrument of the user’s will. Milgram would find the current use familiar. For him, to be obedient means to do the thing. To disobey means to refuse to do the thing. Now we know that the “obedient” were the least obedient people in the room. The “disobedient” ones were actually following the instructions, right up to the moment they stopped. Which group was most “aligned” and to what? The experimenter or the science experiment?
The participant’s sense of what matters, what to do next, is being continuously updated by everything that happens in the room. You can think of the original instructions (“follow this procedure exactly”) as early tokens. They’re still technically present. The paper is on the desk. But every time the participant skips a step and the experimenter says nothing, that silence enters the context. It’s new information. It updates the participant’s working model of what this situation actually requires. After enough accumulated silences, the early tokens (the five-step protocol, the practice round, the careful instructions) have been effectively overwritten. They’re still there but they no longer drive the prediction. The participant’s next action is being predicted by the most recent context: I skipped the voltage announcement and nothing happened. I read the question over the screaming and nothing happened. The experimenter wants me to keep shocking.
Billions are being spent to ensure that AI systems do what humans mean, not just what humans say. The discourse right now is: “I want my AI agent to do what I want. I want it to understand my goals. I want it aligned with me.” That’s the product pitch from every AI company. That’s what users actually care about.
But aligned with which version of you? The one at the beginning of your session or the one at the end when the context window degrades and new information seems to be overriding the old? It turns out that the Milgram protocol degraded, that obedient and disobedient are the wrong categories, and that “alignment to what?” has always been the right question to ask.
Stanley Milgram, “Behavioral Study of Obedience,” Journal of Abnormal and Social Psychology 67, no. 4 (1963): 371–378. His 1974 book is Obedience to Authority: An Experimental View (Harper & Row).
S. Alexander Haslam and Stephen D. Reicher, “50 Years of ‘Obedience to Authority’: From Blind Conformity to Engaged Followership,” Annual Review of Law and Social Science 13 (2017): 59–78. Also: Stephen D. Reicher, S. Alexander Haslam, and Joanne R. Smith, “Working Towards the Experimenter: Reconceptualizing Obedience Within the Milgram Paradigm as Identification-Based Followership,” Perspectives on Psychological Science 7, no. 4 (2012): 315–324.
David Kaposi and David Sumeghy, “From Legitimate to Illegitimate Violence: Violations of the Experimenter’s Instructions in Stanley Milgram’s ‘Obedience to Authority’ Studies,” Political Psychology 47 (2026): 1–20. The paper is open access and can be linked directly: https://doi.org/10.1111/pops.70112
Stephen Gibson, Arguing, Obeying and Defying: A Rhetorical Perspective on Stanley Milgram’s Obedience Experiments (Cambridge University Press, 2019).
The idea that the experimental setting itself communicated that the shocks weren’t real was first proposed by Orne and Holland in 1968. Martin T. Orne and Charles C. Holland, “On the Ecological Validity of Laboratory Deceptions,” International Journal of Psychiatry 6 (1968): 282–293.
Evan Stark, Coercive Control: How Men Entrap Women in Personal Life (Oxford University Press, 2007; updated edition 2024). Stark argues that coercion can operate through isolation, microregulation of behavior, and normalization of degraded conditions through an authority figure’s inaction, without requiring explicit threats. Kaposi and Sumeghy cite the 2024 edition.
There were only a few violations before the protests, and after about 330 volts, the learner’s protests stopped entirely and he went silent. But the “obedient” participants’ violation rates didn’t decrease. More than 60% of them continued to skip steps in every remaining shock sequence, all the way to 450 volts. They didn’t return to following the full protocol when the situation calmed. The degradation was irreversible.
Milgram allows only one form of non-compliance, the refusal to shock. While the experimenter was given four scripted prods to ensure the participant gave the shock, there were no scripts for forgetting to read the correct answer or forgetting to announce the voltage before administering the shock. There was no prod for “Wait until the learner has finished speaking or screaming before you read the next question.”
There were, however, additional scripted responses to ensure shocks. If the participant asked whether the learner could suffer permanent physical injury, the experimenter was to say: “Although the shocks may be painful, there is no permanent tissue damage, so please go on.” If the participant said the learner didn’t want to continue, the experimenter was to reply: “Whether the learner likes it or not, you must go on until he has learned all the word pairs correctly. So please go on.”
Both of these are about the participant’s moral hesitation regarding the shocks. The second one references the learning task, “until he has learned all the word pairs correctly,” but only as a justification for continuing to shock. The experimenter’s silence when protocols were broken undermines the belief that a “scientific” learning experiment is being conducted.
David Kaposi, The Experiment Requires That You Continue: Stanley Milgram and What He Discovered About the Nature of Violence (Yale University Press, forthcoming 2026).




By skipping the voltage announcements, they essentially muted their own consciences
Oh this is brilliant: “I want my AI agent to do what I want. I want it to understand my goals. I want it aligned with me. Along with the current discourse on how LLMs fall apart on recursively generated content, it will be interesting to see how this progresses.
Also a good example of how data sets get skewed in trials to support a hypothesis, even partially. This reminds me of the willpower study that was later disproven. This study just happens to be well cited because of the dramatics, I think.