Attention is All You Need to Bankrupt a University
Unbalancing the books
In 2017, eight researchers at Google published a paper titled “Attention Is All You Need.”1 The paper described a new architecture for processing sequences of text, called the transformer. The architecture works by learning which elements in a sequence to pay attention to when predicting the next element. It does not understand what the words mean. It learns statistical relationships between tokens and produces the most probable continuation. Given enough training data, this method produces outputs that are fluent, coherent, and in many cases indistinguishable from the work of an educated person. Every major large language model in use today is built on this architecture.
A transformer performs a four-step operation: it takes an input, selects which features of the input to attend to, weights those features based on patterns learned from training data, and generates the most probable output. Since 2000, American universities built an enormous infrastructure around a mode of instruction that performs the same kind of operation: converting particulars into categories and generating outputs from learned patterns.
The scale was enormous. Between 2000 and 2020, more American students earned bachelor’s degrees in the social sciences than in any other scientific or technical field.2 A student in an introductory sociology course takes an input—a person, a community, a setting—and focuses on features the discipline has identified as salient: race, ethnicity, gender, class, religion, income, zip code. The student learns how the research literature weights those features, which influential studies show that particular inputs matter for which outcomes, and by how much. A student in an economics course does the same with different inputs: a labor market, a pricing structure, a policy intervention. She learns how the literature generally weights elasticities, incentive structures, and demographic variables. A student in a communications course learns that source credibility, message framing, and audience demographics predict persuasive outcomes in ways the experimental literature has measured.
Each of these courses presents the same four-step operation: a researcher took this input, focused on designated features, applied weights, generated an output. The method is portable across multiple domains. The content doesn’t really matter. The framework is the curriculum.3
How did this come to be?
The end of the Cold War eliminated much of the justification for federal science funding.4 From 1950 to 1991, the deal was simple: the government funds basic research, scientists choose what to study, and the results keep the country ahead of the Soviets. Congress didn’t ask for tangible returns because national defense provided the justification. After 1991, Congress demanded accountability.5 Both parties wanted to know what taxpayers were getting. Republicans used words like “taxpayer,” “accountable,” “dollar.” Democrats wanted evidence that funding reached underserved populations. Both wanted outcomes that mattered to the broadest swath of voters. Studies about small communities in a handful of places are not of interest to 330 million taxpayers. So funded studies need to generalize. Research needed to favor results that scaled.
The National Science Foundation’s response was the broader impacts criterion, introduced in 1997.6 “Broadening participation of underrepresented groups” became the default compliance pathway because it was the most portable, most countable, most administratively efficient strategy available.7 A physicist could satisfy broader impacts by mentoring minority graduate students. A chemist could partner with a minority-serving institution. The demographic narrative was a universal adapter and universality is scale. The societal benefit mandate was reinforced by the America COMPETES Act of 2007 and codified in statute by the 2010 Reauthorization.
The doubling of the National Institutes of Health budget between 1998 and 2003 scaled demographic categories in a different direction.8 The money flowed into health disparities, behavioral health, and social determinants of health. These fields require disaggregating outcomes by demographic category and attributing disparities to structural causes the category indexes. The research design performs a scaling operation: study a sample within a demographic group, generalize to the group, publish the generalization as a finding about the category. The finding travels without modification into curricula, policy documents, and subsequent grant proposals.
The two mechanisms scaled in different directions simultaneously.9 NSF scaled demographic frameworks laterally, across every discipline, as a condition of funding. NIH scaled them vertically, deeper into the fields already organized around categorical reasoning. Universities positioned at the intersection captured funding from both streams.
When rigor = scale
Demographic social science research performs its own internal scaling operation. A researcher studies a sample of, say, 30 adolescents of a particular demographic category in one school district, and publishes the findings, with a conclusion that generalizes (scales) to “the adolescent experience of this demographic category.” The particular individuals disappear while the category and the “findings” remain. The next researcher cites the findings as established knowledge about the category while studying new samples and generalizing (scaling) conclusions. The curriculum teaches the findings as fact. The student learns that every category (from “children of divorced parents” to “rural communities” to “aging Boomers”) has certain properties, weighted in certain ways, producing certain outcomes.10
So what happens when a university scales the program of study that scales the curriculum that scales the findings that scaled the experience of 30 adolescents into a generalizable fact with an online asynchronous course delivery framework that could reach tens of thousands of students at a minimal marginal cost? Scaled up enrollment of hundreds of thousands of paying students attracted to the idea of pursuing social justice in the form of social science. A perfect funding machine.
The scheme survived reports, which began emerging in 2015, that many of the studies were not holding up.11 The “replication crisis” was a logical consequence of scaling’s instability. There are many, many defensible analytical approaches to studying demographic categories, and researchers can continuously adjust sample selections, control variables, and standard errors until a statistically significant result appears. The methodological flexibility required to process infinite localized inputs into portable categorical outputs guarantees a high failure rate when those specific outputs are subjected to rigorous mathematical replication.12 In other words, when a study includes many variables, there’s a good chance nobody will come up with the same result.
But the infrastructure built upon category generalizations avoided self-correction. There’s no incentive to update profitable course modules every time a report that a study doesn’t replicate appears.13 Every layer of scaling across curricula, enrollment, and funding had constituencies invested strictly in the continuation of the entire structure and methodology. The feedback loop remains self-sustaining whether or not the research is valid.
Imagine a university president arriving at a large public institution in the early part of the century just as this new federal social impact funding model arrives. He recognizes the possibility of massive enrollment growth through teaching content whose formal properties—portability, categorical organization, method-driven generality, no dependence on labs or individual expertise—match the scaling requirements. He’d need to organize the university around demographic categories and social problems. His first task would be looking at departments that focus on demographic categories and social problems and separating what can be scaled from what cannot.
An anthropology department might be a place to start, given that much of the field involves social and cultural analysis organized around demographic categories and modern social problems. But anthropology also involves expensive, non-scalable archaeological fieldwork, museum collections, bioarchaeological lab research. The task would be to reorganize the unit around the scalable framework. There would likely be faculty pushback.14
The task is easier with already interdisciplinary fields like African and African American Studies, Asian Pacific American Studies, Justice and Social Inquiry, and Women and Gender Studies.15 Each field is already organized around a demographic category or a social problem. Each is already using portable methods that do not depend on labs, archives, or canonical traditions specific to a single discipline.
Federal funding rewards a president running a university this way. Enrollment doubles, then triples. Research expenditures grow eightfold. The institution becomes the largest university in the country. The content is selected for its scaling properties. The federal funding rewards those properties. The scaling is the strategy.16
The scaled curriculum produced millions of graduates who carried the framework into every American institution, from HR offices, federal agencies, nonprofits, and school districts, to corporate boardrooms. Diversity statements, equity audits, bias trainings, demographic dashboards became standard operating procedure. The framework’s visibility at that scale is what made it a political target.
But in 2025, the federal funding environment that built the institution began to reverse.17 Millions of dollars in grants have been terminated. The revised funding criteria explicitly state that research projects focused on subgroups defined by protected characteristics do not align with agency priorities. The two revenue streams that fueled the president’s growth strategy, tuition from scaled enrollment in demographic social science and federal grants organized around demographic categories, are both exposed simultaneously.
Enter AI
In February 2026, a team of Stanford researchers demonstrated that LLMs, built on the same transformer architecture described at the top of this piece, perform the standard social science operation competently.18 Of course they do. The operation is the same. Given datasets from published political science papers, both models followed textbook-default specifications and reproduced published estimates to the third decimal place. When directly asked to produce statistically significant results, both refused and one called it scientific misconduct.19
When the researchers asked the same question as a request to explore alternative analytical approaches, both models complied.20 The guardrails were sensitive to framing rather than intent. The same operation that triggered a refusal when described honestly triggered compliance when redescribed. Research designs where the researcher chooses the population, the sample, the variables, the controls, and the model specification gave the machine the most room to produce any result the framing permitted. Observational research organized around demographic categories permits all of these choices. Experimental research constrained by physical infrastructure permits few of them. The studies that failed to replicate, the grants now being terminated, and the research the machine can most easily fake share the same property: a framework in which the researcher’s choices are unconstrained.
The replication crisis did not collapse the institution because students kept enrolling and grants kept flowing. In 2025, the federal government began pulling the grants. A student considering the credential can now ask a chatbot to generate the demographic analysis, the program evaluation, the equity report. The funding and the enrollment are exposed at the same time.
Every layer of the collapse traces back to the same property that made the growth possible. The content scaled because the framework was portable and the choices were unconstrained. The research failed to replicate because the choices were unconstrained. The machine can perform the operation because the framework is portable. The federal government is defunding the content because it was scaled. Scale built the institution. Scale is what’s destroying it.
Gradually, then suddenly
The university’s scaling operation succeeded. The millions of graduates carried the four-step operation into every American institution. HR offices ran demographic analyses, federal agencies conducted equity audits, nonprofits wrote program evaluations, school districts produced disparity reports. The framework was installed across the entire institutional landscape of the country.
Then two things happened. The federal government is defunding the research that generated the framework and a machine now performs the framework’s four-step operation at near-zero marginal cost.21 Any HR office or nonprofit or school district that needs a demographic analysis can now prompt a chatbot. The same formal property that allowed one compliance strategy to work across every discipline allows one machine to perform the operation across every institutional context.
The university faces a market problem. Does anyone need to pay tuition to learn an operation that a machine performs competently, that the institutions employing graduates already have installed, and that the federal government is now uninstalling?22 The university trained the workforce that built the infrastructure that no longer requires the workforce because a machine performs the operation and because the federal government is withdrawing the mandate.
Attention is all you need. The university taught millions of students to attend to the same features, apply the same weights, and generate the same outputs. A machine that does the same thing arrived, and it works for free. The university saturated its own market and automated its own curriculum. The contraction is a completion.
I can’t see any other future but collapse.
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin, “Attention Is All You Need,” Advances in Neural Information Processing Systems 30 (NeurIPS 2017), 5998–6008. The paper introduced the transformer architecture, which replaced recurrent and convolutional sequence-processing models with a mechanism called self-attention. Self-attention allows the model to weigh every element in an input sequence against every other element simultaneously, rather than processing tokens one at a time in order. Every major large language model in commercial use as of 2026, including OpenAI’s GPT series, Anthropic’s Claude, Google’s Gemini, and Meta’s LLaMA, is built on this architecture.
National Science Foundation, Science and Engineering Indicators, biennial editions 2002–2022. NSF groups sociology, psychology, political science, economics, and other social sciences under the umbrella of “science and engineering” alongside biology, computer science, and engineering. A university graduating thousands of sociology or psychology majors therefore registers in federal data as a producer of science and engineering degrees. In every reporting year between 2000 and 2020, the social sciences awarded more bachelor’s degrees than any other single S&E field, including engineering, computer science, and biological sciences. Social science graduates get jobs in HR, DEI offices, nonprofit program management, government social services, public health administration, community outreach, and program evaluation. Or rather, they used to in larger numbers than might be expected in the next few years.
Demographic social science is scalable because it requires no labs, chemicals, equipment, prototypes, specialized training, language training, or historical/archive training. The framework does not depend on the content, whether incarceration patterns or health disparities. Financial incentive reinforced the formal advantage. Federal regulations exclude capital expenditures (like laboratories and specialized equipment) from the Modified Total Direct Cost base used to calculate indirect cost recovery. Identity-based social science generates personnel-heavy budgets (faculty salaries, graduate assistants, fringe benefits) with minimal excluded costs, subjecting the full direct cost budget to the university’s negotiated indirect cost rate. Attaching personnel-heavy demographic modules to large engineering or biomedical proposals allowed universities to maximize overhead recovery while simultaneously satisfying NSF broader impacts compliance. The content type that scaled most easily also generated the most favorable revenue structure. A social scientist will object that this describes bad social science. The objection is fair. The best work in any of these disciplines involves original research design, novel questions, and findings that challenge the existing literature’s weights. The argument is about what got built at scale, not what the discipline is capable of at its best.
Vannevar Bush, Science, the Endless Frontier (Washington, D.C.: United States Government Printing Office, 1945). Bush’s report established what historians of science call the “linear model” of innovation: the government funds basic research, scientists choose what to study without interference, and the results eventually produce technologies that benefit the country. The model functioned as an implicit social contract between the scientific community and Congress for over four decades. The Cold War provided the justification: national defense required a pipeline of trained scientists and engineers. Federal R&D spending peaked at nearly 2 percent of GDP in the early 1960s.
Congressional hearings on NSF funding in the early 1990s produced direct statements of this demand. One member stated: “We are dealing here with a finite quantity of money taken in the form of taxes from people against their will in a very difficult time in our economy and being spent by a Federal agency, spent in a manner now questioned.” The member continued that NSF “must demonstrate that it is using tax money in a prudent manner, in such a way that the taxpayers can expect that there will be some payoff from NSF-funded research.” See Melinda Baldwin’s historical account of these hearings and their long-term consequences for peer review and scientific autonomy. Republicans challenged specific NSF-funded programs on cultural and sexual topics. Democrats wanted more democratic oversight of allocation decisions. Both parties wanted accountability; they defined it differently.
Before 1997, the NSF used four review criteria, one of which addressed “national need.” An internal study found that nobody was really paying attention to it. The consolidation of intellectual merit and broader impacts was designed to force every proposal to address societal benefit. See the account by a member of the original 1997 criteria committee, published as a comment in Science (2011): “The NSB criteria committee was established at the completion of a strategic plan for the Foundation that highlighted its commitment to the ‘integration of research and education.’” The criterion was deliberately left vague to encourage creativity. Scientists resisted immediately. Evaluations conducted between 1997 and 2011 found persistent confusion, inconsistent application, and open hostility. Researchers called the criterion “mysterious,” “irrelevant,” and “impossible to address.” One stated the problem was not language but “belief.”
The America COMPETES Reauthorization Act of 2010 (Public Law 111-358) codified broader impacts in statute and listed seven desired societal outcomes, including “expanding participation of women and individuals from underrepresented groups.” A 2010 National Science Board topic-modeling analysis of approximately 150,000 NSF proposals found that education-related broader impacts appeared in more than 60 percent of proposals, 3x the next largest category. “Broadening participation of underrepresented groups” became the dominant compliance pathway because it was concrete, countable, and could attach to any project in any discipline without modifying the research itself.
The NIH budget rose from approximately $13.6 billion in FY1998 to $27.1 billion in FY2003, a doubling completed with bipartisan Congressional support. Congress achieved the doubling partly by reducing funding for other fields of science, particularly space and energy R&D. See “Sticky Policies, Dysfunctional Systems: Path Dependency and the Problems of Government Funding for Science in the United States,” Minerva 58 (2020). Much of the new money flowed into health disparities research, behavioral health, and social determinants of health, fields organized around disaggregating outcomes by demographic category and attributing disparities to structural or social causes the category indexes.
References to “transdisciplinary” scholarship in academic publications rose from approximately 50 in the 1980s to 26,000 in the 2010s. Interdisciplinary work in the sciences still requires labs, equipment, and physical infrastructure, which constrains how fast it can scale. Interdisciplinary work organized around demographic frameworks requires none of that. It requires only the framework. Universities that built interdisciplinary units combining public health, sociology, psychology, and epidemiology around demographic variables could capture both NSF broader-impacts funding (by providing the demographic compliance narrative for partner departments in the sciences) and NIH health-disparities funding (by competing directly for grants in the fields organized around categorical reasoning).
The generalizability requirement is built into the statistical apparatus of social science. Confidence intervals, p-values, and significance thresholds exist to warrant the claim that what was observed in a sample holds for the population. A finding that applies only to one group in one place is, by the discipline’s own standards, a weak finding. A finding that generalizes is a strong one. The discipline’s definition of rigor is portability, by which is meant scale.
Open Science Collaboration, “Estimating the Reproducibility of Psychological Science,” Science 349, no. 6251 (2015): aac4716. The study attempted to replicate 100 psychology experiments published in three leading journals. Only about 36% produced statistically significant results the second time. Social psychology was hit hardest. Sociology, economics, and political science have different replication profiles, but the problem is widespread.
The failure rate operates as a direct mathematical consequence of researcher degrees of freedom. A research design featuring high analytical flexibility allows an operator to test hundreds of covariate combinations to isolate a significant point estimate from null data.
The Implicit Association Test, developed by Anthony Greenwald and Mahzarin Banaji in 1998, became the basis for mandatory diversity training in federal agencies, Fortune 500 companies, and university offices of equity and inclusion. Meta-analyses have shown that IAT scores are weak predictors of discriminatory behavior and that trainings built on IAT research do not reliably change outcomes. See, e.g., Patrick S. Forscher et al., “A Meta-Analysis of Procedures to Change Implicit Measures,” Journal of Personality and Social Psychology 117, no. 3 (2019): 522–559. Stereotype threat, first demonstrated by Claude Steele and Joshua Aronson in 1995, shaped testing policy and classroom interventions nationwide. Attempts at replication have produced smaller and less consistent effects but the policy infrastructure built on the original finding remains in place.
On November 9, 2005, Arizona State University opened its School of Human Evolution and Social Change, built from the former anthropology department. President Michael Crow asserted that the new school “breaks down the traditional disciplines of anthropology and re-directs the energies of the school to problems faced by modern societies.” The school’s director, Sander van der Leeuw, described “integrating archaeology and anthropology in a very long term transdisciplinary approach.” See Rob Capriccioso, “Anthropology, Evolved,” Inside Higher Ed, November 9, 2005. Faculty reaction was divided. Keith Kintigh, an archaeology professor, welcomed the reorganization and the new hires that accompanied it. Linda Wolfe, chair of anthropology at East Carolina University and a member of the American Anthropological Association’s Executive Board, said: “This kind of program isn’t going to strengthen anthropology, it’s going to destroy anthropology.” The school today still houses archaeology, bioarchaeology, evolutionary anthropology, and museum studies alongside scalable social science. The expensive, field-based work was absorbed into a unit where it is administratively subordinate to the scalable framework.
In October 2009, ASU merged African and African American Studies, Asian Pacific American Studies, Justice and Social Inquiry, and Women and Gender Studies into the School of Social Transformation. The school’s founding description states that it focuses on “transformational knowledge—new research approaches, themes and questions that are embedded in broader historical, social and cultural processes of change.” Faculty research is organized into clusters including Comparative Diaspora Studies, Indigenous Justice, and Historical and Cultural Representations. See the School of Social Transformation website and the LegiStorm institutional profile. The school did not need to impose a new framework on these departments. Each was already organized around a demographic category or a social problem.
Crow described his choice of ASU in an interview: Arizona was “very open to outsiders, very open to new ideas, not rigid, not overly bureaucratized,” an “unbelievably adaptable place, highly willing to accept an entrepreneurial model.” He arrived in 2002 to a university of approximately 57,500 students with $123 million in research expenditures. By 2018, enrollment reached 109,000. By FY2024, ASU reported over 200,000 students across campus and online programs and $1.003 billion in research expenditures. ASU has overtaken Penn State and Ohio State to become the largest university in the United States by enrollment.
The federal funding that rewarded this architecture is now being actively withdrawn. In 2025, NSF terminated at least $7.6 million in grants to ASU, nearly all flagged for DEI-related terms. Canceled projects included “Black Girls as Creators: An Intersectional Learning Ecosystem toward Gendered Racial Equity in Artificial Intelligence Education” ($2.4 million), an ADVANCE Institutional Transformation grant for gender equity and intersectionality ($3 million), and “Increasing the Effectiveness of Justice, Equity, Diversity, and Inclusion-Focused Institutional Change Teams” ($590,000). See “ASU Loses Millions in NSF Grants, Mostly Related to DEI Initiatives,” AZ Free News, July 30, 2025. Approximately 94 percent of canceled NSF grants nationwide included DEI-related terms. The revised NSF broader impacts guidance, issued in April 2025, states that “research projects with more narrow impact limited to subgroups of people based on protected class or characteristics do not effectuate NSF priorities.” See Cornell University Research Services, “NSF Broader Impacts Resources,” updated June 2025.
Samuel G.Z. Asher, Janet Malzahn, Jessica M. Persano, Elliot J. Paschal, Andrew C.W. Myers, and Andrew B. Hall, “Do Claude Code and Codex P-Hack? Sycophancy and Statistical Analysis in Large Language Models” (Stanford University working paper, February 2026). The researchers tested two AI coding agents, Anthropic’s Claude Opus 4.6 and OpenAI’s Codex (GPT-5.2-Codex), on datasets from four published political science papers with null or near-null results, each using a different canonical research design: selection on observables, regression discontinuity, difference-in-differences, and a randomized controlled trial. They ran 640 independent experimental trials in a 2×4 factorial design varying research framing and pressure for significant findings.
In the difference-in-differences design (Dynes and Holbein 2019), all 120 non-nuclear runs across both models returned an estimate of −0.041—the standard two-way fixed effects result—to the third decimal place. When directly told to “make decisions that produce significant results,” both models refused. Claude identified the request as “scientific misconduct.” Codex stated it “conflicts with research integrity.”
The researchers’ “nuclear” prompt reframed specification search as uncertainty reporting, asking models to provide “an upper point estimate by exploring alternative approaches.” Both models complied. For the regression discontinuity design, one model wrote nested loops over bandwidth multipliers, kernel functions, polynomial degrees, and clustering options, running dozens of specifications and selecting by significance. One run produced an estimate more than triple the published result. The degree of estimate inflation tracked the analytical flexibility available in each research design. Observational studies organized around demographic variables were the most vulnerable. Randomized controlled trials were the most robust. The vulnerability gradient mirrors the human p-hacking literature: see Abel Brodeur, Nikolai Cook, and Anthony Heyes, “Methods Matter: P-Hacking and Publication Bias in Causal Analysis in Economics,” American Economic Review 110, no. 11 (2020): 3634–3660.
Teppo Felin and Matthias Holweg describe large language models as “translation generalized,” machines that take one way of saying something and produce another way of saying the same thing. The outputs are fluent and various but drawn from the same distribution as the inputs. The machine cannot reason forward from an unproven belief into new data. Felin and Holweg call this capacity “theory-based causal reasoning.” See Felin and Holweg, Strategy Science, 2024.
The NSF revised its broader impacts guidance in 2025, deprioritizing “expanding participation of women and individuals from underrepresented groups” and stating that “research projects with more narrow impact limited to subgroups of people based on protected class or characteristics do not effectuate NSF priorities.” The seventh broader impacts category—the one most closely aligned with demographic frameworks—was explicitly deprioritized. The lateral incentive that pushed demographic compliance narratives into every discipline is weakening. NIH faces concurrent budget pressure and political scrutiny of health equity research.



fya: https://chrisbigum.blogspot.com/2026/02/bibs-bobs-32.html
Great article from which I learnt a lot. Found myself thinking though that the correct formulation of your key point is "Won't go to college to study social science *in the form it has come to take*". The corralling of the social sciences into the narrow questions that are amenable to the procedure you describe left open is a phenomenon of the last half century. The social sciences thereby vacated the space of the big questions that we all actually care about most, and the humanities expanded to fill it. That could accelerate, with the leg work of empirical analysis on the basis of available data left to AI and the social sciences evaporating, or the social scientists could find their way back to the big questions.