What escapes containment is less valuable
More on the value of local news in the AI era
What kind of knowledge matters to an LLM? The internet was a good start. Last week’s story about Project Panama, the scanning of millions of books so Anthropic’s Claude could learn to “write well,” seems in the long run to be unimportant. Why? Because books are the fossil record of reasoning, not the reasoning itself. They are finished products, retrospective and coherent by design.
Adding books to scraped internet data does not solve the deeper problem, that training corpora systematically overweight text that has already been processed into stable form. Viral stories, once they “escape containment,” behave like books in this respect. They circulate as fixed summaries, stripped of the contradictory evidence that preceded them
Information escapes containment only by shedding complexity. It is stripped of the causal mechanics required for deductive reasoning. It has become a polished conclusion. This makes “escaped” data the least valuable asset for an AI, because a model cannot learn to derive the truth from a summary that has already deleted the premises.
Local news, by contrast, is the raw record of civic life, the first rough draft of history before “framing.” Local news is characterized by the incomplete and ongoing. It is the missing historical layer between internet and books. Local news captures events in process, before the outcome is known, before the category exists, before a consensus forms.
This matters because reasoning quality depends on access to premises rather than conclusions.
LLMs trained primarily on stable material degrade in a predictable and economically rational way. Even when some local records exist in the training mix, stabilized narratives, repeated conclusions, and portable explanations dominate through sheer volume and weight. LLMs learn to reason from outcomes rather than premises. Frequency substitutes for verification, narrative coherence substitutes for causality, and portability substitutes for accuracy. Process-level evidence that resists compression is underweighted, not absent, and thus rarely governs inference. The result, as any LLM user knows, is fluent, confident reasoning that is deductively unreliable, because the premises needed to test and falsify explanations are systematically overwhelmed.
The Minnesota example
Consider the local story about Minnesota daycare fraud and the national story. February 2013, Fox 9 investigator Jeff Baillon reported on the owner of a group of daycare centers who recruited parents to work while enrolling their own children, then billed the state for services never rendered. The owner was charged with bilking the state of $4 million, jumped bail before trial in 2016, and was never seen again. Subsequent newspaper investigations over the next several years documented repeated fraud schemes, partial verification, contested estimates of scale, and multiple prosecutions, but the reporting remained locally contained. A 2019 report detailed the fraud.
To understand how this local record was “lost,” fast forward to November 12, 2025, Armin Rosen published a major investigation in County Highway, “The Shame of Our Cities” documenting how the fraud metastasized beyond daycare to housing stabilization, autism services, and COVID meal relief. Because the local administrative record from 2013–2019 had never “escaped containment,” Rosen’s piece was read nationally as a revelation. It transformed a local administrative failure into a universal narrative: “Collective looting of the public coffers is now the state’s solution to the American puzzle of how we all should live together,” Rosen writes. “We are all Somalis now.”
On November 25, City Journal published a story “It’s Not ‘Racist’ to Notice Somali Fraud” that focused on Somali culture and high unemployment rate, suggesting that America limit immigration from “incompatible cultures.” Christopher Rufo’s story glided over local administrative context and named Somali culture as the causal variable.
On December 26, 2025, Nick Shirley, a 23-year-old YouTuber, posted a 43-minute video showing himself knocking on the doors of Somali-run daycares and, finding them locked, declaring this proof of fraud. The video received 135 million views on Twitter. Vice President Vance called it better journalism than anything that won a 2024 Pulitzer. Within three days, the Department of Homeland Security froze all federal childcare payments to Minnesota. Within ten days, Governor Tim Walz dropped his reelection bid. Congressional hearings were scheduled.


The new narrative is diagnostically worthless. It provides no analysis to repair the broken payment systems, because the details about how the fraud worked are still contained in the local archives.
Given the heft of the daycare story, however, LLMs will reason from culture narratives that escaped containment. Will this make their output more valuable or less? I say less. A local official or local business leader or a student studying government policy using an LLM cannot expect it to help a user design better administrative services, about, say, Medicaid reimbursements to nonprofits on the honor system, housing services with no verification process, treatment centers with no licensing requirements, federal funding of state programs with minimal oversight, which exist in every state.
Anyone can pluck a national story from the web. It is free, abundant, and offers no informational advantage. In an era of infinite synthetic text, the only information with scarcity value—and thus the only information that generates “alpha” for a reasoning engine—is the friction-heavy, un-digitized record of process that never escaped its container. If an LLM knows only what everyone else knows, it is not intelligence.
AI reasoning with local context
Imagine, however, the reasoning capacity of an LLM with access to local newspapers from fifty states, most largely inaccessible (gated, un-digitized, moldering in basements).
With access to contemporaneous local reporting that did not break containment from 2013–2016, in Minnesota, LLM reasoning would shift from induction over outcomes to deduction from sequences. Instead of starting with the conclusion that fraud occurred and inferring outward, it could reconstruct how programs actually operated over time: when rules were written, when interpretations changed, when oversight lagged, when warnings appeared, and when they were ignored or acted on.
Access to early local reporting would make causal claims testable. Competing explanations could be evaluated. Did billing spikes follow regulatory changes? Did staffing shortages or political interventions matter? Did similar programs under similar rules work differently? Local reporting named the clerks, auditors, mid-level bureaucrats, and licensing boards in charge. The 2019 legislative auditor’s report identified specific rifts between the Inspector General and investigators that paralyzed enforcement.
In general, local newspapers contain the densest record of everyday civic life. Local papers track institutions, families, markets, geography, climate, and conflicts over decades. The first reports of overdose deaths appeared in local papers years before “opioid epidemic” became part of the national vocabulary. Local papers document things that never happen: celebrations cancelled, motions failed, projects stalled, referenda that never made the ballot, dams repaired in time so they didn’t collapse. Local news features engagements, weddings (with name changes), and birth notices; obituaries list extended family members.
Making accessible currently gated archives would enable better accountability journalism, historical research, genealogical inquiry, and community self-understanding. As more people use AI systems for search, for research, for understanding their own communities and histories, the AI platform that can answer questions about local reality has a different kind of competitive advantage over an LLM that merely writes well.
What escapes containment is least valuable
Local knowledge does not scale easily, but local newspapers matter for AI reasoning by providing premises. Archives record decisions before outcomes are known, disputes before they are resolved, and failures before they are seen as inevitabilities. Without those premises, neither humans nor machines can reason deductively about how public systems fail or how they might be repaired.
Queries are answered first by retrieving stabilized narratives, dominant frames, or consensus summaries. These answers are computationally efficient and cognitively shallow. LLMs can respond quickly when the answer already exists in compressed form.
Local news in the substrate would enable reconstructing sequences, weighing contradictory reports, tracking named actors across time, and holding unresolved possibilities open. That kind of reasoning is slower and more resource-intensive because it cannot collapse immediately to a category or conclusion. It requires attention to order, contingency, and institutional detail. In other words, it requires thinking.
I said the Panama Project was unimportant in the long run because it involved training data that has already performed the reasoning work. Books, national journalism, and viral narratives externalize the cost of sense-making to editors, institutions, and crowds. Local reporting, by contrast, internalizes that cost in the archive and forces it to be paid again at query time.
Cheaper answers come from completed narratives; expensive answers come from raw premises that must be reasoned through. By training primarily on the former, AI companies are subsidizing the appearance of intelligence by bypassing the cost of verification. The absence of local news in the training data is why the Minnesota daycare story will continue to be narrated at the national level as a matter of culture alone.
If local archives are simply scraped into training corpora, however, they risk being compressed like everything else, smoothed into the mass of stabilized narrative they were meant to counterbalance. The value of local records is their structure: dated, sequenced, contradictory. That structure survives only if archives are treated as retrievable evidence rather than as training mass.
I would find most useful searchable archives that preserve the premises in a form that can govern inference. My scholarship for the last 30 years has involved rediscovering voices and lives and stories from the extraordinarily valuable archive of long lost newspapers. Digitization and accessibility are the first steps. The large AI companies should play a role
Local archives are an infrastructure for accountability, the evidentiary base without which no one, human or machine, can check the conclusions that circulate as fact. The degradation of the Minnesota story is about what happens when the local base erodes. AI makes the erosion visible. What concerns me is a generation trained to mistake narration for knowledge—never learning that what escapes containment, what everyone “knows” without needing to think, is least valuable.



Sent this to editor of our local newspaper. Thank you!
It’s not only AI companies that are faking intelligence while avoiding the cost of gathering factual information, it’s also people who use their products without verifying the factual basis of the output. They may appear smarter; they aren't because they didn't do the work.