Discussion about this post

User's avatar
gwern's avatar
Feb 8Edited

LLMs may have knowledge cutoffs and be 'out of date', but this is not an intrinsic limitation of the technology; just how they are trained and deployed right now. They can train in realtime on all new data before almost any humans have seen it.

It is *convenient* for AI labs to do big batch scrapes, single big training runs, and have intense vetting and redteaming of a specific checkpoint which gets dropped in a big bang several months later, and it is not particularly valuable to them to know teen slang or memes in real time; but this approach is not intrinsic to LLMs. There is no reason that LLMs could not know kid slang even before 99% of kids know it, if the social media companies or AI labs wanted to.

Many ML technologies are deployed and trained in realtime (Chinese e-commerce and social media are especially good at this), or at tempos like hourly or daily. This is especially common in fast-moving or adversarial contexts like recommenders or spam filters, where even enormous models may be retrained or trained from scratch constantly, like Tiktok or Google Ads. And LLMs can be too, new text is jut more tokens to predict...

One way to think of it: an LLM *must* "train faster than realtime" because otherwise it could not catch up on centuries of written text in a mere few months of training. If an LLM read and train on everything written in the past, say, 182,000 days in just 182 days (500 years vs 6 months), then it must go through an average of 182,000/182 = 1000 text-days every training-day, or to put it another way, 1000 text-minutes every training-minute.

Clearly it would not be difficult to do just a little more training on what text happened to be written today, and so contains today's new slang. (In the past I've estimated that you could probably keep a frontier LLM up to date on all new high quality English text in realtime by running at most a few hundred GPUs 24/7 - hardly anything!) So set up appropriately, an LLM totally could know emerging slang within minutes, long before almost anyone knew it. Do a minibatch of a few million tokens every couple of seconds, replicate it across the inference datacenters worldwide over the private backbone links over the next minute, switch over the live traffic, and done. Now the LLM knows the "fnargle" slang that some Philadelphia kids invented in their Tiktok posted a minute ago and which a few million kids will see overnight - but it knew that slang hours or days before they all did.

(And given the low latency speed of hardware like Cerebras chips, ever-increasing power of small dense models, and within-datacenter/backbone networking speeds, it might be possible to do a full update loop in the time it takes to do a single keystroke! So since many apps log drafts or keystrokes, all the AIs worldwide could well know what that kid in Philadelphia has said before he has even said it. It is a distressing fact about our world that you can do a *lot* with AI in the time it takes Substack to render a 'reply' button in your browser. And censors are well-aware of the inhuman speed of AI knowledge dissemination and how it can outpace mere human communication, eg. https://ai.meta.com/blog/harmful-content-can-evolve-quickly-our-new-ai-system-adapts-to-tackle-it/ )

Such speed is just one of the many ways in which AI are superhuman. ("There's plenty of room at the bottom", down where the milliseconds slowly inch by like snails with quadrillions of aggregate operations happening in total in parallel... https://gwern.net/blog/2025/llms-can-be-faster https://gwern.net/doc/ai/scaling/hardware/2018-sandberg.pdf https://gwern.net/note/faster )

Rohan Jaiswal's avatar

The leading/lagging indicator framework is genuinely useful, though I'd push on one thing: continuous retraining on real-time data is already here at meaningful scale with Perplexity, Grok live search, and web-enabled Claude — which means the 'dead meme when parents know it' effect has a shorter half-life than two years ago. The harder version of your thesis is that even with real-time data, LLMs are trained to reproduce statistical consensus, so they'll find trending signals faster but still systematically underrepresent outlier and pre-consensus ideas. Is there a class of 'first mile' problems where the signal is structured enough that AI has the advantage over human intuition, or is the human edge specifically in pattern-breaking rather than pattern-finding?

22 more comments...

No posts

Ready for more?