11 Comments
User's avatar
Cathie Campbell's avatar

Such an interesting invitation to see “tokens all the way down” and the definition of token - “the thing that stands for another thing”. To imagine AI numerizing (my word) its words and calculating. But does this tokenization philosophically calculate beyond numerical efficiency? Sensory information has tone, lilt, flatness, urgency, etc. As machines make minds, will the delivery allow words to offer “sense” in the “felt” hearing?

Hollis Robbins's avatar

This is excellent! "On its face...." the piece begins. And it's about "the hypothesis that explicit sensory prompting can surface this latent structure, bringing a text‑only LLM into closer representational alignment with specialist vision and audio encoders." Surfacing latent structures.

Sam Walker's avatar

I always appreciated the wall of bas reliefs in the student union at UIUC. But I think you have made a category error here.

Tokens are... unimportant. Ultimately. They are the substrate. The ink and paper of the of work. What _matters_ is _what is written with them_.

The tokens themselves have very clear distinct defined meanings: the specific textual sequence they represent. That is ALL they represent. 3321-84-7592 means - exclusively - "put characters together to spell ling-u-istics". The idea of "the study of language" is no where to be found in tokens, any more than you could sift the pigments of the Mona Lisa into color piles and never find a trace of "beauty". Grind a man into atoms and you will never find a speck of "life".

Tokens encode text (or whatever). Text encodes speech. Speech encodes an approximation of thought. Thought encodes an approximation of meaning.

Meaning, thought, speech, text, tokens-> meaning.

It's taking a comic strip in a foreign language, pressing it into a wad of silly putty, and pressing that into anotehr piece of paper. Maybe you could never read the original directly, but you're pretty sure Ziggy is still swearing in Urdu or whatever.

Hollis Robbins's avatar

I love this and tell the NVIDIA people! :)

Sam Walker's avatar

Eh, 90% of the work in the field is misguided right now. Computer scientists who have forgotten that LLMs aren't Turing machines. I spend all day every day teaching folks how to talk to AIs and vice versa, and coders are absolutely the worst! Inherent -20 to prompting and it takes considerable effort or rare insight for one to git gud. They spend all day hammering kleenex into a shiv, trying to move mountains with tweezers, then congratulating themselves on how well they get the nondeterministic thought processor to pretend it's a deterministic logic machine.

Sigh. They never understand polysemity - that EVERY encoded pattern gets refectlected and ramified, no matter how high-order abstract it is. Tone, format, vocabulary, intent, attitude - it's ALL included. You can never just wall off a little package of information in a stateless chunk called "DATA", stick it in an envelope covered in "INSTRUCTIONS". You have to worry about what the paper tastes like and the smell of the ink. Did you make the instructions pretty or are they boring? Have you tried perfuming the letter? It's all flat, and the picture of bigfoot is always just a bunch of little gray dots.

As to NVIDIA... heh. With what's been going on with photonic-quantum/classical hybrid chips, and, I suspect, the advent of workable thermodynamic and stochastic computing, not to mention the UVA stuff from Japan, combined with the fact that the market seems to have just caught on to the giant circular money pit of AI... well. Good luck to them.

Marcie Geffner | Mostly Books's avatar

The "Invention of Artificial Intelligence" image is incredibly, chillingly weird. A woman from antiquity holds up a human brain, as if AI was invented 3,000 years ago. The brain is bigger than her head. It's positioned as equal to her. And it's glowing! In the background, a giant crane hovers, Godzilla-like, over a group of buildings. There's a caduceus, a car, a Christian cross, a Red Cross, and a bunch of things that IDK what they are. It looks like a bas-relief, but it doesn't actually exist or represent anything in the real world.

Rob Nelson's avatar

Love this as it inspires me to insert Charle Peirce in your story. He was the first to use the term “token” in the sense it is now used with AI models and spent time at Hopkins (when Dewey was there). No memorials of Peirce at Hopkins though… too shady for Gilman to give him a permanent job.

Joshua Corey's avatar

He has an angry wrenlike vigilance,

a greyhound's gentle tautness;

he seems to wince at pleasure,

and suffocate for privacy.

Robert Lowell on Gaudens’s Shaw, and much else, from “For the Union Dead”

https://www.poetryfoundation.org/poems/57035/for-the-union-dead

Sydney Van Morgan's avatar

Gilman! Great piece.