
LM·LAB
Learn how ChatGPT works — from the very beginning.
Not a course. Not a tutorial. A walk through the ideas.
In 1948, Claude Shannon asked a deceptively simple question: can we predict the next letter in a sentence, given only the ones that came before?
The answer took the next eighty years to unfold. It took counting, then learning, then attention — and finally, scale. Each era solved what the last one couldn't, and each one left a fingerprint you can still find inside the models you use every day.
This is a quiet walk through those four ideas. Not a tutorial, not a pitch. Just the notebook of someone who wanted to understand, written in case you do too.
Just count.
Counting letters sounds too simple to work. It isn't.
Bigrams and N-grams can predict text, generate language, and reveal the hidden structure of any corpus — all without a single neural weight. Shannon's idea, sharpened over decades of statistical NLP, is still the baseline every modern system is quietly measured against.
Then it learns.
Counting has a ceiling. What if, instead of memorising every pattern, the machine could figure them out on its own — from raw data?
Layers of simple operations, stacked on each other, begin to discover structure no human wrote down. It took thirty years and the patience of a few researchers for the idea to become practical. It changed what a model could be.
The model learns to look.
"Attention Is All You Need" changed everything.
Instead of reading word by word, the model learns which parts of the input matter for each prediction. Transformers were born — and with them, the GPT era. A single move that quietly replaced almost everything that came before.
Self-attention: every word looks at every other word — and decides how much each one matters.
06GPT GridSoonTake the Transformer. Make it enormous. Train it on everything. Coming soon.