A personal notebook

LM·LAB

Learn how ChatGPT works — from the very beginning.

Not a course. Not a tutorial. A walk through the ideas.

Enter the lab
05 chapters80 years~45 min
PROLOGUE

In 1948, Claude Shannon asked a deceptively simple question: can we predict the next letter in a sentence, given only the ones that came before?

The answer took the next eighty years to unfold. It took counting, then learning, then attention — and finally, scale. Each era solved what the last one couldn't, and each one left a fingerprint you can still find inside the models you use every day.

This is a quiet walk through those four ideas. Not a tutorial, not a pitch. Just the notebook of someone who wanted to understand, written in case you do too.

THE JOURNEY
ERA I1948 — 1990s

Just count.

Counting letters sounds too simple to work. It isn't.

Bigrams and N-grams can predict text, generate language, and reveal the hidden structure of any corpus — all without a single neural weight. Shannon's idea, sharpened over decades of statistical NLP, is still the baseline every modern system is quietly measured against.

BIGRAM · FREQUENCY
hover any cell — which letter follows?
e
t
a
o
i
n
s
r
h
l
t
h
a
s
i
n
o
r
e
c
hover any cell— which letter follows?
ERA II1986 — 2017

Then it learns.

Counting has a ceiling. What if, instead of memorising every pattern, the machine could figure them out on its own — from raw data?

Layers of simple operations, stacked on each other, begin to discover structure no human wrote down. It took thirty years and the patience of a few researchers for the idea to become practical. It changed what a model could be.

MLP · FORWARD PASS
input→ prediction
ERA III2017

The model learns to look.

"Attention Is All You Need" changed everything.

Instead of reading word by word, the model learns which parts of the input matter for each prediction. Transformers were born — and with them, the GPT era. A single move that quietly replaced almost everything that came before.

05TransformerRead →

Self-attention: every word looks at every other word — and decides how much each one matters.

06GPT GridSoon

Take the Transformer. Make it enormous. Train it on everything. Coming soon.

ATTENTION · FLOW
hover any token — see where it looks
the
cat
sat
on
the
mat
the
cat
sat
on
the
mat
sourceweighted targets →