Interactive Lab

Attention in LLMs

Large Language Models are essentially massive neural networks supercharged by an Attention Mechanism.

In an effort to better understand what's going on under the hood, I built a couple of visualizers. I hope it's somewhat illustrative for you!

Self Attention

The foundation of modern LLMs. Understand the quadratic memory wall and why context length is limited.

The distributed solution for infinite context. See how KV blocks rotate through a cluster without information loss.

Switch between 'Story Mode' for intuition and 'Tech Mode' for precision.

Watch Online Softmax update as context rotates through the cluster.

Learn why Ring Attention is bit-perfect, unlike RAG or sliding windows.