AI and LLMs
From neurons to transformers. Everything you need to understand how large language models are built, trained, and deployed.
25 conceptsInteractive diagrams
01
The Neuron
The basic computational unit of a neural network: a weighted sum followed by a non-linearity.
02
Layers & Deep Networks
How stacking layers lets a network learn increasingly abstract representations.
03
Activation Functions
ReLU, GELU, Sigmoid and the non-linearities that give neural networks expressive power.
04
Training & Backpropagation
How gradients flow backwards through the network to update weights via gradient descent.
05
Overfitting & Generalization
Why a model that memorises training data fails on new examples, and how to prevent it.
06
Tokenization
How raw text is split into tokens (sub-words) that a model can process as integers.
07
Embeddings
Dense vector representations that encode semantic meaning in a continuous space.
08
The Attention Mechanism
How a model learns which parts of the input to focus on when producing each output.
09
Multi-Head Attention
Running attention in parallel across multiple representation subspaces.
10
The Transformer Block
Multi-head attention plus feed-forward layers plus residual connections plus layer norm.
11
Stacking Layers (The Full Model)
How dozens of transformer blocks are composed to build a full LLM.
12
The Language Modeling Objective
Predicting the next token, the self-supervised task that drives pre-training.
13
Pre-training Data
The web-scale corpora, filtering pipelines, and data mixtures used to train LLMs.
14
The Pre-training Loop
Batching, forward passes, loss computation, and distributed training at scale.
15
Compute & Scale
Scaling laws, GPU clusters, and the relationship between compute, data, and model size.
16
Fine-tuning & Instruction Tuning
Adapting a pre-trained model to follow instructions using supervised fine-tuning.
17
RLHF — Reinforcement Learning from Human Feedback
Using human preference data to align model outputs with desired behaviour.
18
The Inference Loop
How a trained model generates text one token at a time using the KV cache.
19
Decoding Strategies
Greedy, beam search, top-k, and nucleus sampling for controlling output diversity.
20
Context Window
The maximum sequence length a model can attend to and how it shapes capabilities.
21
RAG — Retrieval-Augmented Generation
Grounding LLM outputs by retrieving relevant documents at inference time.
22
AI Agents & Tool Use
Giving LLMs the ability to call tools, browse the web, and take multi-step actions.
23
Prompt Engineering
Techniques for eliciting better outputs through careful input design.
24
Multimodal Models
Extending transformers to handle images, audio, and other modalities alongside text.
25
What LLMs Can't Do (Yet)
Persistent limitations around reasoning, grounding, and reliability in current models.