AI and LLMs

From neurons to transformers. Everything you need to understand how large language models are built, trained, and deployed.

25 conceptsInteractive diagrams

The Neuron

The basic computational unit of a neural network: a weighted sum followed by a non-linearity.

Layers & Deep Networks

How stacking layers lets a network learn increasingly abstract representations.

Activation Functions

ReLU, GELU, Sigmoid and the non-linearities that give neural networks expressive power.

Training & Backpropagation

How gradients flow backwards through the network to update weights via gradient descent.

Overfitting & Generalization

Why a model that memorises training data fails on new examples, and how to prevent it.

Tokenization

How raw text is split into tokens (sub-words) that a model can process as integers.

Embeddings

Dense vector representations that encode semantic meaning in a continuous space.

The Attention Mechanism

How a model learns which parts of the input to focus on when producing each output.

Multi-Head Attention

Running attention in parallel across multiple representation subspaces.

The Transformer Block

Multi-head attention plus feed-forward layers plus residual connections plus layer norm.

Stacking Layers (The Full Model)

How dozens of transformer blocks are composed to build a full LLM.

The Language Modeling Objective

Predicting the next token, the self-supervised task that drives pre-training.

Pre-training Data

The web-scale corpora, filtering pipelines, and data mixtures used to train LLMs.

The Pre-training Loop

Batching, forward passes, loss computation, and distributed training at scale.

Compute & Scale

Scaling laws, GPU clusters, and the relationship between compute, data, and model size.

Fine-tuning & Instruction Tuning

Adapting a pre-trained model to follow instructions using supervised fine-tuning.

RLHF — Reinforcement Learning from Human Feedback

Using human preference data to align model outputs with desired behaviour.

The Inference Loop

How a trained model generates text one token at a time using the KV cache.

Decoding Strategies

Greedy, beam search, top-k, and nucleus sampling for controlling output diversity.

Context Window

The maximum sequence length a model can attend to and how it shapes capabilities.

RAG — Retrieval-Augmented Generation

Grounding LLM outputs by retrieving relevant documents at inference time.

AI Agents & Tool Use

Giving LLMs the ability to call tools, browse the web, and take multi-step actions.

Prompt Engineering

Techniques for eliciting better outputs through careful input design.

Multimodal Models

Extending transformers to handle images, audio, and other modalities alongside text.

What LLMs Can't Do (Yet)

Persistent limitations around reasoning, grounding, and reliability in current models.