Glossary

AI terms defined clearly and concisely. The definitions that appear in article popovers come from here.

B D G H I J N O P U V W X Y Z

A

Angel's Share

Also: Pruning, Model Pruning

In distilling, the angel's share is the spirit lost to evaporation during aging. In machine learning, pruning is the intentional removal of redundant weights or neurons from a model — sacrificing a small amount of accuracy for significant gains in speed and size. Like the angel's share, what's lost is a worthwhile trade for what remains.

training

Attention

Also: Self-Attention, Multi-Head Attention

A mechanism that allows a model to weigh the importance of different parts of its input when producing each part of its output. Self-attention lets every token in a sequence attend to every other token, capturing long-range dependencies.

architecture

C

Context Window

Also: Context Length, Context Limit

The maximum amount of text (measured in tokens) a model can process in a single interaction — both the input prompt and the generated output combined. Larger context windows allow models to work with longer documents, but attention costs scale quadratically, making this a key trade-off in model design.

architecture

E

Embedding

Also: Vector Embedding

A numerical representation of text (or other data) as a dense vector in high-dimensional space. Similar concepts end up close together in this space, enabling semantic search, clustering, and similarity comparisons.

architecture

F

Fine-tuning

Also: Fine-tune, SFT

The process of further training a pre-trained model on a smaller, task-specific dataset. Fine-tuning adapts a general model to perform better on particular tasks or to follow specific instructions and formats.

training

K

Knowledge Distillation

Also: Distillation, Model Distillation

A training technique where a smaller "student" model learns to mimic the behavior of a larger "teacher" model. Instead of training on raw data alone, the student learns from the teacher's output probabilities, compressing knowledge into a more compact form — the same principle behind this site's name.

training

L

LLM

Also: Large Language Model

A neural network trained on massive text corpora that can generate, analyze, and transform text. Modern LLMs use transformer architectures and can contain billions of parameters, enabling sophisticated language understanding and generation.

models

M

Model Collapse

Also: Data Contamination Loop

A degenerative phenomenon where models trained on AI-generated data progressively lose quality and diversity over successive generations. Like a distillery reusing its own waste as input, each generation amplifies errors and narrows output distribution until the model produces only generic, repetitive text.

training

Q

Quantization

Also: GGUF, GPTQ

The process of reducing a model's numerical precision — for example, converting 16-bit floating-point weights to 4-bit integers. This dramatically shrinks model size and memory usage, making it possible to run large models on consumer hardware at the cost of a small accuracy trade-off.

deployment

R

RAG

Also: Retrieval-Augmented Generation

A pattern that combines information retrieval with language model generation. The model first retrieves relevant documents from an external knowledge base, then uses them as context to generate grounded, factual responses.

architecture

S

SLM

Also: Small Language Model

A language model deliberately designed to be compact — typically under 10 billion parameters — while maximizing capability through high-quality training data and distillation techniques. Models like Phi, Gemma, and Mistral 7B demonstrate that a well-distilled small model can outperform much larger ones on targeted tasks.

models

Slop

Also: AI Slop

A colloquial term for low-quality, generic AI-generated content — the kind that reads like it was produced without thought, review, or editorial standards. In the distillery metaphor, slop is what comes off the still when nobody is paying attention to the cut.

evaluation

T

Temperature

Also: Sampling Temperature

A parameter that controls randomness in a model's output. A temperature of 0 makes the model always pick the most likely next token (deterministic), while higher values introduce more variation and creativity. Named after the temperature variable in statistical mechanics that governs particle randomness.

architecture

Transformer

Also: Transformer Architecture

A neural network architecture introduced in 2017 that uses self-attention mechanisms to process sequential data in parallel. Transformers are the foundation of most modern language models, including GPT, Claude, and Gemini.

architecture