Glossary
AI terms defined clearly and concisely. The definitions that appear in article popovers come from here.
A
Angel's Share
Also: Pruning, Model Pruning
In distilling, the angel's share is the spirit lost to evaporation during aging. In machine learning, pruning is the intentional removal of redundant weights or neurons from a model — sacrificing a small amount of accuracy for significant gains in speed and size. Like the angel's share, what's lost is a worthwhile trade for what remains.
trainingAttention
Also: Self-Attention, Multi-Head Attention
A mechanism that allows a model to weigh the importance of different parts of its input when producing each part of its output. Self-attention lets every token in a sequence attend to every other token, capturing long-range dependencies.
architectureC
Context Window
Also: Context Length, Context Limit
The maximum amount of text (measured in tokens) a model can process in a single interaction — both the input prompt and the generated output combined. Larger context windows allow models to work with longer documents, but attention costs scale quadratically, making this a key trade-off in model design.
architectureE
Embedding
Also: Vector Embedding
A numerical representation of text (or other data) as a dense vector in high-dimensional space. Similar concepts end up close together in this space, enabling semantic search, clustering, and similarity comparisons.
architectureF
Fine-tuning
Also: Fine-tune, SFT
The process of further training a pre-trained model on a smaller, task-specific dataset. Fine-tuning adapts a general model to perform better on particular tasks or to follow specific instructions and formats.
trainingK
Knowledge Distillation
Also: Distillation, Model Distillation
A training technique where a smaller "student" model learns to mimic the behavior of a larger "teacher" model. Instead of training on raw data alone, the student learns from the teacher's output probabilities, compressing knowledge into a more compact form — the same principle behind this site's name.
trainingL
LLM
Also: Large Language Model
A neural network trained on massive text corpora that can generate, analyze, and transform text. Modern LLMs use transformer architectures and can contain billions of parameters, enabling sophisticated language understanding and generation.
modelsM
Model Collapse
Also: Data Contamination Loop
A degenerative phenomenon where models trained on AI-generated data progressively lose quality and diversity over successive generations. Like a distillery reusing its own waste as input, each generation amplifies errors and narrows output distribution until the model produces only generic, repetitive text.
trainingQ
Quantization
Also: GGUF, GPTQ
The process of reducing a model's numerical precision — for example, converting 16-bit floating-point weights to 4-bit integers. This dramatically shrinks model size and memory usage, making it possible to run large models on consumer hardware at the cost of a small accuracy trade-off.
deploymentR
RAG
Also: Retrieval-Augmented Generation
A pattern that combines information retrieval with language model generation. The model first retrieves relevant documents from an external knowledge base, then uses them as context to generate grounded, factual responses.
architectureS
SLM
Also: Small Language Model
A language model deliberately designed to be compact — typically under 10 billion parameters — while maximizing capability through high-quality training data and distillation techniques. Models like Phi, Gemma, and Mistral 7B demonstrate that a well-distilled small model can outperform much larger ones on targeted tasks.
modelsSlop
Also: AI Slop
A colloquial term for low-quality, generic AI-generated content — the kind that reads like it was produced without thought, review, or editorial standards. In the distillery metaphor, slop is what comes off the still when nobody is paying attention to the cut.
evaluationT
Temperature
Also: Sampling Temperature
A parameter that controls randomness in a model's output. A temperature of 0 makes the model always pick the most likely next token (deterministic), while higher values introduce more variation and creativity. Named after the temperature variable in statistical mechanics that governs particle randomness.
architectureTransformer
Also: Transformer Architecture
A neural network architecture introduced in 2017 that uses self-attention mechanisms to process sequential data in parallel. Transformers are the foundation of most modern language models, including GPT, Claude, and Gemini.
architecture