Definition

Transformer

A transformer is a neural network architecture that uses attention to process relationships between tokens in text, code, images or other data.

Updated May 3, 2026Also known as: transformer architecture, attention model

Short definition

A transformer is a model architecture that became central to modern AI. It uses attention mechanisms to decide which parts of an input are most relevant to each other, making it especially effective for language and sequence tasks.

How it works

Text is split into tokens, converted into numerical representations and processed through attention layers. These layers let the model weigh relationships across the input, such as which word a pronoun refers to or which code block a function call belongs to.

Example

When a model summarizes a long paragraph, transformer attention helps it connect distant pieces of information instead of only reading words one by one.

Why it matters

Transformers power many large language models and multimodal systems. They scale well, handle context effectively and can be adapted to many data types. Their size and compute requirements, however, make cost and efficiency important design concerns.