← AI glossary

Definition

Embedding

An embedding is a numerical representation of text, images or other data that captures meaning for search, comparison and AI workflows.

Also known as: vector embedding, semantic vector

Short definition

An embedding is a list of numbers that represents the meaning or features of an item. Text passages, images, products, users and documents can all be converted into embeddings so software can compare them mathematically.

How it works

An embedding model maps input data into a vector space. Items with similar meaning are placed closer together. This makes it possible to search by meaning rather than exact keywords.

Example

In a support knowledge base, the questions password reset failed and cannot access my account may have similar embeddings even though they use different words. A retrieval system can use that similarity to find helpful documents.

Why it matters

Embeddings are core to semantic search, recommendation systems and RAG. Their usefulness depends on the embedding model, the data domain and how vectors are indexed and evaluated.

How embeddings are compared

Common similarity measures include cosine similarity, dot product and Euclidean distance. The metric should match the embedding model's intended use. Documents and queries must be encoded with the same model, and switching models normally requires rebuilding the index.

Common limitations

An embedding may capture general meaning but struggle with numbers, negation or specialized terminology. Semantic search is therefore often combined with filters and lexical search. Privacy also matters: a vector is not readable like a sentence, but it still represents source content and should be protected when that content is sensitive.