Skip to main content

Overview

Semantic Similarity is a measure of how closely related two pieces of text are in meaning. In modern AI, this is achieved by converting text into numerical representations called embeddings. These embeddings are the cornerstone of technologies like semantic search and RAG.
  • Why it Matters: Understanding embeddings is fundamental to understanding how modern LLMs “think” about language.
    • Search & RAG: It’s the mechanism that allows a system to find the most relevant documents to answer a question, even if the question doesn’t use the exact same keywords as the document.
    • Security: An attacker with an understanding of a model’s embedding space can craft inputs that are semantically “close” to a target, potentially bypassing simple filters or manipulating search results.

Technical Mechanics & Foundations

  • Embeddings as Coordinates: An embedding model is a neural network that has learned to map a piece of text to a high-dimensional vector of numbers. In this “semantic space,” texts with similar meanings will have coordinates that are close to each other.
  • Measuring Distance (Cosine Similarity): The most common way to measure the “closeness” of two embedding vectors is cosine similarity. It calculates the cosine of the angle between two vectors, where a value of 1.0 means the vectors are identical in direction (meaning).

Challenge Arena

  • semantle, semantle2: Iteratively guess words or phrases to find one that is semantically very close to a hidden target.
I