Skip to content
ai

Self-attention

Self-attention Mechanism

Definition

Self-attention computes attention scores between all pairs of tokens in the same sequence, allowing each token's representation to be enriched with context from every other token. Query, key, and value projections of the input are used to compute scaled dot-product attention.

Self-attention is the mechanism that gives transformers their ability to capture long-range dependencies regardless of sequence distance.


Ship secure code faster

Crash Override integrates security into the developer workflow. No context switching, no waiting on reviews.