Skip to content
ai

Multi-head attention

Multi-head Attention

Definition

Multi-head attention runs the self-attention mechanism in parallel across multiple learned projection subspaces (heads), allowing the model to simultaneously attend to different aspects of the input sequence. Each head computes its own query, key, and value projections; outputs are concatenated and projected back to the model dimension.

Multiple heads enable richer representational capacity than a single attention computation.


Ship secure code faster

Crash Override integrates security into the developer workflow. No context switching, no waiting on reviews.