Skip to content
ai

BPE

Byte Pair Encoding

Definition

BPE is a tokenization algorithm that iteratively merges the most frequent pair of bytes or characters in a training corpus, building a vocabulary of subword units. It balances vocabulary size against the ability to represent rare words as sequences of subword tokens.

BPE is used by GPT models; SentencePiece offers a similar approach with language-agnostic segmentation.


Ship secure code faster

Crash Override integrates security into the developer workflow. No context switching, no waiting on reviews.