Skip to content
ai

Vocabulary

Model Vocabulary

Definition

A language model's vocabulary is the fixed set of tokens it can represent, determined by the tokenizer used during training. Vocabularies for modern LLMs typically range from 32,000 (LLaMA) to 200,000+ (GPT-4) tokens.

A larger vocabulary reduces the number of tokens needed to represent text (lower sequence lengths) but increases the size of the embedding and output projection layers.


Ship secure code faster

Crash Override integrates security into the developer workflow. No context switching, no waiting on reviews.