ai
Vocabulary
Model Vocabulary
Definition
A language model's vocabulary is the fixed set of tokens it can represent, determined by the tokenizer used during training. Vocabularies for modern LLMs typically range from 32,000 (LLaMA) to 200,000+ (GPT-4) tokens.
A larger vocabulary reduces the number of tokens needed to represent text (lower sequence lengths) but increases the size of the embedding and output projection layers.
Ship secure code faster
Crash Override integrates security into the developer workflow. No context switching, no waiting on reviews.