BLEU

Bilingual Evaluation Understudy

Definition

BLEU is an automatic metric for evaluating machine-generated text quality by measuring n-gram overlap with human reference translations. It was originally designed for machine translation and ranges from 0 to 1, with higher scores indicating closer match to reference text.

BLEU is widely criticized for correlating poorly with human judgment on open-ended generation tasks.

Related Terms

ROUGE

Recall-Oriented Understudy for Gisting Evaluation

Perplexity

Language Model Perplexity

Evaluation

LLM Evaluation

← Back to Glossary

Ship secure code faster

Crash Override integrates security into the developer workflow. No context switching, no waiting on reviews.

Talk to a Human See the Product