HumanEval

HumanEval Benchmark

Definition

HumanEval is a benchmark introduced by OpenAI to evaluate the code generation capabilities of language models. It consists of 164 Python programming problems with unit tests, and models are scored by the fraction of problems solved (pass@k).

HumanEval is a standard benchmark for comparing coding LLMs, though its small size and Python focus limit its coverage.

Related Terms

MMLU

Massive Multitask Language Understanding

Evaluation

LLM Evaluation

← Back to Glossary

Ship secure code faster

Crash Override integrates security into the developer workflow. No context switching, no waiting on reviews.

Talk to a Human See the Product