Skip to content
ai

HumanEval

HumanEval Benchmark

Definition

HumanEval is a benchmark introduced by OpenAI to evaluate the code generation capabilities of language models. It consists of 164 Python programming problems with unit tests, and models are scored by the fraction of problems solved (pass@k).

HumanEval is a standard benchmark for comparing coding LLMs, though its small size and Python focus limit its coverage.


Ship secure code faster

Crash Override integrates security into the developer workflow. No context switching, no waiting on reviews.