Batch inference

Batch Inference

Definition

Batch inference processes multiple inputs simultaneously through a model rather than handling each request independently. Grouping requests into batches improves GPU utilization and throughput at the cost of increased latency for individual items.

Continuous batching, used by systems like vLLM, dynamically fills batches to maximize hardware efficiency for LLM serving.

Related Terms

Inference

Model Inference

vLLM

vLLM Inference Engine

GPU

Graphics Processing Unit

Streaming

LLM Streaming Output

← Back to Glossary

Ship secure code faster

Crash Override integrates security into the developer workflow. No context switching, no waiting on reviews.

Talk to a Human See the Product