GPTQ

Generative Pre-trained Transformer Quantization

Definition

GPTQ is a one-shot post-training quantization method for LLMs that uses second-order information (Hessian approximation) to minimize quantization error layer by layer. It achieves high-quality 4-bit and 3-bit quantized models with minimal accuracy loss, enabling large models to run on single consumer GPUs.

GPTQ is widely used with the AutoGPTQ library for serving quantized open-source models.

Related Terms

Quantization

Model Quantization

AWQ

Activation-aware Weight Quantization

INT4

4-bit Integer Quantization

Model serving

LLM Model Serving

← Back to Glossary

Ship secure code faster

Crash Override integrates security into the developer workflow. No context switching, no waiting on reviews.

Talk to a Human See the Product