Skip to content
ai

GPTQ

Generative Pre-trained Transformer Quantization

Definition

GPTQ is a one-shot post-training quantization method for LLMs that uses second-order information (Hessian approximation) to minimize quantization error layer by layer. It achieves high-quality 4-bit and 3-bit quantized models with minimal accuracy loss, enabling large models to run on single consumer GPUs.

GPTQ is widely used with the AutoGPTQ library for serving quantized open-source models.


Ship secure code faster

Crash Override integrates security into the developer workflow. No context switching, no waiting on reviews.