AWQ

Activation-aware Weight Quantization

Definition

AWQ is a post-training quantization method that identifies and protects the small fraction of model weights most important for accuracy, quantizing the rest to lower precision. It achieves near-lossless 4-bit quantization by scaling weights based on activation magnitudes observed during calibration.

AWQ enables fast, memory-efficient deployment of large models on consumer GPUs.

Related Terms

Quantization

Model Quantization

GPTQ

Generative Pre-trained Transformer Quantization

INT4

4-bit Integer Quantization

Model serving

LLM Model Serving

← Back to Glossary

Ship secure code faster

Crash Override integrates security into the developer workflow. No context switching, no waiting on reviews.

Talk to a Human See the Product