Skip to content
ai

AWQ

Activation-aware Weight Quantization

Definition

AWQ is a post-training quantization method that identifies and protects the small fraction of model weights most important for accuracy, quantizing the rest to lower precision. It achieves near-lossless 4-bit quantization by scaling weights based on activation magnitudes observed during calibration.

AWQ enables fast, memory-efficient deployment of large models on consumer GPUs.


Ship secure code faster

Crash Override integrates security into the developer workflow. No context switching, no waiting on reviews.