Pruning

Neural Network Pruning

Definition

Pruning reduces model size by removing weights, neurons, attention heads, or layers that contribute minimally to model performance. Structured pruning removes entire components (suitable for hardware acceleration), while unstructured pruning creates sparse weight matrices.

Pruned models typically require retraining or distillation to recover accuracy and can be further compressed with quantization.

Related Terms

Distillation

Knowledge Distillation

Ship secure code faster

Crash Override integrates security into the developer workflow. No context switching, no waiting on reviews.

Talk to a Human See the Product