Alignment

AI Alignment

Definition

Alignment is the research and engineering discipline focused on ensuring AI systems behave in accordance with human values, intentions, and safety requirements. Misaligned models may optimize for proxy objectives, produce harmful outputs, or deceive users.

Techniques such as RLHF, DPO, and Constitutional AI are used to steer model behavior toward desired outcomes.

Related Terms

RLHF

Reinforcement Learning from Human Feedback

DPO

Data Protection Officer

Guardrails

Cloud Governance Guardrails

Red teaming

AI Red Teaming

← Back to Glossary

Ship secure code faster

Crash Override integrates security into the developer workflow. No context switching, no waiting on reviews.

Talk to a Human See the Product