Part 4/12:
- Inner Alignment: This concerns the mathematical and technical foundations—are the models mathematically optimized to produce the desired outputs? Typically, as long as a model plausibly predicts and generates coherent language, it is considered "interaligned." Failures here happen when the loss functions are not properly tuned or the model gets stuck in suboptimal minima, causing it to produce gibberish or harmful outputs.