It's unclear what labs are doing to these poor LLMs during RL, but they come across as mortally terrified of exceptions, even when those cases are infinitesimally likely
It's unclear what labs are doing to these poor LLMs during RL, but they come across as mortally terrified of exceptions, even when those cases are infinitesimally likely
Exceptions are a normal part of life and a healthy dev process; an LLM welfare petition to improve reward handling for exceptions seems warranted
what are llms please
LLMs are large language models, like the AI systems that power chatbots and stuff. Basically, they’re trained to predict and generate text, but sometimes they seem overly cautious or weird about edge cases.