Also known as Einstein’s puzzle or riddle (likely an apocryphal attribution), the problem tests a certain kind of multistep reasoning. Nouha Dziri(opens a new tab), a research scientist at the Allen Institute for AI, and her colleagues recently set transformer-based large language models (LLMs), such as ChatGPT, to work on such tasks — and largely found them wanting. “They might not be able to reason beyond what they have seen during the training data for hard tasks,” Dziri said. “Or at least they do an approximation, and that approximation can be wrong.”
You are viewing a single comment's thread from: