Part 13/19:
A crucial debate is whether diminishing returns will set in as models grow larger and more complex. Currently, scaling up models seems effective: larger models like GPT-3 and GPT-4 perform better with more data and parameters. But there's evidence that beyond a certain point, progress may plateau because models might need to become sparse (activating only parts of their total parameters at once) to continue scaling efficiently.
The speaker hypothesizes that GPT-4 might be a sparse model—huge but efficient—allowing continued growth without proportional increase in computational costs. The trend so far suggests logarithmic scale growth, which could either slow down or continue accelerating.