The cost to have an AI take a given role will become just the amount of compute the AI consumes. This will change our understanding of which roles are scarce.
Future AI firms won’t be constrained by what's scarce or abundant in human skill distributions – they can optimize for whatever abilities are most valuable. Want Jeff Dean-level engineering talent? Cool: once you’ve got one, the marginal copy costs pennies. Need a thousand world-class researchers? Just spin them up. The limiting factor isn't finding or training rare talent – it's just compute.
So what becomes expensive in this world? Roles which justify massive amounts of test- time compute. The CEO function is perhaps the clearest example. Would it be worth it for Google to spend $100 billion annually on inference compute for mega-Sundar? Sure! Just consider what this buys you: millions of subjective hours of strategic planning, Monte Carlo simulations of different five-year trajectories, deep analysis of every line of code and technical system, and exhaustive scenario planning.
Imagine mega-Sundar contemplating: "How would the FTC respond if we acquired eBay to challenge Amazon? Let me simulate the next three years of market dynamics... Ah, I see the likely outcome. I have five minutes of datacenter time left – let me evaluate 1,000 alternative strategies."
The more valuable the decisions, the more compute you'll want to throw at them. A single strategic insight from mega-Sundar could be worth billions. An overlooked risk could cost tens of billions. However many billions Google should optimally spend on inference for mega-Sundar, it's certainly more than one.
Distillation
What might distilled copies of AI Sundar (or AI Jeff) be like? Obviously, it makes sense for them to be highly specialized, especially when you can amortize the cost of that domain specific knowledge across all copies. You can give each distilled data center operator a deep technical understanding of every component in the cluster, for example.
I suspect you’ll see a lot of specialization in function, tacit knowledge, and complex skills, because they seem expensive to sustain in terms of parameter count. But I think the different models might share a lot more factual knowledge than you might expect. It’s true that plumber-GPT doesn’t need to know much about the standard model in physics, nor does physicist-GPT need to know why the drain is leaking. But the cost of storing raw information is so unbelievably cheap (and it’s only decreasing) that Llama-7B already knows more about the standard model and leaky drains than any non-expert.