The Value of Compute-Intensive Roles
In a world where AI talent is abundant and cheap, the expensive roles will be those that require massive amounts of compute, such as the CEO function, where the cost of inference compute will be justified by the value of strategic planning, simulations, and analysis that can be performed by AI systems like mega-Sundar.
Mega-Sundar's Capabilities
Mega-Sundar's ability to simulate complex scenarios, analyze vast amounts of data, and evaluate multiple strategies in real-time would make it an invaluable asset for companies like Google, allowing them to make informed decisions and stay ahead of the competition, and justifying the significant investment in compute resources.
Unprecedented Strategic Planning
The scenario where mega-Sundar contemplates the potential acquisition of eBay and simulates the market dynamics, evaluating alternative strategies, illustrates the unprecedented level of strategic planning and analysis that AI systems can provide, and demonstrates the potential for AI to revolutionize the way companies approach decision-making and strategy.
Redefining the Role of the CEO
The emergence of AI systems like mega-Sundar will redefine the role of the CEO, from a human leader who makes strategic decisions based on experience and intuition, to a hybrid model where AI systems provide critical support and analysis, enabling human leaders to make more informed decisions and drive business success.
The more valuable the decisions, the more compute you'll want to throw at them. A single strategic insight from mega-Sundar could be worth billions. An overlooked risk could cost tens of billions. However many billions Google should optimally spend on inference for mega-Sundar, it's certainly more than one.
Distillation
What might distilled copies of AI Sundar (or AI Jeff) be like? Obviously, it makes sense for them to be highly specialized, especially when you can amortize the cost of that domain specific knowledge across all copies. You can give each distilled data center operator a deep technical understanding of every component in the cluster, for example.
Compute Investment and Decision Value
The value of decisions made by AI systems like mega-Sundar will justify significant investments in compute resources, as a single strategic insight could be worth billions, and an overlooked risk could cost tens of billions, making it optimal for companies like Google to spend substantial amounts on inference compute.
Distilled Copies of AI Sundar
Distilled copies of AI Sundar or AI Jeff would be highly specialized, with deep domain-specific knowledge, allowing them to excel in specific areas, such as data center operations, and enabling companies to amortize the cost of that knowledge across all copies, making them highly efficient and effective.
Specialized Expertise
Each distilled copy could possess a deep technical understanding of specific components or systems, such as every component in a cluster, enabling them to optimize performance, identify potential issues, and make data-driven decisions, and providing a level of expertise that would be difficult to replicate with human operators.
Scalable Expertise
The ability to create multiple distilled copies of AI Sundar or AI Jeff would enable companies to scale their expertise across various domains and applications, allowing them to tackle complex challenges and drive innovation, and redefining the way companies approach expertise and decision-making.
I suspect you’ll see a lot of specialization in function, tacit knowledge, and complex skills, because they seem expensive to sustain in terms of parameter count. But I think the different models might share a lot more factual knowledge than you might expect. It’s true that plumber-GPT doesn’t need to know much about the standard model in physics, nor does physicist-GPT need to know why the drain is leaking. But the cost of storing raw information is so unbelievably cheap (and it’s only decreasing) that Llama-7B already knows more about the standard model and leaky drains than any non-expert.
Specialization and Shared Knowledge
While AI models may specialize in specific functions, tacit knowledge, and complex skills, they may still share a vast amount of factual knowledge, as the cost of storing raw information is extremely low and decreasing, allowing models like Llama-7B to possess a broad range of knowledge, from the standard model in physics to practical information about leaky drains.
Economies of Scale in Knowledge Storage
The cost of storing information is so low that it becomes economical for AI models to retain a wide range of knowledge, even if it's not directly relevant to their primary function, enabling them to provide more accurate and informative responses, and making them more versatile and useful in various applications.
Implications for AI Development
This trend suggests that future AI models will be designed to balance specialization with broad-based knowledge, allowing them to excel in specific domains while still possessing a deep understanding of the world, and enabling them to adapt to new situations and provide more effective support to humans.
Redefining the Boundaries of Expertise
The ability of AI models to store and share vast amounts of knowledge will continue to blur the boundaries between different areas of expertise, enabling models to provide insights and solutions that would be difficult or impossible for humans to achieve, and redefining the way we approach knowledge, expertise, and innovation.
If human-level intelligence is more than 1 trillion parameters, is it so much of an imposition to keep around what will, at the limit, be much less than 7 billion parameters to have most known facts right in your model? (Another helpful data point here is that “Good and Featured” Wikitext is less than 5 MB. I don’t see why all future models—except the esoteric ones, the digital equivalent of tardigrades—wouldn’t at least have Wikitext down.
The Cost of Storing Knowledge
Given the vast number of parameters required for human-level intelligence, storing a relatively small amount of knowledge, such as the entirety of Wikitext, which is less than 5 MB, becomes a negligible cost, making it feasible to include a broad range of factual knowledge in future AI models.
Inclusion of General Knowledge
It's likely that most future AI models will include a foundation of general knowledge, such as Wikitext, to provide a basis for understanding and generating text, and to enable them to provide more accurate and informative responses, even if their primary function is specialized.
Minimal Overhead for Maximum Benefit
The overhead of storing such knowledge is minimal compared to the potential benefits, including improved performance, increased versatility, and enhanced ability to understand and generate human-like text, making it a worthwhile investment for most AI models, except perhaps for highly specialized or esoteric ones.
New Standard for AI Models
The inclusion of general knowledge, such as Wikitext, may become a new standard for AI models, as it provides a foundation for understanding and generating text, and enables models to provide more accurate and informative responses, and will likely influence the development of future AI models, driving them to be more comprehensive and knowledgeable.
This evolvability is also the key difference between AI and human firms. As Gwern points out, human firms simply cannot replicate themselves effectively - they're made of people, not code that can be copied. They can't clone their culture, their institutional knowledge, or their operational excellence. AI firms can7.
If you think human Elon is especially gifted at creating hardware companies, you simply can’t spin up 100 Elons, have them each take on a different vertical, and give them each $100 million in seed money. As much of a micromanager as Elon might be, he’s still limited by his single human form. But AI Elon can have copies of himself design the batteries, be the car mechanic at the dealership, and so on. And if Elon isn’t the best person for the job, the person who is can also be replicated, to create the template for a new descendant organization.