A lot is discussed about compute.
In an age where AI is taking over, it is easy to see why the likes of Nvidia are doing so well. As the leading provider of GPUs, the company is poised to benefit from trillions in infrastructure spending that will occur over the next decade.
Until recently, most of the focus was on training. AI models utilize large compute to process growing amounts of data needed to bring out the latest versions. Over time, we saw the token size increase, to the point where the major models are scoffing up most of the Internet.
While the result is massive super clusters of compute, the majority are not going to training. In fact, over the next decade, training compute will end up as an insignificant percentage.
The market dominator will be inference. Here is where companies are scrambling to ensure they can meet the needs that are certain to arise.

Inference Compute Is The New Gold
If you do not have inference, there is no AI.
Before getting into that, what is inference? Here is what Rafiki provided:
Inference compute refers to the computational resources used to run a trained AI or machine learning model to generate predictions or outputs from new data. It's the "runtime" phase after training, where the model applies its learned patterns—e.g., classifying images or generating text.
In other words, it is the application of the model. Each time someone (or a system) prompts the model, inference is used.
If we look at ChatGPT, we can see how this tilts away from training.
Whatever is used to train each version of ChatGPT, that has to pale in comparison to the prompts done by an estimate 700 million users. The compute required to process all those queries is astronomical.
This is, of course, something that is only going to expand as time passes.
The amount of AI used will never decline. We are seeing it embedded into search engines, email applications and messenger applications. Then we have the generative AI software that provides text, images, and video.
We would also be remiss if we did not mention the future being agentic. AI agents are going to require massive inference compute. Everything they do will utilize it. Consider life with billions of transactions occurring each day via agents.
It is why companies such as xAI, Tesla, and Google are building such massive data centers. They are powered by GPUs (TPUs) but not all of it is training. There is likely a point where the iterative progress slows between releases.
That said, inference is only going to accelerate.
Tesla Gearing Up For Real World AI
Tesla is a company that is ahead of the curve on this one. How well others fare is certainly a point of consideration.
We have to keep in mind that Tesla is not building a chatbot or general AI. Instead, we are dealing with autonomous driving.
Obviously, for a vehicle to drive itself, inference compute is necessary. Over the past decade, Tesla has including a high end computer in each vehicle sold. The inference is being purchased by the customer, eliminating that from the company expenses.
Compare this to an OpenAI or xAI where the company is paying for the GPUs and hoping to recoup the cost. Tesla is pushing that to the buyers of the vehicles as part of the acquisition cost. As the software develops, we are seeing an increase in the ability of the compute to be utilized.
Jensen Huang predicted that inference would grow 1 billion-fold over the next decade. That should give everyone pause as to what will take place. This is the new gold.
Models are going to be a dime a dozen. Who is going to have the ability to run what is necessary? Fortunately, the cost of compute (tokens) keeps dropping. This is key.
By the end of the decade, inference compute, on a per prompt basis, will be a fraction of what it is today. That means AI services will get less expensive to operate. This is why it is such a factor when looking at jobs. If companies can replace someone for $100 per year, that is the direction they will go.
Inference is the new gold.
Posted Using INLEO