It's also the point (starting to emerge), where you need 1 billion dollars to go from grok4 to 5, vs you will stay with grok4 for X time for 1 million and sell another product for X millions.
This is not true.
Same already happens with GPUs... 2 years ago (even last year ones if you are a large buyer), GPUs are already operationally more expensive because the power they consume vs performance is already not there, plus datacenter costs add up to long-term solutions like those.
You can use the same GPU for multiple model generations. GPT5 and GPT5.1 are not the same model, not GPT 5.2 or codex models.
You can't fine-tune these cards.
To produce a codex model you would have to produce these cards again, to produce a new iterative fine-tune you need to produce these cards again.
And then you have already alternatives like Groq and Cerebras for fast inference without the drawback of these cards.
They are Dead on Arrival.
Models are not like Bitcoin ASIC where you optimize the process for one single thing. Each model requires a different process due to changes in parameter counts to changes in architecture.
This is like comparing apples to oranges.
Something being able to be done does not mean it is a viable process.
Man... you are "almost" speaking like there are no costs involved in anything. My perspective has nothing to do with technology; it's just business!
Anyhow, not with brains for much discussion today. Catch you next time when the brains cool down.
Right, now that is announced, coming to my point of why I shared this...
Check the GTC talk by NVIDIA, and check another "version" of what these are called (LPU). And this was sort of where I was trying to have my conversation here. ASIC specialization!
No LPUs are not another version of this. They are vastly different in their architecture than this. Which I mention in the above comment.
I know they are different (hence why I said, "version"), but its a signal for microprocessing specialization which is what I am trying to frame on this conversation, and how the generalized compute market usually evolves.
This only signals that GPUs are not solving some parts of inference as efficient as desired, and then new specialized microprocessors and eventually ASICs, are produced instead of keeping adapting the main core produced equipment to all problems (in this case the GPU).
Anyhow, eager to see how this evolves... even at CPU levels for small inference workloads. Everything is going mega fast...
You are extrapolating this too much. LPUs work because they are not as specialized as Taalas. You can switch the model that is running on them.
Taalas is too much specialized to only run the model it is designed to run. Sure this is very efficient at running that model, but it isn't efficient in a general sense. When you are getting new models every 3 months, not to even mention finetuning models.
An ASIC like Taalas is not very adaptable.
Maybe. Let's see in a few good months.
History told us that if it is too complex, it eventually dies or splits into more pieces. And that's how I feel this might happen.