I know they are different (hence why I said, "version"), but its a signal for microprocessing specialization which is what I am trying to frame on this conversation, and how the generalized compute market usually evolves.
This only signals that GPUs are not solving some parts of inference as efficient as desired, and then new specialized microprocessors and eventually ASICs, are produced instead of keeping adapting the main core produced equipment to all problems (in this case the GPU).
Anyhow, eager to see how this evolves... even at CPU levels for small inference workloads. Everything is going mega fast...
You are extrapolating this too much. LPUs work because they are not as specialized as Taalas. You can switch the model that is running on them.
Taalas is too much specialized to only run the model it is designed to run. Sure this is very efficient at running that model, but it isn't efficient in a general sense. When you are getting new models every 3 months, not to even mention finetuning models.
An ASIC like Taalas is not very adaptable.
Maybe. Let's see in a few good months.
History told us that if it is too complex, it eventually dies or splits into more pieces. And that's how I feel this might happen.