I use MiniMax locally, their cloud sucks. They charge 2x for Highspeed, but it doesn't even hit the speeds advertised for their standard. I am using their cloud right now until the m2.7 weights drop and it is very disappointing. 34t/sec for standard and 44t/sec for high speed, yet it should be 50/100.
You are viewing a single comment's thread from:
interesting. you use it over minimax io or alibaba? They changed the 500 thing too ( it was 3 weeks ago a different usage maximum ( more, but cant remember details again).
I really wonder why you token speed is low, i never experienced it lmao.
I like to run local qwen quant versions for different tasks. But the Big ones ofc.
Qwen 3.5 27B is probably the best choice for local, but it will be slow but it's a solid model with much lower memory demands than most.
and 1M context window
27B is only 262K context window.
80b has 1M :)
Are you talking about Qwen3-Coder-Next 80B? It's only 262K too, but either way these start to get context rot after 100K.
btw since you are also deeper into it, i experminted with memory files ( like claude) a bit. Any experience with it? So far with stuff like opencode + plugnins like rp1 or others it works well to be always up to date.
Check out mem0
I use it via minimax direct. I signed up when I heard they are open sourcing m2.7 weights, something that was looking not likely.
The token speed is low due to demand. They promise 100+ tokens/sec with high speed, but it is barely faster than standard. You do get a lot of usage though compared to others and it's a good model. I'm waiting for m2.7 weights to drop so I can run it on my RTX 6000 Pros. Should be any day now.