You are viewing a single comment's thread from:

RE: Nvidia RTX 6000 Pro power efficiency testing

in #technology19 hours ago

I know you had/have a Strix Halo that you were testing out but now that I see you have some RTX Pro 6000's Workstation editions, so it got me thinking...

Have you gotten your hands on a DGX Spark (or 2) or like a AMD Instinct?

I know a Spark is not built for fast TPS but it was built for Capacity and mainly for lite development on the go before loading those same workloads onto DGX proper servers. So while say a RTX Pro will blow it out of the water with what like 8 times better bandwidth, it also has double the TDP (not counting 2 RTX Pro's). So I am curious if you have done a true deep dive comparison across multiple GPUs like say cost wise compare 2 Sparks to 1 RTX Pro, or something like that.

I only get to play with DGX H200, B200 and B300's at work (still waiting on my Spark to arrive eventually) and there is no way to compare apples to apples with a DGX B300 to a RTX Pro or Strix Halo lol

Sort:  

The dgx sparks and strix are very similar and the strix is a lot cheaper, almost half.

I hated the strix. It was fun to tinker with but even though I got “good” speeds with gpt oss 120 (50 tokens a second) it was still painfully slow for anything.

Just chatting was ok, say hi it responds back. Anything agentic or even real work you can see the problems with slow prompt processing.

For example I wrote a cli tool called please. So I can “please stop process on port 8000” and it would ask the llm for the command to do it and allow you to execute it by just pressing one.

Very small context, 1000 tokens at best. When pointing to a cloud api using same model it takes a second or so to get a response. When using strix halo it would take around 10 seconds.

The amount of context used to answer this simple query should be tiny yet it was still 10x slower than using the cloud. This is an incredibly simple task too. Image generation and coding or any other agentic task was so slow it was unusable. I thought about using it just to handle small model for reasoning or other tasks and it just becomes a bottle neck.

I was hoping using a 3090 via egpu would help the prompt processing part but it was just too slow when they were forced to work together.

Loading...
Loading...