
Battlestar Galatica 2003 Remake
Ok, not that number six, this number 6.

This is my new AI server, named after Number 6 from one of the best sci-fi shows ever made Battlestar Galatica (2003 reboot). I tried to come up with a cool AI name that was based on a movie or show, then I thought about one of the best shows ever made.
What is this thing?
Technically, it's my old computer before my recent pc upgrade. I've been looking for a decent AI server that is local but fast enough that I don't want to throw it out of the window.
I had a lot of fun tinkering with the Strix Halo but it is just a toy and extremely overrated for any actual workload. It is great to tinker with and play with much larger models than you usually can.
Specs
AMD 5950X
Asus Dark Hero AM4 Motherboard
32G DDR4 Ram
Dual Nvidia RTX 6000 Pro 600W Workstation Edition
It doesn't look like much, until you see the GPUs. These are faster 5090's with a full 96GB vram each giving me a total of 192G vram to work with. This system was just sitting in the closet and would be a perfect choice for two RTX 6000 Pro with very minor performance loss off a top of the line AM5 system.
I did run into a few problems, the major one was the PSU did not fit in this case. It is considering larger than a typical PSU so I am currently sitting it on the desk behind the case. I needed a larger power supply as well as one that supports two 12V 2x6 cables.
I am planning on another H9 Flow case like my main system that will fit the larger PSU and give it even more airflow.

While 32G DDR4 ram is not impressive, I do not plan to do any CPU offloading and will only use the GPUs. I used to have more ram in this system but it isn't needed.
What am I running on it?
I am currently testing models and tweaking performance. I am currently running GLM 4.5 Air FP8 on SGLang while I wait for GLM 4.6 Air to be released.
GLM 4.5 Air is a 106B parameter model from Z.ai and considered the best model at this size. The model weights alone come in around 118GB and everything else like kv cache and context you can push 192G easily.

I have a few other models I want to test. I have been using GPT-OSS-120B locally for a while, I was able to get 50 tokens/sec on the Strix Halo, which is really good but it does really poorly under real usage due to the slow ram making prompt processing really slow.
While GLM 4.5 Air is a slightly smaller model 106B parameters compared to 120B of GPT-OSS-120B, it has 4x the active parameters when in use. These models are called MoE or Mixture of Experts models. They are really popular as you can get some amazing speeds due to only some of the parameters being active at a time. For GPT-OSS-120B, only 3B parameters are active when being used where GLM 4.5 Air uses 12B. I am also running GLM 4.5 Air at FP8 quantization which is twice as large as GPT-OSS-120B's MXFP4. In other words, GLM 4.5 Air is considerably more demanding to run.
I have been seeing as much as 138 tokens/second peak from this rig on GLM 4.5 Air, with most requests giving me 100-120 tokens/second. Even at 122K context, I am still seeing around 75 tokens/second. The prompt processing however is very high making it really quick to spit out the first token (ttft).

Here you can see it summarizing a book that represents around 127,00 tokens, very close to the maximum 131,072 tokens this model is capable of. As you fill up the context window with data, models get a lot slower. I am still able to reach an impressive 77 tokens per second at max context.
This thing is a beast, ideally I want to be running the full 357B parameter GLM 4.6 but until DDR6 is released, I will stick with this two GPU setup.
Power usage
Here is where things get interesting. To summarize that book with 127K tokens trips my UPS at almost 1400W draw.

The PSU can handle 1500W, but there are other things on it.
If I power limit the two GPUS to 300W, I can drastically reduce the power draw down to 784W

Surely this means I am getting around half the speed?

I lost a whole 3 tokens/second! I lost about 3.9% in performance but reduced power usage by 43.43%!
Quite a fair trade I must say. Nvidia does make a model of the card called MaxQ that are fixed to 300W total and have a different fan style. I didn't want to pay the same amount for a less potential power if I decide to use them differently.
Why?

Good question. My primary reason is for analyzing stock data. I do not believe AI can predict price action, they can however churn through a massive amount of data and if driven properly can increase your edge or alpha. I already heavily use AI for trading, much of which is using cloud providers, but I don't want my data going to third parties.
You would be surprised how interesting many of us are after finding your last two posts about your set up
I like to post updates when I change things.
Nice
So much of this was over my head, but it sounds pretty cool and I am excited for you.
😄 mostly the AI stuff. The hardware stuff I get.
That power saving is massive for the performance loss damn! Hope energy is pretty cheap there, or you got solar to help, I'd cry looking at my bill with that kinda power usage haha, it's still fairly bad here in the UK.
This post has also reminded me that I need to fetch and watch battlestar galactica!
power is crazy here, 0.256c KWH, it has gone up from 0.15c years ago.
I keep thinking about solar, but I hate that it takes 20 years to break even and by then you need to replace it. Plus I live with snow, so I got to pay someone every year to clean the snow off in bad storms. I am thinking about putting some off to the side though.
The reboot is sooooo good. Some of the best TV you will see.
I think the same a lot of the time, that and being able to afford it in the first place. But if I could afford it I'd probably do it on a moral basis and kinda ignore the ROI prospects. If it could manage 70% to breaking even I'D be fine with that honestly. Though in a few years time when the newest tech gets more affordable this could change.
Caprica was a better show than BSG. I really enjoyed where it was going with the themes of Post-Humanism in the lead up to number 6. I am still sad it was cancelled. One of my favourite shows, but the BSG reboot is close.
When do you get a rambling hybrid installed in a bathtub in the spareroom for more organic tokens per second?
I forget if I got around to watching Caprica. I plan on rewatching the series soon, I almost never re-watch shows. I'll add it to the list.
So glad these GPUs arrived for you, and weren't part of the cargo plane crash :)
We can get up to 2400W at the wall here, but I do not have the funds to obtain 3 of these cards :P
I was honestly worried they wouldn't show up.
Particularly regarding financial and business decisions. It has been a weakness of the financial system that a variety of gatekeepers (lenders, brokers, factors, and etc.) necessarily attained to strategic and other proprietary information in order to facilitate operations. That has been enormously improved by digitization in many ways, but using cloud services dramatically worsens that data insecurity.
It has been one of the most alarming features to me of the surveillance state that has arisen that business information, including metadata regarding principals, their plans, key hurdles, and etc. are all harvested and available to an assortment of analysts and parties with such execrable ethical and moral standards they'd work in that industry. The notorious collection of >16M penis pics by one such analyst well characterizes that ilk, and sensitive business information being at their fingertips is ill-advised, IMHO. I am actually stunned that we aren't inundated with reports of people being ruined, lucrative trades based on insider information, and massive wealth concentration in the wallets of analysts on the daily. OTOH, the lack of such reports doesn't indicate the lack of such swindles, given the nature of that beast.
The more you know, the less you want others to know what you know.
That's a pretty impressive AI setup. I'm sure you have equally impressive uses for it.
Thanks!
Watched both BSG shows with my son. Does AI really help your trading?
Yes
Wow
And you really trust Ai for your trading??
seems to be working
Okay. Great.
And stop downvoting me please
I don’t know what to do
Hm i guess you use ai it for crypto / market trading, but it could be useful for sport trading on exchanges 🤔
Amigo me puedes decir porque le diste voto negativo a mi publicación sin explicación alguna