Part 4/11:
The creator took on a fun challenge: to see if Larry (the Spark) could perform AI tasks like language modeling and image generation better than his high-end dual 4090 setup named Terry.
Language Model Performance
Starting with smaller models like Quinn 38B (38 billion parameters), Terry easily outperformed Larry, showcasing higher tokens per second, indicating faster inference speeds. When scaling up to larger models like Llama 3.70B (70 billion parameters), Terry remained dominant. The initial tests revealed that Larry, despite impressive specs, couldn't match Terry's inference speed on large language models—mainly because of how the models leverage GPU VRAM and hardware optimization.