Sort:  

The cool thing is, you can run the models both on the GPU and CPU, and also split the inference between them, so preferably on GPU, because it's much faster, but it's possible to run the entire model on the CPU RAM only.