That's strange. I just reimplemented the push notifications from this episode on a few of my hobby apps and didn't run into that issue (even when generating new keys).
I wonder if you have another view/partial that is rendering the full template (therefore making another request back to the Rails app for the manifest).
It greatly depends on the requirements that the model has. Some models can even run off of cpu and system ram. But, there are many GPU VMs available with AWS, Azure, etc. and you’d deploy it similar to how you would your other applications.