love movie

David Kimura PRO said about 1 year ago on LLM Insights :

dan.legrand That's correct. I personally use ollama and have an haproxy server running in front of it. It allows me to use multiple computers/gpus to serve small models (typically 8B param ones). The haproxy acts as a load balancer between the computers so I can handle multiple requests concurrently and have the high availability.

Ollama is a self hosted solution so the model and all of the inference happens on the machine that you have it hosted on. So, no prompts or inference goes out to other providers.

0 0 0

Learning Paths

Video Logs new

Blog

Merchandise

Forums new