Streaming LLM Responses | Drifting Ruby

Streaming LLM Responses

Episode #445 by

David Kimura

Mar 3, 2024

7

Previous (#444) Episode Next (#446)

Summary

In this episode, we look at running a self hosted Large Language Model (LLM) and consuming it with a Rails application. We will use a background to make API requests to the LLM and then stream the responses in real-time to the browser.
rails ai artificial intelligence machine learning background processing 24:10

Mark as Watched Watch Later

Chapters

Introduction (0:00)
Installing a LLM (3:23)
Creating a new Rails app (5:22)
Creating the Chat form (5:33)
Creating the route (7:17)
Creating the Chat controller (7:28)
Creating the Chat job (8:43)
Building the API Request (9:05)
Broadcasting the initial div (11:38)
Making the API Request to the LLM (13:49)
Processing the chunk (15:22)
Formatting the response (16:58)
Demo (21:47)
Final thoughts (21:59)

Resources

Ollama - https://github.com/ollama/ollama
Source - https://github.com/driftingruby/445-streaming-llm-responses

This episode is sponsored by Honeybadger

Download Source Code

Summary

# Terminal
brew install ollama
brew services start ollama
ollama list
ollama pull mistral:latest
rails g controller chats
rails g job chat
rails g stimulus markdown-text
yarn add marked
yarn add highlight.js

# app/views/welcome/index.html.erb
<%= turbo_stream_from "welcome" %>
<div id="messages"></div>
<%= render "form" %>

# app/views/welcome/_form.html.erb
<%= form_with url: chats_path, html: { id: "chat_form", class: "mt-3" } do |f| %>
  <div class="row">
    <div class="col-11">
      <%= f.text_area :message, class: "form-control w-100", placeholder: "Your message", autofocus: true %>
    </div>
    <div class="col-1">
      <%= f.submit "Send", class: "btn btn-primary w-100" %>
    </div>
  </div>
<% end %>

# config/routes.rb
resources :chats, only: :create

# app/controllers/chats_controller.rb
class ChatsController < ApplicationController
  def create
    ChatJob.perform_later(params[:message])
    render turbo_stream: turbo_stream.replace("chat_form", partial: "welcome/form")
  end
end

# app/jobs/chat_job.rb
require 'net/http'

class ChatJob < ApplicationJob
  queue_as :default

  def perform(prompt)
    uri = URI("http://localhost:11434/api/generate")
    request = Net::HTTP::Post.new(uri, "Content-Type" => "application/json")
    request.body = {
      model: "mistral:latest",
      prompt: context(prompt),
      temperature: 1,
      stream: true
    }.to_json

    Net::HTTP.start(uri.hostname, uri.port) do |http|
      rand = SecureRandom.hex(10)
      broadcast_message("messages", message_div(rand))
      http.request(request) do |response|
        response.read_body do |chunk|
          # Rails.logger.info "✅ #{chunk}"
          process_chunk(chunk, rand)
        end
      end
    end
  end

  private

  def context(prompt)
    "[INST]#{prompt}[/INST]"
  end

  def message_div(rand)
    <<~HTML
      <div id='#{rand}'
        data-controller='markdown-text'
        data-markdown-text-update-value=''
        class='bg-primary-subtle p-2 rounded-lg mb-2 rounded'></div>
    HTML
  end

  def broadcast_message(target, message)
    Turbo::StreamsChannel.broadcast_append_to "welcome", target: target, html: message
  end

  def process_chunk(chunk, rand)
    json = JSON.parse(chunk)
    done = json["done"]
    message = json["response"].to_s.strip.size.zero? ? "<br>" : json["response"]
    if done
      message = "<script>document.getElementById('#{rand}').dataset.markdownTextUpdatedValue = '#{Time.current.to_f}';</script>"
      broadcast_message(rand, message)
    else
      broadcast_message(rand, message)
    end
  end
end

# app/javascript/controllers/markdown_text_controller.js
import { Controller } from "@hotwired/stimulus"
import { marked } from "marked"
import hljs from "highlight.js"

// Connects to data-controller="markdown-text"
export default class extends Controller {
  static values = { updated: String }

  updatedValueChanged() {
    const markdownText = this.element.innerText || ""
    const html = marked.parse(markdownText)
    this.element.innerHTML = html
    this.element.querySelectorAll("pre").forEach((block) => {
      hljs.highlightElement(block)
    })
  }

}

# app/assets/stylesheets/application.bootstrap.scss
@use "highlight.js/styles/github-dark.css";
@import 'bootstrap/scss/bootstrap';
@import 'bootstrap-icons/font/bootstrap-icons';
@import 'drifting_ruby';

pre {
  padding: 10px;
}