How to Deploy LLMs (Phi-3) on Ubuntu 24.04 GPU Server Using Ollama & WebUI

A complete, step-by-step guide to installing Ollama, configuring Open WebUI, and running powerful local models from scratch. Would you like me to help you outline the actual terminal commands and setup steps for the article?

Dedicated Servers vs Peer To Peer - CTCservers

Deploying Phi-3 on Ubuntu 24.04 GPUs

Ollama and WebUI

How to Build Your Own Private AI: Deploying Phi-3 with Ollama and a WebUI on Ubuntu 24.04

Large Language Models (LLMs) have fundamentally changed how we interact with artificial intelligence, powering everything from advanced coding assistants to everyday conversational bots. However, relying on cloud-based giants like OpenAI or Google means sending your private data over the internet and paying for ongoing API costs. The solution is running AI locally. Highly efficient models like Microsoft's Phi-3 are leading this revolution, proving that you no longer need massive data centers to achieve incredible results. Phi-3 delivers state-of-the-art reasoning and performance while being compact enough to run smoothly on your own private GPU server.

To harness the power of models like Phi-3 on an Ubuntu 24.04 GPU server, we will use Ollama. Ollama is a powerful, developer-friendly tool that completely simplifies the process of downloading, managing, and running open-source LLMs. Instead of wrestling with complex Python environments, complicated dependencies, and manual weights loading, Ollama acts as a streamlined local API server. It handles the heavy lifting of model inference in the background, automatically utilizing your server's GPU to ensure lightning-fast generation speeds.

Finally, an AI model isn't very accessible if you can only interact with it through a command-line terminal. That's where the WebUI comes in. In this tutorial, we will build a lightweight, Flask-based Web User Interface that connects directly to your local Ollama instance. This setup will give you a sleek, browser-based chat experience just like ChatGPT but hosted entirely on your own hardware. By the end of this guide, you will have a fully functional, highly performant AI ecosystem that offers complete control over your privacy, performance, and costs.

Prerequisites

  • An Ubuntu 24.04 server with an NVIDIA GPU.
  • A non-root user or a user with sudo privileges.
  • NVIDIA drivers are installed on your server.
1

Install Ollama

Ollama provides a streamlined, lightweight environment for running powerful large language models (such as Phi-3) entirely locally. It simplifies the AI lifecycle by automatically managing model downloads, caching, and API serving. Plus, setting it up on Ubuntu 24.04 is a remarkably quick and frictionless process.
1. Ollama offers an automated shell script that installs everything you need in just a few moments.All you have to do is run the following command in your terminal:

BASH
curl -fsSL https://ollama.com/install.sh | sh
You will see an output like this
output
>>> Installing ollama to /usr/local
>>> Downloading Linux amd64 bundle
######################################################################## 100.0%
>>> Creating ollama user...
>>> Adding ollama user to render group...
>>> Adding ollama user to video group...
>>> Adding current user to ollama group...
>>> Creating ollama systemd service...
>>> Enabling and starting ollama service...
Created symlink /etc/systemd/system/default.target.wants/ollama.service → /etc/systemd/system/ollama.service.
>>> NVIDIA GPU installed.
2. Reload the Environment
BASH
source ~/.bashrc
3. Start the Ollama service and enable it
BASH
systemctl enable ollama
systemctl start ollama
4. Run the command below to confirm Ollama is running:
BASH
curl http://localhost:11434
You can see the below output
output
Ollama is running
2

Run a Phi-3 Model with Ollama

With the Ollama backend up and running, you can now load the Phi-3 model; a compact, high-speed AI model ideal for local deployments on GPU-powered servers.
1. Start the Phi-3 model with Ollama by running the following command:
BASH
ollama run phi3

If Phi-3 isn’t already on your system, this command will:

  • Download the model locally
  • Launch an interactive terminal session
  • Let you type messages for Phi-3 to respond to in real time
output
>>> What is Ollama?
Ollama seems to be a term that doesn't correspond with well-known technology or concepts as of my last update in April 2023. It might either refer to an emerging tool, application, system, project, acronym, organization, or even fictional element not recognized within the industry up until then.

2. Press Ctrl+D to exit the session.

3

Use Ollama Programmatically with cURL

Beyond the terminal, Ollama excels via its local REST API — perfect for:

  • Building custom tools
  • Integrating with your apps
  • Testing prompts with curl
You can interact with the Ollama backend by sending a prompt to the Phi-3 model. This lets you get AI responses programmatically or via the terminal:
BASH
curl http://localhost:11434/api/generate -d '{
  "model": "phi3",
  "prompt": "Explain what is GPU? Answer in short paragraph",
  "stream": false
}'
You will get a JSON response like:
output
{"model":"phi3","created_at":"2025-07-30T14:44:34.276761484Z","response":"A Graphics Processing Unit, or GPU, is a specialized electronic circuit designed to accelerate the creation of computer graphics and facilitate complex calculations for scientific research. Traditionally used solethy primarily by professional graphic designers and video game developers as it excels at processing large blocks of data simultaneously which allows rendering high-quality images with great speed; modern GPUs have evolved into multipurpose processors that can run a wide array of applications including machine learning, cryptocurrency mining, media encoding/decoding, scientific simulations. They are typically found on personal computers as graphics cards or in mobile devices such as smartphones and tablets where computational efficiency is paramount for performance but their use extends far beyond just rendering visuals - they can perform general-purpose calculations quickly due to a parallel processing architecture that's significantly more efficient at handling the matrix and vector operations commonly found in today’s computing tasks.","done":true,"done_reason":"stop","context":[32010,29871,13,9544,7420,825,338,22796,29973,673,297,3273,14880,32007,29871,13,32001,29871,13,29909,29247,10554,292,13223,29892,470,22796,29892,338,263,4266,1891,27758,11369,8688,304,15592,403,278,11265,310,6601,18533,322,16089,10388,4280,17203,363,16021,5925,29889,18375,17658,1304,14419,21155,19434,491,10257,3983,293,2874,414,322,4863,3748,18777,408,372,5566,1379,472,9068,2919,10930,310,848,21699,607,6511,15061,1880,29899,29567,4558,411,2107,6210,29936,5400,22796,29879,505,15220,1490,964,6674,332,4220,1889,943,393,508,1065,263,9377,1409,310,8324,3704,4933,6509,29892,24941,542,10880,1375,292,29892,5745,8025,29914,7099,3689,29892,16021,23876,29889,2688,526,12234,1476,373,7333,23226,408,18533,15889,470,297,10426,9224,1316,408,15040,561,2873,322,1591,1372,988,26845,19201,338,1828,792,363,4180,541,1009,671,4988,2215,8724,925,15061,7604,29879,448,896,508,2189,2498,29899,15503,4220,17203,9098,2861,304,263,8943,9068,11258,393,29915,29879,16951,901,8543,472,11415,278,4636,322,4608,6931,15574,1476,297,9826,30010,29879,20602,9595,29889],"total_duration":5109776741,"load_duration":6017051,"prompt_eval_count":19,"prompt_eval_duration":20751507,"eval_count":189,"eval_duration":5082086586}
4

Create a Flask WebUI for Chat Interface

Make interacting with Phi-3 more intuitive by creating a simple Flask WebUI.This interface lets you:

  • Send prompts to the model
  • View responses instantly
  • Experience Phi-3 like a local ChatGPT!
1. Set up a Python virtual environment.
BASH
python3 -m venv ollama-env
source ollama-env/bin/activate
2. Install Flask and other required dependencies.
BASH
pip install flask flask-cors requests
3. Create a Flask application.
BASH
nano app.py
Add the below code
BASH
from flask import Flask, request, jsonify, render_template
import requests

app = Flask(__name__)

OLLAMA_API = "http://localhost:11434/api/generate"

@app.route("/")
def home():
    return render_template("index.html")

@app.route("/chat", methods=["POST"])
def chat():
    user_input = request.json.get("prompt")
    response = requests.post(OLLAMA_API, json={
        "model": "phi3",
        "prompt": user_input,
        "stream": False
    })
    return jsonify(response.json())

if __name__ == "__main__":
    app.run(host="0.0.0.0", port=5000)
4. Create Templates Folder
  • Make a folder named templates
  • Inside it, create a file called index.html

This will hold the web interface for your Flask WebUI.

BASH
mkdir templates
nano templates/index.html
Add the below code
BASH
<!DOCTYPE html>
<html>
<head>
  <title>Phi-3 Chatbot</title>
</head>
<body>
  <h1>Chat with Phi-3</h1>
  <textarea id="input" rows="4" cols="50"></textarea><br>
  <button onclick="send()">Send</button>
  <pre id="output"></pre>

  <script>
    async function send() {
      const prompt = document.getElementById("input").value;
      const response = await fetch("/chat", {
        method: "POST",
        headers: { "Content-Type": "application/json" },
        body: JSON.stringify({ prompt })
      });
      const data = await response.json();
      document.getElementById("output").textContent = data.response;
    }
  </script>
</body>
</html>
5. Start the Flask app.
BASH
python3 app.py
6. Open your browser and go to: http://your-server-ip:5000
7. In the WebUI, type your prompt, for example: “Give me 3 startup ideas related to climate change.” Phi-3 will generate responses instantly!
8. Click Send, and Phi-3 will respond right below the input box in your WebUI!

Discover CTCservers Dedicated Server Locations

CTCservers servers are available around the world, providing diverse options for hosting websites. Each region offers unique advantages, making it easier to choose a location that best suits your specific hosting needs.

Limited Time
Special Offers
Server upgrades & more.
UK Region London
15%
OFF
Asia Pacific Tokyo
10%
OFF