How to Deploy LLMs (Phi-3) on Ubuntu 24.04 GPU Server Using Ollama & WebUI

A complete, step-by-step guide to installing Ollama, configuring Open WebUI, and running powerful local models from scratch. Would you like me to help you outline the actual terminal commands and setup steps for the article?

Dedicated Servers vs Peer To Peer - CTCservers

Deploying Phi-3 on Ubuntu 24.04 GPUs

Ollama and WebUI

How to Build Your Own Private AI: Deploying Phi-3 with Ollama and a WebUI on Ubuntu 24.04

Large Language Models (LLMs) have fundamentally changed how we interact with artificial intelligence, powering everything from advanced coding assistants to everyday conversational bots. However, relying on cloud-based giants like OpenAI or Google means sending your private data over the internet and paying for ongoing API costs. The solution is running AI locally. Highly efficient models like Microsoft's Phi-3 are leading this revolution, proving that you no longer need massive data centers to achieve incredible results. Phi-3 delivers state-of-the-art reasoning and performance while being compact enough to run smoothly on your own private GPU server.

To harness the power of models like Phi-3 on an Ubuntu 24.04 GPU server, we will use Ollama. Ollama is a powerful, developer-friendly tool that completely simplifies the process of downloading, managing, and running open-source LLMs. Instead of wrestling with complex Python environments, complicated dependencies, and manual weights loading, Ollama acts as a streamlined local API server. It handles the heavy lifting of model inference in the background, automatically utilizing your server's GPU to ensure lightning-fast generation speeds.

Finally, an AI model isn't very accessible if you can only interact with it through a command-line terminal. That's where the WebUI comes in. In this tutorial, we will build a lightweight, Flask-based Web User Interface that connects directly to your local Ollama instance. This setup will give you a sleek, browser-based chat experience just like ChatGPT but hosted entirely on your own hardware. By the end of this guide, you will have a fully functional, highly performant AI ecosystem that offers complete control over your privacy, performance, and costs.

Prerequisites

An Ubuntu 24.04 server with an NVIDIA GPU.
A non-root user or a user with sudo privileges.
NVIDIA drivers are installed on your server.

Install Ollama

Ollama provides a streamlined, lightweight environment for running powerful large language models (such as Phi-3) entirely locally. It simplifies the AI lifecycle by automatically managing model downloads, caching, and API serving. Plus, setting it up on Ubuntu 24.04 is a remarkably quick and frictionless process.

1. Ollama offers an automated shell script that installs everything you need in just a few moments.All you have to do is run the following command in your terminal:

BASH

curl -fsSL https://ollama.com/install.sh | sh

You will see an output like this

output

>>> Installing ollama to /usr/local
>>> Downloading Linux amd64 bundle
######################################################################## 100.0%
>>> Creating ollama user...
>>> Adding ollama user to render group...
>>> Adding ollama user to video group...
>>> Adding current user to ollama group...
>>> Creating ollama systemd service...
>>> Enabling and starting ollama service...
Created symlink /etc/systemd/system/default.target.wants/ollama.service → /etc/systemd/system/ollama.service.
>>> NVIDIA GPU installed.

2. Reload the Environment

BASH

source ~/.bashrc

3. Start the Ollama service and enable it

BASH

systemctl enable ollama
systemctl start ollama

4. Run the command below to confirm Ollama is running:

BASH

curl http://localhost:11434

You can see the below output

output

Ollama is running

Run a Phi-3 Model with Ollama

With the Ollama backend up and running, you can now load the Phi-3 model; a compact, high-speed AI model ideal for local deployments on GPU-powered servers.

1. Start the Phi-3 model with Ollama by running the following command:

BASH

ollama run phi3

If Phi-3 isn’t already on your system, this command will:

Download the model locally
Launch an interactive terminal session
Let you type messages for Phi-3 to respond to in real time

output

>>> What is Ollama?
Ollama seems to be a term that doesn't correspond with well-known technology or concepts as of my last update in April 2023. It might either refer to an emerging tool, application, system, project, acronym, organization, or even fictional element not recognized within the industry up until then.

2. Press Ctrl+D to exit the session.

Use Ollama Programmatically with cURL

Beyond the terminal, Ollama excels via its local REST API — perfect for:

Building custom tools
Integrating with your apps
Testing prompts with curl

You can interact with the Ollama backend by sending a prompt to the Phi-3 model. This lets you get AI responses programmatically or via the terminal:

BASH

curl http://localhost:11434/api/generate -d '{
  "model": "phi3",
  "prompt": "Explain what is GPU? Answer in short paragraph",
  "stream": false
}'

You will get a JSON response like:

output

{"model":"phi3","created_at":"2025-07-30T14:44:34.276761484Z","response":"A Graphics Processing Unit, or GPU, is a specialized electronic circuit designed to accelerate the creation of computer graphics and facilitate complex calculations for scientific research. Traditionally used solethy primarily by professional graphic designers and video game developers as it excels at processing large blocks of data simultaneously which allows rendering high-quality images with great speed; modern GPUs have evolved into multipurpose processors that can run a wide array of applications including machine learning, cryptocurrency mining, media encoding/decoding, scientific simulations. They are typically found on personal computers as graphics cards or in mobile devices such as smartphones and tablets where computational efficiency is paramount for performance but their use extends far beyond just rendering visuals - they can perform general-purpose calculations quickly due to a parallel processing architecture that's significantly more efficient at handling the matrix and vector operations commonly found in today’s computing tasks.","done":true,"done_reason":"stop","context":[32010,29871,13,9544,7420,825,338,22796,29973,673,297,3273,14880,32007,29871,13,32001,29871,13,29909,29247,10554,292,13223,29892,470,22796,29892,338,263,4266,1891,27758,11369,8688,304,15592,403,278,11265,310,6601,18533,322,16089,10388,4280,17203,363,16021,5925,29889,18375,17658,1304,14419,21155,19434,491,10257,3983,293,2874,414,322,4863,3748,18777,408,372,5566,1379,472,9068,2919,10930,310,848,21699,607,6511,15061,1880,29899,29567,4558,411,2107,6210,29936,5400,22796,29879,505,15220,1490,964,6674,332,4220,1889,943,393,508,1065,263,9377,1409,310,8324,3704,4933,6509,29892,24941,542,10880,1375,292,29892,5745,8025,29914,7099,3689,29892,16021,23876,29889,2688,526,12234,1476,373,7333,23226,408,18533,15889,470,297,10426,9224,1316,408,15040,561,2873,322,1591,1372,988,26845,19201,338,1828,792,363,4180,541,1009,671,4988,2215,8724,925,15061,7604,29879,448,896,508,2189,2498,29899,15503,4220,17203,9098,2861,304,263,8943,9068,11258,393,29915,29879,16951,901,8543,472,11415,278,4636,322,4608,6931,15574,1476,297,9826,30010,29879,20602,9595,29889],"total_duration":5109776741,"load_duration":6017051,"prompt_eval_count":19,"prompt_eval_duration":20751507,"eval_count":189,"eval_duration":5082086586}

Create a Flask WebUI for Chat Interface

Make interacting with Phi-3 more intuitive by creating a simple Flask WebUI.This interface lets you:

Send prompts to the model
View responses instantly
Experience Phi-3 like a local ChatGPT!

1. Set up a Python virtual environment.

BASH

python3 -m venv ollama-env
source ollama-env/bin/activate

2. Install Flask and other required dependencies.

BASH

pip install flask flask-cors requests

3. Create a Flask application.

BASH

nano app.py

Add the below code

BASH

from flask import Flask, request, jsonify, render_template
import requests

app = Flask(__name__)

OLLAMA_API = "http://localhost:11434/api/generate"

@app.route("/")
def home():
    return render_template("index.html")

@app.route("/chat", methods=["POST"])
def chat():
    user_input = request.json.get("prompt")
    response = requests.post(OLLAMA_API, json={
        "model": "phi3",
        "prompt": user_input,
        "stream": False
    })
    return jsonify(response.json())

if __name__ == "__main__":
    app.run(host="0.0.0.0", port=5000)

4. Create Templates Folder

Make a folder named templates
Inside it, create a file called index.html

This will hold the web interface for your Flask WebUI.

BASH

mkdir templates
nano templates/index.html

Add the below code

BASH

<!DOCTYPE html>
<html>
<head>
  <title>Phi-3 Chatbot</title>
</head>
<body>
  <h1>Chat with Phi-3</h1>
  <textarea id="input" rows="4" cols="50"></textarea><br>
  <button onclick="send()">Send</button>
  <pre id="output"></pre>

  <script>
    async function send() {
      const prompt = document.getElementById("input").value;
      const response = await fetch("/chat", {
        method: "POST",
        headers: { "Content-Type": "application/json" },
        body: JSON.stringify({ prompt })
      });
      const data = await response.json();
      document.getElementById("output").textContent = data.response;
    }
  </script>
</body>
</html>

5. Start the Flask app.

BASH

python3 app.py

6. Open your browser and go to: http://your-server-ip:5000

7. In the WebUI, type your prompt, for example: “Give me 3 startup ideas related to climate change.” Phi-3 will generate responses instantly!

8. Click Send, and Phi-3 will respond right below the input box in your WebUI!

CTCservers Recommended Tutorials

Discover CTCservers Dedicated Server Locations

CTCservers servers are available around the world, providing diverse options for hosting websites. Each region offers unique advantages, making it easier to choose a location that best suits your specific hosting needs.

North America

South America

Europe

Asia

Australia

Africa