Ollama on Google Colab with Free GPU

This guide explains how to set up and run Ollama on Google Colab, taking advantage of the free T4 GPU tier. We'll also cover how to expose your Ollama instance to the web using ngrok, making it accessible from anywhere.

This setup is perfect for experimenting with large language models without needing powerful local hardware.

Key Configuration Steps

To successfully run and expose Ollama on Google Colab, there are a few critical steps you'll need to follow.

1. Exposing Ollama to the Network

By default, Ollama only listens for requests from the local machine. To make it accessible from outside the Colab environment, you need to set two environment variables before starting the Ollama server:

OLLAMA_HOST='0.0.0.0'
OLLAMA_ORIGINS='*'

OLLAMA_HOST='0.0.0.0' tells Ollama to listen on all available network interfaces.
OLLAMA_ORIGINS='*' allows cross-origin requests from any domain, which is necessary for ngrok to forward requests correctly.

2. Exposing the URL with ngrok

Google Colab instances are not directly accessible from the public internet. This is where ngrok comes in. It creates a secure tunnel from a public URL to your local Colab environment.

To use ngrok, you'll need to:

Sign up for a free account at ngrok.com.
Get your authtoken from your ngrok dashboard.
Add the token to Google Colab's Secrets Manager under the name NGROK_TOKEN. This keeps your token secure and allows the notebook to access it without hardcoding.

The Python script will then use this token to authenticate with ngrok and create the public URL.

Google Colab Notebook

Here is the complete script to run in your Google Colab notebook. You can copy and paste these cells to get started quickly.

Cell 1: System Setup and Server Launch

This cell installs necessary packages, sets up authentication, and starts the Ollama server.

# --- 1. SYSTEM SETUP & GPU FIX ---
# Install the missing tool so Ollama sees the GPU
!sudo apt-get install -y pciutils
# Verify GPU is actually attached (Should show Tesla T4)
!nvidia-smi

# --- 2. INSTALL OLLAMA ---
!curl -fsSL https://ollama.com/install.sh | sh
!pip install pyngrok

# --- 3. SETUP AUTH & TUNNEL ---
from google.colab import userdata
from pyngrok import ngrok
import os
import subprocess
import time

# Kill any zombies from previous runs
!pkill -f ollama

# Get Ngrok Token
try:
    token = userdata.get('NGROK_TOKEN')
    ngrok.set_auth_token(token)
    print("✅ Auth token set.")
except:
    print("❌ Error: 'NGROK_TOKEN' not found in Secrets.")

# --- 4. START OLLAMA (The Robust Way) ---
# We write the command as a single string with the env var baked in.
# This prevents Python from 'forgetting' to pass the variable.
cmd = "OLLAMA_HOST='0.0.0.0' OLLAMA_ORIGINS='*' ollama serve"

print("⏳ Starting Ollama Server...")
# subprocess.Popen with shell=True forces it to run exactly as typed
process = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
time.sleep(5)

# --- 5. OPEN TUNNEL ---
try:
    public_url = ngrok.connect(11434, host_header="rewrite").public_url
    print(f"🚀 Ollama is ONLINE at: {public_url}")
except Exception as e:
    print(f"❌ Tunnel Error: {e}")

Cell 2: Pulling Models

Use this cell to download the models you want to use. You can uncomment or add any models from the Ollama library.

# Pull the Chat Model (Small & Fast)
#!ollama pull llama3.2:1b

# Pull the Embedding Model (Google's new 300M model)
#!ollama pull embeddinggemma


# Pulling Better Models
#!ollama pull qwen2.5:14b

#!ollama pull qwen3:14b

!ollama pull granite3.1-moe

!ollama pull nomic-embed-text

Cell 3: List Installed Models

This cell lists all the models you have downloaded to confirm they are ready.

!ollama list

18 February 2026