Using ollama from the command line

Using Ollama from the command line allows you to take advantage of LLM from within your programs and shells. This environment assumes that ollama is installed with Docker.

Access ollama via CLI

Check which model is installed.

$ docker exec -it ollama ollama list
NAME               ID              SIZE      MODIFIED
llama3.1:latest    42182419e950    4.7 GB    3 weeks ago
phi3:latest        4f2222927938    2.2 GB    3 weeks ago

Models that are not installed will be downloaded automatically when you run the command, but you can also download them explicitly.

$ docker exec -it ollama ollama pull llama3.2
pulling manifest
pulling dde5aa3fc5ff... 100% ▕████████████████████████████████████████████████████████▏ 2.0 GB
pulling 966de95ca8a6... 100% ▕████████████████████████████████████████████████████████▏ 1.4 KB
pulling fcc5a6bec9da... 100% ▕████████████████████████████████████████████████████████▏ 7.7 KB
pulling a70ff7e570d9... 100% ▕████████████████████████████████████████████████████████▏ 6.0 KB
pulling 56bb8bd477a5... 100% ▕████████████████████████████████████████████████████████▏   96 B
pulling 34bb5ab01051... 100% ▕████████████████████████████████████████████████████████▏  561 B
verifying sha256 digest
writing manifest
success

Run ollama from the command line.

$ docker exec ollama ollama run "llama3.2" "what is AI? Explain in one sentence."
Artificial Intelligence (AI) refers to the development of computer systems that can perform tasks typically requiring human intelligence, such as learning, problem-solving, and decision-making, using algorithms and data.

Access ollama via API

Confirm the API is accesible.

$ curl -X POST http://localhost:11434/api/generate -d '{
  "model":"llama3.2",
  "prompt": "Hi?"
}'

Output looks like:

{"model":"llama3.2","created_at":"2024-11-29T11:39:40.55612798Z","response":"How","done":false}
{"model":"llama3.2","created_at":"2024-11-29T11:39:40.569218685Z","response":" can","done":false}
{"model":"llama3.2","created_at":"2024-11-29T11:39:40.581301676Z","response":" I","done":false}
{"model":"llama3.2","created_at":"2024-11-29T11:39:40.592935427Z","response":" assist","done":false}
{"model":"llama3.2","created_at":"2024-11-29T11:39:40.605632629Z","response":" you","done":false}
{"model":"llama3.2","created_at":"2024-11-29T11:39:40.618116021Z","response":" today","done":false}
{"model":"llama3.2","created_at":"2024-11-29T11:39:40.631825046Z","response":"?","done":false}
{"model":"llama3.2","created_at":"2024-11-29T11:39:40.64372069Z","response":"","done":true,"done_reason":"stop","context":[128006,9125,128007,271,38766,1303,33025,2696,25,6790,220,2366,18,271,128009,128006,882,128007,271,13347,30,128009,128006,78191,128007,271,4438,649,358,7945,499,3432,30],"total_duration":126834961,"load_duration":12344583,"prompt_eval_count":27,"prompt_eval_duration":7000000,"eval_count":8,"eval_duration":106000000}

If you see erros like this, the localhost must be an IP address of the ollama docker. (i.e. other docker container may access with http://ollama:11434)

$ curl -X POST http://localhost:11434/api/generate -d '{
  "model":"llama3.2",
  "prompt": "Hi?"
}'
curl: (7) Failed to connect to localhost port 11434 after 0 ms: Connection refused

You can check the IP address of the ollama Docker:

$ docker exec -it ollama hostname -i
172.18.0.2

Replace localhost to the IP address and delete the newline code:

$ curl -X POST http://`docker exec -it ollama hostname -i | tr -d '\n\r'`:11434/api/generate -d '{
  "model":"llama3.2",
  "prompt": "Hi?"
}'

Now we want the answer in plain text.

$ curl -s -X POST http://`docker exec -it ollama hostname -i | tr -d '\n\r'`:11434/api/generate -d '{
  "model": "llama3.2",
  "prompt":"Hi?",
  "stream": false
 }' | jq -r ".response"

This response would be ideal from program access.

How can I assist you today?

Accessing Ollama on another machine

By default, ollama listens on 127.0.0.1:11434, so access is restricted to local. To allow external access, set the environment variables OLLAMA_HOST and OLLAMA_ORIGINS.

OLLAMA_HOST=0.0.0.0
OLLAMA_ORIGINS=192.168.0.*

If the ollama is in docker, the port can be mapped to the host IP. i.e. If the host ip is 192.168.0.123, it can be accessed with “192.168.0.123:11434”.

services:
  ollama:
    volumes:
      - ollama:/root/.ollama
    container_name: ollama
    pull_policy: always
    tty: true
    restart: unless-stopped
    image: ollama/ollama:${OLLAMA_DOCKER_TAG-latest}
    ports:
      - "11434:11434"
    environment:
      - 'OLLAMA_HOST=0.0.0.0:11434'
    deploy:
      resources:
        reservations:
          devices:
            - driver: ${OLLAMA_GPU_DRIVER-nvidia}
              count: ${OLLAMA_GPU_COUNT-1}
              capabilities:
                - gpu
WSL

Previous article

Changing the hostname on WSL2
Linux

Next article

Linux GPU CLI Monitoring