Research Response

How to Use Arize Phoenix with Ollama

A Practical Guide for LLM‑Ops Engineers and Data Scientists

1. Introduction

Arize Phoenix is an open‑source observability platform that lets teams monitor, debug, and evaluate large‑language‑model (LLM) applications. It can record traces, run automatic evaluations, and surface visual insights that help you spot drift, bias, or performance regressions.

Ollama is a local, lightweight LLM host that exposes an OpenAI‑compatible API. By running Ollama locally, you can keep your data in‑house, cut inference costs, and experiment quickly.

Combining Phoenix with Ollama gives you:

Feature	Phoenix	Ollama
Trace collection	OTLP‑compatible	Any SDK that speaks OpenAI API
Model evaluation	Pre‑built templates (relevance, faithfulness, toxicity, etc.)	Directly feed your local model
Visualization	Embedding heatmaps, trace graphs, metrics dashboards	Immediate feedback on local prompts
Cost	Free, open source	Zero cloud‑usage costs

2. What Is Arize Phoenix?

Phoenix is built on top of OpenTelemetry (OTLP) and provides:

Trace ingestion – collect request‑response pairs from any framework (LangChain, LlamaIndex, DSPy, etc.).
Automatic evaluation – run your LLM output against a prompt or reference set using a library of templates (faithfulness, toxicity, coherence, etc.).
Embeddings visualizer – cluster analysis and dimensionality reduction of user queries or knowledge‑base documents.
Dashboards – metrics such as latency, error rates, accuracy, and drift alerts.

Phoenix is intentionally “playground‑first”: you can spin up a local UI and test everything before deploying to production.

3. Why Combine Phoenix with Ollama?

Pain Point	Why Phoenix Helps	Why Ollama Helps
Latency	Visualize and compare latency distributions across models	Run inference locally, no network round‑trip
Data privacy	Store traces locally, no third‑party transmission	Keep data on‑premises
Cost	Free tooling	Zero cloud inference cost
Rapid iteration	Playground allows instant parameter tweaks	Quick local inference without API throttling

4. Prerequisites

Python 3.10+ (recommended in a virtual environment).
Docker (optional, for running Phoenix locally).
Ollama installed locally – see https://ollama.ai/.
An OpenAI‑compatible API key if you want to evaluate against external reference data (optional).

5. Installing Phoenix

Phoenix can be installed as a Python package or run in a Docker container.
The Python route is easiest for experimentation:

python -m venv venv
source venv/bin/activate
pip install "arize-phoenix[evals,llama-index]"  # pulls in core, evals, and LlamaIndex integration

Alternatively, run the prebuilt Docker image:

docker run -d -p 5000:5000 arize/phoenix

Once the container is running, open the UI at http://localhost:5000.

6. Configuring Phoenix to Use Ollama

Phoenix treats any OpenAI‑compatible endpoint as a “provider.”
Ollama exposes an OpenAI‑compatible endpoint at http://localhost:11434/v1.

6.1 Set Environment Variables

export OPENAI_BASE_URL=http://localhost:11434/v1
export OPENAI_API_KEY=YOUR_LOCAL_KEY  # could be anything, Phoenix ignores it for Ollama

6.2 Create a Prompt Playground Session

In the Phoenix UI, click Playground → New Session.
Under AI Provider, select Custom.
Enter the base URL and API key above.
Choose a model from the list (e.g., llama3.1:8b).

You can now send prompts directly to your local Ollama instance from the Phoenix UI and immediately see the trace, latency, and evaluation results.

7. Sending Traces from Your Own Code

Phoenix provides a lightweight callback handler that you can plug into frameworks like LlamaIndex or LangChain.

from llama_index.callbacks.arize_phoenix import ArizePhoenixCallbackHandler
from llama_index.llms import OpenAI
from llama_index import VectorStoreIndex, PromptTemplate

# Tell LlamaIndex to use the Phoenix callback
callback_handler = ArizePhoenixCallbackHandler()

llm = OpenAI(
    model="llama3.1:8b",
    api_base="http://localhost:11434/v1",
    callbacks=[callback_handler]
)

index = VectorStoreIndex(...)  # build your RAG index

query = "Explain the benefits of using local LLMs."
response = index.as_query_engine().query(query)
print(response)

All requests will be automatically sent to Phoenix via OTLP.
You’ll see each trace appear in the Traces tab, complete with timestamps, request/response payloads, and any evaluation metrics you have configured.

8. Evaluating Responses with Phoenix

Phoenix ships with a rich library of evaluation templates, e.g., RAG_RELEVANCY_PROMPT_TEMPLATE. You can also write your own.

8.1 Using a Built‑In Template

from arize_phoenix.evals import RAG_RELEVANCY_PROMPT_TEMPLATE, OpenAIModel
from arize_phoenix import evaluate

# Assume `model_output` and `ground_truth` are strings
metrics = evaluate(
    model_output=model_output,
    ground_truth=ground_truth,
    eval_template=RAG_RELEVANCY_PROMPT_TEMPLATE,
    model=OpenAIModel(
        name="llama3.1:8b",
        base_url="http://localhost:11434/v1"
    )
)
print(metrics)  # {'relevance': 0.87, 'faithfulness': 0.92, ...}

The metrics are automatically logged to Phoenix, where you can compare them across runs.

8.2 Custom Evaluation Prompts

Create a prompt that asks the LLM to score its own answer:

CUSTOM_PROMPT = """
You are evaluating the following answer to a user query:
Q: {query}
A: {answer}
Rate the answer on a scale of 0–10 for relevance and factual accuracy.
Return a JSON object: {{"relevance": int, "accuracy": int}}
"""

metrics = evaluate(
    model_output=answer,
    ground_truth=None,  # self‑evaluation
    eval_template=CUSTOM_PROMPT,
    model=OpenAIModel(...),
)

9. Visualizing Embeddings

Phoenix’ Embedding Visualizer helps you understand how your data is clustered.

Load your query or document embeddings (e.g., via openai.embeddings.create).
Push them to Phoenix using the SDK:

from arize_phoenix import embeddings

embeddings.upload(
    vectors=vectors,          # list of embedding vectors
    labels=labels,            # optional metadata (e.g., topic)
    dataset_name="my_docs"
)

In the UI, open Embeddings → Dataset → my_docs.
You’ll see a 2‑D/3‑D scatter plot, cluster boundaries, and the ability to filter by label.
Use this to spot outliers or verify that your RAG knowledge base covers the query space.

10. Advanced Use Cases

Scenario	How Phoenix Helps	Tips
RAG system debugging	Trace each retrieval step, compare retrieved docs to ground truth	Use LlamaIndex + Phoenix callbacks to see which docs were fetched
Bias & fairness monitoring	Run periodic evaluation with labeled prompts	Store evaluation metrics in Phoenix, alert on drift
Latency SLA enforcement	Continuous latency dashboards, threshold alerts	Set up an external alerting rule (e.g., PagerDuty) via Phoenix webhook
Multi‑model comparison	Store traces for several Ollama models	Use the Model Comparison view to see accuracy vs latency

11. Troubleshooting Common Issues

Symptom	Likely Cause	Fix
Traces don’t appear	Phoenix OTLP endpoint unreachable	Verify Docker port mapping (`-p 5000:5000`) or local address (`http://localhost:5000`).
Model requests fail	Wrong `OPENAI_BASE_URL`	Ensure it points to Ollama’s v1 endpoint (`http://localhost:11434/v1`).
Evaluation metrics missing	Evaluation template not registered	Pass the correct `eval_template` and ensure `OpenAIModel` has a proper `name` and `base_url`.
Embedding upload errors	Mismatch vector dimension	Ollama’s embeddings (e.g., 768) must match the dataset schema.

12. Summary

Arize Phoenix turns a local Ollama deployment into a production‑grade LLM observability platform. By simply pointing Phoenix at the Ollama endpoint and enabling the built‑in callback handlers, you gain:

Instant trace visualization
Automated evaluation with a library of templates
Embedding insights for data coverage and drift detection
Dashboards that surface latency, accuracy, and error rates

Because both tools are open source, you can keep all data on‑premise and avoid costly cloud usage while still enjoying the benefits of a modern observability stack.

Happy building! 🚀

References used in this article:

Arize Phoenix documentation (user guide, release notes)
Ollama documentation (API compatibility)
OpenTelemetry integration references (OTLP)
Phoenix evaluation templates and examples (RAG relevance, custom prompts)
LlamaIndex callback integration with Phoenix.

Observability for On‑Prem LLMs: Using Arize Phoenix with Ollama

How to Use Arize Phoenix with Ollama

1. Introduction

2. What Is Arize Phoenix?

3. Why Combine Phoenix with Ollama?

4. Prerequisites

5. Installing Phoenix

6. Configuring Phoenix to Use Ollama

6.1 Set Environment Variables

6.2 Create a Prompt Playground Session

7. Sending Traces from Your Own Code

8. Evaluating Responses with Phoenix

8.1 Using a Built‑In Template

8.2 Custom Evaluation Prompts

9. Visualizing Embeddings

10. Advanced Use Cases

11. Troubleshooting Common Issues

12. Summary

More posts

Dealing with Max Iterations in AI Agents

Agent Stopped Due to Max Iterations: How to Avoid Infinite Loops in AI Systems

Formula 1’s 20‑Year Roller‑Coaster: From Turbo‑charged Past to Hybrid‑Powered Future

Julens Shopping‑Guide 2025: Tradition, Teknologi og Bæredygtighed

Observability for On‑Prem LLMs: Using Arize Phoenix with Ollama

How to Use Arize Phoenix with Ollama

1. Introduction

2. What Is Arize Phoenix?

3. Why Combine Phoenix with Ollama?

4. Prerequisites

5. Installing Phoenix

6. Configuring Phoenix to Use Ollama

6.1 Set Environment Variables

6.2 Create a Prompt Playground Session

7. Sending Traces from Your Own Code

8. Evaluating Responses with Phoenix

8.1 Using a Built‑In Template

8.2 Custom Evaluation Prompts

9. Visualizing Embeddings

10. Advanced Use Cases

11. Troubleshooting Common Issues

12. Summary

More posts

Dealing with Max Iterations in AI Agents

Agent Stopped Due to Max Iterations: How to Avoid Infinite Loops in AI Systems

Formula 1’s 20‑Year Roller‑Coaster: From Turbo‑charged Past to Hybrid‑Powered Future

Julens Shopping‑Guide 2025: Tradition, Teknologi og Bæredygtighed

How to Use Arize Phoenix with Ollama

2. What Is Arize Phoenix?

Formula 1’s 20‑Year Roller‑Coaster: From Turbo‑charged Past to Hybrid‑Powered Future