Phoenix

Arize Phoenix is an open-source LLM tracing and evaluation tool, designed for AI developers to evaluate, experiment, and optimize AI applications in real-time.

Built on OpenTelemetry, it provides flexible, vendor-neutral application tracing, customizable evaluation templates, and rich data visualizations that enable developers to quickly improve application performance and reliability.

Highlights:

Tracing - Trace your LLM application's runtime using OpenTelemetry-based instrumentation.
Evaluation - Leverage LLMs to benchmark your application's performance using response and retrieval evals.
Datasets - Create versioned datasets of examples for experimentation, evaluation, and fine-tuning.
Experiments - Track and evaluate changes to prompts, LLMs, and retrieval.

Phoenix is vendor and language agnostic with out-of-the-box support for popular frameworks (🦙LlamaIndex, 🦜⛓LangChain, Haystack, 🧩DSPy) and LLM providers (OpenAI, Bedrock, MistralAI, VertexAI, LiteLLM, and more). For details on auto-instrumentation, check out the OpenInference project.

Phoenix can be run in your Jupyter notebook, local machine, containerized deployment, or in the cloud.