Arize Phoenix is an open-source LLM tracing and evaluation tool, designed for AI developers to evaluate, experiment, and optimize AI applications in real-time.
Built on OpenTelemetry, it provides flexible, vendor-neutral application tracing, customizable evaluation templates, and rich data visualizations that enable developers to quickly improve application performance and reliability.
Highlights:
- Tracing - Trace your LLM application's runtime using OpenTelemetry-based instrumentation.
- Evaluation - Leverage LLMs to benchmark your application's performance using response and retrieval evals.
- Datasets - Create versioned datasets of examples for experimentation, evaluation, and fine-tuning.
- Experiments - Track and evaluate changes to prompts, LLMs, and retrieval.
Phoenix is vendor and language agnostic with out-of-the-box support for popular frameworks (🦙LlamaIndex, 🦜⛓LangChain, Haystack, 🧩DSPy) and LLM providers (OpenAI, Bedrock, MistralAI, VertexAI, LiteLLM, and more). For details on auto-instrumentation, check out the OpenInference project.
Phoenix can be run in your Jupyter notebook, local machine, containerized deployment, or in the cloud.