<< All versions
Skill v1.0.1
currentAutomated scan100/100frank-luongt/faos-skills-marketplace/phoenix
+4 new
──Details
PublishedMay 16, 2026 at 08:48 AM
Content Hashsha256:6c1d3e2c1af8b033...
Git SHA11d3404e01a8
Bump Typepatch
──Files
Files (1 file, 11.1 KB)
SKILL.md11.1 KBactive
SKILL.md · 477 lines · 11.1 KB
version: "1.0.1"
<!-- AUTO-GENERATED by export-skills.py — DO NOT EDIT -->
name: phoenix description: Arize Phoenix open-source LLM observability and evaluation. Use when tracing LLM calls, analyzing embeddings, or running evals locally.
Phoenix - AI Observability Platform
Open-source AI observability and evaluation platform for LLM applications with tracing, evaluation, datasets, experiments, and real-time monitoring.
When to use Phoenix
Use Phoenix when:
- Debugging LLM application issues with detailed traces
- Running systematic evaluations on datasets
- Monitoring production LLM systems in real-time
- Building experiment pipelines for prompt/model comparison
- Self-hosted observability without vendor lock-in
Key features:
- Tracing: OpenTelemetry-based trace collection for any LLM framework
- Evaluation: LLM-as-judge evaluators for quality assessment
- Datasets: Versioned test sets for regression testing
- Experiments: Compare prompts, models, and configurations
- Playground: Interactive prompt testing with multiple models
- Open-source: Self-hosted with PostgreSQL or SQLite
Use alternatives instead:
- LangSmith: Managed platform with LangChain-first integration
- Weights & Biases: Deep learning experiment tracking focus
- Arize Cloud: Managed Phoenix with enterprise features
- MLflow: General ML lifecycle, model registry focus
Quick start
Installation
bash
pip install arize-phoenix# With specific backendspip install arize-phoenix[embeddings] # Embedding analysispip install arize-phoenix-otel # OpenTelemetry configpip install arize-phoenix-evals # Evaluation frameworkpip install arize-phoenix-client # Lightweight REST client
Launch Phoenix server
python
import phoenix as px# Launch in notebook (ThreadServer mode)session = px.launch_app()# View UIsession.view() # Embedded iframeprint(session.url) # http://localhost:6006
Command-line server (production)
bash
# Start Phoenix serverphoenix serve# With PostgreSQLexport PHOENIX_SQL_DATABASE_URL="postgresql://user:pass@host/db"phoenix serve --port 6006
Basic tracing
python
from phoenix.otel import registerfrom openinference.instrumentation.openai import OpenAIInstrumentor# Configure OpenTelemetry with Phoenixtracer_provider = register(project_name="my-llm-app",endpoint="http://localhost:6006/v1/traces")# Instrument OpenAI SDKOpenAIInstrumentor().instrument(tracer_provider=tracer_provider)# All OpenAI calls are now tracedfrom openai import OpenAIclient = OpenAI()response = client.chat.completions.create(model="gpt-4o",messages=[{"role": "user", "content": "Hello!"}])
Core concepts
Traces and spans
A trace represents a complete execution flow, while spans are individual operations within that trace.
python
from phoenix.otel import registerfrom opentelemetry import trace# Setup tracingtracer_provider = register(project_name="my-app")tracer = trace.get_tracer(__name__)# Create custom spanswith tracer.start_as_current_span("process_query") as span:span.set_attribute("input.value", query)# Child spans are automatically nestedwith tracer.start_as_current_span("retrieve_context"):context = retriever.search(query)with tracer.start_as_current_span("generate_response"):response = llm.generate(query, context)span.set_attribute("output.value", response)
Projects
Projects organize related traces:
python
import osos.environ["PHOENIX_PROJECT_NAME"] = "production-chatbot"# Or per-tracefrom phoenix.otel import registertracer_provider = register(project_name="experiment-v2")
Framework instrumentation
OpenAI
python
from phoenix.otel import registerfrom openinference.instrumentation.openai import OpenAIInstrumentortracer_provider = register()OpenAIInstrumentor().instrument(tracer_provider=tracer_provider)
LangChain
python
from phoenix.otel import registerfrom openinference.instrumentation.langchain import LangChainInstrumentortracer_provider = register()LangChainInstrumentor().instrument(tracer_provider=tracer_provider)# All LangChain operations tracedfrom langchain_openai import ChatOpenAIllm = ChatOpenAI(model="gpt-4o")response = llm.invoke("Hello!")
LlamaIndex
python
from phoenix.otel import registerfrom openinference.instrumentation.llama_index import LlamaIndexInstrumentortracer_provider = register()LlamaIndexInstrumentor().instrument(tracer_provider=tracer_provider)
Anthropic
python
from phoenix.otel import registerfrom openinference.instrumentation.anthropic import AnthropicInstrumentortracer_provider = register()AnthropicInstrumentor().instrument(tracer_provider=tracer_provider)
Evaluation framework
Built-in evaluators
python
from phoenix.evals import (OpenAIModel,HallucinationEvaluator,RelevanceEvaluator,ToxicityEvaluator,llm_classify)# Setup model for evaluationeval_model = OpenAIModel(model="gpt-4o")# Evaluate hallucinationhallucination_eval = HallucinationEvaluator(eval_model)results = hallucination_eval.evaluate(input="What is the capital of France?",output="The capital of France is Paris.",reference="Paris is the capital of France.")
Custom evaluators
python
from phoenix.evals import llm_classify# Define custom evaluationdef evaluate_helpfulness(input_text, output_text):template = """Evaluate if the response is helpful for the given question.Question: {input}Response: {output}Is this response helpful? Answer 'helpful' or 'not_helpful'."""result = llm_classify(model=eval_model,template=template,input=input_text,output=output_text,rails=["helpful", "not_helpful"])return result
Run evaluations on dataset
python
from phoenix import Clientfrom phoenix.evals import run_evalsclient = Client()# Get spans to evaluatespans_df = client.get_spans_dataframe(project_name="my-app",filter_condition="span_kind == 'LLM'")# Run evaluationseval_results = run_evals(dataframe=spans_df,evaluators=[HallucinationEvaluator(eval_model),RelevanceEvaluator(eval_model)],provide_explanation=True)# Log results back to Phoenixclient.log_evaluations(eval_results)
Datasets and experiments
Create dataset
python
from phoenix import Clientclient = Client()# Create datasetdataset = client.create_dataset(name="qa-test-set",description="QA evaluation dataset")# Add examplesclient.add_examples_to_dataset(dataset_name="qa-test-set",examples=[{"input": {"question": "What is Python?"},"output": {"answer": "A programming language"}},{"input": {"question": "What is ML?"},"output": {"answer": "Machine learning"}}])
Run experiment
python
from phoenix import Clientfrom phoenix.experiments import run_experimentclient = Client()def my_model(input_data):"""Your model function."""question = input_data["question"]return {"answer": generate_answer(question)}def accuracy_evaluator(input_data, output, expected):"""Custom evaluator."""return {"score": 1.0 if expected["answer"].lower() in output["answer"].lower() else 0.0,"label": "correct" if expected["answer"].lower() in output["answer"].lower() else "incorrect"}# Run experimentresults = run_experiment(dataset_name="qa-test-set",task=my_model,evaluators=[accuracy_evaluator],experiment_name="baseline-v1")print(f"Average accuracy: {results.aggregate_metrics['accuracy']}")
Client API
Query traces and spans
python
from phoenix import Clientclient = Client(endpoint="http://localhost:6006")# Get spans as DataFramespans_df = client.get_spans_dataframe(project_name="my-app",filter_condition="span_kind == 'LLM'",limit=1000)# Get specific spanspan = client.get_span(span_id="abc123")# Get tracetrace = client.get_trace(trace_id="xyz789")
Log feedback
python
from phoenix import Clientclient = Client()# Log user feedbackclient.log_annotation(span_id="abc123",name="user_rating",annotator_kind="HUMAN",score=0.8,label="helpful",metadata={"comment": "Good response"})
Export data
python
# Export to pandasdf = client.get_spans_dataframe(project_name="my-app")# Export tracestraces = client.list_traces(project_name="my-app")
Production deployment
Docker
bash
docker run -p 6006:6006 arizephoenix/phoenix:latest
With PostgreSQL
bash
# Set database URLexport PHOENIX_SQL_DATABASE_URL="postgresql://user:pass@host:5432/phoenix"# Start serverphoenix serve --host 0.0.0.0 --port 6006
Environment variables
| Variable | Description | Default | |
|---|---|---|---|
PHOENIX_PORT | HTTP server port | 6006 | |
PHOENIX_HOST | Server bind address | 127.0.0.1 | |
PHOENIX_GRPC_PORT | gRPC/OTLP port | 4317 | |
PHOENIX_SQL_DATABASE_URL | Database connection | SQLite temp | |
PHOENIX_WORKING_DIR | Data storage directory | OS temp | |
PHOENIX_ENABLE_AUTH | Enable authentication | false | |
PHOENIX_SECRET | JWT signing secret | Required if auth enabled |
With authentication
bash
export PHOENIX_ENABLE_AUTH=trueexport PHOENIX_SECRET="your-secret-key-min-32-chars"export PHOENIX_ADMIN_SECRET="admin-bootstrap-token"phoenix serve
Best practices
- Use projects: Separate traces by environment (dev/staging/prod)
- Add metadata: Include user IDs, session IDs for debugging
- Evaluate regularly: Run automated evaluations in CI/CD
- Version datasets: Track test set changes over time
- Monitor costs: Track token usage via Phoenix dashboards
- Self-host: Use PostgreSQL for production deployments
Common issues
Traces not appearing:
python
from phoenix.otel import register# Verify endpointtracer_provider = register(project_name="my-app",endpoint="http://localhost:6006/v1/traces" # Correct endpoint)# Force flushfrom opentelemetry import tracetrace.get_tracer_provider().force_flush()
High memory in notebook:
python
# Close session when donesession = px.launch_app()# ... do work ...session.close()px.close_app()
Database connection issues:
bash
# Verify PostgreSQL connectionpsql $PHOENIX_SQL_DATABASE_URL -c "SELECT 1"# Check Phoenix logsphoenix serve --log-level debug
References
- [Advanced Usage](references/advanced-usage.md) - Custom evaluators, experiments, production setup
- [Troubleshooting](references/troubleshooting.md) - Common issues, debugging, performance
Resources
- Documentation: https://docs.arize.com/phoenix
- Repository: https://github.com/Arize-ai/phoenix
- Docker Hub: https://hub.docker.com/r/arizephoenix/phoenix
- Version: 12.0.0+
- License: Apache 2.0
<!-- Source: .faos/custom/skills/ai-research/phoenix/SKILL.md -->