Introduction

Tags allow you to add custom metadata to your evaluations, making it easier to organize, filter, and analyze your evaluation data. Use tags to categorize evaluations by environment, version, feature flags, experiments, or any other dimension that helps you track your AI application’s performance.

Why Use Tags?

Organize evaluations: Group by environment, version, or feature flags
Filter and query: Find evaluations in Metabase or analytics tools
Track experiments: Tag with experiment IDs or A/B test variants
Monitor deployments: Tag with deployment versions or release numbers

Tag Format and Constraints

Tags are key-value pairs with the following constraints:

Keys: Must be strings, maximum 64 characters
Values: Must be strings only, maximum 64 characters
No nested structures: Tag values cannot be dictionaries, lists, tuples, or sets
No non-string values: Tag values cannot be numbers, booleans, or other types
Dictionary format: Tags must be provided as a Python dictionary

# ✅ Valid tags
tags = {
    "environment": "production",
    "version": "1.2.3",
    "experiment": "variant_a",
    "deployment_id": "abc123",
    "is_production": "true"  # Convert booleans to strings
}

# ❌ Invalid tags
tags = {
    "metadata": {"key": "value"},  # Error: No nested dicts
    "versions": [1, 2, 3],         # Error: No lists
    "production": False,            # Error: Values must be strings
    "count": 42,                    # Error: Values must be strings
    "a" * 65: "value"              # Error: Key too long (>64 chars)
}

Using Tags

Tags can be added to both evaluate and evaluate_trace calls in synchronous and asynchronous clients.

Basic Usage

from composo import Composo, AsyncComposo

# Synchronous
composo_client = Composo(api_key="your-api-key")
result = composo_client.evaluate(
    messages=[{"role": "user", "content": "Hello"}],
    criteria="Reward helpful responses",
    tags={"environment": "production", "version": "1.0.0"}
)

# Asynchronous
async_client = AsyncComposo(api_key="your-api-key")
result = await async_client.evaluate(
    messages=[{"role": "user", "content": "Hello"}],
    criteria="Reward helpful responses",
    tags={"environment": "production", "version": "1.0.0"}
)

Trace Evaluation

from composo import Composo
from composo.tracing import ComposoTracer, Instruments, AgentTracer
from openai import OpenAI

ComposoTracer.init(instruments=Instruments.OPENAI)
composo_client = Composo(api_key="your-api-key")

with AgentTracer("my_agent") as tracer:
    response = OpenAI().chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": "What is Python?"}],
        max_tokens=100,
    )

trace = tracer.get_multi_agent_trace()
trace_evaluation = composo_client.evaluate_trace(
    trace=trace,
    criteria=["Reward agents that provide helpful advice"],
    tags={"environment": "production", "agent_version": "2.1.0"}
)

Common Use Cases

Environment and Version Tagging

import os
from composo import Composo

composo_client = Composo(api_key=os.getenv("COMPOSO_API_KEY"))

def evaluate_with_env_tags(messages, criteria):
    return composo_client.evaluate(
        messages=messages,
        criteria=criteria,
        tags={
            "environment": os.getenv("ENVIRONMENT", "development"),
            "version": os.getenv("APP_VERSION", "unknown"),
            "deployment": os.getenv("DEPLOYMENT_ID", "local")
        }
    )

Experiment Tagging

def evaluate_experiment(messages, experiment_id, variant):
    return composo_client.evaluate(
        messages=messages,
        criteria="Reward helpful responses",
        tags={
            "experiment_id": experiment_id,
            "variant": variant,
            "type": "ab_test"
        }
    )

# Usage
control_result = evaluate_experiment(messages, "exp_001", "control")
treatment_result = evaluate_experiment(messages, "exp_001", "treatment")

Querying Tags in Metabase

Tags are stored and indexed for efficient querying in Metabase.

Basic Filtering

Click + New → Question
Select your evaluations table
Click Filter → Tags → Contains
Enter tag key-value pair: {"environment": "production"}
Add multiple filters with + for AND logic

Visualizations

To create visualizations grouped by tag values:

Create a query filtering by date range
Click Summarize
Choose your metric (e.g., Average of Latency (ms) or Count of rows)
Add Group by → Tags → select your tag key (e.g., environment)
Visualize as a Bar chart or Line chart

Best Practices

Consistent Naming

Use consistent tag names and values across your application:

# ✅ Good: Consistent naming
tags = {
    "environment": "production",  # Always "environment", not "env"
    "version": "1.0.0",          # Always semantic versioning
    "deployment_id": "abc123"     # Always "deployment_id"
}

# ❌ Bad: Inconsistent naming
tags = {
    "env": "prod",               # Sometimes "env", sometimes "environment"
    "version": "v1.2.3"         # Sometimes with "v" prefix
}

Keep Tags Concise

Keep keys and values under 64 characters
Use concise but meaningful names
Avoid excessive tags (3-5 tags per evaluation is usually sufficient)

# ✅ Good: Concise and focused
tags = {"env": "prod", "version": "1.0.0", "experiment": "variant_a"}

# ❌ Bad: Too verbose
tags = {
    "application_environment": "production_environment",
    "experiment_id": "prompt_optimization_experiment_variant_a_2024"
}

Error Handling

Tags are validated automatically. Invalid tags will raise a ValueError:

try:
    result = composo_client.evaluate(
        messages=[{"role": "user", "content": "Hello"}],
        criteria="Reward helpful responses",
        tags={"production": False}  # Invalid: non-string value
    )
except ValueError as e:
    print(f"Tag validation error: {e}")
    # Output: Tag values must be strings.

# Other validation errors:
# - Tag values must not be mappings (no nested dicts allowed).
# - Tag values must not be collections (lists, tuples, or sets are not allowed).

Summary

Tags provide a powerful way to organize and filter your evaluations:

✅ Add tags to evaluate and evaluate_trace calls
✅ Use tags to categorize by environment, version, experiments, and more
✅ Filter and visualize tags in Metabase using the UI
✅ Follow best practices for consistent, meaningful tags
✅ Tags are validated automatically with clear error messages

Start tagging your evaluations today to gain better insights into your AI application’s performance!

Getting Started

Criteria Guide

Testing

Monitoring

Cookbooks

Community Examples

Tags

Introduction

Why Use Tags?

Tag Format and Constraints

Using Tags

Basic Usage

Trace Evaluation

Common Use Cases

Environment and Version Tagging

Experiment Tagging

Querying Tags in Metabase

Basic Filtering

Visualizations

Best Practices

Consistent Naming

Keep Tags Concise

Error Handling

Summary

Getting Started

Criteria Guide

Testing

Monitoring

Cookbooks

Community Examples

​Introduction

​Why Use Tags?

​Tag Format and Constraints

​Using Tags

​Basic Usage

​Trace Evaluation

​Common Use Cases

​Environment and Version Tagging

​Experiment Tagging

​Querying Tags in Metabase

​Basic Filtering

​Visualizations

​Best Practices

​Consistent Naming

​Keep Tags Concise

​Error Handling

​Summary

Introduction

Why Use Tags?

Tag Format and Constraints

Using Tags

Basic Usage

Trace Evaluation

Common Use Cases

Environment and Version Tagging

Experiment Tagging

Querying Tags in Metabase

Basic Filtering

Visualizations

Best Practices

Consistent Naming

Keep Tags Concise

Error Handling

Summary