Skip to main content

Introduction

Tags allow you to add custom metadata to your evaluations, making it easier to organize, filter, and analyze your evaluation data. Use tags to categorize evaluations by environment, version, feature flags, experiments, or any other dimension that helps you track your AI application’s performance.

Why Use Tags?

  • Organize evaluations: Group by environment, version, or feature flags
  • Filter and query: Find evaluations in Metabase or analytics tools
  • Track experiments: Tag with experiment IDs or A/B test variants
  • Monitor deployments: Tag with deployment versions or release numbers

Tag Format and Constraints

Tags are key-value pairs with the following constraints:
  • Keys: Must be strings, maximum 64 characters
  • Values: Must be strings, numbers, or bools (converted to strings), maximum 64 characters
  • No nested structures: Tag values cannot be dictionaries, lists, tuples, or sets
  • Dictionary format: Tags must be provided as a Python dictionary
# ✅ Valid tags
tags = {
    "environment": "production",
    "version": "1.2.3",
    "experiment": "variant_a",
    "deployment_id": "abc123",
    "production": False  # Numbers and bools are converted to strings
}

# ❌ Invalid tags
tags = {
    "metadata": {"key": "value"},  # Error: No nested dicts
    "versions": [1, 2, 3],         # Error: No lists
    "a" * 65: "value"              # Error: Key too long (>64 chars)
}

Using Tags

Tags can be added to both evaluate and evaluate_trace calls in synchronous and asynchronous clients.

Basic Usage

from composo import Composo, AsyncComposo

# Synchronous
composo_client = Composo(api_key="your-api-key")
result = composo_client.evaluate(
    messages=[{"role": "user", "content": "Hello"}],
    criteria="Reward helpful responses",
    tags={"environment": "production", "version": "1.0.0"}
)

# Asynchronous
async_client = AsyncComposo(api_key="your-api-key")
result = await async_client.evaluate(
    messages=[{"role": "user", "content": "Hello"}],
    criteria="Reward helpful responses",
    tags={"environment": "production", "version": "1.0.0"}
)

Trace Evaluation

from composo import Composo
from composo.tracing import ComposoTracer, Instruments, AgentTracer
from openai import OpenAI

ComposoTracer.init(instruments=Instruments.OPENAI)
composo_client = Composo(api_key="your-api-key")

with AgentTracer("my_agent") as tracer:
    response = OpenAI().chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": "What is Python?"}],
        max_tokens=100,
    )

trace = tracer.get_multi_agent_trace()
trace_evaluation = composo_client.evaluate_trace(
    trace=trace,
    criteria=["Reward agents that provide helpful advice"],
    tags={"environment": "production", "agent_version": "2.1.0"}
)

Common Use Cases

Environment and Version Tagging

import os
from composo import Composo

composo_client = Composo(api_key=os.getenv("COMPOSO_API_KEY"))

def evaluate_with_env_tags(messages, criteria):
    return composo_client.evaluate(
        messages=messages,
        criteria=criteria,
        tags={
            "environment": os.getenv("ENVIRONMENT", "development"),
            "version": os.getenv("APP_VERSION", "unknown"),
            "deployment": os.getenv("DEPLOYMENT_ID", "local")
        }
    )

Experiment Tagging

def evaluate_experiment(messages, experiment_id, variant):
    return composo_client.evaluate(
        messages=messages,
        criteria="Reward helpful responses",
        tags={
            "experiment_id": experiment_id,
            "variant": variant,
            "type": "ab_test"
        }
    )

# Usage
control_result = evaluate_experiment(messages, "exp_001", "control")
treatment_result = evaluate_experiment(messages, "exp_001", "treatment")

Querying Tags in Metabase

Tags are stored and indexed for efficient querying in Metabase.

Basic Filtering

  1. Click + NewQuestion
  2. Select your evaluations table
  3. Click FilterTagsContains
  4. Enter tag key-value pair: {"environment": "production"}
  5. Add multiple filters with + for AND logic

Visualizations

To create visualizations grouped by tag values:
  1. Create a query filtering by date range
  2. Click Summarize
  3. Choose your metric (e.g., Average of Latency (ms) or Count of rows)
  4. Add Group byTags → select your tag key (e.g., environment)
  5. Visualize as a Bar chart or Line chart

Best Practices

Consistent Naming

Use consistent tag names and values across your application:
# ✅ Good: Consistent naming
tags = {
    "environment": "production",  # Always "environment", not "env"
    "version": "1.0.0",          # Always semantic versioning
    "deployment_id": "abc123"     # Always "deployment_id"
}

# ❌ Bad: Inconsistent naming
tags = {
    "env": "prod",               # Sometimes "env", sometimes "environment"
    "version": "v1.2.3"         # Sometimes with "v" prefix
}

Keep Tags Concise

  • Keep keys and values under 64 characters
  • Use concise but meaningful names
  • Avoid excessive tags (3-5 tags per evaluation is usually sufficient)
# ✅ Good: Concise and focused
tags = {"env": "prod", "version": "1.0.0", "experiment": "variant_a"}

# ❌ Bad: Too verbose
tags = {
    "application_environment": "production_environment",
    "experiment_id": "prompt_optimization_experiment_variant_a_2024"
}

Error Handling

Tags are validated automatically. Invalid tags will raise a ValueError:
try:
    result = composo_client.evaluate(
        messages=[{"role": "user", "content": "Hello"}],
        criteria="Reward helpful responses",
        tags={"nested": {"key": "value"}}  # Invalid: nested dict
    )
except ValueError as e:
    print(f"Tag validation error: {e}")
    # Output: Tag values must not be mappings (no nested dicts allowed).

Summary

Tags provide a powerful way to organize and filter your evaluations:
  • ✅ Add tags to evaluate and evaluate_trace calls
  • ✅ Use tags to categorize by environment, version, experiments, and more
  • ✅ Filter and visualize tags in Metabase using the UI
  • ✅ Follow best practices for consistent, meaningful tags
  • ✅ Tags are validated automatically with clear error messages
Start tagging your evaluations today to gain better insights into your AI application’s performance!