Introduction
Tags allow you to add custom metadata to your evaluations, making it easier to organize, filter, and analyze your evaluation data. Use tags to categorize evaluations by environment, version, feature flags, experiments, or any other dimension that helps you track your AI application’s performance.Why Use Tags?
- Organize evaluations: Group by environment, version, or feature flags
- Filter and query: Find evaluations in Metabase or analytics tools
- Track experiments: Tag with experiment IDs or A/B test variants
- Monitor deployments: Tag with deployment versions or release numbers
Tag Format and Constraints
Tags are key-value pairs with the following constraints:- Keys: Must be strings, maximum 64 characters
- Values: Must be strings, numbers, or bools (converted to strings), maximum 64 characters
- No nested structures: Tag values cannot be dictionaries, lists, tuples, or sets
- Dictionary format: Tags must be provided as a Python dictionary
Using Tags
Tags can be added to bothevaluate and evaluate_trace calls in synchronous and asynchronous clients.
Basic Usage
Trace Evaluation
Common Use Cases
Environment and Version Tagging
Experiment Tagging
Querying Tags in Metabase
Tags are stored and indexed for efficient querying in Metabase.Basic Filtering
- Click + New → Question
- Select your evaluations table
- Click Filter → Tags → Contains
- Enter tag key-value pair:
{"environment": "production"} - Add multiple filters with + for AND logic
Visualizations
To create visualizations grouped by tag values:- Create a query filtering by date range
- Click Summarize
- Choose your metric (e.g., Average of Latency (ms) or Count of rows)
- Add Group by → Tags → select your tag key (e.g.,
environment) - Visualize as a Bar chart or Line chart
Best Practices
Consistent Naming
Use consistent tag names and values across your application:Keep Tags Concise
- Keep keys and values under 64 characters
- Use concise but meaningful names
- Avoid excessive tags (3-5 tags per evaluation is usually sufficient)
Error Handling
Tags are validated automatically. Invalid tags will raise aValueError:
Summary
Tags provide a powerful way to organize and filter your evaluations:- ✅ Add tags to
evaluateandevaluate_tracecalls - ✅ Use tags to categorize by environment, version, experiments, and more
- ✅ Filter and visualize tags in Metabase using the UI
- ✅ Follow best practices for consistent, meaningful tags
- ✅ Tags are validated automatically with clear error messages