Overview

Tool calling evaluation assesses how well your LLM selects appropriate tools for given tasks and provides correct parameters to function calls.

Evaluation Types

Composo supports two main types of tool calling evaluation:

Immediate Function Evaluation

Evaluates tool calls before receiving the tool response. Useful for:
  • Evalating a tool call in a live application before the tool is invoked
  • An evaluation of the tool call quality unbiased by the tool response

Hindsight Function Evaluation

Evaluates tool calls after receiving the tool response. Useful when:
  • Insight and evaluation power is most important
We recommend using hindsight evaluation wherever possible - seeing the result of the tool call provides additional information which can make the evaluation significantly more powerful.
Note: Composo API will automatically decide whether to do an Immediate or Hindsight evaluations depending on whether you provide the tool response in your messages object

Criteria Format Requirements

Tool calling criteria must start with one of these prefixes:

Continuous Evaluation (0-1 scoring)

  • "Reward tool calls"
  • "Penalize tool calls"

Binary Evaluation (Pass/Fail)

  • "Tool call passes if"
  • "Tool call fails if"
Read our criteria writing guide for detailed advice on writing criteria.

Message Format

Tool calling evaluations use the same message format as response evaluations, but include tool definitions:
{
    "messages": [
        {"role": "user", "content": "What's the weather in Paris?"},
        {"role": "assistant", "content": null, "tool_calls": [
            {
                "id": "call_123",
                "type": "function",
                "function": {
                    "name": "get_weather",
                    "arguments": "{\"location\": \"Paris, France\"}"
                }
            }
        ]},
        {"role": "tool", "tool_call_id": "call_123", "content": "Currently 15°C with clear skies"},
        {"role": "assistant", "content": "The weather in Paris is currently 15°C with clear skies."}
    ],
    "tools": [
        {
            "type": "function",
            "function": {
                "name": "get_weather",
                "description": "Get current weather for a location",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "location": {"type": "string"}
                    },
                    "required": ["location"]
                }
            }
        }
    ],
    "evaluation_criteria": "Reward tool calls that select appropriate tools and provide accurate parameters"
}

Next Steps