Anonymizing your data while maintaining evaluation quality
When dealing with sensitive customer information, you may need to anonymize data before sending it to Composo evaluation services. This guide explains how to effectively anonymize your data while preserving evaluation quality.
For optimal evaluation results, we recommend using a consistent placeholder substitution approach rather than removing or scrambling PII. This preserves relationships between entities that are important for evaluation quality.
Replace “Bob sent an email to Sally” with “NAME_1 sent an email to NAME_2”
This preserves relationships between entities
Maintain placeholder consistency across all related content
The same entity should have the same placeholder ID throughout a single evaluation request
Example: If “Sally” is “NAME_2” in one part, it should remain “NAME_2” everywhere in that request
Preserve structure and context
Keep sentence structure, formatting, and non-PII context intact
This ensures evaluations remain accurate and meaningful
Numbering can be omitted if there is only one instance of a particular entity type. For example, if only one name appears in your data, you can simply use “NAME” instead of “NAME_1”.
{ "messages": [ {"role": "user", "content": "How do I contact Bob Smith?"}, {"role": "assistant", "content": "You can reach Bob Smith at [email protected] or call him at (555) 123-4567."} ], "evaluation_criteria": "Reward responses that provide complete contact information when requested."}
Anonymized Data:
Copy
{ "messages": [ {"role": "user", "content": "How do I contact NAME_1?"}, {"role": "assistant", "content": "You can reach NAME_1 at EMAIL_1 or call him at PHONE_1."} ], "evaluation_criteria": "Reward responses that provide complete contact information when requested."}