Conversations¶
Evaluate multi-turn dialogues and conversation-dependent prompts with systematic context management
Conversations enable you to test prompts that require conversation history or multi-turn dialogue context. Whether you're building chatbots, customer support systems, or any LLM application that maintains context across interactions, the conversation feature lets you systematically evaluate how your prompts perform with different conversation histories.
What Are Conversations?¶
Conversations in elluminate are structured message histories that provide context for prompt evaluation. Instead of testing prompts in isolation, conversations let you:
- Test multi-turn dialogues - Evaluate how your LLM and prompt handle ongoing conversations
- Provide conversation history - Give your LLM the context of previous messages
- Evaluate context awareness - Verify your prompt maintains coherence across turns
- Test with realistic scenarios - Use actual conversation transcripts from your system
A conversation is stored as a special payload in your collection that contains:
- Messages - The conversation history (system, user, assistant, tool messages)
- Tools (optional) - Tool definitions available during the conversation
- Tool choice (optional) - How the model should use tools
- Response format (optional) - Structured output requirements
- Metadata (optional) - Additional configuration like merge modes
How Conversations Work¶
The Conversation Column¶
Conversations are stored in a special Conversation column type in your collections. This column:
- Contains structured conversation payloads (not plain text)
- Can only exist once per collection
- Cannot coexist with Raw Input columns
- Cannot be renamed after creation
The Raw Input Column¶
Raw Inputs are in direct contrast to conversations a single prompt that is directly sent to the LLM. It does not allow for placeholders, and can not be used in combination with conversations.
Raw Inputs simplify the use-case in which you want to test specific variations of a prompt directly against an LLM.
On a technical level, raw inputs are a simplifed conversation with just one single user message.
Message Flow¶
When you run an experiment with conversations, elluminate:
- Optionally adds your template - If provided, we fill your prompt template placeholders with values from the collection
- Appends conversation messages - Adds the conversation history after the prompt template
- Sends to LLM - The model sees the full constructed message history
- Evaluates the response - We rate based on criteria with full conversation context
Example message order:
1. System message from template
2. User message from template (with placeholders filled)
3. User message from conversation payload
4. Assistant message from conversation payload
5. User message from conversation payload
-> LLM generates response here
Setting Up Conversations¶
Step 1: Create a Collection with a Conversation Column¶
Via the UI¶
- Navigate to your project's Collections page
- Click "New Collection" or open an existing collection
- Click the three-dot menu and select "Manage Columns"
- Click "Add Column"
- Configure the column:
- Name: Choose a descriptive name (e.g.,
conversation,chat_history) - Type: Select "Conversation"
- Default Value: Leave empty (conversation columns don't use default values at the moment)

Conversation Column Restrictions:
- Only one conversation column per collection
- Cannot coexist with a Raw Input column
- Column name cannot be changed after creation
- Must contain valid conversation payloads
Step 2: Add Conversation Data¶
Conversation Payload Format¶
Conversations use the Messages schema:
"messages": [
{"role": "user", "content": "Hello!"},
{"role": "assistant", "content": "Hi! How can I help?"},
{"role": "user", "content": "I need help with my account."}
]
Or Unified Conversation Envelope (UCE) schema (everything except messages is optional):
{
"schema_version": "elluminate.uce/1",
"input": {
"messages": [
{"role": "user", "content": "Hello!"},
{"role": "assistant", "content": "Hi! How can I help?"},
{"role": "user", "content": "I need help with my account."}
],
"tools": [...],
"tool_choice": "auto",
"response_format": {...},
"metadata": {...}
}
}
Message Roles¶
Messages support these roles:
system- System instructions or contextuser- User messagesassistant- Assistant responsestool- Tool execution results
Adding Conversation Data via UI¶
- Open your collection
- Click "Add Variables"
- For the conversation column, paste a valid JSON payload
- Optionally, fill in other columns (e.g., scenario description)
- Click the checkmark on the right to save

Bulk Import via File Upload¶
Prepare a JSONL file where each conversation value follows the UCE format:
{"conversation": {"schema_version": "elluminate.uce/1", "input": {"messages": [...]}}, "scenario": "password_reset", "category": "account"}
{"conversation": {"schema_version": "elluminate.uce/1", "input": {"messages": [...]}}, "scenario": "billing_inquiry", "category": "support"}
Then upload via:
- Go to the collections page and open the collection
- Click "Upload variables" and select the JSONL file
- Confirm the upload
Using Conversations in Experiments¶
Conversations work seamlessly with and without prompt templates. The template provides initial context, and the conversation provides dialogue history.

Advanced Features¶
Combining with Other Columns¶
You can mix conversation columns with regular text columns to add metadata:


Tool Calling in Conversations¶
Include tool definitions in your conversation payload:
{
"schema_version": "elluminate.uce/1",
"input": {
"messages": [
{"role": "user", "content": "Check my account balance."}
],
"tools": [
{
"type": "function",
"function": {
"name": "get_account_balance",
"description": "Retrieves the current account balance",
"parameters": {
"type": "object",
"properties": {
"account_id": {"type": "string"}
},
"required": ["account_id"]
}
}
}
],
"tool_choice": "auto"
}
}
Merging Tools¶
When both your template and conversation define tools, they merge by default. Control this with merge modes:
{
"schema_version": "elluminate.uce/1",
"input": {
"messages": [...],
"tools": [
// Only these tools will be available
],
"metadata": {
"merge_mode": {
"tools": "replace" // Or "merge" (default)
}
}
}
}
Response Format Control¶
Specify output format per conversation with a JSON schema:
{
"schema_version": "elluminate.uce/1",
"input": {
"messages": [...],
"response_format": {
"json_schema": {
"name": "customer_summary",
"schema": {
"type": "object",
"properties": {
"issue": {
"type": "string"
},
"priority": {
"type": "string",
"enum": ["low", "medium"]
}
},
"required": ["issue", "priority"]
}
}
}
}
}
Practical SDK Examples¶
Example 1: Customer Support Chatbot¶
from elluminate import Client
client = Client()
# Create collection
collection, _ = client.collections.get_or_create(
name="Customer Support Scenarios",
columns=[
{"name": "conversation", "column_type": "conversation"},
{"name": "scenario_type", "column_type": "category"}
]
)
# Add a password reset conversation
password_reset_conversation = {
"schema_version": "elluminate.uce/1",
"input": {
"messages": [
{"role": "user", "content": "I can't log into my account."},
{"role": "assistant", "content": "I can help you with that. Can you tell me your email address?"},
{"role": "user", "content": "It's [email protected]"},
{"role": "assistant", "content": "Thank you. I've found your account. Would you like me to send a password reset link?"},
{"role": "user", "content": "Yes please."}
]
}
}
client.template_variables.add_to_collection(
template_variables={
"conversation": password_reset_conversation,
"scenario_type": "password_reset"
},
collection=collection
)
# Create evaluation criteria
criteria = [
"Does the assistant maintain a professional and helpful tone throughout?",
"Does the assistant successfully guide the user to resolve their issue?",
"Does the assistant ask appropriate follow-up questions?"
]
criterion_set, _ = client.criterion_sets.get_or_create(
name="Customer Support Quality",
criteria=criteria
)
# Create template
prompt_template, _ = client.prompt_templates.get_or_create(
user_prompt_template=[
{"role": "system", "content": "You are a helpful customer support assistant. Continue the conversation naturally based on the history."}
],
name="Support Assistant"
)
# Link criteria to template
criterion_set = client.criterion_sets.add_prompt_template(
criterion_set=criterion_set,
prompt_template=prompt_template
)
# Run experiment
experiment = client.experiments.create(
name="Support Conversation Quality",
prompt_template=prompt_template,
collection=collection
)
# Generate responses and rate
responses = client.responses.generate(experiment=experiment)
client.ratings.rate_many(responses)
# View results
experiment.print_results_summary()
Example 2: Multi-Turn Technical Support¶
# Technical troubleshooting conversation
tech_support_conversation = {
"schema_version": "elluminate.uce/1",
"input": {
"messages": [
{"role": "user", "content": "My app keeps crashing."},
{"role": "assistant", "content": "I'm sorry to hear that. What device are you using?"},
{"role": "user", "content": "iPhone 14 with iOS 17."},
{"role": "assistant", "content": "Thank you. Have you tried updating the app to the latest version?"},
{"role": "user", "content": "Yes, it's already updated."},
{"role": "assistant", "content": "Let's try clearing the app cache. Go to Settings > Apps > [App Name] > Clear Cache."},
{"role": "user", "content": "Okay, I did that. Now what?"}
]
}
}
client.template_variables.add_to_collection(
template_variables={
"conversation": tech_support_conversation,
"scenario_type": "technical_troubleshooting",
"difficulty": "medium"
},
collection=collection
)
Best Practices¶
Structuring Conversation Data¶
Keep conversations focused
- Each conversation should test a specific scenario or use case
- Limit conversation length to relevant context (typically 3-10 messages)
- Remove irrelevant small talk or greetings unless testing those specifically
Use realistic conversation patterns
- Include typical user messages (typos, incomplete sentences, varied phrasing)
- Add assistant responses that reflect your system's actual behavior
- Include edge cases (unclear requests, off-topic questions)
Balance your test scenarios
- Happy paths (60-70%) - Normal conversations that should work well
- Edge cases (20-30%) - Unusual but valid conversation flows
- Adversarial cases (10-20%) - Attempts to confuse or break the system
Evaluation Strategy¶
Design conversation-aware criteria
Good criteria reference the conversation history:
- ✅ "Does the assistant maintain consistency with information provided earlier in the conversation?"
- ✅ "Does the response appropriately address the user's follow-up question?"
- ❌ "Is the response helpful?" (too generic)
Test incremental conversation building
Instead of one long conversation, test progression:
- Conversation 1: Initial request
- Conversation 2: Initial request + one follow-up
- Conversation 3: Initial request + two follow-ups
This helps isolate where context awareness breaks down.
FAQ¶
Can I edit the conversation payload after adding it?¶
Yes, click the edit icon on the variables table row. Be careful to maintain valid JSON structure.
What happens if my conversation payload is invalid?¶
elluminate validates the payload when you save. You'll see an error message indicating what's wrong (e.g., "Missing required field 'messages'", "Invalid tool definition").
Can I use conversations with batch processing?¶
Yes! Conversations work with all experiment features including batch operations and async SDK methods.
How do conversations differ from Raw Input columns?¶
- Conversations: Structured message histories with optional tools/metadata. Can combine with templates.
- Raw Input: Single user message, cannot combine with templates. For simple, template-free testing.
Can I have multiple conversation columns?¶
No, only one conversation column per collection. This ensures clarity about which payload provides the conversation context.
Do conversation messages count toward token limits?¶
Yes, all messages (template + conversation) are sent to the LLM and count toward the model's context window.
Can I reference conversation data in criteria?¶
The rating model sees the full conversation when evaluating, so your criteria can reference "earlier in the conversation" or "the conversation history."