Test Set Generation
Automatically expand your test datasets with semantically similar synthetic examples
elluminate provides functionality to generate synthetic test data based on existing template variables. This allows you to automatically expand your test sets with semantically similar examples.
Usage in the SDK
The following example demonstrates how to generate synthetic test data:
| """v1.0 API version of example_sdk_usage_generate_testset.py
Demonstrates testset generation - automatically expanding test collections
with LLM-generated variations based on existing examples.
v1.0 API:
- collection.generate_variables(prompt_template) - generates a new test case using AI
"""
from dotenv import load_dotenv
from elluminate import Client
load_dotenv(override=True)
client = Client()
# v1.0: get_or_create_prompt_template - messages is part of lookup
template, _ = client.get_or_create_prompt_template(
name="University Nobel Laureates (Generated)",
messages="List the most impactful Nobel Prize winners from {{university}} in {{state}} "
"and their breakthrough discoveries.",
)
# v1.0: get_or_create_collection (name is the lookup key)
collection, _ = client.get_or_create_collection(
name="Top Universities - Generated",
defaults={"description": "A collection of prestigious US universities (Generated)"},
)
# v1.0: Seed the collection with initial example data
# This gives the generator examples to learn the pattern from
collection.add_many(
variables=[
{"university": "MIT", "state": "Massachusetts"},
]
)
# v1.0: Generate new test cases using AI
# The generator uses the existing entries as examples to create new variations
generated_values = []
for _ in range(2):
generated_var = collection.generate_variables(template)
generated_values.append(generated_var)
print("Generated test cases:")
for template_variables in generated_values:
print(f" {template_variables.input_values}")
# Example output:
# {'state': 'Alaska', 'university': 'University of Alaska Fairbanks'}
# {'state': 'Tennessee', 'university': 'Vanderbilt University'}
|
-
Create a prompt template that defines the structure of your prompts. The template variables will be used to generate the test data.
-
Create a collection to store your template variables and add the base examples.
-
Define your base template variables that will serve as examples for test data generation.
-
Generate test data using generate_entry. The generated data will maintain the same structure as your base data while providing semantic variations.
Usage in the Frontend
Open the Template Variable Collection and click on the small Magic Generate Button (with a star) at the bottom. You may click multiple times to generate more examples.

Best Practices
- Quality Base Data: Start with high-quality, representative example data for better synthetic generations.
- Validation: Always review generated synthetic data before using them in production.
- Diversity: Include diverse base examples to get more varied synthetic data.
- Iterative Refinement: Use generated examples to identify potential edge cases and improve your prompt templates.