Structured Outputs¶
Structured outputs enable the developer to enforce that the LLM produces responses formatted in a programatically deterministic manner. Agentic programs make great use of this feature to enable interoperability code paths and LLM responses. This makes evaluating structured outputs an essential part for evaluating agents.
Basic Usage¶
An example showcasing using Pydantic models for structured output generation and evaluation:
-
Define Schema: Create a Pydantic model with field descriptions and basic constraints to define the exact JSON structure you want the LLM to return
-
Create Template: Use the
response_format
parameter when creating a prompt template to specify that responses should follow your Pydantic model structure -
Add Criteria: Define evaluation criteria that reference specific schema fields - criteria may also be auto-generated as per usual
-
Run Experiment: Create and run experiments normally - the structured output format will be enforced automatically for all response generations
-
Access Responses: The structured outputs can be found in the assistant message's
content
key as a JSON string
Schema Definition Methods¶
Pydantic Models¶
Pydantic models provide the most intuitive and recommended way to define structured output schemas. Simply set the response_format
to the Pydantic class definition, and Elluminate handles the rest.
OpenAI JSON Schema Format¶
In addition to Pydantic models, you may also set the response_format
directly with an OpenAI JSON Schema definition:
schema = {
"type": "json_schema",
"json_schema": {
"name": "sentiment",
"schema": {
"type": "object",
"properties": {
"stars": {
"type": "integer",
"description": "Number of stars of the review",
"minimum": 1,
"maximum": 5
},
"sentiment": {
"type": "string",
"description": "The sentiment output, could be positive, negative, or neutral.",
"enum": [
"positive",
"negative",
"neutral"
]
},
"confidence": {
"type": "number",
"description": "Confidence score of the sentiment analysis between 0 and 1",
"minimum": 0,
"maximum": 1
}
},
"required": [
"stars",
"sentiment",
"confidence"
],
"additionalProperties": False
}
}
}
AI-Powered Schema Generation¶
The frontend provides an AI-powered schema generator that creates JSON schemas from natural language descriptions. Simply describe what you want to extract, and Elluminate will generate an appropriate schema.
Evaluating Structured Outputs¶
The rating model has access to all field descriptions from your structured output schema, providing valuable context about what each field should contain and how it should be interpreted. Subsequently to evaluate structured outputs, simply create criteria and run an experiment as per usual.
Using Field Names in Criteria
It may be beneficial to use field names from your schema in the criteria. This helps the rating model understand exactly which part of the JSON structure to focus on. For example, "Does the 'sentiment' field..." is more precise than "Is the sentiment correct?"