ExperimentsResource
ExperimentsResource ¶
ExperimentsResource(client: Union[Client, AsyncClient])
Bases: BaseResource
arun
async
¶
arun(experiment: Experiment, rating_mode: RatingMode = DETAILED, n_epochs: int = 1, block: bool = True, timeout: float | None = None, generation_params: GenerationParams | None = None) -> Experiment
Run an existing unrun experiment to generate responses and ratings (async).
create ¶
create(name: str, prompt_template: PromptTemplate | None, collection: TemplateVariablesCollection, llm_config: LLMConfig | None = None, criterion_set: CriterionSet | None = None, description: str = '', generate: bool = False, rating_mode: RatingMode = DETAILED, n_epochs: int = 1, block: bool = False, timeout: float | None = None, generation_params: GenerationParams | None = None, rating_version: str | None = None) -> Experiment
Creates a new experiment.
Note: When block=True with generate=True, this method uses async streaming internally and falls back to the async implementation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
The name of the experiment. |
required |
prompt_template
|
PromptTemplate | None
|
Optional prompt template to use for the experiment. If omitted, the collection must contain a Conversation or Raw Input column. |
required |
collection
|
TemplateVariablesCollection
|
The collection of template variables to use for the experiment. |
required |
llm_config
|
LLMConfig | None
|
Optional LLMConfig to use for the experiment. Uses platform default if not specified. |
None
|
criterion_set
|
CriterionSet | None
|
Optional criterion set to evaluate against. If omitted, falls back to the prompt template's linked criterion set (if template is provided). |
None
|
description
|
str
|
Optional description for the experiment. |
''
|
generate
|
bool
|
Whether to generate responses and ratings immediately. Defaults to False. |
False
|
rating_mode
|
RatingMode
|
The rating mode to use if generating responses (Only used if generate=True). Defaults to RatingMode.DETAILED. |
DETAILED
|
n_epochs
|
int
|
Number of times to run the experiment for each input. Defaults to 1. |
1
|
block
|
bool
|
Whether to block until the experiment is executed, only relevant if generate=True. Defaults to False. |
False
|
timeout
|
float | None
|
The timeout for the experiment execution, only relevant if generate=True and block=True. Defaults to None. |
None
|
generation_params
|
GenerationParams | None
|
Optional sampling parameters to override LLMConfig defaults for this experiment. Defaults to None (uses LLMConfig defaults). |
None
|
rating_version
|
str | None
|
Version of core rating to use. If not provided, uses project's default_rating_version. Use "mock" in test environments to avoid actual LLM calls for ratings. |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
Experiment |
Experiment
|
The newly created experiment object. If generate=True, |
Experiment
|
responses and ratings will be generated. The returned experiment object will |
|
Experiment
|
then include a generation task ID that can be used to check the status of the |
|
Experiment
|
generation. |
Raises:
| Type | Description |
|---|---|
HTTPStatusError
|
If the experiment with the same name already exists |
delete ¶
Deletes an experiment.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
experiment
|
Experiment
|
The experiment to delete. |
required |
Raises:
| Type | Description |
|---|---|
HTTPStatusError
|
If the experiment doesn't exist or belongs to a different project. |
get ¶
Get an experiment by name or id.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str | None
|
The name of the experiment to get. |
None
|
id
|
int | None
|
The id of the experiment to get. |
None
|
fetch_responses
|
bool
|
Whether to fetch responses for the experiment. Defaults to True for backward compatibility. Set to False to save API calls when responses aren't needed. |
True
|
Returns:
| Name | Type | Description |
|---|---|---|
Experiment |
Experiment
|
The experiment object. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If neither or both name and id are provided, or if not found. |
get_or_create ¶
get_or_create(name: str, prompt_template: PromptTemplate | None, collection: TemplateVariablesCollection, llm_config: LLMConfig | None = None, criterion_set: CriterionSet | None = None, description: str = '', generate: bool = False, rating_mode: RatingMode = DETAILED, n_epochs: int = 1, block: bool = False, timeout: float | None = None, generation_params: GenerationParams | None = None, rating_version: str | None = None) -> tuple[Experiment, bool]
Gets an existing experiment by name or creates a new one if it doesn't exist.
The existence of an experiment is determined solely by its name. If an experiment with the given name exists, it will be returned regardless of its other properties. If no experiment exists with that name, a new one will be created with the provided parameters.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
The name of the experiment to get or create. |
required |
prompt_template
|
PromptTemplate | None
|
Optional prompt template to use if creating a new experiment. If omitted, the collection must contain a Conversation or Raw Input column. |
required |
collection
|
TemplateVariablesCollection
|
The collection of template variables to use if creating a new experiment. |
required |
llm_config
|
LLMConfig | None
|
Optional LLMConfig to use if creating a new experiment. |
None
|
criterion_set
|
CriterionSet | None
|
Optional criterion set to use if creating a new experiment. If omitted, falls back to the prompt template's linked criterion set (if template is provided). |
None
|
description
|
str
|
Optional description if creating a new experiment. |
''
|
generate
|
bool
|
Whether to generate responses and ratings immediately. Defaults to False. |
False
|
rating_mode
|
RatingMode
|
The rating mode to use if generating responses. Defaults to RatingMode.DETAILED. |
DETAILED
|
n_epochs
|
int
|
Number of times to run the experiment for each input. Defaults to 1. |
1
|
block
|
bool
|
Whether to block until the experiment is executed when creating a new experiment, only relevant if generate=True. Defaults to False. |
False
|
timeout
|
float | None
|
The timeout for the experiment execution when creating a new experiment, only relevant if generate=True and block=True. Defaults to None. |
None
|
generation_params
|
GenerationParams | None
|
Optional sampling parameters to override LLMConfig defaults for this experiment. Defaults to None (uses LLMConfig defaults). |
None
|
rating_version
|
str | None
|
Version of core rating to use. If not provided, uses project's default_rating_version. Use "mock" in test environments to avoid actual LLM calls for ratings. |
None
|
Returns:
| Type | Description |
|---|---|
tuple[Experiment, bool]
|
tuple[Experiment | ExperimentGenerationStatus, bool]: A tuple containing: - The experiment object (either existing or newly created) - Boolean indicating if a new experiment was created (True) or existing one returned (False) |
list ¶
list(prompt_template: PromptTemplate | None = None, collection: TemplateVariablesCollection | None = None, llm_config: LLMConfig | None = None) -> list[Experiment]
Get a list of experiments sorted by creation date.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
prompt_template
|
PromptTemplate | None
|
The prompt template to filter by. |
None
|
collection
|
TemplateVariablesCollection | None
|
The collection to filter by. |
None
|
llm_config
|
LLMConfig | None
|
The LLM config to filter by. |
None
|
Returns:
| Type | Description |
|---|---|
list[Experiment]
|
list[Experiment]: A list of experiments. |
run ¶
run(experiment: Experiment, rating_mode: RatingMode = DETAILED, n_epochs: int = 1, block: bool = True, timeout: float | None = None, generation_params: GenerationParams | None = None) -> Experiment
Run an existing unrun experiment to generate responses and ratings.
This method triggers generation for an experiment that was created without running it (i.e., using client.create_experiment() without generate=True).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
experiment
|
Experiment
|
The experiment to run. |
required |
rating_mode
|
RatingMode
|
The rating mode to use (FAST or DETAILED). Defaults to DETAILED. |
DETAILED
|
n_epochs
|
int
|
Number of times to run for each input. Defaults to 1. |
1
|
block
|
bool
|
Whether to block until generation completes. Defaults to True. |
True
|
timeout
|
float | None
|
Optional timeout in seconds. Only relevant if block=True. |
None
|
generation_params
|
GenerationParams | None
|
Optional sampling parameters to override LLMConfig defaults. |
None
|
Returns:
| Type | Description |
|---|---|
Experiment
|
The experiment with generation_task_id set. If block=True, the experiment |
Experiment
|
will include the generated responses and ratings. |
Raises:
| Type | Description |
|---|---|
HTTPStatusError
|
If the experiment has already been run or doesn't exist. |