Skip to content

ExperimentsResource

ExperimentsResource

ExperimentsResource(client: Union[Client, AsyncClient])

Bases: BaseResource

arun async

arun(experiment: Experiment, rating_mode: RatingMode = DETAILED, n_epochs: int = 1, block: bool = True, timeout: float | None = None, generation_params: GenerationParams | None = None) -> Experiment

Run an existing unrun experiment to generate responses and ratings (async).

create

create(name: str, prompt_template: PromptTemplate | None, collection: TemplateVariablesCollection, llm_config: LLMConfig | None = None, criterion_set: CriterionSet | None = None, description: str = '', generate: bool = False, rating_mode: RatingMode = DETAILED, n_epochs: int = 1, block: bool = False, timeout: float | None = None, generation_params: GenerationParams | None = None, rating_version: str | None = None) -> Experiment

Creates a new experiment.

Note: When block=True with generate=True, this method uses async streaming internally and falls back to the async implementation.

Parameters:

Name Type Description Default
name str

The name of the experiment.

required
prompt_template PromptTemplate | None

Optional prompt template to use for the experiment. If omitted, the collection must contain a Conversation or Raw Input column.

required
collection TemplateVariablesCollection

The collection of template variables to use for the experiment.

required
llm_config LLMConfig | None

Optional LLMConfig to use for the experiment. Uses platform default if not specified.

None
criterion_set CriterionSet | None

Optional criterion set to evaluate against. If omitted, falls back to the prompt template's linked criterion set (if template is provided).

None
description str

Optional description for the experiment.

''
generate bool

Whether to generate responses and ratings immediately. Defaults to False.

False
rating_mode RatingMode

The rating mode to use if generating responses (Only used if generate=True). Defaults to RatingMode.DETAILED.

DETAILED
n_epochs int

Number of times to run the experiment for each input. Defaults to 1.

1
block bool

Whether to block until the experiment is executed, only relevant if generate=True. Defaults to False.

False
timeout float | None

The timeout for the experiment execution, only relevant if generate=True and block=True. Defaults to None.

None
generation_params GenerationParams | None

Optional sampling parameters to override LLMConfig defaults for this experiment. Defaults to None (uses LLMConfig defaults).

None
rating_version str | None

Version of core rating to use. If not provided, uses project's default_rating_version. Use "mock" in test environments to avoid actual LLM calls for ratings.

None

Returns:

Name Type Description
Experiment Experiment

The newly created experiment object. If generate=True,

Experiment

responses and ratings will be generated. The returned experiment object will

Experiment

then include a generation task ID that can be used to check the status of the

Experiment

generation.

Raises:

Type Description
HTTPStatusError

If the experiment with the same name already exists

delete

delete(experiment: Experiment) -> None

Deletes an experiment.

Parameters:

Name Type Description Default
experiment Experiment

The experiment to delete.

required

Raises:

Type Description
HTTPStatusError

If the experiment doesn't exist or belongs to a different project.

get

get(*, name: str | None = None, id: int | None = None, fetch_responses: bool = True) -> Experiment

Get an experiment by name or id.

Parameters:

Name Type Description Default
name str | None

The name of the experiment to get.

None
id int | None

The id of the experiment to get.

None
fetch_responses bool

Whether to fetch responses for the experiment. Defaults to True for backward compatibility. Set to False to save API calls when responses aren't needed.

True

Returns:

Name Type Description
Experiment Experiment

The experiment object.

Raises:

Type Description
ValueError

If neither or both name and id are provided, or if not found.

get_or_create

get_or_create(name: str, prompt_template: PromptTemplate | None, collection: TemplateVariablesCollection, llm_config: LLMConfig | None = None, criterion_set: CriterionSet | None = None, description: str = '', generate: bool = False, rating_mode: RatingMode = DETAILED, n_epochs: int = 1, block: bool = False, timeout: float | None = None, generation_params: GenerationParams | None = None, rating_version: str | None = None) -> tuple[Experiment, bool]

Gets an existing experiment by name or creates a new one if it doesn't exist.

The existence of an experiment is determined solely by its name. If an experiment with the given name exists, it will be returned regardless of its other properties. If no experiment exists with that name, a new one will be created with the provided parameters.

Parameters:

Name Type Description Default
name str

The name of the experiment to get or create.

required
prompt_template PromptTemplate | None

Optional prompt template to use if creating a new experiment. If omitted, the collection must contain a Conversation or Raw Input column.

required
collection TemplateVariablesCollection

The collection of template variables to use if creating a new experiment.

required
llm_config LLMConfig | None

Optional LLMConfig to use if creating a new experiment.

None
criterion_set CriterionSet | None

Optional criterion set to use if creating a new experiment. If omitted, falls back to the prompt template's linked criterion set (if template is provided).

None
description str

Optional description if creating a new experiment.

''
generate bool

Whether to generate responses and ratings immediately. Defaults to False.

False
rating_mode RatingMode

The rating mode to use if generating responses. Defaults to RatingMode.DETAILED.

DETAILED
n_epochs int

Number of times to run the experiment for each input. Defaults to 1.

1
block bool

Whether to block until the experiment is executed when creating a new experiment, only relevant if generate=True. Defaults to False.

False
timeout float | None

The timeout for the experiment execution when creating a new experiment, only relevant if generate=True and block=True. Defaults to None.

None
generation_params GenerationParams | None

Optional sampling parameters to override LLMConfig defaults for this experiment. Defaults to None (uses LLMConfig defaults).

None
rating_version str | None

Version of core rating to use. If not provided, uses project's default_rating_version. Use "mock" in test environments to avoid actual LLM calls for ratings.

None

Returns:

Type Description
tuple[Experiment, bool]

tuple[Experiment | ExperimentGenerationStatus, bool]: A tuple containing: - The experiment object (either existing or newly created) - Boolean indicating if a new experiment was created (True) or existing one returned (False)

list

list(prompt_template: PromptTemplate | None = None, collection: TemplateVariablesCollection | None = None, llm_config: LLMConfig | None = None) -> list[Experiment]

Get a list of experiments sorted by creation date.

Parameters:

Name Type Description Default
prompt_template PromptTemplate | None

The prompt template to filter by.

None
collection TemplateVariablesCollection | None

The collection to filter by.

None
llm_config LLMConfig | None

The LLM config to filter by.

None

Returns:

Type Description
list[Experiment]

list[Experiment]: A list of experiments.

run

run(experiment: Experiment, rating_mode: RatingMode = DETAILED, n_epochs: int = 1, block: bool = True, timeout: float | None = None, generation_params: GenerationParams | None = None) -> Experiment

Run an existing unrun experiment to generate responses and ratings.

This method triggers generation for an experiment that was created without running it (i.e., using client.create_experiment() without generate=True).

Parameters:

Name Type Description Default
experiment Experiment

The experiment to run.

required
rating_mode RatingMode

The rating mode to use (FAST or DETAILED). Defaults to DETAILED.

DETAILED
n_epochs int

Number of times to run for each input. Defaults to 1.

1
block bool

Whether to block until generation completes. Defaults to True.

True
timeout float | None

Optional timeout in seconds. Only relevant if block=True.

None
generation_params GenerationParams | None

Optional sampling parameters to override LLMConfig defaults.

None

Returns:

Type Description
Experiment

The experiment with generation_task_id set. If block=True, the experiment

Experiment

will include the generated responses and ratings.

Raises:

Type Description
HTTPStatusError

If the experiment has already been run or doesn't exist.